Resource

Analysis of Normal Human Mammary Reveals Cell-Specific Active States and Associated Transcription Factor Networks

Graphical Abstract Authors Davide Pellacani, Misha Bilenky, Nagarajan Kannan, ..., Samuel Aparicio, Martin Hirst, Connie J. Eaves

Correspondence [email protected] (M.H.), [email protected] (C.J.E.)

In Brief Pellacani et al. present comprehensive and DNA modification profiles for four cell types in normal human breast tissue and three immortalized human mammary epithelial cell lines. Analysis of activated enhancers place luminal progenitors in between bipotent progenitor-containing basal cells and nonproliferative luminal cells. Explore consortium data at the Cell Press IHEC webportal at http://www.cell.com/ consortium/IHEC.

Highlights d Epigenomes of four normal human breast cell fractions and three cell lines are reported d Luminal progenitor and mature luminal cell epigenomes differ greatly d Enhancers define luminal progenitors as intermediate between basal and luminal cells d Transcription factor binding sites in active enhancers point to distinct regulators

Pellacani et al., 2016, Cell Reports 17, 2060–2074 November 15, 2016 ª 2016 The Author(s). http://dx.doi.org/10.1016/j.celrep.2016.10.058 Cell Reports Resource

Analysis of Normal Human Mammary Epigenomes Reveals Cell-Specific Active Enhancer States and Associated Transcription Factor Networks

Davide Pellacani,1 Misha Bilenky,2 Nagarajan Kannan,1 Alireza Heravi-Moussavi,2 David J.H.F. Knapp,1 Sitanshu Gakkhar,2 Michelle Moksa,3 Annaick Carles,3 Richard Moore,2 Andrew J. Mungall,2 Marco A. Marra,2 Steven J.M. Jones,2 Samuel Aparicio,4,5 Martin Hirst,2,3,* and Connie J. Eaves1,6,* 1Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada 2Canada’s Michael Smith Sciences Centre, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada 3Michael Smith Laboratories, Department of Microbiology and Immunology, University of British Columbia, Vancouver, BC V6T 1Z4, Canada 4Department of Molecular Oncology, BC Cancer Agency, Vancouver, BC V5Z 1L3, Canada 5Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC V6T 2B5, Canada 6Lead Contact *Correspondence: [email protected] (M.H.), [email protected] (C.J.E.) http://dx.doi.org/10.1016/j.celrep.2016.10.058

SUMMARY 3D cultures (Eirew et al., 2008; Kannan et al., 2013; Raouf et al., 2008). Most colonies derived from the BCs contain a The normal adult human mammary gland is a contin- mixture of cells with either basal or luminal features, although uous bilayered epithelial system. Bipotent and sometimes only basal progeny are produced. Human LPs myoepithelial progenitors are prominent and unique contain all of the in vitro progenitor activity of the luminal components of the outer (basal) layer. The inner compartment and produce colonies of cells expressing luminal (luminal) layer includes both luminal-restricted pro- features exclusively. Because LCs lack substantial proliferative genitors and a phenotypically separable fraction potential in vitro, they are assumed to represent a mature subset of luminal cells (Fridriksdottir et al., 2015; Kannan et al., 2013). that lacks progenitor activity. We now report an epi- When transplanted into immunodeficient mice, a small propor- genomic comparison of these three subsets with one tion (<1%) of human BCs, but not LPs or LCs, can also individu- another, with their associated stromal cells, and with ally regenerate normal-appearing bilayered mammary structures three immortalized, non-tumorigenic human mam- that contain the same spectrum of cell types and progenitors mary cell lines. Each genome-wide analysis contains found in the normal human gland. This property suggests the profiles for six histone marks, methylated DNA, and cells of origin of these structures represent a human mammary RNA transcripts. Analysis of these datasets shows stem cell population (Eirew et al., 2008; Kuperwasser et al., that each cell type has unique features, primarily 2004; Lim et al., 2009; Nguyen et al., 2014). within genomic regulatory regions, and that the cell Despite these advances, the molecular regulators that specify lines group together. Analyses of the and the unique features of each of these normal human mammary enhancer profiles place the luminal progenitors in subsets have only recently become amenable to interrogation. profiling (Choudhury et al., 2013; Gascard et al., between the basal cells and the non-progenitor 2015; Huh et al., 2015; Kannan et al., 2013; Lim et al., 2009, luminal subset. Integrative analysis reveals networks 2010; Makarem et al., 2013; Maruyama et al., 2011; Raouf of subset-specific transcription factors. et al., 2008; Dos Santos et al., 2015), together with functional studies, has enabled several key transcription factors (TFs) to INTRODUCTION be identified in the subpopulations of normal human as well as analogous mouse mammary cell types (Fu et al., 2014; Visvader The normal adult human female mammary gland is a continuous and Stingl, 2014). Thus implicated in BCs are SLUG, TP63 and bilayered, branching epithelial structure containing three pheno- TP53, SOX9, STAT3, MYC, and TAZ. In cells with luminal fea- typically distinct subsets of cells. These can now be isolated and tures, CEBPB and NOTCH3 (Raouf et al., 2008), together with biologically distinguished (Fu et al., 2014). They are referred to as GATA3, ELF5 (Chakrabarti et al., 2012), and FOXA1, which reg- basal cells (BCs), luminal progenitors (LPs), and luminal cells ulates estrogen receptor activity (Theodorou et al., 2013), have (LCs) because their surface marker expression profiles identify been identified as important. their location within the outer (basal) or inner (luminal) layer of Previous human mammary profiles produced as the gland. A prominent fourth cell population in the breast is part of the Roadmap Consortium (Roadmap Epi- comprised of mesodermally derived stromal cells (SCs). Approx- genomics Consortium et al., 2015) have provided an initial global imately 20%–30% of human mammary BCs and LPs have annotation of active enhancer regions (Gascard et al., 2015). We progenitor (clonogenic) activity demonstrable in both 2D and now report a more extensive set of reference epigenome profiles

2060 Cell Reports 17, 2060–2074, November 15, 2016 ª 2016 The Author(s). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). A Normal FACS 6 samples pooled Molecular analyses Integrative analyses breast ( 5x107 cells per fraction) samples RNA Datasets n = 6 RNA-seq CD45- CD31- LC DNA Expression states single cells WGBS LC LP Chromatin LP Enhancers BC BC

EpCAM-PE Promoters SC SC Transcription factor CD49f-APC networks

Colony assays

B C Type Study LP vs BC LP vs LC Up: 806 BC CEMT Do Do BC Nguyen BC 1 Nguyen 1 BC Nguyen (RPKM) (RPKM) 10 myo REMC 10 myo REMC myo REMC LC log LP CEMT BC log LP Nguyen LP Nguyen LP Nguyen 1 1 lum REMC LP log10(RPKM) LP log10(RPKM) LC CEMT lum REMC BC vs LC DE genes lum REMC Do MCF10A CEMT LP vs BC 184−L9 CEMT

184−L2 CEMT 1

vHMEC REMC (RPKM)

vHMEC REMC 10 LP vs LC

SC CEMT log

fibr REMC LC fibr REMC BC vs LC

0.20 0.15 0.10 0.05 0.00 1 246810 BC log10(RPKM) Dist. = 1−Spearman | Link. = complete absolute log2(

D E Type Study myo REMC BC CEMT myo REMC SC CEMT LP CEMT LC CEMT lum REMC lum REMC fibr REMC fibr REMC vHMEC REMC MCF10A CEMT 184−L9 CEMT 184−L2 CEMT

0.4 0.3 0.2 0.1 0.0 Dist. = 1−Spearman | Link. = complete

Figure 1. Isolation of Purified Cell Subsets and Epigenomic Profiling (A) Experimental design. (B) Unsupervised clustering of RNA-sequencing experiments comparing the samples analyzed in this study (CEMT) with other studies (Nguyen et al., 2015; Roadmap Epigenomics Consortium et al., 2015; Gascard et al., 2015). (legend continued on next page)

Cell Reports 17, 2060–2074, November 15, 2016 2061 that include a separate analysis of human LPs and LCs and pro- These findings are consistent with and extend previously re- files for SCs isolated from the same normal breast tissue, as well ported findings obtained both by RNA-seq (Gascard et al., as profiles generated from three widely used immortalized, but 2015; Nguyen et al., 2015) and by 30 tag-based and microarray non-tumorigenic, human mammary epithelial cell lines: i.e., analysis (Kannan et al., 2013; Lim et al., 2009; Raouf et al., MCF-10A cells (Debnath et al., 2003) and 184-hTERT-L2 (184- 2008), including an identification of many of the same genes as L2) and 184-hTERT-L9 (184-L9) cells (Burleigh et al., 2015). ‘‘BC specific’’ (e.g., TP63 (deltaN), ACTA2, MME, KRT14, The results demonstrate that the cell lines are not representative THY1, and KRT5) or ‘‘LP specific’’ (e.g., MUC1, PROM1, KIT, of any freshly isolated normal human mammary subset. They KRT18, EPCAM, and KRT8)(Figure S2A). Similar clustering have also allowed a hierarchy of enhancer states in normal and differences between the same four cell types were human mammary tissue and their corresponding cell-specific observed when either microRNA (miRNA) or large intergenic TF networks to be identified. noncoding RNA (lincRNA) profiles were analyzed (Figures 1D, S2D, and S2E). RESULTS AND DISCUSSION Notably, the for all three immortalized human mammary cell lines clustered independently from both the pri- Generation and Validation of Comprehensive Reference mary epithelial subsets and the SCs isolated directly from the Epigenomes same freshly dissociated normal human breast tissue (Figure 1B). Reference epigenomic profiles were generated following the rec- Moreover, 66 of the 100 genes whose differential expression ommendations of the International Con- contributed most to the separate clustering of the mammary sortium (IHEC; http://ihec-epigenomes.org). The subsets of cell lines (highest rank differences) included genes of known normal human breast cells isolated were >95% pure CD49f+ importance in the mammary gland (e.g., PROM1, KIT, CXCR4, EPCAMlow/ cells (BCs), CD49f+EPCAMhigh cells (LPs), CD49f SOX10, ELF3, RARRES1, MMP7, TGFBI, and NT5E) and were EPCAMhigh cells (LCs), and CD49fEPCAM cells (SCs). These found to be differentially expressed in the different subsets of pri- subsets were initially isolated separately from each of six donor mary epithelial cells (Figure 1E). These results show that these samples (aged 17-49 years) using gates that excluded CD45+ culture-adapted immortalized cell lines, commonly used as hematopoietic cells, CD31+ endothelial cells, and dead (DAPI+) models of ‘‘normal’’ human mammary cells, exhibit significant cells (Figures 1A and S1). Like fractions of cells from each and consistent differences in their transcriptional profiles from sample were then pooled for epigenomic profiling. In vitro 2D all subsets of epithelial mammary cells isolated directly from colony-forming cell (CFC) assays confirmed the expected normal human breast tissue. content of mammary progenitors in each subset (24% of BCs, 18% of LPs, and <0.3% of LCs). Chromatin immunopre- Variations in the Chromatin States of Different Normal cipitation sequencing (ChIP-seq) data for six histone marks Human Breast Cell (H3K4me3, H3K4me1, H3K27ac, H3K27me3, H3K9me3, and To examine and compare the chromatin states of the same pri- H3K36me3), whole-genome bisulfite sequencing (WGBS), and mary breast cell types as well as the three cell lines, we used RNA sequencing (RNA-seq) datasets (both mRNA and miRNA) ChromHMM (Ernst and Kellis, 2012) to generate an 18-state were obtained for all four subsets. Parallel data were obtained model from the ChIP-seq profiles (Figures 2A and 2B). The re- on MCF-10A cells and 184-L2 and 184-L9 cells. Fully anno- sults recapitulate many features of the 18-state model recently tated tracks can be viewed in current genome browsers published for 111 primary human tissues and cell types (Road- or downloaded directly for offline analysis at http://www. map Epigenomics Consortium et al., 2015), including the epigenomes.ca. Quality measures and methods to access the reported relationship between chromatin states and DNA raw and processed data are provided in Experimental Proced- (Figure 2C), with two notable differences. One ures and Table S1. was the detection of a state characterized by short regions Unsupervised hierarchical clustering analysis of the transcrip- (median size = 400 bp) containing both H3K9me3 and tome data indicated all three mammary epithelial subsets are H3K27me3 (Figure S3A, ‘‘Repr’’ state) that frequently overlap- more similar to each other than to the SCs but, nevertheless, ped with lamina-associated domains (LADs) across the entire distinct, with the LPs clustering closer to LCs than BCs. (Fig- genome (15%, 1.3-fold enrichment). The second was a split of ure 1B). Paired comparison of the gene expression profiles each of the previously designated ‘‘quiescent’’ (Quies and confirmed a similarity of LP and LC transcriptomes (p < 0.001, Quies_G – no marks) and ‘‘transcribed’’ (Tx and Tx_3p – Mann-Whitney) across both protein coding (Figures 1C and H3K36me3) states into two states specified by their transition S2B; Table S2) and non-coding space (Figure S2C; Table S3). rather than their emission probabilities (Figure 2A).

(C) Differential gene expression analysis for protein-coding genes. Each dot-plot shows all protein coding genes (gray), those with upregulated expression (green), and those with downregulated expression (red). Bottom right panel: boxplot showing the distribution of absolute fold changes in the three comparisons made. (D) Unsupervised clustering of miRNA sequencing data for the samples analyzed in this study (CEMT) together with those previously published within the REMC (Gascard et al., 2015). (E) Heatmap showing the expression (Z scores) of the 100 genes with the largest rank difference in cultured cells compared to freshly isolated cells. (Diff. Expr., gene differentially expressed between the primary epithelial subpopulations profiled here). See also Figures S1 and S2 and Tables S1, S2, and S3.

2062 Cell Reports 17, 2060–2074, November 15, 2016 ABchr2173.25 173.3 173.35 173.4 Mb

ITGA6 PDK1

TSS_ATSS_Flnk_1TSS_Flnk_2TSS_Flnk_DTx Tx_3pEnh Enh_AEnh_GEnh_G_ATSS_BivEnh_BivRepr_PCZnf_RptsHet ReprQuiesQuies_G

H3K36me3 1.0 H3K9me3 0.8 ChromHMM H3K27me3 0.6 H3K4me1 0.4 10 0.2 H3K27ac Emission probability H3K4me3 0.0 H3K4me3 Genome % 0 CpGIsland max 2 Exon Gene H3K4me1 TES TSS Overlap 0 TSS±2kb min 3.5 LADs enrichment) (fold TSS_A H3K27ac TSS_Flnk_1 TSS_Flnk_2 0.0 TSS_Flnk_D 2 Tx Tx_3p H3K36me3 Enh 1.0 Enh_A 0.8 0 Enh_G 0.6 2 Enh_G_A 0.4 0.2 H3K27me3 TSS_Biv Transition probability Enh_Biv 0.0 Repr_PC 0 2 Transition to state Transition Znf_Rpts Het Repr H3K9me3 Quies Quies_G 0

CD1.0 184−L2 18/18 184−L9 0.8 14/18 MCF10A 0.6 18/18 SC 16/18 0.4 BC 17/18 0.2 LP 16/18

Fractional Methylation Fractional LC 0.0 Tx Het

Enh 1.0 0.8 0.6 0.4 0.2 0.0 Repr Quies Tx_3p

Enh_A 1 − Mean Norm. Jaccard Enh_G TSS_A Enh_Biv TSS_Biv Quies_G Znf_Rpts Repr_PC Enh_G_A TSS_Flnk_2 TSS_Flnk_1 TSS_Flnk_D

E F Jaccard value 0.0 0.2 0.4 0.6 0.8 TSS_A TSS_Flnk_1 TSS_Flnk_2 TSS_Flnk_D Tx Tx_3p Enh Enh_A Enh_G Enh_G_A TSS_Biv Enh_Biv Repr_PC Znf_Rpts Het Repr Quies Quies_G

Figure 2. Chromatin State Differences between Cell Types (A) Chromatin state definitions, histone mark probabilities, overlap with genomic features, and transition probabilities for an 18-state model defined using the ChromHMM software. (B) Example of the chromatin state model (ChromHMM) and ChIP-seq signal tracks (y axis = signal per million reads [SPMR]) around the ITGA6 gene (CD49f) for the LP subset. (C) Distributions of DNA methylation levels across all chromatin states in the four primary samples analyzed. Boxplots show 10%, 25%, 50%, 75%, and 90% quantiles. (D) Aggregated hierarchical clustering based on the normalized Jaccard values of each chromatin state. Fractions show the number of states for which each node is present in the individual dendrograms. (E) Heatmap showing the Jaccard value of enhancers associated states with the ChromHMM profiles generated by the Roadmap Epigenomics Consortium. Only the top five matches for each sample are represented (black indicates high Jaccard). See complete dataset in Figure S3B. (F) Jaccard values of each chromatin state between cell types (primary samples only). Each dot shows one pairwise comparison (lines indicate medians). See also Figure S3.

Cell Reports 17, 2060–2074, November 15, 2016 2063 AB

CD

E

F

G

H

(legend on next page)

2064 Cell Reports 17, 2060–2074, November 15, 2016 Comparison of these chromatin states again showed a Chromatin Features at Promoters Point to a Hierarchy of consistent difference between all three mammary cell lines Normal Mammary Epithelial Cell States and the four primary cell types (Figure 2D). In fact, a com- The importance of H3K4me3 and H3K27me3 modifications parison of the enhancer states of these cells with those in developmental specification processes (Voigt et al., 2013) of 127 cell and tissue types reported by the Roadmap Epi- prompted us to examine these modifications at gene promoters genomics Consortium (Roadmap Epigenomics Consortium genome-wide, both individually and in combination (Figure 3). et al., 2015) showed our three cell lines were similar to H3K4me3 levels at promoters were highly consistent across all cultured epithelial cells derived from breast and skin, and primary cell types (Spearman correlation coefficient R 0.9). the BC fraction was most similar to previously analyzed These correlated positively with the transcript levels of the asso- freshly isolated breast myoepithelial cells (Figures 2Eand ciated protein-coding genes that were differentially expressed S3B). The present findings thus underscore the magnitude in different cell types. A similar trend was seen for noncoding of changes imposed by culture adaptation on the chromatin RNAs. In general, H3K27me3 levels at promoters were more var- landscape (Antequera et al., 1990), in addition to effects on iable in all cell types (average Spearman correlation coefficient = the transcriptome. 0.69), and these correlated negatively with the levels of tran- Pairwise comparisons between the chromatin states of each scripts of the associated differentially expressed genes (Figures of the four primary mammary cell types revealed substantive 3A–3C and S4A–S4E). additional differences, involving 25% of the genome in We next used false discovery rate (FDR)-based thresholds each case. Moreover, differences distinguishing LPs, LCs, (q < 0.01 and at least 400 bp covered) to categorize gene and BCs from each other proved to be as extensive as those promoters based on the presence of H3K4me3, H3K27me3, or distinguishing any one of these from the developmentally unre- both (bivalent) in all three primary mammary subsets (Figure 3D; lated SCs (i.e., for each chromatin state, the Jaccard indices Table S4). LCs showed a higher number of H3K27me3-marked of paired comparisons of SCs with each of the mammary and bivalent promoters compared to BCs, consistent with an cell types were not significantly different, p R 0.1 Mann-Whit- important role of this polycomb-deposited mark in establishing ney; Figure 2F). Enhancer states (abbreviated as Enh, Enh-A, LC identity, as suggested previously (Maruyama et al., 2011; Enh-G, and Enh-AG) appeared to be the most variable Pal et al., 2013). H3K27me3-marked promoters specific to features, as reported previously (Roadmap Epigenomics each cell type were associated with repressed gene expression Consortium et al., 2015). Many promoter-associated states (p < 0.01, Mann-Whitney on the group of genes; Figure S4F). (abbreviated as TSS-Flnk-1, TSS-Flnk-2, and TSS-Flnk-D) However, a majority of these promoters lacked H3K4me3 in were also highly variable between all cell types (Figures S3C the other cell types (Figure 3E). Thus, for these genes, transcrip- and S3D). tion appears to be independent of promoter modification by In summary, the chromatin state models of the data reported H3K4me3. Approximately 20%–30% of these promoters were here show that the three major epithelial cell types in normal marked by H3K4me1 (Figures 3F and 3G). They were also en- adult human mammary tissue possess many unique features ex- riched in noncoding RNAs (Figure 3H), of which miRNAs and tending throughout a large part of their . Nevertheless, lincRNAs were the most frequent types. This suggests that the their epigenomes are more different from those of three immor- role of H3K4me1 and H3K27me3 in regulating noncoding RNA talized non-tumorigenic human mammary cell lines by compari- production seen in other tissues (Amin et al., 2015) also holds son to the epigenome of developmentally unrelated SCs isolated for subsets of freshly isolated normal human mammary epithelial from the same human breast tissue samples. The present pri- cells. mary mammary cell epigenome datasets also confirm a marked We identified 2,700 bivalent promoters in each of the three cell-type specificity in the chromatin states of both enhancer primary mammary cell types. This included the LCs, even though and promoter regions. these cells are thought to represent a terminally differentiated

Figure 3. H3K27me3 Is Highly Variable at Promoters (A) Heatmaps showing Spearman correlation coefficients of H3K4me3 (top) and H3K27me3 (bottom) marks at promoters in all pairwise comparisons of the four subsets of primary cells analyzed. (B) Density plot showing the distribution of promoters marked by H3K4me3 (top) and H3K27me3 (bottom) in pairwise comparisons between epithelial cell types (warmer colors indicate higher densities). SPKM, signal per kilobase per million mapped reads. (C) Dot plot showing the difference in median levels of histone marks at promoters of differentially expressed protein-coding genes. Each dot shows one pairwise comparison. (D) Venn diagrams showing the numbers of H3K4me3, H3K27me3, and bivalently marked promoters marked in each epithelial cell type. (E) Hive plots showing the proportion of H3K27me3-marked promoters specific to each cell type (dark red), being bivalent (blue), and H3K4me3-marked (dark green), or not marked (black) in other cell types. Ribbons are proportional to the number of promoters changing marking between two cell types. (F) Fraction of cell-specific H3K27me3-marked promoters not marked by H3K4me3 in other cell types but marked by H3K4me1. Groups of bars represent cell- specific H3K27me3-marked promoters. Individual bars show the H3K4me1 marking in each cell type. (G) Average plots of H3K4me1-marked cell-type-specific H3K27me3-marked promoters, not marked by H3K4me3 in other cell types. Left, BCs; middle, LPs; and right, LCs. (H) Fraction of genes associated with cell-type-specific H3K27me3-marked promoters, but not marked by H3K4me3 in other cell types, that are classifiedas protein coding. Gray boxplots indicate distributions of equally sized random gene sets. See also Figure S4 and Table S4.

Cell Reports 17, 2060–2074, November 15, 2016 2065 (legend on next page)

2066 Cell Reports 17, 2060–2074, November 15, 2016 human mammary cell type (Figure 3D). Although heterogeneity intermediate between BCs and LCs. Interestingly, in LPs, of promoter marking between individual LCs cannot be excluded only 2% (11/449) of the bivalent promoters were marked as an explanation for the bivalency observed, a similar variability by H3K27me3 in BCs and H3K4me3 in LCs, whereas 15% across donors in gene expression of affected genes was evident (68/449) showed the opposite pattern. The latter were also between cell types in three independent studies (Figure S5). significantly enriched in genes encoding proteins that regulate In addition, we found that bivalent promoters in BCs overlap extracellular matrix organization (p < 0.01, hypergeometric distri- significantly with the bivalent chromatin states identified in the bution with Bonferroni correction). Surprisingly, in both BCs and Roadmap Epigenomics Consortium dataset for a myoepithelial LPs, 88% (538/610) of the bivalent promoters specific to LCs cell population (essentially the same as BCs) obtained from a were classified as H3K4me3, and these were enriched in genes single donor (Fisher’s p value < 105). Thus it seems unlikely encoding proteins responsible for cell migration, proliferation, that the bivalent promoters identified here reflect a mixture of and receptor tyrosine kinases (e.g., VIM, SNAI2, CDK6, KIT, different promoter marking in the same phenotypes of cells and EGFR; p < 0.01, hypergeometric distribution with Bonferroni isolated from different donors. correction). This latter association suggests an unanticipated Most promoters that were bivalent in a given subset of mam- role of this promoter state in repressing processes specific to mary epithelial cells contained at least one of the two marks BCs and LPs. in the other two subsets. These changes in promoter marking Previous studies in breast and other tissues have indicated an correlated with expected changes in gene expression; i.e., association of bivalently marked promoters with restricted increased transcript levels were associated with loss of promoter expression of genes important for lineage determination (Bern- H3K27me3 marks and decreased transcript levels were associ- stein et al., 2006; Maruyama et al., 2011; Visvader and Lindeman, ated with loss of promoter H3K4me3 marks (Figures 4A–4C). 2012; Voigt et al., 2013). The present results suggest that pro- More than 90% of the bivalent promoters specific to BCs (422/ moter bivalency plays a broader role in regulating gene expres- 459) ‘‘resolved’’ to a display of the same single modification in sion among normal human mammary cells. Our analysis also both LPs and LCs, with 60% showing H3K4me3 and 40% supports a hierarchical organization of mammary epithelial cell showing H3K27me3 modifications. These findings are reminis- states in which a bivalent promoter landscape established in cent of the changes in bivalent promoter modifications that BCs is resolved in LPs and is then maintained in LCs. This in- accompany the differentiation of embryonic stem cells (Bern- cludes some de novo gains of bivalency in LCs suggesting that stein et al., 2006). this chromatin state may play a role in the repression of genes Gene promoters that were bivalently marked in BCs, but involved in proliferation. marked by H3K27me3 in LPs or LCs, included many genes that proteins and TFs with a reported role in regulating Chromatin Features at Enhancers Point to a Hierarchy of stem cell properties (e.g., SOX17, HOXD10, TCF15, TAL1, Normal Mammary Epithelial Cell States PIWIL1, and the imprinted gene, IGF2). Conversely, gene pro- We next examined the distribution of enhancers in each subset moters marked by H3K4me3 in both LCs and LPs with were of primary cells based on the identification of sites marked by significantly enriched in genes encoding proteins implicated H3K4me1, H3K4me3, H3K27ac and H3K27me3 (Figure 5; Table in their differentiation (e.g., ESR1, SOX9, ERBB4, NOTCH2, S5; see Experimental Procedures for details of this analysis). RUNX1, CCND1, and EPCAM; p < 0.01, hypergeometric distri- Distal ‘‘primed’’ enhancers were defined as any region of chro- bution with Bonferroni correction; Figure 4D). These findings matin marked by H3K4me1, but not also high in H3K4me3, support a model in which genes crucial for luminal differentiation assuming the latter to have promoter rather than enhancer activ- may be poised in BCs but appear activated in cells that initiate ity (Figures 5B, 5C, and S6). We termed those also strongly luminal differentiation. marked by H3K27ac as ‘‘active,’’ those strongly co-marked by 60% (269/449) of LP-specific bivalent promoters were marked H3K27me3 as ‘‘poised,’’ and the very rare regions strongly by H3K4me3 in BCs and LCs. Associated genes were signifi- marked by both H3K27me3 and H3K27ac (<0.1%) as ‘‘mixed’’ cantly enriched in genes involved in epithelial tissue morphogen- (anticipating this latter group to represent regions of differential esis and development (e.g., BMP4, TGFB1, PDGFA, and RXRA; marking in individual cells within the same subset). p < 0.01, hypergeometric distribution with Bonferroni correction). Using these definitions, we identified 141,400 putative primed This result is again consistent with LPs occupying a state enhancers (Figures 5D and 5E) with a median distance to their

Figure 4. Bivalent Promoter Differences between Human Mammary Epithelial Cell Types (A) Genome browser tracks of the ChIP-seq signals for H3K4me3 (dark green) and H3K27me3 (dark red) (y axis = signal per million reads [SPMR) in the epithelial cell types around the NOTCH3 gene (bivalent in BCs), the MME gene (bivalent in ]LPs) and the EGFR gene (bivalent in LCs). Light gray rectangles indicate gene promoters. (B) Hive plots showing the proportion of bivalent promoters specific to each cell type (blue), being H3K27me3-marked (dark red), H3K4me3-marked (dark green), or not marked (black) in other cell types. Ribbons are proportional to the number of differently marked promoters in two cell types. (C) Levels of expression of genes associated with cell-specific bivalent promoters that show different marks in other cell types. Promoter categories are shown below the x axis label. Black lines indicate medians of each group. p values are from a two-tailed Mann-Whitney U test. (D) Top five gene ontologies enriched in association with cell-specific bivalent promoters that show different marks in other cell types. Promoter categories are shown in the top right corner. See also Figure S5.

Cell Reports 17, 2060–2074, November 15, 2016 2067 Figure 5. Identification of Different Classes of Enhancers in Different Human Mammary Cell Subsets (A) Table showing the different categories of enhancers identified, the histone marking used to define each category (X for absent, O for present, X/O for optional), and the numbers of enhancers in each cell type. (B) Expression of THY1 in the four mammary cell types analyzed. (C) Example of the enhancers identified (Enh track), ChIP-seq signal tracks (y axis = signal per million reads), unmethylated regions (UMR track), and DNA methylation levels (Fr. Meth.) and coverage (log2 scale) in all cell types around the THY1 gene. Light gray rectangles indicate all enhancers identified. (D) Heatmap showing the signal for H3K4me1, H3K27ac, and H3K27me3 on each enhancer region identified in LPs, ordered by region size and categorized as off (gray), primed (green), active (red), poised (blue), and mixed (magenta). Each row represents one enhancer region. (E) H3K4me1 levels (SPKM) at primed enhancers in all cell types. Each column represents one individual enhancer. Enhancers are grouped based on their presence in each cell type. (F) H3K27ac levels (SPKM) at active enhancers in all cell types. Each column represents one individual enhancer. Enhancers are grouped based on their presence in each cell type. See also Figure S6 and Table S5.

2068 Cell Reports 17, 2060–2074, November 15, 2016 A

BCD

E

G

F

HI

(legend on next page)

Cell Reports 17, 2060–2074, November 15, 2016 2069 closest transcription start site (TSS) of 22 kb. Notably, these en- HOXA4, and FOXO1). Significantly fewer distal enhancers were hancers overlap significantly (Fisher’s p value < 105 for both defined as ‘‘poised’’ (on average, only 5% of the primed en- robust and permissive datasets) with enhancers previously in- hancers in each cell type), and these showed a weak inverse ferred from RNA expression data (Andersson et al., 2014). Of relationship with gene expression (less with upregulated genes these, 374 overlap with functionally in vivo validated enhancers and more with downregulated genes, Figure S7D). (Fisher’s p value < 105) present in the VISTA enhancer browser We then compared the state of enhancer priming and activa- (Visel et al., 2007). They were characterized by an overall low tion within the different mammary subsets. This analysis showed average level of methylation (on average, 50% at enhancer that up to 40% of the enhancers that were specifically active in midpoints, and 75% on equal numbers of randomly selected a particular subset were in a primed state in the other two sub- genomic regions; Figure S7A). They also showed significant sets (Fisher p value < 105). This change in enhancer state was overlap (Fisher’s p value < 105) with unmethylated regions seen more often in LPs than in either LCs or BCs (Figure 6D). (UMRs) and low-methylated regions (LMRs) (Figure S7B), as A deeper analysis of H3K4me1 marking of active enhancers defined by applying a hidden Markov model to the WGBS data- revealed a significant imbalance in cell-specific marking (Fig- set (Stadler et al., 2011). ure 6E). Thus, of the enhancers classified as active only in LCs, The repertoire of active enhancers comprised 40% of those 50% (2,500) of those that were primed in LPs were unmarked defined as primed (Figure 5D), of which 30% to 50% were pre- in BCs. However, only 30% (1,000) of those primed in BCs sent in only one mammary subset (using our thresholds, Figures were unmarked in LPs. An analogous pattern was seen for 5F and 6A). These were associated with 40% to 60% of the enhancers classified as active only in BCs, with 60% (4,000) genes showing upregulated expression but only 10% to 30% of those identified as primed in LPs being unmarked in LCs of the genes showing downregulated expression in pairwise and only 30% (1,200) of those identified as primed in LCs being comparisons of all cell types (Figures 5B, 5C, 6B, and S6). unmarked in LPs. Similar numbers of active LP-specific en- A similar, but generally less significant, association was seen hancers were also found to be primed in BCs and LCs. However, with primed enhancers (excluding those defined as active, approximately half of these were unmarked in the other cell type poised or mixed; Figure 6C). Interestingly, regions correspond- (55% and 50% in BCs and LCs, respectively). ing to those identified as super enhancers in the mouse mam- This evolving pattern of enhancer activation states is again mary gland (Shin et al., 2016) significantly overlapped with sites consistent with an underlying hierarchy of human mammary identified as active enhancers in human mammary epithelial cell differentiation states in which LPs represent an epigenomic cells, but not in the human SCs (Figure S7C). Thus, at least intermediate between BCs and LCs, with enhancer activation some of the enhancers identified here as active in human mam- and decommissioning being governed by H3K27ac addition mary cells appear to also be active in the mouse mammary and removal, respectively. Importantly, functional annotation of gland. genes associated with enhancers found to be active in BCs This analysis also revealed cell-type-specific enhancer acti- and primed in LPs showed enrichment for terms related to cell vation of many genes with an established role in mammary motility and adhesion (p < 0.01 FDR, binomial testing; Figure 6F). epithelial biology. For example, many enhancers in the vicinity Genes associated with these terms and active enhancers in of BC-specific genes were defined as active only in BCs (e.g., BCs were also expressed at a higher level in BCs than in TP63, THY1, VIM, and SNAI2). Similarly, multiple enhancers LPs and LCs (Figure 6G). near LC-specific genes were active only in these cells (e.g., Strikingly, >60% of the enhancers uniquely active in one of the ESR1, KLF4, FOXA1, S100P, and TBX3), and several enhancers mammary cell types were not marked by H3K4me1 in either of near KIT and ELF5 were active exclusively in LPs. Likewise, we the other two. This suggests that a majority of enhancers are found shared active enhancers for many genes that are ex- activated de novo in each cell type. However, 50% of the genes pressed at higher levels in BCs and LPs, the two mammary sub- associated with de novo activated enhancers were also found to sets that show high progenitor activity (e.g., KRT5, EGFR, ITGA6, be associated with active enhancers that are primed or active in

Figure 6. Differential Enhancer Activation in Different Human Mammary Cell Types (A) Venn diagrams showing the numbers of enhancers identified in each epithelial cell type as primed, poised, active, and mixed. (B and C) Fraction of differentially expressed genes associated with at least one active (B) or primed (C) enhancer in only one of the two cell types in each pairwise comparison. p values are from a two-tailed Fisher’s test. (D) Fraction of active enhancers present in only one epithelial cell type marked by H3K4me1 in the other two samples. p values are from a two-tailed Fisher’s test of a comparison between cell types. (E) Hive plots showing the proportion of active enhancers specific to each cell type (red), found to be off (gray), primed (green), poised (blue), or mixed (magenta) in the other cell types. Ribbons are proportional to the number of enhancers changing marking between two cell types. Left: active enhancers specific to BCs. Middle: active enhancers specific to LPs. Right: active enhancers specific to LCs. (F) Top eight gene ontologies enriched in association with enhancers classified as active in BCs, primed in LPs, and off in LCs. (G) Expression of genes associated with the gene ontologies shown in (F) and with enhancers classified as active in BCs, primed in LPs, and off in LCs. (Black lines represent medians). (H) Number of genes whose expression was associated only with active enhancers not primed in other cell types or with active enhancers that were either primed or not in other cell types. (I) Heatmap showing the levels of DNA methylation of the most variable CpG sites within UMRs and LMRs. Each row represents one CpG site. Sites are grouped according to their changes in methylation across the epithelial cell types. See also Figure S7.

2070 Cell Reports 17, 2060–2074, November 15, 2016 (legend on next page)

Cell Reports 17, 2060–2074, November 15, 2016 2071 one of the other two mammary cell types (Figure 6H). This asso- cell types, independently of whether all or only inter-genic or ciation might be explained by two coexisting modes of enhancer only intra-genic enhancers were considered (Figure 7B; Table activation within these mammary epithelial subsets: one in which S6). In BCs, these included binding sites for EGR1 and EGR2, active enhancers fall close to enhancers primed or active in the as well as members of the TEAD EBF and TCF families, and other cell subpopulations and one independent of pre-existing TP53. In LCs, this analysis identified binding sites for many enhancer states. members of the AP-1 TF complex, together with ESR1 and In summary, from a complete description of human mammary FOXP1. The TFs similarly identified in LPs included GRHL2, epithelial cell enhancer activation states, we show that their ELF1, and ETS1. enhancer profiles display extensive cell-type specificity, with Lastly, we used these enriched TF binding sites to derive a TF no greater similarity between LPs and LCs than between LPs regulatory network for each of the three human mammary cell and BCs, or between LCs and BCs. This epigenomic distinction types. To construct these networks, we inferred a positive inter- of LPs and LCs suggests that enhancer activation plays an action between two TFs whenever a binding site for one of the important role in regulating their distinct transcriptional profiles, TFs was found in an active enhancer within 40 kb (62% of all with many BC- and LC-specific genes primed for activation or regions analyzed) of the gene for the other TF and expression re-activation in LPs, even though they maintain a BC- or LC-spe- of the two TF genes was positively or negatively correlated (ab- cific transcriptional profile. The DNA methylation data obtained solute Spearman correlation coefficient > 0.25 from a published on the same three cell samples also showed most of the changes array dataset; Kannan et al., 2013). The networks thus generated in DNA methylation followed a progressive shift from hyper- to show a high degree of independence in cell-type-specific TFs, hypo-methylation, or vice versa from hypo- to hyper-methyl- with very few shared nodes and edges (Figures 7C–7E; Table ation, in BCs, then LPs, and then LCs. Many CpG sites were S7). The network operative in BCs shows a highly interconnected also specifically hypo-methylated in LPs, indicative of a marked central group of TFs in which TP63, NF1 (NFIB), EGR2, and ETS1 cell-type-specific methylation profile (Figure 6I). are the most prominent. In contrast, the network generated for LCs is much smaller, centered on FOXA1, BATF, and STAT5, Identification of Distinct TF Regulatory Networks with many TFs potentially contributing to the regulation of We next mapped the DNA binding sites of 319 TFs to genomic re- ESR1. The network derived for LPs is dominated by ELF5 and gions identified as cell-type-specific active enhancers (Figures EHF, which regulate each other, as well as EHF regulating itself. 7A and 7B). After filtering for significance (p < 106) and expres- Interestingly, the LP network contains a module common to the sion (reads per kilobase of transcript per million mapped reads BC network that involves NFIB and ETS1, and it includes five [RPKM] > 1), binding sites for 64 TFs were found to be enriched TFs present in the LC network (ATF3, BATF, FOS, FOSL1, and in BCs, 21 in LPs, and 25 in LCs (an average of 72% of the FOSL2), reinforcing the concept of LPs as an epigenomically in- H3K27ac peaks analyzed contained at least one binding site for termediate state between BCs and LCs. the TFs enriched in the corresponding cell type). This analysis re- These findings illustrate the power of complete epigenome vealed TP63 and FOXA1 to have the most prevalent binding sites profiles to reveal novel information about mechanisms that con- in BCs and LCs, respectively, consistent with their reported se- trol the transcriptional landscape of biologically complex normal lective functions in these cells (Bernardo et al., 2010; Chakrabarti adult human tissues. We anticipate the networks defined here et al., 2014). Analogously identified TFs in LPs were EHF and will be further refined by a deeper subsetting of the populations ELF5, two members of the ETS family of TFs with very similar analyzed, as has been suggested (Visvader and Stingl 2014). DNA binding motifs. ELF5 is a recently identified luminal-specific However, the fact that the present datasets have been generated TF, but a role of EHF in the mammary gland has not been from very large pools of highly purified cells isolated from six previously reported. However, EHF has a known role in the estab- donors makes it likely that they will be generally representative lishment and maintenance of epithelial cell phenotypes in other of normal breast cells from premenopausal women. tissues (Albino et al., 2012; Stephens et al., 2013). An examination of several other reported mammary datasets confirmed higher EXPERIMENTAL PROCEDURES levels of ELF5 and EHF transcripts in LPs as compared to LCs and their very low levels in BCs. Comparison of the relative prev- Breast Tissue Dissociation, Cell Sorting, and In Vitro Progenitor Assays alence of each of these four enhancer-based TF binding sites Histologically normal breast tissue was obtained with informed consent confirmed their respective cell-type specificities. from six healthy premenopausal adult female donors undergoing reduction This analysis also identified a number of other TF binding sites mammoplasty surgeries and used according to procedures approved by the uniquely prevalent in one of the three normal human mammary Research Ethics Board of the University of British Columbia. Fresh tissue

Figure 7. Distinct Transcriptional Regulatory Networks Derived for Each Mammary Epithelial Subpopulation (A) Table showing the top five TF binding sites enriched in the sequences associated with active enhancers specific to each epithelial cell type. (B) Heatmap showing the normalized enrichment value (ln(p value)/max((ln(p value))) of all TF binding sites found to be significantly enriched (p < 106) and expressed (RPKM > 1) in each epithelial cell type. (C–E) TF regulatory networks constructed for BCs (C), LPs (D), and LCs (E). Node size is proportional to the enrichment value shown in (B), and edge thickness is proportional to the gene expression correlation of the two nodes connected. Green arrows between the two genes indicate a positive correlation. Magenta lines indicate a negative correlation. See also Tables S6 and S7.

2072 Cell Reports 17, 2060–2074, November 15, 2016 was viably cryopreserved as enzymatically isolated organoids and then REFERENCES thawed and dissociated into single-cell suspensions and the four defined mammary cell types isolated by fluorescence-activated cell sorting (FACS) Albino, D., Longoni, N., Curti, L., Mello-Grand, M., Pinton, S., Civenni, G., Thal- (Figure 1A) as previously described (Kannan et al., 2014). Similar sub- mann, G., D’Ambrosio, G., Sarti, M., Sessa, F., et al. (2012). ESE3/EHF con- sets from each sort were combined in the same proportions to obtain pools trols epithelial cell differentiation and its loss leads to prostate tumors with of R50 3 106 cells for each subset. The cells were washed in PBS containing mesenchymal and stem-like features. Cancer Res. 72, 2889–2900. protease inhibitors and frozen until processed for ChIP-seq or DNA/RNA Amin, V., Harris, R.A., Onuchic, V., Jackson, A.R., Charnecki, T., Paithankar, extraction. The progenitor content of each subset was determined indepen- S., Lakshmi Subramanian, S., Riehle, K., Coarfa, C., and Milosavljevic, A. dently, as previously described (Kannan et al., 2013). (2015). Epigenomic footprints across 111 reference epigenomes reveal tis- sue-specific epigenetic regulation of lincRNAs. Nat. Commun. 6, 6370. Production of Reference Epigenomes Andersson, R., Gebhard, C., Miguel-Escalada, I., Hoof, I., Bornholdt, J., Boyd, RNA and DNA extraction and ChIP library construction and sequencing were M., Chen, Y., Zhao, X., Schmidl, C., Suzuki, T., et al.; FANTOM Consortium performed following the guidelines formulated by the IHEC (http://www. (2014). An atlas of active enhancers across human cell types and tissues. Na- ihec-epigenomes.org). These guidelines, as well as the standard operating ture 507, 455–461. procedures for RNA-seq, ChIP-seq, and whole-genome bisulfite se- quencing library construction, are available at http://www.epigenomes.ca/ Antequera, F., Boyes, J., and Bird, A. (1990). High levels of de novo methylation protocols-and-standards or by request. and altered chromatin structure at CpG islands in cell lines. Cell 62, 503–514. Bernardo, G.M., Lozada, K.L., Miedler, J.D., Harburg, G., Hewitt, S.C., Mosley, Epigenomic Data Analysis J.D., Godwin, A.K., Korach, K.S., Visvader, J.E., Kaestner, K.H., et al. (2010). Detailed methods explaining the data analysis performed for the RNA-seq, FOXA1 is an essential determinant of ERalpha expression and mammary ChIP-seq, and WGBS datasets are provided in Supplemental Experimental ductal morphogenesis. Development 137, 2045–2054. Procedures. Bernstein, B.E., Mikkelsen, T.S., Xie, X., Kamal, M., Huebert, D.J., Cuff, J., Fry, B., Meissner, A., Wernig, M., Plath, K., et al. (2006). A bivalent chromatin struc- ture marks key developmental genes in embryonic stem cells. Cell 125, ACCESSION NUMBERS 315–326. The accession numbers for the raw sequence datasets reported in this Burleigh, A., McKinney, S., Brimhall, J., Yap, D., Eirew, P., Poon, S., Ng, V., paper are EGA: EGAS00001000552 and EpIRR: IHECRE00000223, Wan, A., Prentice, L., Annab, L., et al. (2015). A co-culture genome-wide IHECRE00000228, IHECRE00000231, IHECRE00000242, IHECRE00000001, RNAi screen with mammary epithelial cells reveals transmembrane signals IHECRE00000225, and IHECRE00000227. Processed signal tracks are also required for growth and differentiation. Breast Cancer Res. 17,4. available at http://www.epigenomes.ca. Chakrabarti, R., Hwang, J., Andres Blanco, M., Wei, Y., Lukaci sin, M., Romano, R.-A., Smalley, K., Liu, S., Yang, Q., Ibrahim, T., et al. (2012). Elf5 SUPPLEMENTAL INFORMATION inhibits the epithelial-mesenchymal transition in mammary gland development and breast cancer metastasis by transcriptionally repressing Snail2. Nat. Cell Supplemental Information includes Supplemental Experimental Procedures, Biol. 14, 1212–1222. seven figures, and seven tables and can be found with this article online at Chakrabarti, R., Wei, Y., Hwang, J., Hang, X., Andres Blanco, M., Choudhury, http://dx.doi.org/10.1016/j.celrep.2016.10.058. A., Tiede, B., Romano, R.-A., DeCoste, C., Mercatali, L., et al. (2014). DNp63 promotes stem cell activity in mammary gland development and basal-like AUTHOR CONTRIBUTIONS breast cancer by enhancing Fzd7 expression and Wnt signalling. Nat. Cell Biol. 16, 1004–1015, 1–13. D.P., N.K., M.H., and C.J.E. designed the project, and C.J.E. and M.H. were Choudhury, S., Almendro, V., Merino, V.F., Wu, Z., Maruyama, R., Su, Y., Mar- jointly responsible for its overall execution and quality control. N.K. sorted tins, F.C., Fackler, M.J., Bessarabova, M., Kowalczyk, A., et al. (2013). Molec- the primary samples and performed in vitro assays. M. Moksa, M. Marra, ular profiling of human mammary gland links breast cancer risk to a p27(+) cell S.J.M.J., R.M., A.M., and S.A. oversaw the generation of sequence data under population with progenitor characteristics. Cell Stem Cell 13, 117–130. the guidance of M.H., and D.P., M.B., A.H.-M., D.J.H.F.K., S.G., and A.C. Debnath, J., Muthuswamy, S.K., and Brugge, J.S. (2003). Morphogenesis and analyzed it. D.P., M.H., and C.J.E. wrote the manuscript. All authors contrib- oncogenesis of MCF-10A mammary epithelial acini grown in three-dimen- uted to the interpretation of the results and read and approved the manuscript. sional basement membrane cultures. Methods 30, 256–268. Dos Santos, C.O., Dolzhenko, E., Hodges, E., Smith, A.D., and Hannon, G.J. ACKNOWLEDGMENTS (2015). An epigenetic memory of pregnancy in the mouse mammary gland. Cell Rep. 11, 1102–1109. The authors thank D. Wilkinson, G. Edin, and M. Hale for excellent technical support; Drs. E. Bovill, J. Boyle, S. Bristol, P. Gdalevitch, A. Seal, J. Sproul, Eirew, P., Stingl, J., Raouf, A., Turashvili, G., Aparicio, S., Emerman, J.T., and and N. van Laeken for access to discarded reduction mammoplasty tissue; Eaves, C.J. (2008). A method for quantifying normal human mammary epithe- and E. Chun and Dr. A. Gagliardi for critical reading of this manuscript. This lial stem cells with in vivo regenerative ability. Nat. Med. 14, 1384–1389. work was supported by the Canadian Cancer Society Research Institute (grants Ernst, J., and Kellis, M. (2012). ChromHMM: automating chromatin-state dis- 702294 and 702851) and by Genome British Columbia and the Canadian Insti- covery and characterization. Nat. Methods 9, 215–216. tutes of Health Research as part of the Canadian , Environment Fridriksdottir, A.J., Kim, J., Villadsen, R., Klitgaard, M.C., Hopkinson, B.M., Pe- and Health Research Consortium Network (CIHR-262119). N.K. was supported tersen, O.W., and Rønnov-Jessen, L. (2015). Propagation of oestrogen recep- by a MITACS Elevate Fellowship (application ref. IT02817). D.J.H.F.K. received a tor-positive and oestrogen-responsive normal human breast cells in culture. Vanier Canada Graduate Scholarship from CIHR (CGV-121184). S.A. is sup- Nat. Commun. 6, 8786. ported by a Canada Research Chair in Molecular Oncology. Fu, N., Lindeman, G.J., and Visvader, J.E. (2014). The mammary stem cell Received: January 14, 2016 hierarchy. Curr. Top. Dev. Biol. 107, 133–160. Revised: August 10, 2016 Gascard, P., Bilenky, M., Sigaroudinia, M., Zhao, J., Li, L., Carles, A., Delaney, Accepted: September 30, 2016 A., Tam, A., Kamoh, B., Cho, S., et al. (2015). Epigenetic and transcriptional de- Published: November 15, 2016 terminants of the human breast. Nat. Commun. 6, 6351.

Cell Reports 17, 2060–2074, November 15, 2016 2073 Huh, S.J., Clement, K., Jee, D., Merlini, A., Choudhury, S., Maruyama, R., Yoo, complex clonal dynamics of de novo transformed human mammary cells. Na- R., Chytil, A., Boyle, P., Ran, F.A., et al. (2015). Age- and pregnancy-associ- ture 528, 267–271. ated DNA methylation changes in mammary epithelial cells. Stem Cell Reports Pal, B., Bouras, T., Shi, W., Vaillant, F., Sheridan, J.M., Fu, N., Breslin, K., 4, 297–311. Jiang, K., Ritchie, M.E., Young, M., et al. (2013). Global changes in the mam- Kannan, N., Huda, N., Tu, L., Droumeva, R., Aubert, G., Chavez, E., Brinkman, mary epigenome are induced by hormonal cues and coordinated by Ezh2. Cell R.R., Lansdorp, P., Emerman, J., Abe, S., et al. (2013). The luminal progenitor Rep. 3, 411–426. compartment of the normal human mammary gland constitutes a unique site Raouf, A., Zhao, Y., To, K., Stingl, J., Delaney, A., Barbara, M., Iscove, N., of telomere dysfunction. Stem Cell Reports 1, 28–37. Jones, S., McKinney, S., Emerman, J., et al. (2008). Transcriptome analysis Kannan, N., Nguyen, L.V., Makarem, M., Dong, Y., Shih, K., Eirew, P., Raouf, of the normal human mammary cell commitment and differentiation process. A., Emerman, J.T., and Eaves, C.J. (2014). Glutathione-dependent and -inde- Cell Stem Cell 3, 109–118. pendent oxidative stress-control mechanisms distinguish normal human mammary epithelial cell subsets. Proc. Natl. Acad. Sci. USA 111, 7789–7794. Roadmap Epigenomics Consortium; Kundaje, A., Meuleman, W., Ernst, J., Bi- lenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., Kuperwasser, C., Chavarria, T., Wu, M., Magrane, G., Gray, J.W., Carey, L., Ziller, M.J., et al. (2015). Integrative analysis of 111 reference human epige- Richardson, A., and Weinberg, R.A. (2004). Reconstruction of functionally nomes. Nature 518, 317–330. normal and malignant human breast tissues in mice. Proc. Natl. Acad. Sci. USA 101, 4966–4971. Shin, H.Y., Willi, M., Yoo, K.H., Zeng, X., Wang, C., Metser, G., and Hennigh- ausen, L. (2016). Hierarchy within the mammary STAT5-driven Wap super- Lim, E., Vaillant, F., Wu, D., Forrest, N.C., Pal, B., Hart, A.H., Asselin-Labat, enhancer. Nat. Genet. 48, 904–911. M.-L., Gyorki, D.E., Ward, T., Partanen, A., et al.; kConFab (2009). Aberrant luminal progenitors as the candidate target population for basal tumor devel- Stadler, M.B., Murr, R., Burger, L., Ivanek, R., Lienert, F., Scho¨ ler, A., van Nim- opment in BRCA1 mutation carriers. Nat. Med. 15, 907–913. wegen, E., Wirbelauer, C., Oakeley, E.J., Gaidatzis, D., et al. (2011). DNA-bind- Lim, E., Wu, D., Pal, B., Bouras, T., Asselin-Labat, M.-L., Vaillant, F., Yagita, H., ing factors shape the mouse methylome at distal regulatory regions. Nature Lindeman, G.J., Smyth, G.K., and Visvader, J.E. (2010). Transcriptome ana- 480, 490–495. lyses of mouse and human mammary cell subpopulations reveal multiple Stephens, D.N., Klein, R.H., Salmans, M.L., Gordon, W., Ho, H., and Andersen, conserved genes and pathways. Breast Cancer Res. 12, R21. B. (2013). The Ets transcription factor EHF as a regulator of cornea epithelial Makarem, M., Kannan, N., Nguyen, L.V., Knapp, D.J.H.F., Balani, S., Prater, cell identity. J. Biol. Chem. 288, 34304–34324. M.D., Stingl, J., Raouf, A., Nemirovsky, O., Eirew, P., and Eaves, C.J. (2013). Theodorou, V., Stark, R., Menon, S., and Carroll, J.S. (2013). GATA3 acts up- Developmental changes in the in vitro activated regenerative activity of prim- stream of FOXA1 in mediating ESR1 binding by shaping enhancer accessi- itive mammary epithelial cells. PLoS Biol. 11, e1001630. bility. Genome Res. 23, 12–22. Maruyama, R., Choudhury, S., Kowalczyk, A., Bessarabova, M., Beresford- Visel, A., Minovitsky, S., Dubchak, I., and Pennacchio, L.A. (2007). VISTA Smith, B., Conway, T., Kaspi, A., Wu, Z., Nikolskaya, T., Merino, V.F., et al. Enhancer Browser–a database of tissue-specific human enhancers. Nucleic (2011). Epigenetic regulation of cell type-specific expression patterns in the Acids Res. 35, D88–D92. human mammary epithelium. PLoS Genet. 7, e1001369. Visvader, J.E., and Lindeman, G.J. (2012). Cancer stem cells: current status Nguyen, L.V., Makarem, M., Carles, A., Moksa, M., Kannan, N., Pandoh, P., and evolving complexities. Cell Stem Cell 10, 717–728. Eirew, P., Osako, T., Kardel, M., Cheung, A.M.S., et al. (2014). Clonal analysis via barcoding reveals diverse growth and differentiation of transplanted mouse Visvader, J.E., and Stingl, J. (2014). Mammary stem cells and the differentia- and human mammary stem cells. Cell Stem Cell 14, 253–263. tion hierarchy: current status and perspectives. Genes Dev. 28, 1143–1158. Nguyen, L.V., Pellacani, D., Lefort, S., Kannan, N., Osako, T., Makarem, M., Voigt, P., Tee, W.-W., and Reinberg, D. (2013). A double take on bivalent Cox, C.L., Kennedy, W., Beer, P., Carles, A., et al. (2015). Barcoding reveals promoters. Genes Dev. 27, 1318–1338.

2074 Cell Reports 17, 2060–2074, November 15, 2016