Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

Enhancer Transcripts Mark Active Estrogen Binding Sites

Running Title: eRNAs at Binding Sites

Nasun Hah1, 2, 6, Shino Murakami3, 4, Anusha Nagari3, Charles Danko1, 5, W. Lee Kraus1, 2, 3, 4, 7

1 Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853. 2 Graduate Field of Biochemistry, Molecular and Cell Biology, Cornell University, Ithaca, NY 14853. 3 Cecil H. and Ida Green Center for Reproductive Biology Sciences and Division of Basic Reproductive Biology Research, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, TX, 75390. 4 Graduate School of Biomedical Sciences, Program in Integrative Biology, University of Texas Southwestern Medical Center, Dallas, TX, 75390. 5 Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853.

6 Current address: Expression Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037.

7 Address correspondence to: W. Lee Kraus, Ph.D. Cecil H. and Ida Green Center for Reproductive Biology Sciences The University of Texas Southwestern Medical Center at Dallas 5323 Harry Hines Boulevard Dallas, TX 75390-8511

Phone: 214-648-2388 Fax: 214-648-0383 E-mail: [email protected]

Keywords: Estrogen, ESR1, , Estrogen receptor binding site (ERBS), Enhancer, Enhancer RNA (eRNA), GRO-seq, Transcription

1 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

Abstract

We have integrated and analyzed a large number of data sets from a variety of genomic assays (e.g., GRO-seq, ChIP-seq, DNase-seq, ChIA-PET) using a novel computational pipeline to provide a global view of estrogen receptor 1 (ESR1; a.k.a. ERα) enhancers in MCF-7 human breast cancer cells. Using this approach, we have defined a class of primary transcripts (eRNAs) that are transcribed uni- or bidirectionally from estrogen receptor binding sites (ERBSs) with an average transcription unit length of ~3 to 5 kb. The majority are upregulated by short treatments with estradiol (i.e., 10, 25, or 40 min.) with kinetics that generally precede or match the induction of the target . The production of eRNAs at ERBSs is strongly correlated with the enrichment of a number of genomic features that have been shown to be associated with enhancers (e.g., H3K4me1, H3K27ac, EP300/CREBBP, RNA polymerase II, open chromatin architecture), as well as enhancer looping to target gene promoters. In the absence of eRNA production, strong enrichment of these features is not observed, even though ESR1 binding is evident. We find that flavopiridol, a CDK9 inhibitor that blocks transcription elongation, inhibits eRNA production, but does not affect other molecular indicators of enhancer activity

(e.g., RNA pol II binding, H3K4me1 levels, enhancer looping). These results indicate that the assembly of enhancer complexes can be dissociated from eRNA production, suggesting that eRNA production occurs after the assembly of active enhancers. Finally, we show that an enhancer transcription “signature” based on GRO-seq data can be used for de novo enhancer prediction across cell types. Together, our studies shed new light on the activity of ESR1 at its enhancer sites and provide new insights about enhancer function in general.

2 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

Introduction

Enhancers are genomic regulatory elements that (1) carry sequence information for binding, (2) may be located far from TSSs, (3) regulate regardless of location and orientation, and (4) play key roles in controlling tissue-specific gene expression (Bulger and Groudine 2011; Ong and Corces 2011). Current models posit that enhancers function by promoting communication with target gene promoters through chromatin loops or by tracking of enhancer-bound transcription factors through intervening chromatin to target gene promoters (Bulger and Groudine 2011; Ong and Corces 2011; Kolovos et al. 2012).

Recent studies have focused intense interest on the properties of enhancers, beyond the binding of sequence-specific transcription factors, which might give clues to their mechanisms of action and aid in their identification. In this regard, histone modifications (e.g., H3 lysine 4 monomethyl, H3K4me1; H3 lysine 27 acetyl, H3K27ac), histone variants (e.g., H2A.Z), coactivators (e.g., EP300, CREBBP, Mediator), and an open chromatin architecture (e.g., DNase

I hypersensitivity) have been identified as genomic features that mark or identify enhancers

(Melgar et al. 2011; Natoli and Andrau 2012). Differential association of these features with enhancers in a given cell may define distinct classes of enhancers that specify distinct gene regulatory mechanisms and biological outcomes (Creyghton et al. 2010; Ghisletti et al. 2010;

Rada-Iglesias et al. 2011; Wang et al. 2011; Zentner et al. 2011; Pham et al. 2012; Rada-Iglesias et al. 2012; Shen et al. 2012; Vahedi et al. 2012; Whyte et al. 2012; Xu et al. 2012; Ostuni et al.

2013). Enhancer profiles may even provide useful clinical signatures for cancer diagnosis and prognosis (Akhtar-Zaidi et al. 2012; Ross-Innes et al. 2012).

More recently, a number of studies have shown that many enhancers overlap with sites of

RNA pol II loading, active RNA pol II transcription, and the production of enhancer RNAs

3 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

(“eRNAs”) (De Santa et al. 2010; Kim et al. 2010; Hah et al. 2011; Wang et al. 2011; Djebali et al. 2012). A common signature of enhancer transcription is the production of short (i.e., ~1 to 2 kb) eRNAs that are transcribed bidirectionally (Kim et al. 2010). We and others have recently shown that the genomic binding sites for the estrogen receptor (ESR1) and other steroid hormone receptors overlap with sites of transcription (Hah et al. 2011; Wang et al. 2011). The role of transcription in enhancer function is unknown, but the act of transcription may help to create an open chromatin environment that promotes enhancer function (Natoli and Andrau 2012).

Alternatively, the stable accumulation of eRNAs may play a functional, perhaps even structural, role and may facilitate gene looping (Orom et al. 2010; Orom and Shiekhattar 2011; Natoli and

Andrau 2012; Lai et al. 2013; Melo et al. 2013).

In the studies described herein, we used Global Run-On Sequencing (GRO-seq), a method that assays the location and orientation of all active RNA polymerases genome-wide

(Core et al. 2008), to generate a global profile of active transcription at ESR1 binding sites

(ERBSs) in MCF-7 human breast cancer cells in response to a short time course of E2 treatment.

We integrated the data from our GRO-seq assays with data from a variety of other genomic assays (e.g., ChIP-seq, DNase-seq, ChIA-PET) using a novel computational pipeline to provide a comprehensive and global view of ESR1 enhancers and their regulation by E2 in MCF-7 cells.

Together, our studies have shed new light on the activity of ESR1 at its enhancer sites and provide new insights about enhancer function in general, including the potential roles of enhancer transcription.

4 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

Results

ESR1 enhancers are sites of estrogen-induced transcription

In a previous study using GRO-seq to characterize the estrogen-regulated transcriptome in MCF-7 cells, we identified hundreds of transcribed regions in the genome generating primary transcripts that overlap estrogen receptor binding sites (ERBSs) (Hah et al. 2011). In this paper, we have undertaken a comprehensive identification and analysis of ESR1 enhancer transcription in MCF-7 human breast cancer cells by integrating a wide array of genomic data sets with locus- specific molecular analyses. Multiple examples of transcribed ESR1 enhancers located upstream of estrogen-regulated target genes are shown in browser track representation in Fig. 1A and

Supplemental Figs. 1 and 2. These include an ERBS, which we refer to as ERBS1, located ~20 kb upstream of the promoter of the P2RY2 gene, as well as these additional enhancer/gene pairs:

ERBS2/GREB1, ERBS3/SBNO2, ERBS4/SMAD7, and ERBS5/PGR. We also include as a negative control an ERBS (ERBS6) that does not produce enhancer transcripts. As shown in the

GRO-seq browser tracks in Fig. 1A, transcription of the P2RY2 gene and a region around ERBS1 is upregulated rapidly in a short time course of treatment with 17β-estradiol. The transcripts from ERBS1 (Fig. 1A), as well as ERBSs 2 through 5 (Supplemental Fig. 2), are produced bidirectionally from both strands of DNA, reminiscent of the enhancer RNAs (eRNAs) described previously (Kim et al. 2010), and the transcribed regions are associated with RNA pol II and previously identified transcription start sites (TSSs) (Yamashita et al. 2011).

As expected, these ERBSs are also associated with previously characterized enhancer features, including the pioneer transcription factor FOXA1, histone H3 lysine 4 monomethylation (H3K4me1), the histone acetyltransferases EP300 and CREBBP (a.k.a. p300 and CBP), p160 steroid receptor coactivator (NCOA1, 2, and 3), and an open chromatin

5 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013 architecture as defined by DNase-seq and formaldehyde-assisted isolation of regulatory elements

(FAIRE)-seq (Fig. 1A and Supplemental Fig. 2). Like the ERBS eRNAs, many of these enhancer features are induced by treatment with E2. These ERBSs are also involved in chromatin looping events that promote physical interactions with their target genes, as defined by chromatin interaction analysis by paired-end tag sequencing (ChIA-PET; Fullwood et al. 2009)

(Fig. 1A and Supplemental Fig. 2). Locus-specific molecular assays, including ChIP-qPCR, RT- qPCR, and 3C-PCR, confirm the localization of RNA pol II, H3K4me1, and H3K27ac at these

ERBSs (Fig. 1, B and C; Supplemental Figs. 3A, 3B, 4A through 4D, and 5), as well as the steady-state production of the eRNAs (Fig. 1D; Supplemental Figs. 3C and 6A through 6E) and enhancer looping events (Fig. 1E; Supplemental Figs. 3D and 3E) as the genes are induced by

E2. Interestingly, an ERBS not associated with eRNA production (ERBS6) lacks many of the enhancer features described above (Supplemental Figs. 1, 2, 4D, 5, and 6F). Collectively, these genomic and locus-specific analyses illustrate clearly the range of features associated with ESR1 enhancers, including the production of eRNAs.

Global identification and characterization of estrogen-regulated ESR1 enhancer transcripts

To obtain a global view of ESR1 enhancer transcripts, we developed a computational pipeline to identify eRNAs overlapping with ERBSs using GRO-seq data (Fig. 2A). Starting from a set of all ERBSs (~10,000) defined previously (Welboren et al. 2009), we narrowed the list to those that are intergenic (i.e., >10 kb away from the beginning or end of an annotated

RefSeq gene; ~3000), which allowed us to avoid ambiguities caused by eRNAs overlapping annotated transcribed regions. We then separated them in to those that have (~1,500) and those

6 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013 that do not have (~1,500) an overlapping transcript, as defined by GRO-seq. Next, we classified those with overlapping transcripts based on the production of transcripts from both strands of

DNA (“Paired”; 715) or from one strand of DNA (“Unpaired”; 882). Finally, we applied a length filter for the transcription unit/primary transcript, defining those <9 kb as “short” and those ≥9 kb as “long” to exclude those that might overlap with annotated promoters (Fig. 2, B and C). For the purposes of the remaining studies shown herein, we focused on two classes of enhancer transcripts: (1) short unpaired (S-U) and (2) short-short paired (S-S), which we call eRNAs. Many of the transcripts in the remaining classes (i.e., long unpaired, long-short paired, and long-long paired) do not resemble eRNAs as previously described (Kim et al. 2010) and are likely to represent other types of non-coding RNAs.

Production of the short unpaired and short-short paired ESR1 enhancer transcripts is regulated by E2 over a short time course of treatment (0, 10, 40, and 160 min.) (Fig. 2D). About

70 percent of the transcripts are E2 upregulated, with maximum effects for most upregulated transcripts occurring at 40 min. and for most downregulated transcripts occurring at 160 min.

(Fig. 2D). The upregulation is evident in metaplots of the GRO-seq data (Fig. 2E) and corresponds to the levels of RNA pol II at the ERBS, as expected (Fig. 2, F and G). For the paired eRNAs, the extent of transcription from each strand is highly correlated with the other

(Pearson’s R = 0.7470408, p-value < 2.2 x 10-16; Supplemental Fig. 7), suggesting a common regulatory mechanism. Unbiased motif analyses indicate that ERBS with or without transcripts are highly enriched for predicted estrogen response elements and share some other motifs in common as well (e.g., SP1), which may function as tethering factors for ESR1 (Supplemental

Fig. 8).

7 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

The production of eRNAs from ERBSs positively correlates with a wide variety of enhancer features

To better understand how the production of eRNAs from ERBSs may relate to enhancer function, we mined a large number of existing genomic data sets from MCF-7 cells (see

Supplemental Table 1). Although all three classes of ERBSs that we examined (i.e., those with

S-S paired eRNAs, short unpaired eRNAs, and no eRNAs) have similar mean and median levels of ESR1 binding by ChIP-seq (<2-fold difference; Fig. 3A), considerable differences were observed among these groups with respect to other enhancer properties. For example, ERBSs producing short bidirectional eRNAs (i.e., S-S paired transcripts) have significantly higher mean and median levels of pioneer factors (e.g., FOXA1; TFAP2C, a.k.a. AP2γ), ESR1 coregulators

(e.g., CREBBP, EP300, NCOA1, NCOA2, and NCOA3), enhancer histone modifications (e.g.,

H3K4me1), as well as the most accessible chromatin structures (defined by DNase-seq and

FAIRE-seq), than ERBS producing no transcripts (Wilcoxon rank sum test, p-value < 0.001; Fig.

3; Supplemental Fig. 9). Moderate differences in the levels of ESR1 binding across the three classes of ERBSs cannot account for the differences in transcript production, as ERBS without transcripts have disproportionately lower levels of GRO-seq signals than ERBS with transcripts

(Supplemental Fig. 10A). Furthermore, ERBSs selected for similar levels of ESR1 binding show significant differences in the enrichment of enhancer features (Wilcoxon rank sum test, p-value <

0.001; Supplemental Fig. 10B). Together, these data indicate that the production of eRNAs at

ERBS correlates with properties that are generally associated with active enhancers.

The production of eRNAs from ERBSs positively correlates with enhancer looping to target genes

8 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

Models of enhancer function implicate looping to target gene promoters as a key mechanistic step controlling gene expression. Indeed, ESR1 enhancers participate in extensive looping interactions (Fullwood et al. 2009), and estrogen-dependent looping from distal ERBSs correlates with estrogen-dependent target gene activation (Carroll et al. 2005; Pan et al. 2008)

(Fig. 1, D and E). To determine how the production of eRNAs from ERBSs might relate to chromatin looping, we mined an existing ESR1 ChIA-PET data set (Fullwood et al. 2009), which maps ESR1-dependent looping on a genome-wide basis, for looping events between an intergenic ERBS and a target gene promoter. Specifically, we assayed for loops originating within a 2 kb window around the peak center of ERBSs with transcripts (i.e., either S-S paired or

S-U) and looping to a 10 kb window around the TSSs of target genes (Fig. 4A). We then integrated the looping information with enhancer transcription data from our GRO-seq analyses, producing metaplots of GRO-seq reads for ERBS with or without loops. We observed that

~40% of ERBSs with an eRNA are involved in looping events versus ~20% of ERBSs without an eRNA. This represents a 2-fold enrichment (p < 0.0001), indicating a greater likelihood of looping when the enhancer is transcribed.

We also observed that ERBSs looping to target gene promoters have significantly greater mean and median levels of enhancer transcription and RNA pol II loading than those ERBSs that do not loop to a promoter (Wilcoxon rank sum test, p-value < 0.001; Fig. 4B; Supplemental Fig.

11A). In addition, we assayed ~1300 E2-upregulated target genes and found that those that are looped to from an intergenic ERBS have elevated levels of transcription, as defined by GRO-seq, compared to those that are not looped to from an intergenic ERBS (Fig. 4, C and D;

Supplemental Fig. 12). Further integration with additional genomic data indicates that E2- dependent production of eRNAs from ERBSs also positively correlates in a significant manner

9 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013 with coregulator recruitment and an open chromatin structure, but not the levels of histone methylation, at the enhancers (Wilcoxon rank sum test, p-value < 0.001; Fig. 4, E and F;

Supplemental Fig. 11, B through G). RAD21, a component of the cohesin complex, which is thought to facilitate gene looping (Kagey et al. 2010), is enriched at ERBSs that loop to target genes (Supplemental Fig. 13A). siRNA-mediated knockdown of RAD21 results in an ~2 to 3- fold increase in basal eRNA production, as well as a modest (~25% to 50%) reduction in E2- induced eRNA production at four ERBSs that we examined in gene-specific assays

(Supplemental Fig. 13, B and C). These results suggest a functional link between E2-dependent production of eRNAs at ERBSs, enhancer looping, and target gene activation.

Inhibition of eRNA production by flavopiridol does not inhibit enhancer complex assembly or looping to target gene promoters

Our results have shown that the production of eRNAs from ERBSs correlates with many indicators of “active” enhancers (e.g., RNA pol II, coregulators, H3K4me1, looping to target gene promoters), but the precise role of eRNAs in enhancer function are unknown. To address the role of E2-induced eRNAs in ESR1 enhancer function, we used the small molecule drug flavopiridol (FP), an inhibitor of the CDK9 kinase of the P-TEFb complex (Chao et al. 2000), to block enhancer transcription and the stable, steady-state accumulation of eRNAs originating from ERBSs. MCF-7 cells were pretreated with FP for 1 hour prior to treatment with E2. In locus-specific assays using RT-qPCR, FP efficiently blocked the production and accumulation of eRNAs from all five of the ERBSs that we examined, as well as mRNAs from nearby E2- regulated target genes looped to from these enhancers (Fig. 5, A and B; Supplemental Fig. 6).

10 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

This experimental system gave us the opportunity to examine the assembly of ESR1 enhancer complexes in the absence of eRNAs.

Treatment of MCF-7 cells with FP did not affect the E2-dependent binding of ESR1 or loading of RNA pol II at the ERBSs (Fig. 5, C and D; Supplemental Fig. 14, A through D), which occurred normally in the presence of the drug. Likewise, treatment with FP did not dramatically affect the recruitment of coregulators to the ERBSs (Fig. 5, E and F; Supplemental

Fig. 15) or the levels of H3K4me1 and H3K27ac at the ERBSs (Fig. 5, G and H; Supplemental

Fig. 14, E through H). Thus, although the production of eRNAs correlates well with markers of active enhancers, the production and stable accumulation of eRNAs are not required for the assembly of ESR1 enhancer complexes at ERBSs. Furthermore, treatment with FP did not affect

E2-dependent looping between ERBSs and E2-regulated target genes (Fig. 5, I and J), indicating that the production and stable accumulation of eRNAs are not required for chromatin looping, at least under the conditions that we tested herein.

eRNAs defined by GRO-seq can be used to predict enhancers

As clearly illustrated in the data shown above, the production of eRNAs from ERBSs tracks very well with known features of active enhancers. As such, we reasoned that eRNA production can be used in the absence of any other genomic information to identify active enhancers de novo. We tested this hypothesis in the series of experiments described below.

First, we used a directed approach to determine whether any known motifs for sequence- specific DNA binding transcription factors are enriched in GRO-seq reads in MCF-7 cells. To this end, we mapped all occurrences of 78 transcription factor motifs using position weight matrices (PWMs) from the JASPAR database (Bryne et al. 2008) to the using

11 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

FIMO (Grant et al. 2011), filtering for those located in intergenic regions (i.e., >10 kb away from the start or end of an annotated gene) (Fig. 6A). Seventy-five out of the 78 motifs overlapped at least one S-S paired or S-U transcript originating within a 2 kb window (± 1 kb) around the center of the motif. Then, for each occurrence of each motif, we collected all GRO-seq reads within a 1 kb window (± 0.5 kb) around the center of the motif and normalized them to the total number of occurrences of the motif. As expected, this analysis identified motifs for estrogen receptors (e.g., ESR1, ESR2) as dramatically enriched (i.e., well above the median) in GRO-seq reads in the E2-treated condition (Fig. 6, B and C).

Other mapped motifs (e.g., GABPA, , EGR1, PAX5, MYCN, and SP1) also showed enrichment in GRO-seq reads in patterns that were either constitutive or repressed by E2

(Fig. 6, B and C; Supplemental Fig. 16). In addition, we observed that some mapped motifs

(e.g., SPZ1) showed no enrichment of GRO-seq reads, which served as a useful control in this experiment. These motifs show patterns of RNA pol II, H3K4me1, and CREBBP based on

ChIP-seq data that correspond well with the GRO-seq results (Fig. 6D; Supplemental Fig. 17).

Furthermore, ChIP-qPCR assays confirm the binding of GABPA, KLF4, and EGR1 at their cognate predicted enhancers, but not control regions lacking the respective motifs (Fig. 6, E and

F; Supplemental Fig. 18). Together, the results from the directed search suggest that enhancers for a variety of transcription factors are active in MCF-7 cells and that GRO-seq can be used to identify them.

Second, we used an unbiased search for eRNA “signatures” to identify active enhancers in MCF-7 cells in the absence of any transcription factor binding or motif information.

Specifically, we searched the called transcripts from our MCF-7 GRO-seq data for short, divergent, overlapping, intergenic eRNA pairs that fit the average signature obtained from

12 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

ERBSs [see Fig. 2B(c)], limiting our search to S-S paired transcripts to make it as stringent as possible (Fig. 7A). We identified 771 occurrences that fit the criteria, which we further divided based on the presence of an ERBS within 1 kb of the center of the peak (202 with an ERBS and

569 without an ERBS). Given the stringency of our search criteria, this is unlikely to be a comprehensive set, but it demonstrates proof-of-principle. The putative enhancers that we identified without an ERBS are of particular interest, since they represent an unbiased identification that does not rely on transcription factor binding data. Interestingly, the transcription of many enhancers in this group is regulated by E2 (33% are downregulated and

13% are upregulated) as indicated by our GRO-seq data (Figs. 7B, top left, and 7C), even though they lack ESR1 binding, as determined by ChIP-seq. These results suggest that the estrogen signaling pathway can impact enhancers that lack ESR1 binding. The patterns of E2-dependent regulation of transcription for these enhancers mirror the those of the nearest neighbor genes

(Supplemental Fig. 19), as might be expected if there is a functional link between distal enhancers and nearby target genes.

We used the set of putative enhancers without an ERBS to align other genomic data to the center of the divergent eRNA overlap, which revealed features that would be expected of active enhancers, including an enrichment of RNA Pol II, CREBBP, and H3K4me1, as well as an open chromatin architecture, as determined by DNase-seq (Fig. 7B). In spite of the fact that the transcription of many enhancers in this group is dramatically reduced by E2 treatment (Fig.

7B, top left, and Fig. 7C), the other enhancer properties that we examined were largely unaffected (Fig. 7, B and C). This result mirrors what we observed with the ESR1 enhancers in the presence of FP, namely a preservation of enhancer features in spite of inhibited transcription.

13 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

To determine which transcription factors might underlie the predicted enhancers identified based on the called transcripts from GRO-seq, we performed de novo motif analyses on a 1 kb region around the center of the eRNA overlap using MEME (Bailey et al. 2009). The predicted motifs from MEME were then matched to known motifs using STAMP (Parks and

Beiko 2010). As expected, the predicted enhancers with an ERBS were enriched for EREs, whereas those without an ERBS were not (Fig. 7D and data not shown). The predicted enhancers without an ERBS were enriched in motifs for transcription factors such as SP1, KLF4,

Bicoid-related factors, and EGR1 (Fig. 7D). These same motifs were also enriched in our directed search (Fig. 6C; Supplemental Fig. 16), giving more confidence to this approach.

To determine how enhancer prediction using GRO-seq compares to enhancer prediction using H3K4me1 or H3K27ac ChIP-seq, we called enhancers using all three parameters under basal (i.e., –E2) conditions. We found that a greater percentage of enhancers called by GRO-seq have Pol II-dependent promoter looping events than those called by H3K4me1 or H3K27ac

(Supplemental Fig. 20, A and B). In addition, enhancers called by GRO-seq have a greater enrichment of DNase-seq and CREBBP ChIP-seq signals than those called by H3K4me1 or

H3K27ac (Supplemental Fig. 20C). Furthermore, DNase-seq and CREBBP ChIP-seq signals at enhancers called by eRNAs (and to a lesser extent H3K4me1) track better with Pol II looping than those called by H3K27ac (Supplemental Fig. 20D). Collectively, these results indicate that

(1) enhancers called by GRO-seq are “active” and (2) the presence of eRNAs may more accurately predict active enhancers than H3K4me1 or H3K27ac. Finally, application of our

GRO-seq-based enhancer prediction pipeline to data from mouse embryonic stem cells

(Supplemental Fig. 21) further demonstrates the power of using GRO-seq to perform unbiased enhance prediction in the absence of other genomic information.

14 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

Discussion

In the studies described herein, we integrated and analyzed a large number of genomic data sets using a novel computational pipeline to provide a comprehensive and global view of

ESR1 enhancers in the MCF-7 human breast cancer cell line. Our results indicate that intergenic

ERBSs produce enhancer transcripts that are similar to eRNAs described previously (Kim et al.

2010) and have the following properties: (1) they originate from one or both strands of DNA, with an average transcription unit (i.e., primary transcript) length of ~3 to 5 kb, (2) the majority are upregulated by short treatments with E2 (i.e., 10, 25, or 40 min.) with kinetics that, in many cases, precede or match the induction of the target gene, and (3) they are detectable by RT-qPCR using either random hexamer or oligo(dT) primers for RT, suggesting that they may be polyadenylated, but perhaps minimally so since they do not give strong signals in poly(A) RNA- seq data sets from MCF-7 cells (data not shown).

Relation of eRNA production to features of active enhancers

The enrichment of a number of genomic features have been proposed to be marks of active enhancers, including H3K4me1, H3K27ac, EP300/CREBBP, RNA pol II, and an open chromatin architecture (reviewed in (Maston et al. 2012; Natoli and Andrau 2012)). Our results indicate that the production of eRNAs at ERBSs strongly correlates with the enrichment of these features and, in fact, may be a good indicator of active enhancers. In the absence of eRNA production, strong enrichment of these features is not observed, even though ESR1 binding is evident. ERBSs producing eRNAs are also enriched for enhancer-promoter loops. The formation of such loops is thought to play an important role in the communication between signal-dependent enhancers and their regulated target genes (Fullwood et al. 2009; Bulger and

15 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

Groudine 2011). Indeed, target genes that are looped to by an intergenic ERBSs show greater levels of transcription than those that are not looped to by an intergenic ERBS. Interestingly, the enrichment of coregulators (e.g., CREBBP, EP300, NCOAs) correlates well with enhancer looping, whereas the enrichment of H3K4me1, H3K4me2, and H3K4me3 does not. Thus, histone H3K4 methylation at enhancers is not a good predictor of ERBS looping to target gene promoters, even though it clearly marks ERBSs.

Our results indicate that although ESR1 may bind to some enhancer “hot spots” (i.e., genomic locations with pre-opened chromatin where a number of different sequence-specific

DNA-binding transcription factors bind; Li et al. 2011), ESR1 can also bind to genomic locations with closed chromatin, inducing chromatin opening in an E2-dependent manner. This is suggested by the decrease in H3K4me1 levels upon ESR1 binding (Fig. 3E), is evident from our analysis of the DNase-seq data (Fig. 3G), and is consistent with previous reports for ESR1 action at enhancers (He et al. 2012). In this regard, we observed a unimodal enrichment of K3K4me1,

2, and 3 at ERBSs, which colocalizes with the site of ESR1 binding (Fig. 3, E and F;

Supplemental Fig. 8E), in a region thought to be occupied by a nucleosome that is remodeled upon ESR1 binding (Wang et al. 2011; He et al. 2012). In contrast, studies of other transcription factors have revealed a bimodal enrichment pattern with transcription factor binding in the middle, in a region thought to be nucleosome-free (He et al. 2010; Kim et al. 2010; Wang et al.

2011; He et al. 2012). Nucleosome remodeling events at steroid receptor enhancers are a common theme, as illustrated by ESR1, , and , although the precise mechanisms of binding and nucleosome rearrangements may differ (He et al. 2010;

John et al. 2011; He et al. 2012).

16 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

The assembly of enhancer complexes can be dissociated from eRNA production

The function of enhancer transcription and the stable, steady-state accumulation of eRNAs are unknown. Some have suggested that the act of transcription helps to create an open chromatin environment that promote enhancer function, while others have suggested that the stable accumulation of eRNAs may play a functional, perhaps even structural, role (Bulger and

Groudine 2011; Orom and Shiekhattar 2011; Maston et al. 2012; Natoli and Andrau 2012). Two aspects of our results have shed some light on these questions, as well as the order of operations at signal-regulated enhancers. First, using the drug FP, we showed that many features of enhancers, including the assembly of enhancer complexes and the modification of histones, can be dissociated from eRNA production. FP blocks transcription elongation, but not preinitiation complex assembly or transcription initiation, so it targets a specific aspect of enhancer function.

As we observed, FP efficiently blocks enhancer transcription and the stable, steady-state accumulation of eRNAs at ERBSs, but had no effect on any of the E2-dependent enhancer features that we examined, including enhancer-promoter looping. Second, in our unbiased search for enhancers, we identified a large set of putative enhancers actively transcribed, but lacking ERBSs. Transcription of these enhancers was dramatically reduced by treatment with

E2, in many cases being completely abrogated. However, as observed in the experiments with

FP, inhibition of the transcription of these enhancers had, in most cases, little or no effect on the enhancer features that we monitored. Together, these results clearly show that the assembly of enhancer complexes can be dissociated from eRNA production, suggesting that eRNA production occurs after the assembly of active enhancers. These results are consistent with previous results showing that eRNAs are not expressed, despite normal levels of RNA pol II loading at enhancers, when the cognate promoter is removed (Kim et al. 2010). They do not,

17 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013 however, suggest that eRNA production is unnecessary for enhancer function or target gene activation. In fact, FP also inhibits target gene activation, so the potential role of eRNA production in that aspect of enhancer function could not be assayed in our studies. Further studies will be required to resolve these issues.

Unbiased prediction of enhancers based on eRNA production

Our genomic analyses have defined an enhancer transcription “signature” based on GRO- seq data (i.e., short, bi-directional transcription units) that can be used for de novo enhancer prediction. Using GRO-seq-defined transcription and the enhancer transcription signature, one can query any cell for active enhancers in the absence of any histone modification or transcription factor binding information from ChIP-seq, as we did for MCF-7 cells and mES cells. The power of this approach is the ability to use a single genomic data set (i.e., GRO-seq) for each cell type, which would allow profiling of enhancers across a large collection of cell lines or tissues. Then, using transcription factor binding site motif analyses, one could work backwards to determine which transcription factors may underlie those enhancers (see Fig. 7D, for example), which after selection could be assayed by ChIP-seq. Collectively, our results indicate that (1) enhancers called by GRO-seq are “active” and (2) the presence of eRNAs may more accurately predict active enhancers than other commonly used enhancer marks.

18 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

Methods

Additional details about all of the methods listed below, information about the antibodies used, and additional methods for the analysis of GRO-seq data and other genomic data can be found in the supplemental materials.

Cell culture and treatments

MCF-7 cells were plated for experiments in phenol red-free MEM (Sigma) supplemented with 5% charcoal-dextran treated calf serum (CDCS) prior to treatment. As indicated, the cells were treated with 100 nM E2 for the times specified, with or without a 1 hour pre-treatment of 1

μM flavopiridol (Sigma).

Locus-specific molecular assays: ChIP-qPCR, RT-qPCR, and 3C-PCR

Chromatin immunoprecipitation-quantitative PCR (ChIP-qPCR) (Kininis et al. 2007), reverse transcription-quantitative PCR (RT-qPCR) (Hah et al. 2011), and conformation capture-PCR (3C-PCR) (Pan et al. 2008) was performed essentially as described previously using MCF-7 cells and locus-specific primers designed to detect the end products of interest by qPCR or PCR (see Supplemental Table S2). Each experiment was performed a minimum of three times with independent biological samples to ensure reproducibility.

GRO-seq

Global run-on sequencing (GRO-seq) was performed as described previously (Hah et al.

2011) from two biological replicates of E2-treated (0, 10, and 40 min.) MCF-7 cells. The data sets, which were used for all genome-scale analyses, are available from NCBI/GEO using

19 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013 accession number GSE43836. For some gene-specific analyses showing genome browser tracks, we also used data from GRO-seq libraries generated from other E2 treatment time points (e.g.,

25 min; GEO accession number GSE41324).

Analysis of GRO-seq data

GRO-seq data were analyzed using software described previously (Hah et al. 2011) and the approaches described below. Software, scripts, and other information can be obtained by contacting W. Lee Kraus. Read alignment, transcript calling, determination of regulation by E2, and the generation of heat maps were performed as described previously (Hah et al. 2011).

Defining Classes of ESR1 enhancer transcripts (eRNAs). The repertoire of genomic

ESR1 binding sites (ERBSs) was extracted from ChIP-seq data provided in (Welboren et al.

2009) (GEO accession number GSM365926). Those ERBSs >10 kb away from the 5’ or 3’ ends of annotated genes were defined as “Intergenic ERBSs.” They were divided into three classes based on the presence, location, and orientation of GRO-seq-defined transcripts: (1) those overlapping transcripts originating from both strands of DNA, running in opposite directions as a divergent pair; (2) those overlapping a transcript originating from one strand of DNA only

(“Unpaired); and (3) those not overlapping a transcript. The transcripts in classes 1 and 2 were further categorized based on the length of the transcript unit/primary transcript as ‘short’ (length

< 9 kb; eRNAs, by our definition) and ‘long’ (length >9 kb; which likely represent other classes of non-coding RNAs, such as lncRNAs).

Analysis of ChIP-seq data

20 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

ChIP-seq datasets from MCF-7 cells were obtained from the NCBI Gene Expression

Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) and the ArrayExpress

(http://www.ebi.ac.uk/arrayexpress/) on-line databases (see Supplemental Table 1). The raw files were aligned to hg18 using BOWTIE (Langmead et al. 2009). Uniquely mappable reads were converted into bigWig files using BEDTools (Quinlan and Hall 2010) for visualization in the UCSC genome browser.

Analysis of ChIA-PET data

We used the ESR1-dependent intra-chromosomal loops defined in the ChIA-PET data set from (Fullwood et al. 2009) to examine if the ERBSs defined by (Welboren et al. 2009) are (1) involved in loops and (2) if they loop to the promoters of 19,008 unique RefSeq genes. We assayed for loops within 2 kb window (± 1kb) around ESR1 peak centers and a 10 kb window (±

5 kb) around the transcription start sites (TSS) of genes. We used GRO-seq data to determine the amount of transcription at the ERBSs and the target genes involved in looping events.

Various metaplot representations of GRO-seq and ChIP-seq data were used to compare the properties of ERBSs and genes with or without looping events.

Genomic data analysis and visualization

We used metaplots to illustrate the distribution of GRO-seq, ChIP-seq, and ChIA-PET reads around ESR1 peak maxima (or other genomic features) using the metaplot function in our

GRO-seq package (Hah et al. 2011). We also used boxplots to minimize the bias caused by outliers in the data, which can overly influence metaplot representations. The inter-quartile regions (IQRs) of the boxplots were used to plot metaplots centered on the ERBSs.

21 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

Predicting enhancers based on GRO-seq data

We used GRO-seq data combined with DNA-binding transcription factor motif information to predict active enhancers with directed and unbiased approaches.

Directed search for enhancers based on GRO-seq data. Motifs for all curated, non- redundant, vertebrate transcription factors (130 total) in the JASPAR database (Bryne et al.

2008) were searched using FIMO (Grant et al. 2011) with a p-value threshold of 1 x 10-6; 78 out of 130 motifs were identified and mapped in hg18 at this p-value. From the total set of identified motifs, we selected only intergenic motifs (>10 kb from RefSeq genes) that have eRNAs originating within a 2 kb window around the center of the motif (i.e., ± 1 kb relative to the motif). For each occurrence of a specific intergenic motif with an overlapping eRNA that we selected, we collected all GRO-seq reads within a 1 kb window around the center of the motif

(i.e., ± 0.5 kb relative to the motif) and normalized them to the total number of occurrences of the motif.

Unbiased search for enhancers based on GRO-seq data. We searched our MCF-7

GRO-seq data sets for sites in the genome expressing intergenic (>10 kb away from the start or end of an annotated RefSeq gene) short-short paired eRNAs using the following parameters: primary transcript length shorter than 9 kb and an average overlap of 3 kb. All occurrences that fit these criteria were collected and subjected to motif analysis. De novo motif searching was performed on a 1 kb region around the center of the plus and minus strand overlap (+/- 500 bp) using the command-line version of MEME (Bailey and Elkan 1994). The predicted motifs from

MEME were matched to known motifs using STAMP (Parks and Beiko 2010).

22 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

Data Access

The new GRO-seq data sets described herein are available from NCBI/GEO

(http://www.ncbi.nlm.nih.gov/geo/) using accession number GSE43836.

Acknowledgements

We thank TK Kim, Xin Luo, Miao Sun, and Hector Franco for critical comments on this manuscript. This work was supported by an NIH training award (T32HD052471) and a postdoctoral fellowship from the PhRMA Foundation to C.G.D. and a grant from the

NIH/NIDDK (DK058110) to W.L.K.

Disclosure Declaration

The authors have no conflicts of interest to disclose.

23 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

Figure Legends

Figure 1. The ESR1 enhancer of the estrogen-responsive P2RY2 gene produces bi- directional transcripts in MCF-7 cells.

(A) Browser tracks of GRO-seq, ChIP-seq (Pol II, ESR1, FOXA1, and H3K4me1), ChIA-PET,

TSS locations, and gene annotation for P2RY2 and its distal ESR1 binding site (ERBS1). The data are from MCF-7 cells treated with a time course of E2 (GRO-seq) or a single time point of

E2 (45 or 60 min.). TSSs identified previously based on a published data set from MCF-7 cells

(Yamashita et al. 2011) are located as indicated. The locations of primers used for 3C assays are indicated by orange arrows. The black bars shown for the ChIA-PET data indicate the “head” and “tail” making contact in the gene loops, which are indicated by the dotted black lines. Scale bars show the length of the indicated region. A more detailed set of genomic data for P2RY2, as well as data for additional enhancer/gene pairs, can be found in Supplemental Figs. 1 and 2.

(B and C) ChIP-qPCR analyses showing recruitment of ESR1 and Pol II (B) or levels of

H3K4me1, me3, and H3 (C) at ERBS1 in response to a time course of E2 treatment. Each bar represents the mean + the SEM for three or more independent biological replicates.

(D) RT-qPCR analyses showing the expression of ERBS1 eRNA and P2RY2 mRNA in response to a time course of E2 treatment. Each bar represents the mean + the SEM for three or more independent biological replicates.

(E) 3C-PCR assay showing E2-induced looping between ERBS1 and the P2RY2 gene. The lower case letters correspond to the primers denoted by orange arrows shown in panel A. The assays were conducted in the presence (experimental) or absence (control) of DNA ligase, as indicated. Digested and ligated bacterial artificial chromosome (BAC) DNA spanning the entire

24 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

P2YR2 locus was used as a PCR control. The size of the PCR fragments in bp is shown. One representative experiment from three conducted is shown.

Figure 2. Genome-wide identification of ESR1 enhancer transcripts in MCF-7 cells using

GRO-seq.

(A) Flowchart of ERBS classification in MCF-7 cells based on genomic location, eRNA production, and length of the transcribed region based on ChIP-seq and GRO-seq.

(B) Schematics of average transcribed regions overlapping ERBSs in MCF-7 cells in five classes: (a) short unpaired, (b) long unpaired, (c) short-short paired, (d) long-short unpaired, and

(e) long-long, paired. “Short” and “Long” indicates a transcribed region <9 kb or ≥9 kb, respectively. Red and blue boxes indicate transcription from opposite strands.

(C) Graphical representation of the positions and orientations of eRNAs (indicated by red and blue lines) relative to ESR1 binding sites (indicated by yellow oval and line) for unpaired and paired eRNAs in MCF-7 cells. The position relative to the ERBS is indicated in kb. (a) through

(e) correspond to the categories shown in panel B. Red and blue lines indicate transcription from opposite strands.

(D) Heat map showing the expression of E2-regulated short unpaired (S-U) and short-short paired (S-S) eRNAs over a time course of E2 treatment in MCF-7 cells based on GRO-seq data.

The data were median centered and scaled to the 0 min. time point. Yellow and blue indicate upregulated and downregulated transcripts, respectively. Only unique transcripts are shown (i.e., those transcripts that overlap more than one ERBS are represented once).

(E) Metaplot analyses of GRO-seq reads surrounding ERBS associated with short-short paired transcripts, short unpaired transcripts, or no transcripts in MCF-7 cells ± E2 treatment.

25 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

(F) Metaplot analyses of Pol II ChIP-seq reads surrounding ERBS associated with short-short paired transcripts, short unpaired transcripts, or no transcripts in MCF-7 cells ± E2 treatment.

(G) Box plot representation of GRO-seq and Pol II ChIP-seq reads associated with short-short paired transcripts (S-S), short unpaired transcripts (S-U), or no transcripts in MCF-7 cells ± E2 treatment.

Figure 3. The production of eRNAs from ERBSs positively correlates with the recruitment of coactivators, the levels of histone modifications, and the chromatin state in MCF-7 cells.

Browser tracks, metaplots, and boxplots showing a positive correlation between eRNA production at ERBS with known markers of enhancer function. (Left two panels) Browser track representations of coactivator or histone modification ChIP-seq data, or DNase-seq data, as indicated on the y-axis for ERBS1 and ERBS6. (Middle three panels) Metaplot analyses of ChIP- seq or DNase-seq read counts for sets of ERBSs with short-short paired, short unpaired, or no transcripts in the presence (green line) or absence (black line) of E2 treatment. (Right panel)

Box plot representations of ChIP-seq or DNase-seq data for sets of ERBSs with short-short paired (blue boxes), short unpaired (maroon boxes), or no transcripts (yellow boxes) in the presence (+) or absence (-) of E2 treatment.

(A) ESR1 ChIP-seq.

(B) FOXA1 ChIP-seq.

(C) CREBBP ChIP-seq.

(D) NCOA3 ChIP-seq.

(E) H3K4me1 ChIP-seq.

(F) H3K4me3 ChIP-seq.

26 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

(G) DNase-seq.

Figure 4. The production of eRNAs from ERBSs positively correlates with enhancer looping to target genes in MCF-7 cells.

(A and C) Schematics of the looping analyses based on ESR1 ChIA-PET data. ERBSs

(enhancers) were queried to determine if they loop to the promoters of RefSeq genes, based on

ESR1 ChIA-PET data. Looping was assayed within a 2 kb (± 1 kb) window around ESR1 peak centers and a 10 kb (± 5 kb) window around the TSSs of target genes.

(B) Metaplots and boxplots for GRO-seq (top) and Pol II ChIP-seq (bottom) data for ERBSs that either loop to (Loop, red lines) or do not loop to (No Loop, blue lines) target gene promoters.

(D) Metaplots and boxplots for GRO-seq (top) and Pol II ChIP-seq (bottom) data for target gene promoters that are either looped to (Loop, red lines) or are not looped to (No Loop, blue lines) target gene promoters.

(E) Metaplots and boxplots for CREBBP ChIP-seq (top) and NCOA3 ChIP-seq (bottom) data for

ERBSs that either loop to (Loop, red lines) or do not loop to (No Loop, blue lines) target gene promoters.

(F) Metaplots and boxplots for H3K4me1 ChIP-seq ChIP-seq (top) and DNase-seq (bottom) data for ERBSs that either loop to (Loop, red lines) or do not loop to (No Loop, blue lines) target gene promoters.

Figure 5. Inhibition of eRNA production by flavopiridol does not inhibit ESR1, Pol II, or coregulator binding, alter H3K4me1 or H3K27ac levels, or prevent enhancer looping at

ERBSs in MCF-7 cells.

27 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

Locus-specific assays for E2-responsive enhancers showing the effects of a 1 hour pre-treatment with flavopiridol (FP) on various molecular outcomes in MCF-7 cells. Each bar represents the mean + the SEM for three or more independent biological replicates.

(A and B) Treatment with flavopiridol inhibits the E2-dependent production and steady-state accumulation of eRNAs and target gene mRNAs. RT-qPCR analyses for selected eRNAs and mRNAs in response to E2 treatment. (A) ERBS1 eRNA/P2RY2 mRNA and (B) ERBS2 eRNA/GREB1 mRNA.

(C and D) ChIP-qPCR analyses for ESR1 (left) and Pol II (right) for (C) ERBS1 and (D) ERBS2 in the absence or presence of E2 and flavopiridol, as indicated.

(E and F) ChIP-qPCR analyses for CREBBP (left), EP300 (middle), and Pol II (right) for (E)

ERBS1 and (F) ERBS2 in the absence or presence of E2 and flavopiridol, as indicated.

(G and H) ChIP-qPCR analyses for H3K4me1 (left), H3K27ac (middle), and H3 (right) for (G)

ERBS1 and (H) ERBS2 in the absence or presence of E2 and flavopiridol, as indicated.

(I and J) 3C-PCR analyses showing that looping between distal ERBSs and target genes in the presence of E2 is not blocked by flavopiridol (FP). (I) ERBS1/P2RY2 and (J) ERBS2/GREB1.

The lower case letters correspond to the primers denoted by orange arrows shown in Fig. 1A.

The assays were conducted in the presence (experimental) or absence (control) of DNA ligase, as indicated. The size of the PCR fragments in bp is shown. One representative experiment from three conducted is shown.

Figure 6. Directed search for ESR1 and non-ESR1 enhancers in MCF-7 cells using GRO- seq data.

(A) Schematic of the directed enhancer search using GRO-seq data. Seventy-eight motifs from

28 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013 the JASPAR database were mapped to the human genome using FIMO. For all intergenic motifs

(>10 kb from RefSeq genes) with eRNAs (either short-short paired or short unpaired) originating within a 2 kb window around the center of the motif (i.e., ± 1 kb relative to the motif), we collected the GRO-seq reads within 1 kb window around the center of the motif (i.e., ± 0.5 kb relative to the motif) and normalized them to the total number of occurrences of the motif.

(B) Bar graph showing the normalized GRO-seq read count density per occurrence for 9 selected motifs from the JASPAR database ±E2.

(C) Web logos for the 9 selected motifs shown in panel B generated using the JASPAR position weight matrices (PWMs).

(D) Metaplots of Pol II, H3K4me1, and CREBBP ChIP-seq data from MCF-7 cells treated with

(green lines) or without (black lines) E2 for 4 selected motifs from panel B.

(E and F) ChIP-qPCR assays of KLF4 (E) and EGR1 (F) binding (left panels) and enhancer- associated histone modifications (H3K4me1 and H3K27ac; right panels) at cognate predicted binding sites. Each bar represents the mean + SEM for three or more independent biological replicates.

Figure 7. Unbiased search for ESR1 and non-ESR1 enhancers in MCF-7 cells using GRO- seq data.

(A) Schematic of the unbiased enhancer search using GRO-seq data. All intergenic (>10 kb away from the start or end of an annotated RefSeq gene) short-short paired eRNAs less than 9 kb and with an average overlap of 3 kb were identified in the set of called transcripts from MCF-7 cells.

29 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

(B) GRO-seq data, Pol II, ESR1, CREBBP, H3K4me1, and RAD21 ChIP-seq data, and DNase- seq data (as indicated) from the analysis described in panel A were collected, mapped relative to the center of the plus and minus strand overlap of the short-short paired eRNAs, and expressed as metaplots.

(C) Browser tracks of GRO-seq and selected ChIP-seq data for two enhancers without ERBSs identified in the unbiased search described in panel A.

(D) Web logos and statistical parameters for the top motifs identified in a search of enhancers identified in panel A. All occurrences of the short-short paired transcripts were collected and subjected to motif analysis. De novo motif searching was performed on a 1 kb region around the center of the plus and minus strand overlap (+/- 500 bp) using MEME. The predicted motifs from MEME were matched to known motifs using STAMP.

30 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

References

Akhtar-Zaidi B, Cowper-Sal-lari R, Corradin O, Saiakhova A, Bartels CF, Balasubramanian D,

Myeroff L, Lutterbaugh J, Jarrar A, Kalady MF et al. 2012. Epigenomic enhancer

profiling defines a signature of colon cancer. Science 336(6082): 736-739.

Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS.

2009. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res 37(Web

Server issue): W202-208.

Bailey TL, Elkan C. 1994. Fitting a mixture model by expectation maximization to discover

motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2: 28-36.

Bryne JC, Valen E, Tang MH, Marstrand T, Winther O, da Piedade I, Krogh A, Lenhard B,

Sandelin A. 2008. JASPAR, the open access database of transcription factor-binding

profiles: new content and tools in the 2008 update. Nucleic Acids Res 36(Database issue):

D102-106.

Bulger M, Groudine M. 2011. Functional and mechanistic diversity of distal transcription

enhancers. Cell 144(3): 327-339.

Carroll JS, Liu XS, Brodsky AS, Li W, Meyer CA, Szary AJ, Eeckhoute J, Shao W, Hestermann

EV, Geistlinger TR et al. 2005. Chromosome-wide mapping of estrogen receptor binding

reveals long-range regulation requiring the forkhead FoxA1. Cell 122(1): 33-43.

Chao SH, Fujinaga K, Marion JE, Taube R, Sausville EA, Senderowicz AM, Peterlin BM, Price

DH. 2000. Flavopiridol inhibits P-TEFb and blocks HIV-1 replication. J Biol Chem

275(37): 28345-28348.

Core LJ, Waterfall JJ, Lis JT. 2008. Nascent RNA sequencing reveals widespread pausing and

divergent initiation at human promoters. Science 322(5909): 1845-1848.

31 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, Hanna J, Lodato

MA, Frampton GM, Sharp PA et al. 2010. Histone H3K27ac separates active from poised

enhancers and predicts developmental state. Proc Natl Acad Sci U S A 107(50): 21931-

21936.

De Santa F, Barozzi I, Mietton F, Ghisletti S, Polletti S, Tusi BK, Muller H, Ragoussis J, Wei

CL, Natoli G. 2010. A large fraction of extragenic RNA pol II transcription sites overlap

enhancers. PLoS Biol 8(5): e1000384.

Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, Lin

W, Schlesinger F et al. 2012. Landscape of transcription in human cells. Nature

489(7414): 101-108.

Fullwood MJ, Liu MH, Pan YF, Liu J, Xu H, Mohamed YB, Orlov YL, Velkov S, Ho A, Mei

PH et al. 2009. An oestrogen-receptor-alpha-bound human chromatin interactome.

Nature 462(7269): 58-64.

Ghisletti S, Barozzi I, Mietton F, Polletti S, De Santa F, Venturini E, Gregory L, Lonie L, Chew

A, Wei CL et al. 2010. Identification and characterization of enhancers controlling the

inflammatory gene expression program in macrophages. Immunity 32(3): 317-328.

Grant CE, Bailey TL, Noble WS. 2011. FIMO: scanning for occurrences of a given motif.

Bioinformatics 27(7): 1017-1018.

Hah N, Danko CG, Core L, Waterfall JJ, Siepel A, Lis JT, Kraus WL. 2011. A rapid, extensive,

and transient transcriptional response to estrogen signaling in breast cancer cells. Cell

145(4): 622-634.

32 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

He HH, Meyer CA, Chen MW, Jordan VC, Brown M, Liu XS. 2012. Differential DNase I

hypersensitivity reveals factor-dependent chromatin dynamics. Genome Res 22(6): 1015-

1025.

He HH, Meyer CA, Shin H, Bailey ST, Wei G, Wang Q, Zhang Y, Xu K, Ni M, Lupien M et al.

2010. Nucleosome dynamics define transcriptional enhancers. Nat Genet 42(4): 343-347.

John S, Sabo PJ, Thurman RE, Sung MH, Biddie SC, Johnson TA, Hager GL,

Stamatoyannopoulos JA. 2011. Chromatin accessibility pre-determines glucocorticoid

receptor binding patterns. Nat Genet 43(3): 264-268.

Kagey MH, Newman JJ, Bilodeau S, Zhan Y, Orlando DA, van Berkum NL, Ebmeier CC,

Goossens J, Rahl PB, Levine SS et al. 2010. Mediator and cohesin connect gene

expression and chromatin architecture. Nature 467(7314): 430-435.

Kim TK, Hemberg M, Gray JM, Costa AM, Bear DM, Wu J, Harmin DA, Laptewicz M,

Barbara-Haley K, Kuersten S et al. 2010. Widespread transcription at neuronal activity-

regulated enhancers. Nature 465(7295): 182-187.

Kininis M, Chen BS, Diehl AG, Isaacs GD, Zhang T, Siepel AC, Clark AG, Kraus WL. 2007.

Genomic analyses of transcription factor binding, histone acetylation, and gene

expression reveal mechanistically distinct classes of estrogen-regulated promoters. Mol

Cell Biol 27(14): 5090-5104.

Kolovos P, Knoch TA, Grosveld FG, Cook PR, Papantonis A. 2012. Enhancers and silencers: an

integrated and simple model for their function. Epigenetics Chromatin 5(1): 1.

Lai F, Orom UA, Cesaroni M, Beringer M, Taatjes DJ, Blobel GA, Shiekhattar R. 2013.

Activating RNAs associate with Mediator to enhance chromatin architecture and

transcription. Nature 494(7438): 497-501.

33 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

Langmead B, Trapnell C, Pop M, Salzberg SL. 2009. Ultrafast and memory-efficient alignment

of short DNA sequences to the human genome. Genome Biol 10(3): R25.

Li XY, Thomas S, Sabo PJ, Eisen MB, Stamatoyannopoulos JA, Biggin MD. 2011. The role of

chromatin accessibility in directing the widespread, overlapping patterns of Drosophila

transcription factor binding. Genome Biol 12(4): R34.

Maston GA, Landt SG, Snyder M, Green MR. 2012. Characterization of enhancer function from

genome-wide analyses. Annu Rev Genomics Hum Genet.

Melgar MF, Collins FS, Sethupathy P. 2011. Discovery of active enhancers through bidirectional

expression of short transcripts. Genome Biol 12(11): R113.

Melo CA, Drost J, Wijchers PJ, van de Werken H, de Wit E, Oude Vrielink JA, Elkon R, Melo

SA, Leveille N, Kalluri R et al. 2013. eRNAs Are Required for -Dependent Enhancer

Activity and Gene Transcription. Mol Cell 49(3): 524-535.

Natoli G, Andrau JC. 2012. Noncoding transcription at enhancers: general principles and

functional models. Annu Rev Genet.

Ong CT, Corces VG. 2011. Enhancer function: new insights into the regulation of tissue-specific

gene expression. Nat Rev Genet 12(4): 283-293.

Orom UA, Derrien T, Beringer M, Gumireddy K, Gardini A, Bussotti G, Lai F, Zytnicki M,

Notredame C, Huang Q et al. 2010. Long noncoding RNAs with enhancer-like function

in human cells. Cell 143(1): 46-58.

Orom UA, Shiekhattar R. 2011. Noncoding RNAs and enhancers: complications of a long-

distance relationship. Trends Genet 27(10): 433-439.

34 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

Ostuni R, Piccolo V, Barozzi I, Polletti S, Termanini A, Bonifacio S, Curina A, Prosperini E,

Ghisletti S, Natoli G. 2013. Latent enhancers activated by stimulation in differentiated

cells. Cell 152(1-2): 157-171.

Pan YF, Wansa KD, Liu MH, Zhao B, Hong SZ, Tan PY, Lim KS, Bourque G, Liu ET, Cheung

E. 2008. Regulation of estrogen receptor-mediated long range transcription via

evolutionarily conserved distal response elements. J Biol Chem 283(47): 32977-32988.

Parks DH, Beiko RG. 2010. Identifying biologically relevant differences between metagenomic

communities. Bioinformatics 26(6): 715-721.

Pham TH, Benner C, Lichtinger M, Schwarzfischer L, Hu Y, Andreesen R, Chen W, Rehli M.

2012. Dynamic epigenetic enhancer signatures reveal key transcription factors associated

with monocytic differentiation states. Blood 119(24): e161-171.

Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities for comparing genomic

features. Bioinformatics 26(6): 841-842.

Rada-Iglesias A, Bajpai R, Prescott S, Brugmann SA, Swigut T, Wysocka J. 2012. Epigenomic

annotation of enhancers predicts transcriptional regulators of human neural crest. Cell

Stem Cell 11(5): 633-648.

Rada-Iglesias A, Bajpai R, Swigut T, Brugmann SA, Flynn RA, Wysocka J. 2011. A unique

chromatin signature uncovers early developmental enhancers in humans. Nature

470(7333): 279-283.

Ross-Innes CS, Stark R, Teschendorff AE, Holmes KA, Ali HR, Dunning MJ, Brown GD, Gojis

O, Ellis IO, Green AR et al. 2012. Differential oestrogen receptor binding is associated

with clinical outcome in breast cancer. Nature 481(7381): 389-393.

35 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

Shen Y, Yue F, McCleary DF, Ye Z, Edsall L, Kuan S, Wagner U, Dixon J, Lee L, Lobanenkov

VV et al. 2012. A map of the cis-regulatory sequences in the mouse genome. Nature

488(7409): 116-120.

Vahedi G, Takahashi H, Nakayamada S, Sun HW, Sartorelli V, Kanno Y, O'Shea JJ. 2012.

STATs shape the active enhancer landscape of T cell populations. Cell 151(5): 981-993.

Wang D, Garcia-Bassets I, Benner C, Li W, Su X, Zhou Y, Qiu J, Liu W, Kaikkonen MU, Ohgi

KA et al. 2011. Reprogramming transcription by distinct classes of enhancers

functionally defined by eRNA. Nature 474(7351): 390-394.

Welboren WJ, van Driel MA, Janssen-Megens EM, van Heeringen SJ, Sweep FC, Span PN,

Stunnenberg HG. 2009. ChIP-Seq of ERalpha and RNA polymerase II defines genes

differentially responding to ligands. EMBO J 28(10): 1418-1428.

Whyte WA, Bilodeau S, Orlando DA, Hoke HA, Frampton GM, Foster CT, Cowley SM, Young

RA. 2012. Enhancer decommissioning by LSD1 during embryonic stem cell

differentiation. Nature 482(7384): 221-225.

Xu J, Shao Z, Glass K, Bauer DE, Pinello L, Van Handel B, Hou S, Stamatoyannopoulos JA,

Mikkola HK, Yuan GC et al. 2012. Combinatorial assembly of developmental stage-

specific enhancers controls gene expression programs during human erythropoiesis. Dev

Cell 23(4): 796-811.

Yamashita R, Sathira NP, Kanai A, Tanimoto K, Arauchi T, Tanaka Y, Hashimoto S, Sugano S,

Nakai K, Suzuki Y. 2011. Genome-wide characterization of transcriptional start sites in

humans by integrative transcriptome analysis. Genome Res 21(5): 775-789.

Zentner GE, Tesar PJ, Scacheri PC. 2011. Epigenetic signatures distinguish multiple classes of

enhancers with distinct cellular functions. Genome Res 21(8): 1273-1283.

36 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Hah et al. (Kraus) April 16, 2013

37 Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Enhancer (ERBS1) Gene (P2RY2) A B 1.0 0.12 75 ESR1 Pol II 0.8 0.10 0 min. 0.08 0 0.6 0.06 35 75 0.4 0.04 Percent Input PercentInput 0.2 10 min. 0.02 0 0 0 35 E2 (min) 0 10 40 160 0 10 40 160 75

(E2timecourse) C 8.0 H3K4me1

seq 25 min. H3K4me3 0 6.0 H3

GRO- 35 75 4.0

40 min. Percent Input PercentInput 2.0 0

35 0 15 0 0 0 10 40 10 40

E2 (min) 10 40 160 160 160 Pol II – E2 D 8.0 0 eRNA 15 mRNA 6.0 Pol II + E2

0 4.0 TSS

P2RY23C d c b a Hub 2.0 RelativeExpression 0 ESR1 ChIA-PET E2 (min) 0 40 160 0 40 160

– E2 + E2 50 10 kb E ESR1 – E2 Hub

Hub b c d a b c d a

0 * 600 bp 50 *

Ligase 100 bp

ESR1 + E2 +

0 600 bp 30

Ligase 100 bp

FOXA1 + E2 –

0 600 bp 15 BAC 100 bp H3K4me1 + E2

0

Fig. 1 - Hah et al. (2013) All_Classes

unpaired A All ESR1 binding C Unpaired Paired (882 total) (715 Total) sites (10,191) ESR1 Binding Site 800 ESR1 Binding Site TSS 1000 10 kb 10 kb Intergenic ESR1 Gene (a) 700 (c) binding sites (3,191) ERBS ERBS 800 800

ERBSs overlapping ERBSs not overlapping 600 a transcript (1,597) a transcript (1,594) 600 600 600

ERBSs with Unpaired ERBSs with Paired 500

eRNAs (882) eRNAs (715) 400Transcripts (b) 400

(d) Short Long Short-Short Long-Long 400 Number of Transcripts Transcripts Number of (452) (430) (283 pairs) (304 pairs) 200 400 200 Short-Long Transcripts (128 pairs) 300 (e)

0 0 B -40 -20 0 20 40 Transcribed Paired Relative−40000 −20000 Position0 20000 (kb)40000 Region Pairs Transcript Number of Length Filter 3.0 kb overlap 200 E2Relative (min.) Positions <9 kb = Short 4.8 kb 200 (c) Short-Short ≥9 kb = Long 3.5 kb D 0 10 40 160 5.0 kb overlap S-S Unpaired 100 37.0 kb (422) (d) Long-Short 5.2 kb (a) Short 3.1 kb 21.5 kb overlap S-U

(b) Long 34.1 kb 48.3 kb (380) 0 0 (e) Long-Long -40 -20 0 20 40 23.3 kb (Unique -1 +1 Relative Position (kb) Transcripts) −40000 −20000 0 20000 40000 E Short-Short Paired Short Unpaired None Relative Positions 6 GRO-seq –E2 +E2 –E2 +E2 –E2 +E2 4 2 0 -2 -4

Norm.Read Counts -6 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 Distance from Distance from Distance from Distance from Distance from Distance from ERBS (kb) ERBS (kb) ERBS (kb) ERBS (kb) ERBS (kb) ERBS (kb)

F Short-Short Paired Short Unpaired None G 12 250 200 Pol II –E2 –E2 –E2 b GRO-seq Pol II 10 200 b +E2 +E2 +E2 150 8 c 150 a a 6 100 100 d 4 a d e 50 50 c 2 f e Norm.Read Counts Norm.Read Counts 0 0 Norm.Read Counts 0 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 E2 – + – + – + E2 – + – + – + Distance from Distance from Distance from ERBS (kb) ERBS (kb) ERBS (kb) S-S S-U None S-S S-U None

Fig. 2 - Hah et al. (2013) A Short-Short Paired Short Unpaired None 50 2 kb 150 1200 ESR1 –E2 b ESR1 +E2 120 900 b 90 600 d

ESR1 60 300 30 a a c Norm.Read Counts 0 0 Norm.Read Counts 0 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 E2 – + – + – + B 30 50 500 FOXA1 –E2 b FOXA1 40 +E2 400

30 300 a d 20 200 FOXA1 c f 10 100 e Norm.Read Counts 0 0 Norm.Read Counts 0 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 E2 – + – + – + C 250 80 800 –E2 CREBBP b CREBBP +E2 60 600

40 400 d a CREBBP CREBBP 20 200 c f e Norm.Read Counts 0 0 Norm.Read Counts 0 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 E2 – + – + – + D 150 35 250 NCOA3 –E2 b NCOA3 30 +E2 200 25 150 20 c 15 100 e NCOA3 10 a a d 50 5 Norm.Read Counts 0 0 Norm.Read Counts 0 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 E2 – + – + – + E 12 10 250 H3K4me1 –E2 a H3K4me1 8 +E2 200 b 6 150 a b c 4 100 c H3K4me1 2 50 Norm.Read Counts 0 0 Norm.Read Counts 0 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 E2 – + – + – + F 30 4 100 H3K4me3 –E2 H3K4me3 +E2 80 b 3 d a 60 c 2 e 40 e

H3K4me3 1 20 Norm.Read Counts 0 0 Norm.Read Counts 0 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 E2 – + – + – + G 100 10 200 DNase I –E2 DNaseI 8 b +E2 150 I 6 c 100 a 4 a e DNase 50 d 2 Norm.Read Counts 0 0 Norm.Read Counts 0 ERBS1 ERBS6 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 E2 – + – + – + Distance from Distance from Distance from ERBS (kb) ERBS (kb) ERBS (kb) S-S S-U None Fig. 3 - Hah et al. (2013) A ESR1 Binding Sites with S-S Paired or C Short Unpaired Transcripts (735) RefSeq genes upregulated with 40 min. E2 (1317)

Loop S-S Paired ! Short Unpaired ! Loop RefSeq ERBS! RefSeq Promoters No Loop No Loop Promoters

B Enhancer D Transcription Start Site 30 300 50 800 Loop Loop –E2 +E2 c 40 –E2 +E2 d 20 No Loop No Loop

30 600 200 10 20 seq seq b 10 400 b 0 0 c 100 a GRO- a GRO- -10 200 -10 a -20 Norm.Read Counts Norm.Read Counts Norm.Read Counts -20 0 Norm.Read Counts -30 0 -4 -2 0 2 4 -4 -2 0 2 4 E2 – + – + -4 -2 0 2 4 -4 -2 0 2 4 E2 – + – +

40 400 100 1200 d Loop –E2 +E2 c Loop –E2 +E2 No Loop No Loop 30 300 75 900 II 20 200 II 50 600 a b c Pol Pol a a b 10 100 25 300 Norm.Read Counts Norm.Read Counts Norm.Read Counts 0 0 Norm.Read Counts 0 0 -4 -2 0 2 4 -4 -2 0 2 4 E2 – + – + -4 -2 0 2 4 -4 -2 0 2 4 E2 – + – + Distance from Distance from Distance from Distance from ERBS (kb) ERBS (kb) No Loop Loop TSS (kb) TSS (kb) No Loop Loop

E Enhancer F Enhancer 200 1500 20 250 d Loop –E2 +E2 Loop –E2 +E2 a b No Loop No Loop 200 150 15 1000 b 150 b 100 10 100 500 b CREBBP CREBBP 50 c H3K4me1 5 a 50 Norm.Read Counts Norm.Read Counts Norm.Read Counts 0 Norm.Read Counts 0 0 0 -4 -2 0 2 4 -4 -2 0 2 4 E2 – + – + -4 -2 0 2 4 -4 -2 0 2 4 E2 – + – +

100 800 40 300 Loop –E2 +E2 c Loop –E2 +E2 c 80 No Loop No Loop 250 600 30

I 200 60 400 20 150 b 40 a NCOA3 DNase 100 200 b a 20 10 a a 50 Norm.Read Counts Norm.Read Counts Norm.Read Counts 0 0 Norm.Read Counts 0 0 -4 -2 0 2 4 -4 -2 0 2 4 E2 – + – + -4 -2 0 2 4 -4 -2 0 2 4 E2 – + – + Distance from Distance from Distance from Distance from ERBS (kb) ERBS (kb) No Loop Loop ERBS (kb) ERBS (kb) No Loop Loop

Fig. 4 - Hah et al. (2013) RT-qPCR ChIP-qPCR 3C ERBS1 / P2YR2 A 5.0 C 1.4 0.15 I P2YR2 eRNA ESR1 Pol II – FP + FP mRNA 1.2 ERBS1 ERBS1 4.0 0.12 Hub 1.0 Hub b c d a b c d a

3.0 0.09 0.8 * * * * 600 bp

0.6 Ligase 2.0 0.06 100 bp +

0.4 Percent Input PercentInput Percent Input PercentInput 1.0 0.03 0.2 600 bp RelativeExpression

0 0 0 Ligase 100 bp E2 – – + + – – + + E2 – + – + E2 – + – + – FP – + FP – + FP – + ERBS2 / GREB1 B 4.0 D 1.8 0.15 J GREB1 eRNA ESR1 Pol II – FP + FP ERBS2 ERBS2 mRNA 1.5 0.12 3.0 Hub Hub a b c d e a b c d e 1.2 0.09 800 bp 2.0 0.9 * 0.06 Ligase * 200 bp

0.6 +

Percent Input PercentInput 1.0 PercentInput 0.3 0.03 800 bp RelativeExpression

0 0 0 Ligase 200 bp E2 – – + + – – + + E2 – + – + E2 – + – + – FP – + FP – + FP – +

E ERBS1 / ChIP-qPCR F ERBS2 / ChIP-qPCR 0.30 0.5 0.45 0.3 1.0 0.7 CREBBP EP300 NCOA3 CREBBP EP300 NCOA3 0.40 0.25 0.6 0.4 0.35 0.8 0.5 0.20 0.30 0.2 0.3 0.6 0.25 0.4 * 0.15 0.2 0.20 0.4 0.3 0.10 0.15 0.1 0.2 Percent Input PercentInput Percent Input PercentInput 0.1 0.10 0.2 0.05 0.1 0.05 0 0 0 0 0 0 E2 – + – + E2 – + – + E2 – + – + E2 – + – + E2 – + – + E2 – + – + FP – + FP – + FP – + FP – + FP – + FP – +

G 6 5 5 H 6 5 5 H3K4me1 H3K27ac H3 H3K4me1 H3K27ac H3 5 5 4 4 4 4 4 4 3 3 3 3 3 3 2 2 2 2 2 2 Percent Input PercentInput Percent Input PercentInput 1 1 1 1 1 1

0 0 0 0 0 0 E2 – + – + E2 – + – + E2 – + – + E2 – + – + E2 – + – + E2 – + – + FP – + FP – + FP – + FP – + FP – + FP – +

Fig. 5 - Hah et al. (2013) A C 200 TSS Induced Constitutive Repressed 10 kb 10 kb Gene 150 Reads Motif Motif –E2 seq 100 +E2 • 78 JASPAR motifs mapped using FIMO. • 75 out of 78 had at least one 50 Short-Short Paired or Short Norm. GRO- Unpaired transcript. 0 • Limited to distal motifs (>10 Kb from RefSeq gene). SP1 SPZ1 ESR1 ESR2 KLF4 PAX5 EGR1 MYCN GABPA GABPA

B t_1_MA0112.2ESR1 ESR2 GABPA KLF4 EGR1 t_1_MA0258.1 t_1_MA0062.2 t_1_MA0039.2 t_MA0162.1 2 2 2 2 2 2 2 2 2 2 ! ! ! ! !

1 1 1 1 1 bits bits

1 bits bits

1 1 1 1 bits Bits Bits A Bits Bits T G Bits T C CCC A T A T A A T T G C T C T T T A C G T G C G C T G G T G C A C C A A C G T T T G C C GC G A G C TA T A G A A G G C T ATT G A T T T A C C G G A T C C A A A C CG T GGC C A C CAT AGT G T C C C C C A A A A T A A A T ATA A A C A T A C T A A G G C G C G A C A CG G A T A A A T G T GT 0 G G A 0 4 5 6 7 8 9 1 2 3 G 2 3 4 5 6 7 8 9 1 0 0 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 TG G 1 C G 0 2 3 4 5 6 7 8 9 0 C 0 0 C 0 0 1 10 11 12 13 14 15 16 17 18 10 11 12 13 14 15 16 17 18 19 20 10 10 11 G CA GGAA GGG G G G G10 11 1 3 5 7 9 11 13 15 1 3 5 7 9 11 13 15 1 3 5 7 9 11 13 15 1 3 5 7 9 11 13 15 1 3 5 7 9 11 13 15 2 Position! 2 Position! 2 Position! 2 Position! 2 Position!

1 1 1 bits

bits 1 1 bits bits bits T G A T A A T T C C GG A CG G A T C C A A T T G C A A C T C G CG T G A G A G A C T A T T C T C ACC C T A A T A G C C T C C A C G A A C C A A A G C A GA A A T T A C G A T C C T T T C T T C G T TC G A C G G G C TA T G G t_1_MA0014.1 C PAX5A TG G GT A ATT C CA MYCN SP1 SPZ1 A G A G T G CG G C A CAGTGTGA t_1_MA0111.1 C C A 0 0 0 t_MA0079.2 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 G A t_1_MA0104.2 1 C2 3 4 5 6 7 8 9 TT TG 0 A 1 2 3 4 5 6 7 0 C G 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 C GGAA 10 11 GG G 2 10 11 12 13 14 15 16 17 18 19 20 q_NNRGGTCANNNTGHCCTN q_CCGGAAGYGVN q_DGGGYGKGGC Tomtom 23.8.2012 12:49 2 2 2 Tomtom 23.8.2012 12:52 2 2 Tomtom 23.8.2012 12:53 2 2 q_GGGCGKW GGGCGTomtom 23.8.2012 12:38 q_NNNNNAGGTCANCNTGACCY Logos Tomtom 23.8.2012 12:51 ! ! ! !

1 1 1 1 derived from bits bits bits 1 1 bits 1 1 Bits A GG Bits Bits Bits G T A T G AA G G C GCA GA C G TT C T G C A CC G JASPAR PWMs C C G T G T C A C G G T C T T C G T G C A T T T G T G C C A A C T A T A T T G C T A T GG C A C T A G ACG T C G G AA G C T T GA C TA G T G A A AG GG CAG A T A A A CA A T C A T T G GA TT GA 0 0 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 0 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 0 0 1 G 0 C C 0 G 10 11 10 11 12 13 14 15 16 17 18 19 20 10 G CAC G C 10 1 3 5 7 9 11 13 15 1 3 5 7 9 11 13 15 1 3 5 7 9 11 13 15 1 3 5 7 9 11 13 15 2 Position! 2 Position! 2 Position! 2 Position!

1 1 1 1 bits bits bits bits

A GG AA G A T G C T G CC G T T G G C G C G GC AC G A GA C T T T G TT G G T C C A A A A C A C TGA C G AT T C T T T T AG T A GA A T C G A T G A A D AC T G C G T CG GG A C T T A G A G A T G CA A A A 0 0

0 1 2 3 4 5 6 7 8 9

1 2 3 4 5 6 7 8 9 G G 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 9 ESR1 AC G KLF4 EGR1 SPZ1 10 11 C 10 10 11 G12 13 14 15 16 17 18 19 20 q_NSCACGTGGC Tomtom 23.8.2012 13:03 CCq_CCCTCCCC CCCTomtom 23.8.2012 12:39 q_AGGGTWDCAGC Tomtom 23.8.2012 13:04 q_DRNBCANBGDDGCGKRRCVR Tomtom 23.8.2012 13:02 T (230) (325) (48) (226) 12 12 12 12 –E2 10 +E2 10 10 10 8 8 8 8 6 6 6 6 Pol II II Pol 4 4 4 4 2 2 2 2

Norm.Read Counts 0 0 0 0 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 10 10 10 10

8 8 8 8

6 6 6 6

4 4 4 4

H3K4me1 2 2 2 2

Norm.Read Counts 0 0 0 0 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 80 8 8 8

60 6 6 6

40 4 4 4 CBP CBP

20 2 2 2

Norm.Read Counts 0 0 0 0 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 Distance from Distance from Distance from Distance from Center of BS (kb) Center of BS (kb) Center of BS (kb) Center of BS (kb)

E ChIP-qPCR F ChIP-qPCR 0.08 6.0 0.7 7.0 KLF4 Histones EGR1 Histones Left 5.0 0.6 6.0 0.06 IgG

0.5 5.0 4.0 Txn Factor

Input Input 0.4 4.0 0.04 3.0 Right 0.3 3.0 H3K4me1 KLF4 BS1 KLF4 2.0 EGR1BS1

Percent H3K27Ac Percent Input PercentInput Percent Input PercentInput 0.2 PercentInput 2.0 0.02 H3 1.0 0.1 1.0

0 0 0 0 Fig. 6 - Hah et al. (2013) A • Search for S-S Paired eRNA signature >10 kb S-S Paired eRNAs S-S Paired eRNAs from annotated genes. with ERBS (202) without ERBS (569) • Identify those with and Overlap without ERBS within 1 kb Center of of the center of overlap. Overlap

B S-S Paired eRNAs without ERBS (569)

6 6 10 6 GRO-seq –E2 +E2 CREBBP –E2 H3K4me1 4 4 8 +E2 5 2 2 4 6 0 0 3 4 -2 -2 2 -4 -4 2 1 Norm.Read Counts -6 -6 Norm.Read Counts 0 0 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 -2 0 2 4 5 150 5 15 Pol II –E2 ESR1 DNase I RAD21 4 +E2 125 4 12 100 3 3 9 75 2 2 6 50 1 25 1 3 Norm.Read Counts Norm.Read Counts 0 0 0 0 -4 -2 0 2 4 -4 -2 0 2 4 -4 -2 0 2 4 -2 0 2 4 Distance from Distance from Distance from Center Distance from Center of ERBS (kb) ERBS (kb) of Overlap (kb) of Overlap (kb)

C Chr. 22 / 26,197,000 5 kb Chr. 6 / 13,978,000 5 kb 25 25

GRO-seq

-15 -15 25 25

GRO-seq (+E2)

-15 -15 70 20

CREBBP 70 20

CREBBP + E2 10 10

H3K4me1 10 10

H3K4me1 + E2 10 50

RAD21 10 50

RAD21 + E2

D With ERBS Without ERBS

Motif 1 Motif 1 Motif 2 Motif 3 2 2 2 2 2 2 2 2

1 1 1 1 bits bits bits 1 1 bits 1 1

Bits Bits T TG Bits Bits G C C TGC A C G C C G C C T T GG T C C TT A T A CA T G C TA A G C A T T TA T A G A A A A A AC CA A A A G A G G G C GT G CTTC C C A A T C T T A T T A G T A G T A G C G A T CC T A G A G C C T GG T G A C A AA T A GA T G TGA A C G T T TCT C 0 2 3 4 5 6 7 8 9 1 T C C 0 2 3 4 5 6 7 8 9 0 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 G G C 1 A

T 10 11 12 13 14 15 G C 10 11 12 13 14 15 10 11 12 13 14 15 0 GT G 0 0 10 11 12 13 14 15 0 C G MEME (no SSC) 12.01.12 14:51 A C G MEME (no SSC) 11.01.12 16:18 MEME (no SSC) 11.01.12 12:58 C AG CA GA C CT TA TC CAMEME (noG SSC) 11.01.12C 15:12 A CT G 1 3 5 7 9 11 13 15 1 3 5 7 9 11 13 15 1 3 5 7 9 11 13 15 1 3 5 7 9 11 13 15 Position Position Position Position Meme E-value 2.8 x 10-126 2.1 x 10-396 2.2 x 10-304 1.3 x 10-242 No. of Sites 55 569 108 146 STAMP match ESR1 / ESR2 SP1 / KLF4 BCD EGR1 STAMP E-value 2.4 x 10-5 / 5.8 x 10-5 1.1 x 10-4 / 1.6 x 10-4 2.7 x 10-7 1.1 x 10-3

Fig. 7 - Hah et al. (2013) Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Enhancer Transcripts Mark Active Estrogen Receptor Binding Sites

Nasun Hah, Shino Murakami, Anusha Nagari, et al.

Genome Res. published online May 1, 2013 Access the most recent version at doi:10.1101/gr.152306.112

Supplemental http://genome.cshlp.org/content/suppl/2013/05/16/gr.152306.112.DC1 Material

P

Accepted Peer-reviewed and accepted for publication but not copyedited or typeset; accepted Manuscript manuscript is likely to differ from the final, published version.

Creative This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the Commons first six months after the full-issue publication date (see License http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported), as described at http://creativecommons.org/licenses/by-nc/3.0/.

Email Alerting Receive free email alerts when new articles cite this article - sign up in the box at the Service top right corner of the article or click here.

Advance online articles have been peer reviewed and accepted for publication but have not yet appeared in the paper journal (edited, typeset versions may be posted when available prior to final publication). Advance online articles are citable and establish publication priority; they are indexed by PubMed from initial publication. Citations to Advance online articles must include the digital object identifier (DOIs) and date of initial publication.

To subscribe to Genome Research go to: https://genome.cshlp.org/subscriptions

© 2013, Published by Cold Spring Harbor Laboratory Press