Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
Comparative analyses of super-enhancers reveal conserved elements in vertebrate genomes
Yuvia A. Pérez-Rico1, 2, 3, Valentina Boeva2, 4, Allison C. Mallory1, Angelo Bitetti1, 3, Sara
Majello1, Emmanuel Barillot2, 4, Alena Shkumatava1
1 Institut Curie, PSL Research University, INSERM U934, CNRS UMR 3215, F-75005, Paris,
France.
2 INSERM, U900, F-75005, Paris, France.
3 Sorbonne Universités, UPMC Univ Paris 06, F-75005, Paris, France.
4 Institut Curie, Mines ParisTech, PSL Research University, F-75005, Paris, France.
Address correspondence to [email protected]
Institut Curie, PSL Research University, CNRS, UMR 3215, 26 rue d’Ulm, F-75005, Paris,
France.
Running title: Conserved super-enhancers in vertebrates.
Keywords: enhancers, super-enhancers, H3K27ac, hyperactive chromatin, zebrafish
1
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
1 ABSTRACT
2 Super-enhancers (SEs) are key transcriptional drivers of cellular, developmental and
3 disease states in mammals, yet the conservational and regulatory features of these
4 enhancer elements in non-mammalian vertebrates are unknown. To define SEs in zebrafish
5 and enable sequence and functional comparisons to mouse and human SEs, we used
6 genome-wide histone H3 lysine 27 acetylation (H3K27ac) occupancy as a primary SE
7 delineator. Our study determined the set of SEs in pluripotent state cells and adult zebrafish
8 tissues and revealed both similarities and differences between zebrafish and mammalian
9 SEs. Although the total number of SEs was proportional to the genome size, the genomic
10 distribution of zebrafish SEs differed from that of the mammalian SEs. Despite the
11 evolutionary distance separating zebrafish and mammals and the low overall SE sequence
12 conservation, ~42% of zebrafish SEs were located in close proximity to orthologs that also
13 were associated with SEs in mouse and human. Compared to their non-associated
14 counterparts, higher sequence conservation was revealed for those SEs that have
15 maintained orthologous gene associations. Functional dissection of two of these SEs
16 identified conserved sequence elements and tissue-specific expression patterns, while
17 chromatin accessibility analyses predicted transcription factors governing the function of
18 pluripotent state zebrafish SEs. Our zebrafish annotations and comparative studies show the
19 extent of SE usage and their conservation across vertebrates, permitting future gene
20 regulatory studies in several tissues.
21
22
23
24
25
2
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
26 INTRODUCTION
27 The identification of transcriptional regulators is central for understanding tissue-specific
28 expression programs. Enhancers are cis-regulatory elements able to recruit transcription
29 factors (TFs) and the transcriptional apparatus to activate their target gene expression
30 (Smith and Shilatifard 2014; Heinz et al. 2015; Ren and Yue 2015). Chromatin
31 immunoprecipitation followed by high-throughput sequencing (ChIP-seq) has been a
32 frequently used strategy to generate genome-wide enhancer annotations (Visel et al. 2009;
33 Creyghton et al. 2010; Bernstein et al. 2010; Rada-Iglesias et al. 2011; Kieffer-Kwon et al.
34 2013; Vermunt et al. 2014; Villar et al. 2015; Prescott et al. 2015). ChIP-seq-based
35 approaches have shown that a subset of mammalian enhancers are found in close
36 sequence proximity to one another, forming large regions of hyperactive chromatin referred
37 to as super-enhancers (SEs) or stretch enhancers (Whyte et al. 2013; Lovén et al. 2013;
38 Parker et al. 2013). This structure distinguishes them from shorter, more compacted regions
39 referred to as typical enhancers.
40 SEs are characterized by their high level of histone H3 lysine 27 acetylation (H3K27ac)
41 density, a mark associated with active enhancers and promoters (Creyghton et al. 2010;
42 Rada-Iglesias et al. 2011), and the binding of a high abundance of TFs, transcriptional
43 coactivators and chromatin remodelers (Whyte et al. 2013; Hnisz et al. 2013). Analyses of
44 the SE dynamics during lineage commitment of specific cell types have shown that SEs are
45 remodeled during differentiation, having crucial roles in cell fate determination (Vahedi et al.
46 2015; Adam et al. 2015; Thakurela et al. 2015). Moreover, SEs are enriched for single
47 nucleotide polymorphisms (SNPs) associated with a broad spectrum of diseases including
48 but not limited to cancers, type 1 diabetes, Alzheimer’s disease and multiple sclerosis (Hnisz
49 et al. 2013; Parker et al. 2013; Vahedi et al. 2015). For example, a fraction of human T-cell
50 acute lymphoblastic leukemia cases exhibits somatic mutations that create MYB TF binding
51 sites that generate a SE adjacent to the TAL1 oncogene (Mansour et al. 2014). Despite a
52 basic understanding of the features and functions of mammalian SEs and a recently
3
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
53 published catalog of SEs in non-vertebrates (Wei et al. 2016), the extent to which the
54 defining characteristics of mammalian SEs also apply to similar regulatory regions in species
55 outside of the mammalian clade is not known.
56 Comparative analyses of enhancers in different species have been invaluable for our
57 understanding of their evolution (reviewed in Rubinstein and de Souza 2013; Domené et al.
58 2013). Here, we employed the zebrafish model as an exemplar to define SE biology in
59 vertebrates (Patton et al. 2005; Howe et al. 2013; White et al. 2013; Vacaru et al. 2014;
60 Kaufman et al. 2016). Previous studies of zebrafish have successfully identified stage-
61 specific enhancers involved in early development and have highlighted their general low
62 sequence conservation (Aday et al. 2011; Bogdanović et al. 2012; Lee et al. 2015). Although
63 these enhancer annotations open the possibility to gain fundamental insights into gene
64 regulation during embryonic development, they do not address the tissue-specificity of
65 enhancers in zebrafish.
66 To identify cell- and tissue-specific enhancers, in particular SEs, we analyzed the distribution
67 of H3K27ac in zebrafish pluripotent cells and four adult tissues. Our comparative analyses of
68 zebrafish, mouse and human SEs highlight their differences and similarities and advance the
69 study of gene regulation in zebrafish by identifying a set of SE candidates involved in cellular
70 identity.
71
72 RESULTS
73 H3K27ac marks hundreds of SEs in zebrafish
74 To assess characteristic features of vertebrate SEs, we identified enhancer regions in
75 zebrafish (Danio rerio), mouse and human brain, heart, intestine, testis and pluripotent cells.
76 For zebrafish, we used the early embryonic dome stage as a comparative stage to the
77 pluripotent state of mouse and human ESCs (Schier and Talbot 2005). All mouse and
78 human enhancer annotations, as well as zebrafish pluripotent state enhancer annotations
4
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
79 were based on publicly available datasets of the H3K27ac mark, whereas those of the
80 zebrafish adult brain, heart, intestine and testis were performed using in-house generated
81 H3K27ac ChIP-seq datasets (Fig. 1A; Supplemental Table S1; Bernstein et al. 2010; Rada-
82 Iglesias et al. 2011; Mouse ENCODE Consortium 2012; Bogdanović et al. 2012; Chadwick
83 et al. 2012; Nord et al. 2013; Yue et al. 2014). To identify typical enhancers and SEs,
84 H3K27ac–enriched regions were identified with SICER (Zang et al. 2009), filtered to discard
85 active promoters and stitched by the ROSE software (Fig. 1A; Whyte et al. 2013; Lovén et
86 al. 2013). We identified an average of 743 and 1,183 SEs for zebrafish and mammals,
87 respectively (Fig. 1B; Supplemental Table S1; Supplemental Dataset S1). Similar to
88 mammalian SEs, most zebrafish SEs were longer than typical enhancers, although the
89 length parameter was not explicitly considered for their identification (Supplemental Fig.
90 S1A-C; examples of typical enhancers and SEs are shown in Supplemental Fig. S2A).
91
92 Genomic distribution of zebrafish typical enhancers and SEs differs from that of
93 mammalian regions
94 In contrast to mammalian SEs, which tend to overlap with gene bodies (Whyte et al. 2013;
95 Lovén et al. 2013), neither zebrafish typical enhancers nor zebrafish SEs were preferentially
96 enriched in the TSS downstream regions in any tissue or at any embryonic stage analyzed
97 (Fig. 2A; Supplemental Fig. S2B). To assess if zebrafish typical enhancers and SEs were
98 enriched in gene bodies, the proportion of genes covered by typical enhancers and SEs was
99 calculated and compared to the proportion of genes covered by random control regions. As
100 expected, mouse and human typical enhancers and SEs from all analyzed samples showed
101 significant enrichments in gene bodies (P-values from z-scores ≤ 4.71x10-18), whereas gene-
102 body enrichment of zebrafish typical enhancers and SEs showed variation among the
103 different cells and tissues analyzed (Fig. 2B). Furthermore, we found that on average for all
104 cells and tissues analyzed, ~65% and ~73% of mouse and ~70% and ~80% of human
105 typical enhancer and SE sequences, respectively, overlapped introns (Fig. 2C). In zebrafish,
5
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
106 only ~28%of typical enhancer and ~29% of SE sequences overlapped introns, and the
107 majority of zebrafish typical enhancer and SE sequences (~67% and ~66%, respectively)
108 overlapped intergenic regions in all zebrafish cells and adult tissues (Fig. 2C; Supplemental
109 Fig. S2C). These drastic differences in genomic distribution cannot be solely explained by
110 differences in the global genome composition of the three species, as more than 50% of the
111 zebrafish, mouse and human genomes correspond to intergenic sequences (Supplemental
112 Fig. S2D).
113
114 Vertebrate SEs are more cell- and tissue-specific than typical enhancers
115 A notable characteristic of mammalian SEs is their association with key cellular identity
116 genes (Whyte et al. 2013; Hnisz et al 2013; Fig. 3A). Similar to mouse and human SEs,
117 gene ontology (GO) annotations of the zebrafish SEs in pluripotent state, brain, heart,
118 intestine and testis showed enriched terms related to early development and pluripotency,
119 neuronal components, signal transduction, immune pathways and chromatin organization,
120 respectively (Supplemental Fig. S3). In addition, our intraspecies comparisons showed that,
121 similar to mammals (Hnisz et al. 2013), zebrafish SEs exhibit higher cell- and tissue-
122 specificity than typical enhancers (P-values from G-tests of independence ≤ 8.5x10-13, with
123 the exception of zebrafish heart; Fig. 3B; Supplemental Fig. S4).
124
125 SEs associate with a conserved set of genes throughout vertebrate evolution
126 Collectively, typical enhancers and SEs showed higher sequence conservation than their
127 immediate flanking regions (P-values from Wilcoxon rank-sum test ≤ 2.8x10-4, with the
128 exception of typical enhancers from the right ventricle of the human heart; Fig. 4A). While
129 zebrafish SEs from most tissues analyzed had significantly higher sequence conservation
130 than zebrafish typical enhancers (P-values from Wilcoxon rank-sum test ≤ 9.3x10-4), mouse
131 and human sequence conservation differences were dependent on the tissue analyzed
6
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
132 (Supplemental Fig. S5A). When we compared individual intergenic regions enriched for
133 H3K27ac within typical enhancers and SEs, the higher conservation found for full-length SEs
134 was diminished, and, for most of the datasets, typical enhancer regions were more
135 conserved than SE regions (P-values from Wilcoxon rank-sum test ≤ 3.7x10-3; Supplemental
136 Fig. S5B). This observation is consistent with the fact that a higher proportion of SE
137 constitutive regions overlaps intragenic sequences, which could artificially inflate the SE
138 conservation estimate when analyzed as a whole unit (Supplemental Fig. S5C).
139 Next, to determine if SEs tend to maintain their spatial association with orthologous genes
140 throughout evolution, the genes associated with zebrafish, mouse and human typical
141 enhancers and SEs were compared based on homology annotations. The proportion of
142 orthologous genes associated with typical enhancers in all three species was significantly
143 larger than that associated with SEs (P-values from G-tests of independence ≤ 5.497x10-8;
144 Fig. 4B; Supplemental Fig. S6A-D; Supplemental Table S2). Approximately 42% of zebrafish
145 SEs were associated with orthologous genes in mouse and human (pluripotent state =
146 110/473; brain = 321/664; heart = 325/850; intestine = 462/1145; testis = 362/581), and
147 ~27% and ~21% of the mouse and human SEs, respectively, maintained their orthologous
148 associations (examples are illustrated in Fig. 4C and Supplemental Fig. S6E-H). Importantly,
149 mammalian SEs with conserved orthologous gene associations in the three species had
150 higher sequence conservation than the non-associated-SEs (P-values from Wilcoxon rank-
151 sum test ≤ 4.7x10-3). Similar results were also observed for the zebrafish brain and testis
152 SEs (P-values from Wilcoxon rank-sum test ≤ 9.1x10-3; Fig. 4D; Supplemental Fig. 6I). Thus,
153 despite overall low sequence conservation in vertebrates, SEs that maintained orthologous
154 gene associations exhibited higher conservation at the sequence level than those lacking
155 such associations.
156
157 Analysis of accessible chromatin identifies differences between zebrafish typical
158 enhancer and SE composition
7
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
159 Within zebrafish SEs, we sought to demarcate transcription factor binding site (TFBS)
160 hotspots or epicenters, defined as regions shorter than 1 kb bound by at least five TFs
161 involved in cell identity (Siersbæk et al. 2014; Adam et al. 2015). To overcome the lack of
162 zebrafish ChIP-seq data, we focused on the identification of accessible chromatin regions by
163 ATAC-seq (Buenrostro et al. 2013; Supplemental Fig. S7A). To confirm that ATAC-seq data
164 can be mined to identify TFBSs in zebrafish, we compared ATAC-seq and Nanog ChIP-seq
165 peaks (Xu et al. 2012). These comparisons showed significant overlap at both the genome-
166 wide level and within SEs (P-values based on hypergeometric distributions ≤ e-2917.71; Fig.
167 5A).
168 A differential analysis of ATAC-seq peaks within typical enhancers and SEs identified 12
169 clusters of over-represented motifs within SEs (Supplemental Fig. S7B). Our set of
170 consensus motifs included those with similarity to matrix models of pluripotency-associated
171 TFs, such as SOX2, EOMES and FOXD3 (Sutton et al. 1996; Hromas et al. 1999; Avilion et
172 al. 2003; Kidder and Palmer 2010). The motif that correlated with the SOX2 matrix was the
173 consensus of two motifs: one similar to the SOX2 matrix model and the second motif similar
174 to the SOX9 and ESRRA matrix models (Fig. 5B). GO annotation of the SE ATAC-seq peaks
175 containing sites of these two motifs showed enrichment for TF function and pluripotency
176 terms that were not identified by the global analysis of pluripotent state SEs (Fig. 5C;
177 Supplemental Fig. S3A). Thus, our results predict a set of TFs with enriched binding to
178 accessible chromatin regions highly associated with pluripotency.
179
180 Dissections of vertebrate SEs identify functionally conserved elements
181 To determine the different contribution of regions within SEs, two SEs with conserved
182 association with irf2bpl and zic2a (hereafter referred as SE-irf2bpl and SE-zic2a; Fig. 4C;
183 Supplemental Fig. S6A) were tested by GFP reporter assays in zebrafish embryos
184 (Supplemental Fig. S8A). Twelve zebrafish gene distal regions were selected for the
8
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
185 enhancer activity test based on their H3K27ac, ATAC-seq and Nanog ChIP-seq profiles (Fig.
186 6; Supplemental Table S3). To evaluate the functional conservation of the equivalent mouse
187 SEs, nine mouse regions, selected based on presence or absence of TFBSs for 14
188 pluripotent state TFs, were tested (Supplemental Fig. S7A; Supplemental Table S3; Chen et
189 al. 2008; Heng et al. 2010; Ma et al. 2011; Vella et al. 2012; Betschinger et al. 2013; Whyte
190 et al. 2013). It should be noted that while the mouse Zic2–associated region is a typical
191 enhancer at the pluripotent state (Fig. 6C), it is identified as a SE in the brain (Fig. 4C).
192 For zebrafish SE-irf2bpl, there was a strong concordance between enhancer activity and the
193 presence of a high ATAC-seq signal (Fig. 6A-B, Supplemental Fig. S8B). Remarkably, the
194 GFP expression pattern driven by the conserved zebrafish region D and mouse region K
195 (Fig. 6A) substantially overlapped within the olfactory placode (Fig. 6B). Similarly, the mouse
196 region G (Fig. 6A) drove dim GFP expression in the olfactory placode at ~24 hours post-
197 fertilization (hpf) with peak GFP expression in the roof plate at 48 hpf (Supplemental Fig.
198 S8B).
199 For zebrafish SE-zic2a, 75% of SE-zic2a regions exhibiting enhancer activity also contained
200 ATAC-seq peaks and displayed high sequence conservation (the P, Q and R regions; Fig.
201 6C-D; Supplemental Fig. S8C). Interestingly, the zebrafish S region, originally selected as a
202 control region based on the lack of sequence conservation and the absence of H3K27ac and
203 ATAC-seq signals, drove specific GFP expression in the notochord and telencephalon (Fig.
204 6D) similar to the spinal cord and telencephalon expression driven by the equivalent mouse
205 T region (Fig. 6D). As the S region contained a mildly enriched Nanog peak (Fig. 6C) and
206 predicted TFBSs (Supplemental Table S3), it likely corresponds to a redundant or “shadow”
207 enhancer that is not active under homeostatic conditions and, consequently, is not found by
208 ATAC-seq (Fig. 6C).
209 Taken together, our results confirm that SEs contain regions with evolutionary conserved
210 enhancer functions and emphasize the importance of analyzing comprehensive hyperactive
9
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
211 chromatin regions instead of isolated enhancers to allow the identification of enhancers with
212 partially redundant activities.
213
214 DISCUSSION
215 In this study, we identify tissue-specific enhancers in zebrafish, focusing on hyperactive
216 chromatin regions or SEs. Our comparative analyses support a model in which SEs specify
217 uniquely important cell- and tissue-specific regulatory regions across species (Hnisz et al.
218 2013; Saint-André et al. 2016), and highlight the difference in genomic distribution between
219 zebrafish and mammalian SEs. While the majority of mammalian SEs overlap with their
220 target genes (Whyte et al. 2013), zebrafish typical enhancers and SEs are mainly located
221 within intergenic regions. Similarly, during early zebrafish development, differentially
222 methylated DNA regions, ~50% of which are enriched for enhancer-associated chromatin
223 marks including H3K27ac, are mainly embedded within intergenic sequences (Lee et al.
224 2015). Future analyses incorporating the enhancer annotations of additional species may
225 reveal if the intergenic distribution of zebrafish regulatory regions is a distinctive feature.
226 Similar to what has been shown for zebrafish and mammalian enhancers (Bogdanović et al.
227 2012; Lee et al. 2015; Villar et al. 2015), our PhastCons value-based sequence conservation
228 analysis showed that both zebrafish typical enhancers and SEs have overall low sequence
229 conservation, and that SE intergenic constitutive regions do not display higher conservation
230 than those of typical enhancers. However, the sequence conservation was detectably higher
231 in the fraction of SEs that has maintained an association with orthologous genes in
232 zebrafish, mouse and human compared to the fraction lacking conserved orthologous
233 associations. It remains to be determined if those SEs with orthologous gene associations
234 have an evolutionary common origin, or if they independently evolved in the three species.
235 Notably, enhancers shared between human and chimp also display higher sequence
236 conservation than species-biased enhancers (Prescott et al. 2015).
10
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
237 Previous studies have reported enhancer regions with overlapping functions in
238 phylogenetically distant species (Hare et al. 2008; Taher et al. 2011; Clarke et al. 2012).
239 However, the genome-wide prediction of those regions is not trivial (Taher et al. 2011), as
240 sequence conservation alone does not necessarily predict functional conservation, and
241 regions with high sequence conservation can drive different patterns of expression in
242 reporter assays (Goode et al. 2011). Thus, it is remarkable that we defined equivalent
243 subregions in two SEs with conserved enhancer functions. Although the extent of enhancer
244 redundancy is poorly understood, a recent study has shown the genome-wide pervasiveness
245 of shadow enhancers during Drosophila development (Cannavò et al. 2016). Indeed, one of
246 the zebrafish SE regions identified in this study likely represents a shadow enhancer with a
247 conserved function. For these reasons, we propose that the future identification of shadow
248 enhancers will benefit from the analysis of whole hyperactive chromatin regions rather than
249 the analysis of isolated enhancers.
250 Our study reveals the genome-wide distribution of tissue-specific cis-regulatory elements in
251 zebrafish and identifies the key SE complement in this important model system. Moreover,
252 the characterized genomic distribution of zebrafish typical enhancers and SEs, together with
253 our comparative analyses to those of mammals solidifies our understanding of pervasive and
254 conserved vertebrate transcriptional mechanisms.
255
256 METHODS
257 ChIP-seq assays
258 Whole brains, hearts, intestines and testis were dissected from same-age adult male AB
259 zebrafish. Two biological replicates were prepared from each tissue. ChIP-seq was
260 performed as previously described (Guenther et al. 2008) using Abcam H3K27ac antibody
261 (ab4729, lot# GR259887-1). Purified chromatin was used for single-end library preparation
262 following standard Illumina protocols. For more details, see Supplemental Material.
11
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
263
264 Identification of typical enhancers and SEs
265 H3K27ac ChIP-seq datasets were mapped to their corresponding reference genomes (Zv9
266 for zebrafish, mm10 for mouse and hg38 for human) using Bowtie 2 version 2.1.0
267 (Langmead and Salzberg 2012). Peak calling was performed with SICER version 1.1 (Zang
268 et al. 2009), if available, input libraries were used as controls for the peak calling
269 (Supplemental Table S1). Identified peaks were filtered to discard peaks for which the main
270 summit was within promoter regions and used as input for the ROSE algorithm version 0.1 to
271 identify typical enhancers and SEs. For detailed parameters see, Supplemental Material;
272 Supplemental File S1 and Supplemental File S2.
273
274 Computational analyses
275 The calculation of typical enhancer and SE distributions around TSSs was performed using
276 Nebula (Boeva et al. 2012). Typical enhancer and SE enrichments over gene bodies were
277 calculated with a customized script (Supplemental File S3) and control enrichments were
278 obtained by bootstrap resampling with 100 iterations. To calculate the percentage of typical
279 enhancer and SE sequences overlapping with genomic features, typical enhancer and SE
280 annotations were compared to RefSeq Gene annotations (Rosembloom et al. 2015) using
281 BEDTools intersect function (Quinlan and Hall 2010). Sequence conservation scores were
282 calculated based on the vertebrate conservation PhastCons tracks from UCSC (Siepel and
283 Haussler 2005; Siepel et al. 2005) associated with each of the genome versions used for
284 read mapping using hgWiggle (Kent et al. 2002) and a customized Python script
285 (Supplemental File S4). For ortholog comparisons, typical enhancer and SE target genes
286 were annotated based on gene proximity using Nebula. All gene names were converted to
287 Ensembl ids and compared based on homology annotations from Ensembl (Genes 82;
288 Cunningham et al. 2015). Analysis of the ATAC-seq library was performed as previously
12
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
289 described (Buenrostro et al. 2013). Over-represented motifs in ATAC-seq peaks within SEs
290 were identified using the RSAT peak-motifs tool (Thomas-Chollier et al. 2012a; Thomas-
291 Chollier et al. 2012b). For more details see, Supplemental Material.
292
293 Microinjections
294 Each of the vectors containing SE regions (for cloning details see the Supplemental
295 Material) was co-injected with Tol2 mRNA into one-cell stage zebrafish embryos. GFP
296 expression was monitored during the first three days post-fertilization. All injection
297 experiments were repeated at least twice (Supplementary Table S3). For more details, see
298 Supplemental Material.
299
300 DATA ACCESS
301 Zebrafish H3K27ac ChIP-seq data generated in this study have been submitted to the NCBI
302 Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) (Edgar et al. 2002)
303 under accession number GSE75734.
304
305 ACKNOWLEDGEMENTS
306 We thank Igor Ulitsky, Matthew Guenther and Violaine Saint-André for helpful comments on
307 this manuscript. We also thank all members of the Shkumatava lab for help with zebrafish
308 dissections and for useful discussions. High-throughput sequencing was performed by the
309 ICGex NGS platform of Institut Curie supported by the grants ANR-10-EQPX-03 (Equipex)
310 and ANR-10-INBS-09-08 (France Génomique Consortium) from the ANR (“Investissements
311 d’Avenir” program) and by the Canceropole Île-de-France. This work was supported by
312 grants from ERC (FLAME-337440), ATIP-Avenir and La Fondation Bettencourt Schueller
13
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
313 and FRM (DBI201312285578). YAPR was partially funded by a scholarship from Secretaría
314 de Ciencia, Tecnología e Innovación – Seciti, México.
315
316 Author contributions
317 AS conceived and designed the project. YAPR, VB, ACM and AS designed experiments;
318 ACM and AS performed zebrafish ChIP-seq; YAPR performed computational analyses and
319 prepared plasmid constructs; YAPR and AB performed microinjections and microscopy; SM
320 assisted with experimental work; YAPR, ACM and AS wrote the manuscript. All authors
321 reviewed and approved the manuscript; VB, EB and AS supervised the project.
322
323 DISCLOSURE DECLARATION
324 The authors declare no competing interests.
325
326 FIGURE LEGENDS
327 Figure 1. Identification of typical enhancers and SEs in vertebrate genomes.
328 (A) Workflow for the identification of vertebrate typical enhancers and SEs. Schematic
329 representations depict the cells and tissues analyzed.
330 (B) Saturation curves of H3K27ac density across brain datasets (whole brain for zebrafish,
331 olfactory bulb for mouse and middle frontal lobe for human). The number of ranked typical
332 enhancers and SEs by H3K27ac density (x-axis) and their densities (y-axis) are plotted.
333 Horizontal dotted lines represent density cutoffs used for the classification of SEs and
334 vertical dotted lines demark SEs from typical enhancers. The total number of predicted SEs
335 is noted on the right side of each graph.
336
14
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
337 Figure 2. Genomic distribution of typical enhancers and SEs.
338 (A) Density plots representing the proportion of genes (y-axis) covered by typical enhancers
339 and SEs in the vicinity of TSSs (x-axis) in zebrafish brain, mouse cerebellum and human
340 angular gyrus.
341 (B) Proportion of gene bodies overlapping with typical enhancers, SEs and control regions
342 (y-axis) in different zebrafish, mouse and human cells and tissues (x-axis). The mean and
343 the standard deviation (black bars) calculated from bootstrap analyses of control regions are
344 shown. All comparisons between typical enhancers and SEs and their controls have
345 significant differences (P-values from z-scores ≤ 3x10-4), with the exception of zebrafish
346 pluripotent state and heart typical enhancers. NS, not significant.
347 (C) Distribution of typical enhancer and SE sequences across genomic features. The y-axis
348 shows the percentage of total brain typical enhancer or SE base pairs overlapping the
349 different genomic features represented in the legend. Adult brain datasets for mouse and
350 human correspond to olfactory bulb and cingulate gyrus, respectively.
351
352 Figure 3. Cell and tissue specificity of vertebrate typical enhancers and SEs.
353 (A) Distribution of H3K27ac at selected genes (genomic position represented on the x-axis)
354 in both pluripotent state and adult brain of zebrafish, mouse and human (raw tag counts
355 represented on the y-axis). Typical enhancers and SEs are denoted by grey bars and red
356 bars, respectively.
357 (B) Chow-Ruskey diagrams representing the overlap between pluripotent state (orange),
358 brain (green), heart (purple), intestine (red) and testis (blue) typical enhancers and SEs in
359 zebrafish. Color-coded tables show the percentages of cell- or tissue-specific and non-
360 specific regions for each dataset.
361
15
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
362 Figure 4. SE conservation in vertebrates.
363 (A) Metagenes of sequence conservation of typical enhancers and SEs from zebrafish whole
364 brain, mouse olfactory bulb and human middle frontal lobe. The x-axis depicts the start and
365 end of typical enhancers and SEs flanked by 3 kb of adjacent sequence. The y-axis
366 represents sequence conservation calculated by PhastCons.
367 (B) Venn diagrams show the number of orthologous genes associated with brain typical
368 enhancers (left) and SEs (right) in zebrafish (green), mouse (blue) and human (purple).
369 Color-coded tables show the percentages of intersection and difference for each species.
370 The observed differences in overlap between typical enhancers and SEs in the three species
371 are significant (p-values ≤ 5.497x10-8) based on G-tests of independence.
372 (C) ChIP-seq binding profiles for H3K27ac at the indicated loci in zebrafish, mouse and
373 human brain (raw tag counts represented on the y-axis). Typical enhancers and SEs are
374 denoted by grey bars and red bars, respectively. Gene positions are noted along the x-axis.
375 (D) Box plots depicting average sequence conservation of brain SEs with maintained
376 orthologous association in zebrafish, mouse and human and with no maintained orthologous
377 association. The y-axis shows sequence conservation calculated by PhastCons. The box
378 bounds the interquartile range divided by the median and the notch approximates a 95%
379 confidence interval for the median. All observed differences in conservation between SE
380 categories are significant (p-value ≤ 9.1x10-3) based on Wilcoxon rank-sum tests.
381
382 Figure 5. Analysis of zebrafish SE composition by ATAC-seq.
383 (A) Venn diagrams representing the overlap between ATAC-seq peaks (purple) and Nanog
384 peaks (orange) genome-wide (left) and within pluripotent state SEs (right).
16
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
385 (B) Cluster, consensus motif sequence and logos of SOX-related de-novo-found motifs in
386 ATAC-seq peaks within SEs (left). JASPAR matrix models (right) of SOX2, SOX9 and
387 ESRRA. Ncorr, normalized correlation between identified motifs and JASPAR models.
388 (C) Top molecular function and wiki pathway GO terms enriched for the ATAC-seq peaks
389 containing sites of the de-novo identified oligos_7nt_m2 (left) and oligos_6nt_m3 (right)
390 motifs shown in (B). Binomial FDR q-values for the GO terms are displayed in a color-scale
391 (q-values ≤ 6.7x10-4).
392
393 Figure 6. Functional analysis of vertebrate SEs.
394 (A) Genomic context and conservation of the zebrafish (left) and mouse (right) irf2bpl and
395 Irf2bpl loci. Horizontal bars represent SEs (red). Raw H3K27ac ChIP-seq, ATAC-seq and
396 Nanog ChIP-seq profiles are shown in tag counts (y-axis). The TFBS track represents the
397 TFBS enrichment along the mouse locus. The Vertebrate Cons tracks represent
398 conservation scores calculated by PhastCons. Grey and green highlighted regions
399 correspond to the regions tested in reporter assays. Regions driving specific GFP
400 expression are indicated in green.
401 (B) GFP expression driven by the zebrafish SE-irf2bpl D region (left) and the mouse K
402 region (right) in transgenic zebrafish embryos at 48 hpf. White arrows indicate the olfactory
403 placode (op).
404 (C) Genomic context and conservation of the zebrafish and mouse zic2a and Zic2 loci as
405 described in A. Horizontal bars represent typical enhancers (grey) and SEs (red).
406 (D) GFP expression driven by the zebrafish P, Q and S regions (left) and the mouse T region
407 (right). H, hindbrain; nt, notochord; r, retina; sc, spinal cord; t, telencephalon.
408
409 REFERENCES
17
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
410 Adam RC, Yang H, Rockowitz S, Larsen SB, Nikolova M, Oristian DS, Polak L, Kadaja M,
411 Asare A, Zheng D, et al. 2015. Pioneer factors govern super-enhancer dynamics in
412 stem cell plasticity and lineage choice. Nature 521: 366–370.
413 Aday AW, Zhu LJ, Lakshmanan A, Wang J, Lawson ND. 2011. Identification of cis regulatory
414 features in the embryonic zebrafish genome through large-scale profiling of
415 H3K4me1 and H3K4me3 binding sites. Dev Biol 357: 450–462.
416 Avilion AA, Nicolis SK, Pevny LH, Perez L, Vivian N, Lovell-Badge R. 2003. Multipotent cell
417 lineages in early mouse development depend on SOX2 function. Genes Dev 17:
418 126–140.
419 Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A,
420 Kellis M, Marra MA, Beaudet AL, Ecker JR, et al. 2010. The NIH Roadmap
421 Epigenomics Mapping Consortium. Nat Biotechnol 28: 1045–1048.
422 Betschinger J, Nichols J, Dietmann S, Corrin PD, Paddison PJ, Smith A. 2013. Exit from
423 Pluripotency Is Gated by Intracellular Redistribution of the bHLH Transcription Factor
424 Tfe3. Cell 153: 335–347.
425 Boeva V, Lermine A, Barette C, Guillouf C, Barillot E. 2012. Nebula--a web-server for
426 advanced ChIP-seq data analysis. Bioinformatics 28: 2517–2519.
427 Bogdanović O, Fernandez-Minan A, Tena JJ, de la Calle-Mustienes E, Hidalgo C, van
428 Kruysbergen I, van Heeringen SJ, Veenstra GJC, Gomez-Skarmeta JL. 2012.
429 Dynamics of enhancer chromatin signatures mark the transition from pluripotency to
430 cell specification during embryogenesis. Genome Res 22: 2043–2053.
431 Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. 2013. Transposition of native
432 chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding
433 proteins and nucleosome position. Nat Methods 10: 1213–1218.
434 Cannavò E, Khoueiry P, Garfield DA, Geeleher P, Zichner T, Gustafson EH, Ciglar L, Korbel
435 JO, Furlong EE. 2016. Shadow enhancers are pervasive features of developmental
436 regulatory networks. Curr Biol 26: 38–51.
18
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
437 Chadwick LH. 2012. The NIH Roadmap Epigenomics Program data resource. Epigenomics
438 4: 317–324.
439 Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB, Wong E, Orlov YL, Zhang W, Jiang J, et
440 al. 2008. Integration of External Signaling Pathways with the Core Transcriptional
441 Network in Embryonic Stem Cells. Cell 133: 1106–1117.
442 Clarke SL, VanderMeer JE, Wenger AM, Schaar BT, Ahituv N, Bejerano G. 2012. Human
443 developmental enhancers conserved between deuterostomes and protostomes.
444 PLoS Genet 8: e1002852. doi: 10.1371/journal.pgen.1002852.
445 Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, Hanna J,
446 Lodato MA, Frampton GM, Sharp PA, et al. 2010. Histone H3K27ac separates active
447 from poised enhancers and predicts developmental state. Proc Natl Acad Sci 107:
448 21931–21936.
449 Cunningham F, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham
450 P, Coates G, Fitzgerald S, et al. 2015. Ensembl 2015. Nucleic Acids Res 43: D662–
451 D669.
452 Domené S, Bumaschny VF, de Souza FSJ, Franchini LF, Nasif S, Low MJ, Rubinstein M.
453 2013. Enhancer turnover and conserved regulatory function in vertebrate evolution.
454 Philos Trans R Soc Lond B Biol Sci 368: 20130027–20130027.
455 Edgar R, Domrachev M, Lash AE. 2002. Gene Expression Omnibus: NCBI gene expression
456 and hybridization array data repository. Nucleic Acids Res 30: 207–210.
457 Goode DK, Callaway HA, Cerda GA, Lewis KE, Elgar G. 2011. Minor change, major
458 difference: Divergent functions of highly conserved cis-regulatory elements
459 subsequent to whole genome duplication events. Development 138: 879–884.
460 Guenther MG, Lawton LN, Rozovskaia T, Frampton GM, Levine SS, Volkert TL, Croce CM,
461 Nakamura T, Canaani E, Young RA. 2008. Aberrant chromatin at genes encoding
462 stem cell regulators in human mixed-lineage leukemia. Genes & Dev 22: 3403–3408.
19
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
463 Hare EE, Peterson BK, Iyer VN, Meier R, Eisen MB. 2008. Sepsid even-skipped enhancers
464 are functionally conserved in Drosophila despite lack of sequence conservation.
465 PLoS Genet 4: e1000106. doi: 10.1371/journal.pgen.1000106.
466 Heinz S, Romanoski CE, Benner C, Glass CK. 2015. The selection and function of cell type-
467 specific enhancers. Nat Rev Mol Cell Biol 16: 144–154.
468 Heng JCD, Feng B, Han J, Jiang J, Kraus P, Ng JH, Orlov YL, Huss M, Yang L, Lufkin T, et
469 al. 2010. The Nuclear Receptor Nr5a2 Can Replace Oct4 in the Reprogramming of
470 Murine Somatic Cells to Pluripotent Cells. Stem Cell 6: 167–174.
471 Hnisz D, Abraham BJ, Lee TI, Lau A, Saint-André V, Sigova AA, Hoke HA, Young RA. 2013.
472 Super-Enhancers in the Control of Cell Identity and Disease. Cell 155: 934–947.
473 Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, Collins JE, Humphray S,
474 McLaren K, Matthews L, et al. 2013. The zebrafish reference genome sequence and
475 its relationship to the human genome. Nature 496: 498–503.
476 Hromas R, Ye H, Spinella M, Dmitrovsky E, Xu D, Costa RH. 1999. Genesis, a Winged Helix
477 transcriptional repressor, has embryonic expression limited to the neural crest, and
478 stimulates proliferation in vitro in a neural development model. Cell Tissue Res 297:
479 371–382.
480 Kaufman CK, Mosimann C, Fan ZP, Yang S, Thomas AJ, Ablain J, Tan JL, Fogley RD, van
481 Rooijen E, Hagedorn EJ, et al. 2016. A zebrafish melanoma model reveals
482 emergence of neural crest identity during melanoma initiation. Science 351:
483 aad2197. doi: 10.1126/science.aad2197.
484 Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. 2002. The
485 human genome browser at UCSC. Genome Res 12: 996–1006.
486 Kidder BL, Palmer S. 2010. Examination of transcriptional networks reveals an important
487 role for TCFAP2C, SMARCA4, and EOMES in trophoblast stem cell maintenance.
488 Genome Res 20: 458–472.
20
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
489 Kieffer-Kwon KR, Tang Z, Mathe E, Qian J, Sung MH, Li G, Resch W, Baek S, Pruett N,
490 Grøntved L, et al. 2013. Interactome maps of mouse gene regulatory domains reveal
491 basic principles of transcriptional regulation. Cell 155: 1507–1520.
492 Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat Meth 9:
493 357–359.
494 Lee HJ, Lowdon RF, Maricque B, Zhang B, Stevens M, Li D, Johnson SL, Wang T. 2015.
495 Developmental enhancers revealed by extensive DNA methylome maps of zebrafish
496 early embryos. Nat Commun 6: 6315. doi: 10.1038/ncomms7315.
497 Lovén J, Hoke HA, Lin CY, Lau A, Orlando DA, Vakoc CR, Bradner JE, Lee TI, Young RA.
498 2013. Selective Inhibition of Tumor Oncogenes by Disruption of Super-Enhancers.
499 Cell 153: 320–334.
500 Ma Z, Swigut T, Valouev A, Rada-Iglesias A, Wysocka J. 2011. Sequence-specific regulator
501 Prdm14 safeguards mouse ESCs from entering extraembryonic endoderm fates. Nat
502 Struct Mol Biol 18: 120–127.
503 Mansour MR, Abraham BJ, Anders L, Berezovskaya A, Gutierrez A, Durbin AD, Etchin J,
504 Lawton L, Sallan SE, Silverman LB, et al. 2014. An oncogenic super-enhancer
505 formed through somatic mutation of a noncoding intergenic element. Science 346:
506 1373–1377.
507 Mouse ENCODE Consortium. 2012. An encyclopedia of mouse DNA elements (Mouse
508 ENCODE). Genome Biol 13: 418. doi: 10.1186/gb-2012-13-8-418.
509 Nord AS, Blow MJ, Attanasio C, Akiyama JA, Holt A, Hosseini R, Phouanenavong S,
510 Plajzer-Frick I, Shoukry M, Afzal V, et al. 2013. Rapid and Pervasive Changes in
511 Genome-wide Enhancer Usage during Mammalian Development. Cell 155: 1521–
512 1531.
513 Parker SC, Stitzel ML, Taylor DL, Orozco JM, Erdos MR, Akiyama JA, van Bueren KL,
514 Chines PS, Narisu N, Black BL, et al. 2013. Chromatin stretch enhancer states drive
515 cell-specific gene regulation and harbour human disease risk variants. Proc Natl
516 Acad Sci 110: 17921–17926.
21
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
517 Patton EE, Widlund HR, Kutok JL, Kopani KR, Amatruda JF, Murphey RD, Berghmans S,
518 Mayhall EA, Traver D, Fletcher CD, et al. 2005. BRAF mutations are sufficient to
519 promote nevi formation and cooperate with p53 in the genesis of melanoma. Curr
520 Biol 15: 249–254.
521 Prescott SL, Srinivasan R, Marchetto MC, Grishina I, Narvaiza I, Selleri L, Gage FH, Swigut
522 T, Wysocka J. 2015. Enhancer Divergence and cis-Regulatory Evolution in the
523 Human and Chimp Neural Crest. Cell 163: 68–83.
524 Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities for comparing genomic
525 features. Bioinformatics 26: 841–842.
526 Rada-Iglesias A, Bajpai R, Swigut T, Brugmann SA, Flynn RA, Wysocka J. 2011. A unique
527 chromatin signature uncovers early developmental enhancers in humans. Nature
528 470: 279–283.
529 Ren B, Yue F. 2015. Transcriptional enhancers: Bridging the Genome and Phenome. Cold
530 Spring Harb Symp Quant Biol 2015 November 18. doi: 10.1101/sqb.2015.80.027219.
531 Rosenbloom KR, Armstrong J, Barber GP, Casper J, Clawson H, Diekhans M, Dreszer TR,
532 Fujita PA, Guruvadoo L, Haeussler M, et al. 2015. The UCSC Genome Browser
533 database: 2015 update. Nucleic Acids Res 43: D670–D681.
534 Rubinstein M, de Souza FSJ. 2013. Evolution of transcriptional enhancers and animal
535 diversity. Philos Trans R Soc Lond B Biol Sci 368: 20130017–20130017.
536 Saint-André V, Federation AJ, Lin CY, Abraham BJ, Reddy J, Lee TI, Bradner JE, Young
537 RA. 2016. Models of human core transcriptional regulatory circuitries. Genome Res
538 26: 385–396.
539 Schier A, Talbot WS. 2005. Molecular genetics of axis formation in zebrafish. Annu Rev
540 Genet 39: 561–613.
541 Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth
542 J, Hillier LW, Richards S, et al. 2005. Evolutionarily conserved elements in
543 vertebrate, insect, worm, and yeast genomes. Genome Res 15: 1034–1050.
22
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
544 Siepel A, Haussler D. 2005. Phylogenetic hidden Markov models. In Statistical methods in
545 molecular evolution (ed. R. Nielsen), pp. 325–351. Springer, New York.
546 Siersbæk R, Rabiee A, Nielsen R, Sidoli S, Traynor S, Loft A, La Cour Poulsen L,
547 Rogowska-Wrzesinska A, Jensen ON, Mandrup S. 2014. Transcription Factor
548 Cooperativity in Early Adipogenic Hotspots and Super-Enhancers. Cell Rep 7: 1443–
549 1455.
550 Smith E, Shilatifard A. 2014. Enhancer biology and enhanceropathies. Nat Struct Mol Biol
551 21: 210–219.
552 Sutton J, Costa R, Klug M, Field L, Xu D, Largaespada DA, Fletcher CF, Jenkins NA,
553 Copeland NG, Klemsz M, et al. 1996. Genesis, a winged helix transcriptional
554 repressor with expression restricted to embryonic stem cells. J Biol Chem 271:
555 23126–23133.
556 Taher L, McGaughey DM, Maragh S, Aneas I, Bessling SL, Miller W, Nobrega MA,
557 McCallion AS, Ovcharenko I. 2011. Genome-wide identification of conserved
558 regulatory function in diverged sequences. Genome Res 21: 1139–1149.
559 Thakurela S, Sahu SK, Garding A, Tiwari VK. 2015. Dynamics and function of distal
560 regulatory elements during neurogenesis and neuroplasticity. Genome Res 25:
561 1309–1324.
562 Thomas-Chollier M, Herrmann C, Defrance M, Sand O, Thieffry D, van Helden J. 2012a.
563 RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets. Nucleic Acids Res
564 40: e31. doi: 10.1093/nar/gkr1104.
565 Thomas-Chollier M, Darbo E, Herrmann C, Defrance M, Thieffry D, van Helden J. 2012b. A
566 complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using
567 peak-motifs. Nat Protoc 7: 1551–1568.
568 Vacaru AM, Di Narzo AF, Howarth DL, Tsedensodnom O, Imrie D, Cinaroglu A, Amin S, Hao
569 K, Sadler KC. 2014. Molecularly defined unfolded protein response subclasses have
570 distinct correlations with fatty liver disease in zebrafish. Dis Model Mech 7: 823–835.
23
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
571 Vahedi G, Kanno Y, Furumoto Y, Jiang K, Parker SCJ, Erdos MR, Davis SR, Roychoudhuri
572 R, Restifo NP, Gadina M, et al. 2015. Super-enhancers delineate disease-associated
573 regulatory nodes in T cells. Nature 520: 558–562.
574 Vella P, Barozzi I, Cuomo A, Bonaldi T, Pasini D. 2012. Yin Yang 1 extends the Myc-related
575 transcription factors network in embryonic stem cells. Nucleic Acids Res 40: 3403–
576 3418.
577 Vermunt MW, Reinink P, Korving J, de Bruijn E, Creyghton PM, Basak O, Geeven G,
578 Toonen PW, Lansu N, Meunier C, et al. 2014. Large-scale identification of
579 coregulated enhancer networks in the adult human brain. Cell Rep 9: 767–779.
580 Villar D, Berthelot C, Aldridge S, Rayner TF, Lukk M, Pignatelli M, Park TJ, Deaville R,
581 Erichsen JT, Jasinska AJ, et al. 2015. Enhancer Evolution across 20 Mammalian
582 Species. Cell 160: 554–566.
583 Visel A, Blow MJ, Li Z, Zhang T, Akiyama JA, Holt A, Plajzer-Frick I, Shoukry M, Wright C,
584 Chen F, et al. 2009. ChIP-seq accurately predicts tissue-specific activity of
585 enhancers. Nature 457: 854–858.
586 Wei Y, Zhang S, Shang S, Zhang B, Li S, Wang X, Wang F, Su J, Wu Q, Liu H, et al. 2016.
587 SEA: a super-enhancer archive. Nucleic Acids Res 44: D172-D179.
588 White R, Rose K, Zon L. 2013. Zebrafish cancer: the state of the art and the path forward.
589 Nat Rev Cancer 13: 624-636.
590 Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young
591 RA. 2013. Master Transcription Factors and Mediator Establish Super-Enhancers at
592 Key Cell Identity Genes. Cell 153: 307–319.
593 Xu C, Fan ZP, Müller P, Fogley R, DiBiase A, Trompouki E, Unternaehrer J, Xiong F,
594 Torregroza I, Evans T, et al. 2012. Nanog-like Regulates Endoderm Formation
595 through the Mxtx2-Nodal Pathway. Dev Cell 22: 625–638.
596 Yue F, Cheng Y, Breschi A, Vierstra J, Wu W, Ryba T, Sandstrom R, Ma Z, Davis C, Pope
597 BD, et al. 2014. A comparative encyclopedia of DNA elements in the mouse genome.
598 Nature 515: 355–364.
24
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
599 Zang C, Schones DE, Zeng C, Cui K, Zhao K, Peng W. 2009. A clustering approach for
600 identification of enriched domains from histone modification ChIP-Seq data.
601 Bioinformatics 25: 1952–1958.
25
Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor LaboratoryPérez-Rico203679_Fig1 Press
A Pluripotent Brain Heart Intestine Testis state
450 My
80 My
H3K27ac ChIP-seq
Mapping to reference genome
Peak calling
Typical enhancer SE
Super-enhancer identification
Intra- and interspecies comparisons B Zebrafish Mouse Human
Cutoff: 5720.4492 Cutoff: 9832.6656 Cutoff: 6386.5332 60,000 150,000 60,000
40,000 100,000 40,000 664 SE 993 SE 1,323 SE 20,000 50,000 20,000 H3K27ac density H3K27ac density H3K27ac density
0 0 0 0 4,000 8,000 12,000 0 5,000 10,000 15,000 0 10,000 20,000 Enhancers ranked Enhancers ranked Enhancers ranked by H3K27ac by H3K27ac by H3K27ac Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor LaboratoryPérez-Rico203679_Fig2 Press
A Typical enhancers Zebrafish Mouse Human 0 0.04 0.08 0 0.025 0.05 0 0.02 0.04
−100kb−50kb TSS 50kb 100kb −100kb−50kb TSS 50kb 100kb −100kb−50kb TSS 50kb 100kb
SEs
(density) Zebrafish Mouse Human 0.02 0.025 0.01 Proportion of genes covered 0.012 0 0.02 0.04 0 0
−100kb−50kb TSS 50kb 100kb −100kb−50kb TSS 50kb 100kb −100kb−50kb TSS 50kb 100kb B Typical enhancers SEs Control regions
NS 0.2 0.3 0.4 NS with overlap tion of gene bodies 0.1 Propo r 0.0 bulb state Right Brain gyrus Heart Heart Testis ESCs Testis ESCs ventricle Intestine Intestine Intestine Olfactory Cingulate Pluripotent
C 2 kb upstream UTR Exon Intron Intergenic
Zebrafish Mouse Human 100 100 100 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 % of base pairs % of base pairs % of base pairs Typical SEs Typical SEs Typical SEs enhancers enhancers enhancers Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor LaboratoryPérez-Rico203679_Fig3 Press
A Pluripotent state Adult brain
110 6kb 120 6kb
Tags 0 0 nanog nanog
110 5kb 120 5kb
{ Tags 0 0 neurod2 neurod2
250 20kb 150 20kb
Tags 0 0 Esrrb Esrrb
250 10kb 150 10kb
{ Tags 0 0 Mir5098 Dlx1as Mir5098 Dlx1as Metap1d Metap1d Dlx1 Dlx2 Dlx1 Dlx2
62 10kb 62 10kb
Tags 0 0 ZIC3 ZIC3
62 20kb 62 20kb
{ Tags 0 0 SLC6A1 SLC6A1 SLC6A11 SLC6A1-AS1 SLC6A11 SLC6A1-AS1 B Typical enhancers SEs
Intestine Pluripotent Intestine Pluripotent 3,834 state state 451 271 1,256 273 2,076 2,047 72 544 40 35 200 247 26 682 75 383 368 47 275 400 2 13 4,516 240 11 15 1,235 223 42 1,244 30 4 15 11 126 1 3 61 747 4 4 2 7 96 Brain 22 Brain 179 84 19 2512 79202 171 237 325 298 3,037 301 371 2,768 Heart Heart 527 Testis Testis
% % Specific 35 38 24 30 31 Specific 58 62 28 41 53 Non-specific 65 62 76 70 69 Non-specific 42 38 72 59 47 Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor LaboratoryPérez-Rico203679_Fig4 Press
A Typical enhancers Zebrafish Mouse Human 0.12 0.16 Conservation Conservation Conservation 0.08 0.10 0.12 0.14 by PhastCons by by PhastCons by PhastCons by 0.08 0.10 0.14 0.18 -3kb Start End +3kb -3kb Start End +3kb -3kb Start End +3kb
SEs Zebrafish Mouse Human 0.10 0.14 0.18 Conservation Conservation Conservation by PhastCons by by PhastCons by PhastCons by 0.06 0.10 0.14 0.10 0.14 0.18 -3kb Start End +3kb -3kb Start End +3kb -3kb Start End +3kb B C 130 Typical enhancers SEs 12kb Tags 0 Zebrafish Zebrafish clybl zic5 zic2a 3,651 1,157 287 240 180 3kb 1,306 1,621
503 Tags 0 2,905 Zic5 Zic2 Mouse 4,057 2,037 Mouse 2610035F20Rik 3,529 2,827 3,317 3,557 39 5kb % Int.Dif. % Int.Dif.
31 69 23 77 Tags Human Human 0 27 73 8 92 LOC101927437 ZIC2 26 74 6 94 ZIC5 LINC00554 D
SEs
Conserved association
Non-conserved association Conservation by PhastCons Conservation by 0.0 0.1 0.2 0.3 0.4 0.5 0.6
Brain Middle Olfactory Angular Anterior Inferior Cerebellum bulb gyrus Cingulategyrus caudate Hippocampusmiddle frontal lobe temporal lobe Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor LaboratoryPérez-Rico203679_Fig5 Press
A Genome-wide comparison Comparison within SEs
ATAC−seq Nanog ChIP−seq ATAC−seq Nanog ChIP−seq peaks peaks peaks peaks
33,101 9,973 14,870 1,047 1,264 980
p-value = e-2917.71 p-value = e-7462.03 B De-novo found motifs JASPAR matrix models
2 oligos_7nt_m2 MA0143.3 SOX2 2 1
bits 1 bits
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8 9 11 5’ 10 12 13 14 15 16 17 3’ 5’ 3’ 1,476 sites wwtcArGGCCwTTGkkw 585 sites Ncorr = 0.401
2 oligos_6nt_m3 MA0077.1 SOX9 MA0592.2 ESRRA 2 2 1 bits 1 1 bits bits
0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 0 0 1 2 3 4 5 6 7 8 9 11 10 12 13 14 15 16 17 11 5’ 3’ 5’ 3’ 5’ 10 3’ 421 sites 76 sites 7,063 sites Ncorr = 0.479 Ncorr = 0.531 C oligos_7nt_m2 oligos_6nt_m3
Molecular function Molecular function Wiki pathways
protein heterodimerization activity GO:0046982 DNA binding GO:0003677 noncanonical wnt pathway WP215 protein dimerization activity GO:0046983 protein dimerization activity GO:0046983 canonical wnt - zebrafish WP566 protein binding GO:0005515 protein heterodimerization activity GO:0046982 FGF signaling pathway WP152 DNA binding GO:0003677 seq. specific DNA binding TF activity GO:0003700 Id signaling pathway WP1374 nucleic acid binding GO:0003676 protein binding GO:0005515 Nodal signaling pathway WP341 Wnt signaling pathway WP1325
Log10 (FDR q-value) Log10 (FDR q-value) Wnt signaling pathway and pluripotency WP1344 -45 -35 -25 -15 -5 -14 -10 -6 -2 Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor LaboratoryPérez-Rico203679_Fig6 Press
A SE-irf2bpl ABCD EF G HIJ K 80 3kb 250 H3K27ac 24kb H3K27ac 0 0 ATAC-seq TFBSs Nanog Vertebrate Vertebrate Cons Cons Lrrc74a Irf2bpl Cipc Zdhhc22 si:ch211-185a18.2 irf2bpl B D region K region op op
op op 200 µm 200 µm
C SE-zic2a LMORSNPQ TU 55 7kb 190 H3K27ac 3kb 0 H3K27ac ATAC-seq 0 TFBSs Nanog Vertebrate Vertebrate Cons Cons Zic5 Zic2 zic5 2610035F20Rik zic2a D P region h
t T region sc Q region rp h
r t S region 500 µm t nt 500 µm Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press
Comparative analyses of super-enhancers reveal conserved elements in vertebrate genomes
Yuvia A. Pérez Rico, Valentina Boeva, Allison C. Mallory, et al.
Genome Res. published online December 13, 2016 Access the most recent version at doi:10.1101/gr.203679.115
Supplemental http://genome.cshlp.org/content/suppl/2017/01/17/gr.203679.115.DC1 Material
P
Accepted Peer-reviewed and accepted for publication but not copyedited or typeset; Manuscript accepted manuscript is likely to differ from the final, published version.
Creative This article is distributed exclusively by Cold Spring Harbor Laboratory Press for Commons the first six months after the full-issue publication date (see License http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
Email Alerting Receive free email alerts when new articles cite this article - sign up in the box at Service the top right corner of the article or click here.
Advance online articles have been peer reviewed and accepted for publication but have not yet appeared in the paper journal (edited, typeset versions may be posted when available prior to final publication). Advance online articles are citable and establish publication priority; they are indexed by PubMed from initial publication. Citations to Advance online articles must include the digital object identifier (DOIs) and date of initial publication.
To subscribe to Genome Research go to: https://genome.cshlp.org/subscriptions
Published by Cold Spring Harbor Laboratory Press