Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Comparative analyses of super-enhancers reveal conserved elements in vertebrate genomes

Yuvia A. Pérez-Rico1, 2, 3, Valentina Boeva2, 4, Allison C. Mallory1, Angelo Bitetti1, 3, Sara

Majello1, Emmanuel Barillot2, 4, Alena Shkumatava1

1 Institut Curie, PSL Research University, INSERM U934, CNRS UMR 3215, F-75005, Paris,

France.

2 INSERM, U900, F-75005, Paris, France.

3 Sorbonne Universités, UPMC Univ Paris 06, F-75005, Paris, France.

4 Institut Curie, Mines ParisTech, PSL Research University, F-75005, Paris, France.

Address correspondence to [email protected]

Institut Curie, PSL Research University, CNRS, UMR 3215, 26 rue d’Ulm, F-75005, Paris,

France.

Running title: Conserved super-enhancers in vertebrates.

Keywords: enhancers, super-enhancers, H3K27ac, hyperactive chromatin, zebrafish

1

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

1 ABSTRACT

2 Super-enhancers (SEs) are key transcriptional drivers of cellular, developmental and

3 disease states in mammals, yet the conservational and regulatory features of these

4 enhancer elements in non-mammalian vertebrates are unknown. To define SEs in zebrafish

5 and enable sequence and functional comparisons to mouse and human SEs, we used

6 genome-wide histone H3 lysine 27 acetylation (H3K27ac) occupancy as a primary SE

7 delineator. Our study determined the set of SEs in pluripotent state cells and adult zebrafish

8 tissues and revealed both similarities and differences between zebrafish and mammalian

9 SEs. Although the total number of SEs was proportional to the genome size, the genomic

10 distribution of zebrafish SEs differed from that of the mammalian SEs. Despite the

11 evolutionary distance separating zebrafish and mammals and the low overall SE sequence

12 conservation, ~42% of zebrafish SEs were located in close proximity to orthologs that also

13 were associated with SEs in mouse and human. Compared to their non-associated

14 counterparts, higher sequence conservation was revealed for those SEs that have

15 maintained orthologous associations. Functional dissection of two of these SEs

16 identified conserved sequence elements and tissue-specific expression patterns, while

17 chromatin accessibility analyses predicted transcription factors governing the function of

18 pluripotent state zebrafish SEs. Our zebrafish annotations and comparative studies show the

19 extent of SE usage and their conservation across vertebrates, permitting future gene

20 regulatory studies in several tissues.

21

22

23

24

25

2

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

26 INTRODUCTION

27 The identification of transcriptional regulators is central for understanding tissue-specific

28 expression programs. Enhancers are cis-regulatory elements able to recruit transcription

29 factors (TFs) and the transcriptional apparatus to activate their target gene expression

30 (Smith and Shilatifard 2014; Heinz et al. 2015; Ren and Yue 2015). Chromatin

31 immunoprecipitation followed by high-throughput sequencing (ChIP-seq) has been a

32 frequently used strategy to generate genome-wide enhancer annotations (Visel et al. 2009;

33 Creyghton et al. 2010; Bernstein et al. 2010; Rada-Iglesias et al. 2011; Kieffer-Kwon et al.

34 2013; Vermunt et al. 2014; Villar et al. 2015; Prescott et al. 2015). ChIP-seq-based

35 approaches have shown that a subset of mammalian enhancers are found in close

36 sequence proximity to one another, forming large regions of hyperactive chromatin referred

37 to as super-enhancers (SEs) or stretch enhancers (Whyte et al. 2013; Lovén et al. 2013;

38 Parker et al. 2013). This structure distinguishes them from shorter, more compacted regions

39 referred to as typical enhancers.

40 SEs are characterized by their high level of histone H3 lysine 27 acetylation (H3K27ac)

41 density, a mark associated with active enhancers and promoters (Creyghton et al. 2010;

42 Rada-Iglesias et al. 2011), and the binding of a high abundance of TFs, transcriptional

43 coactivators and chromatin remodelers (Whyte et al. 2013; Hnisz et al. 2013). Analyses of

44 the SE dynamics during lineage commitment of specific cell types have shown that SEs are

45 remodeled during differentiation, having crucial roles in cell fate determination (Vahedi et al.

46 2015; Adam et al. 2015; Thakurela et al. 2015). Moreover, SEs are enriched for single

47 nucleotide polymorphisms (SNPs) associated with a broad spectrum of diseases including

48 but not limited to cancers, type 1 diabetes, Alzheimer’s disease and multiple sclerosis (Hnisz

49 et al. 2013; Parker et al. 2013; Vahedi et al. 2015). For example, a fraction of human T-cell

50 acute lymphoblastic leukemia cases exhibits somatic mutations that create MYB TF binding

51 sites that generate a SE adjacent to the TAL1 oncogene (Mansour et al. 2014). Despite a

52 basic understanding of the features and functions of mammalian SEs and a recently

3

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

53 published catalog of SEs in non-vertebrates (Wei et al. 2016), the extent to which the

54 defining characteristics of mammalian SEs also apply to similar regulatory regions in species

55 outside of the mammalian clade is not known.

56 Comparative analyses of enhancers in different species have been invaluable for our

57 understanding of their evolution (reviewed in Rubinstein and de Souza 2013; Domené et al.

58 2013). Here, we employed the zebrafish model as an exemplar to define SE biology in

59 vertebrates (Patton et al. 2005; Howe et al. 2013; White et al. 2013; Vacaru et al. 2014;

60 Kaufman et al. 2016). Previous studies of zebrafish have successfully identified stage-

61 specific enhancers involved in early development and have highlighted their general low

62 sequence conservation (Aday et al. 2011; Bogdanović et al. 2012; Lee et al. 2015). Although

63 these enhancer annotations open the possibility to gain fundamental insights into gene

64 regulation during embryonic development, they do not address the tissue-specificity of

65 enhancers in zebrafish.

66 To identify cell- and tissue-specific enhancers, in particular SEs, we analyzed the distribution

67 of H3K27ac in zebrafish pluripotent cells and four adult tissues. Our comparative analyses of

68 zebrafish, mouse and human SEs highlight their differences and similarities and advance the

69 study of gene regulation in zebrafish by identifying a set of SE candidates involved in cellular

70 identity.

71

72 RESULTS

73 H3K27ac marks hundreds of SEs in zebrafish

74 To assess characteristic features of vertebrate SEs, we identified enhancer regions in

75 zebrafish (Danio rerio), mouse and human brain, heart, intestine, testis and pluripotent cells.

76 For zebrafish, we used the early embryonic dome stage as a comparative stage to the

77 pluripotent state of mouse and human ESCs (Schier and Talbot 2005). All mouse and

78 human enhancer annotations, as well as zebrafish pluripotent state enhancer annotations

4

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

79 were based on publicly available datasets of the H3K27ac mark, whereas those of the

80 zebrafish adult brain, heart, intestine and testis were performed using in-house generated

81 H3K27ac ChIP-seq datasets (Fig. 1A; Supplemental Table S1; Bernstein et al. 2010; Rada-

82 Iglesias et al. 2011; Mouse ENCODE Consortium 2012; Bogdanović et al. 2012; Chadwick

83 et al. 2012; Nord et al. 2013; Yue et al. 2014). To identify typical enhancers and SEs,

84 H3K27ac–enriched regions were identified with SICER (Zang et al. 2009), filtered to discard

85 active promoters and stitched by the ROSE software (Fig. 1A; Whyte et al. 2013; Lovén et

86 al. 2013). We identified an average of 743 and 1,183 SEs for zebrafish and mammals,

87 respectively (Fig. 1B; Supplemental Table S1; Supplemental Dataset S1). Similar to

88 mammalian SEs, most zebrafish SEs were longer than typical enhancers, although the

89 length parameter was not explicitly considered for their identification (Supplemental Fig.

90 S1A-C; examples of typical enhancers and SEs are shown in Supplemental Fig. S2A).

91

92 Genomic distribution of zebrafish typical enhancers and SEs differs from that of

93 mammalian regions

94 In contrast to mammalian SEs, which tend to overlap with gene bodies (Whyte et al. 2013;

95 Lovén et al. 2013), neither zebrafish typical enhancers nor zebrafish SEs were preferentially

96 enriched in the TSS downstream regions in any tissue or at any embryonic stage analyzed

97 (Fig. 2A; Supplemental Fig. S2B). To assess if zebrafish typical enhancers and SEs were

98 enriched in gene bodies, the proportion of covered by typical enhancers and SEs was

99 calculated and compared to the proportion of genes covered by random control regions. As

100 expected, mouse and human typical enhancers and SEs from all analyzed samples showed

101 significant enrichments in gene bodies (P-values from z-scores ≤ 4.71x10-18), whereas gene-

102 body enrichment of zebrafish typical enhancers and SEs showed variation among the

103 different cells and tissues analyzed (Fig. 2B). Furthermore, we found that on average for all

104 cells and tissues analyzed, ~65% and ~73% of mouse and ~70% and ~80% of human

105 typical enhancer and SE sequences, respectively, overlapped introns (Fig. 2C). In zebrafish,

5

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

106 only ~28%of typical enhancer and ~29% of SE sequences overlapped introns, and the

107 majority of zebrafish typical enhancer and SE sequences (~67% and ~66%, respectively)

108 overlapped intergenic regions in all zebrafish cells and adult tissues (Fig. 2C; Supplemental

109 Fig. S2C). These drastic differences in genomic distribution cannot be solely explained by

110 differences in the global genome composition of the three species, as more than 50% of the

111 zebrafish, mouse and human genomes correspond to intergenic sequences (Supplemental

112 Fig. S2D).

113

114 Vertebrate SEs are more cell- and tissue-specific than typical enhancers

115 A notable characteristic of mammalian SEs is their association with key cellular identity

116 genes (Whyte et al. 2013; Hnisz et al 2013; Fig. 3A). Similar to mouse and human SEs,

117 (GO) annotations of the zebrafish SEs in pluripotent state, brain, heart,

118 intestine and testis showed enriched terms related to early development and pluripotency,

119 neuronal components, signal transduction, immune pathways and chromatin organization,

120 respectively (Supplemental Fig. S3). In addition, our intraspecies comparisons showed that,

121 similar to mammals (Hnisz et al. 2013), zebrafish SEs exhibit higher cell- and tissue-

122 specificity than typical enhancers (P-values from G-tests of independence ≤ 8.5x10-13, with

123 the exception of zebrafish heart; Fig. 3B; Supplemental Fig. S4).

124

125 SEs associate with a conserved set of genes throughout vertebrate evolution

126 Collectively, typical enhancers and SEs showed higher sequence conservation than their

127 immediate flanking regions (P-values from Wilcoxon rank-sum test ≤ 2.8x10-4, with the

128 exception of typical enhancers from the right ventricle of the human heart; Fig. 4A). While

129 zebrafish SEs from most tissues analyzed had significantly higher sequence conservation

130 than zebrafish typical enhancers (P-values from Wilcoxon rank-sum test ≤ 9.3x10-4), mouse

131 and human sequence conservation differences were dependent on the tissue analyzed

6

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

132 (Supplemental Fig. S5A). When we compared individual intergenic regions enriched for

133 H3K27ac within typical enhancers and SEs, the higher conservation found for full-length SEs

134 was diminished, and, for most of the datasets, typical enhancer regions were more

135 conserved than SE regions (P-values from Wilcoxon rank-sum test ≤ 3.7x10-3; Supplemental

136 Fig. S5B). This observation is consistent with the fact that a higher proportion of SE

137 constitutive regions overlaps intragenic sequences, which could artificially inflate the SE

138 conservation estimate when analyzed as a whole unit (Supplemental Fig. S5C).

139 Next, to determine if SEs tend to maintain their spatial association with orthologous genes

140 throughout evolution, the genes associated with zebrafish, mouse and human typical

141 enhancers and SEs were compared based on homology annotations. The proportion of

142 orthologous genes associated with typical enhancers in all three species was significantly

143 larger than that associated with SEs (P-values from G-tests of independence ≤ 5.497x10-8;

144 Fig. 4B; Supplemental Fig. S6A-D; Supplemental Table S2). Approximately 42% of zebrafish

145 SEs were associated with orthologous genes in mouse and human (pluripotent state =

146 110/473; brain = 321/664; heart = 325/850; intestine = 462/1145; testis = 362/581), and

147 ~27% and ~21% of the mouse and human SEs, respectively, maintained their orthologous

148 associations (examples are illustrated in Fig. 4C and Supplemental Fig. S6E-H). Importantly,

149 mammalian SEs with conserved orthologous gene associations in the three species had

150 higher sequence conservation than the non-associated-SEs (P-values from Wilcoxon rank-

151 sum test ≤ 4.7x10-3). Similar results were also observed for the zebrafish brain and testis

152 SEs (P-values from Wilcoxon rank-sum test ≤ 9.1x10-3; Fig. 4D; Supplemental Fig. 6I). Thus,

153 despite overall low sequence conservation in vertebrates, SEs that maintained orthologous

154 gene associations exhibited higher conservation at the sequence level than those lacking

155 such associations.

156

157 Analysis of accessible chromatin identifies differences between zebrafish typical

158 enhancer and SE composition

7

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

159 Within zebrafish SEs, we sought to demarcate transcription factor binding site (TFBS)

160 hotspots or epicenters, defined as regions shorter than 1 kb bound by at least five TFs

161 involved in cell identity (Siersbæk et al. 2014; Adam et al. 2015). To overcome the lack of

162 zebrafish ChIP-seq data, we focused on the identification of accessible chromatin regions by

163 ATAC-seq (Buenrostro et al. 2013; Supplemental Fig. S7A). To confirm that ATAC-seq data

164 can be mined to identify TFBSs in zebrafish, we compared ATAC-seq and Nanog ChIP-seq

165 peaks (Xu et al. 2012). These comparisons showed significant overlap at both the genome-

166 wide level and within SEs (P-values based on hypergeometric distributions ≤ e-2917.71; Fig.

167 5A).

168 A differential analysis of ATAC-seq peaks within typical enhancers and SEs identified 12

169 clusters of over-represented motifs within SEs (Supplemental Fig. S7B). Our set of

170 consensus motifs included those with similarity to matrix models of pluripotency-associated

171 TFs, such as SOX2, EOMES and FOXD3 (Sutton et al. 1996; Hromas et al. 1999; Avilion et

172 al. 2003; Kidder and Palmer 2010). The motif that correlated with the SOX2 matrix was the

173 consensus of two motifs: one similar to the SOX2 matrix model and the second motif similar

174 to the SOX9 and ESRRA matrix models (Fig. 5B). GO annotation of the SE ATAC-seq peaks

175 containing sites of these two motifs showed enrichment for TF function and pluripotency

176 terms that were not identified by the global analysis of pluripotent state SEs (Fig. 5C;

177 Supplemental Fig. S3A). Thus, our results predict a set of TFs with enriched binding to

178 accessible chromatin regions highly associated with pluripotency.

179

180 Dissections of vertebrate SEs identify functionally conserved elements

181 To determine the different contribution of regions within SEs, two SEs with conserved

182 association with irf2bpl and zic2a (hereafter referred as SE-irf2bpl and SE-zic2a; Fig. 4C;

183 Supplemental Fig. S6A) were tested by GFP reporter assays in zebrafish embryos

184 (Supplemental Fig. S8A). Twelve zebrafish gene distal regions were selected for the

8

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

185 enhancer activity test based on their H3K27ac, ATAC-seq and Nanog ChIP-seq profiles (Fig.

186 6; Supplemental Table S3). To evaluate the functional conservation of the equivalent mouse

187 SEs, nine mouse regions, selected based on presence or absence of TFBSs for 14

188 pluripotent state TFs, were tested (Supplemental Fig. S7A; Supplemental Table S3; Chen et

189 al. 2008; Heng et al. 2010; Ma et al. 2011; Vella et al. 2012; Betschinger et al. 2013; Whyte

190 et al. 2013). It should be noted that while the mouse Zic2–associated region is a typical

191 enhancer at the pluripotent state (Fig. 6C), it is identified as a SE in the brain (Fig. 4C).

192 For zebrafish SE-irf2bpl, there was a strong concordance between enhancer activity and the

193 presence of a high ATAC-seq signal (Fig. 6A-B, Supplemental Fig. S8B). Remarkably, the

194 GFP expression pattern driven by the conserved zebrafish region D and mouse region K

195 (Fig. 6A) substantially overlapped within the olfactory placode (Fig. 6B). Similarly, the mouse

196 region G (Fig. 6A) drove dim GFP expression in the olfactory placode at ~24 hours post-

197 fertilization (hpf) with peak GFP expression in the roof plate at 48 hpf (Supplemental Fig.

198 S8B).

199 For zebrafish SE-zic2a, 75% of SE-zic2a regions exhibiting enhancer activity also contained

200 ATAC-seq peaks and displayed high sequence conservation (the P, Q and R regions; Fig.

201 6C-D; Supplemental Fig. S8C). Interestingly, the zebrafish S region, originally selected as a

202 control region based on the lack of sequence conservation and the absence of H3K27ac and

203 ATAC-seq signals, drove specific GFP expression in the notochord and telencephalon (Fig.

204 6D) similar to the spinal cord and telencephalon expression driven by the equivalent mouse

205 T region (Fig. 6D). As the S region contained a mildly enriched Nanog peak (Fig. 6C) and

206 predicted TFBSs (Supplemental Table S3), it likely corresponds to a redundant or “shadow”

207 enhancer that is not active under homeostatic conditions and, consequently, is not found by

208 ATAC-seq (Fig. 6C).

209 Taken together, our results confirm that SEs contain regions with evolutionary conserved

210 enhancer functions and emphasize the importance of analyzing comprehensive hyperactive

9

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

211 chromatin regions instead of isolated enhancers to allow the identification of enhancers with

212 partially redundant activities.

213

214 DISCUSSION

215 In this study, we identify tissue-specific enhancers in zebrafish, focusing on hyperactive

216 chromatin regions or SEs. Our comparative analyses support a model in which SEs specify

217 uniquely important cell- and tissue-specific regulatory regions across species (Hnisz et al.

218 2013; Saint-André et al. 2016), and highlight the difference in genomic distribution between

219 zebrafish and mammalian SEs. While the majority of mammalian SEs overlap with their

220 target genes (Whyte et al. 2013), zebrafish typical enhancers and SEs are mainly located

221 within intergenic regions. Similarly, during early zebrafish development, differentially

222 methylated DNA regions, ~50% of which are enriched for enhancer-associated chromatin

223 marks including H3K27ac, are mainly embedded within intergenic sequences (Lee et al.

224 2015). Future analyses incorporating the enhancer annotations of additional species may

225 reveal if the intergenic distribution of zebrafish regulatory regions is a distinctive feature.

226 Similar to what has been shown for zebrafish and mammalian enhancers (Bogdanović et al.

227 2012; Lee et al. 2015; Villar et al. 2015), our PhastCons value-based sequence conservation

228 analysis showed that both zebrafish typical enhancers and SEs have overall low sequence

229 conservation, and that SE intergenic constitutive regions do not display higher conservation

230 than those of typical enhancers. However, the sequence conservation was detectably higher

231 in the fraction of SEs that has maintained an association with orthologous genes in

232 zebrafish, mouse and human compared to the fraction lacking conserved orthologous

233 associations. It remains to be determined if those SEs with orthologous gene associations

234 have an evolutionary common origin, or if they independently evolved in the three species.

235 Notably, enhancers shared between human and chimp also display higher sequence

236 conservation than species-biased enhancers (Prescott et al. 2015).

10

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

237 Previous studies have reported enhancer regions with overlapping functions in

238 phylogenetically distant species (Hare et al. 2008; Taher et al. 2011; Clarke et al. 2012).

239 However, the genome-wide prediction of those regions is not trivial (Taher et al. 2011), as

240 sequence conservation alone does not necessarily predict functional conservation, and

241 regions with high sequence conservation can drive different patterns of expression in

242 reporter assays (Goode et al. 2011). Thus, it is remarkable that we defined equivalent

243 subregions in two SEs with conserved enhancer functions. Although the extent of enhancer

244 redundancy is poorly understood, a recent study has shown the genome-wide pervasiveness

245 of shadow enhancers during Drosophila development (Cannavò et al. 2016). Indeed, one of

246 the zebrafish SE regions identified in this study likely represents a shadow enhancer with a

247 conserved function. For these reasons, we propose that the future identification of shadow

248 enhancers will benefit from the analysis of whole hyperactive chromatin regions rather than

249 the analysis of isolated enhancers.

250 Our study reveals the genome-wide distribution of tissue-specific cis-regulatory elements in

251 zebrafish and identifies the key SE complement in this important model system. Moreover,

252 the characterized genomic distribution of zebrafish typical enhancers and SEs, together with

253 our comparative analyses to those of mammals solidifies our understanding of pervasive and

254 conserved vertebrate transcriptional mechanisms.

255

256 METHODS

257 ChIP-seq assays

258 Whole brains, hearts, intestines and testis were dissected from same-age adult male AB

259 zebrafish. Two biological replicates were prepared from each tissue. ChIP-seq was

260 performed as previously described (Guenther et al. 2008) using Abcam H3K27ac antibody

261 (ab4729, lot# GR259887-1). Purified chromatin was used for single-end library preparation

262 following standard Illumina protocols. For more details, see Supplemental Material.

11

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

263

264 Identification of typical enhancers and SEs

265 H3K27ac ChIP-seq datasets were mapped to their corresponding reference genomes (Zv9

266 for zebrafish, mm10 for mouse and hg38 for human) using Bowtie 2 version 2.1.0

267 (Langmead and Salzberg 2012). Peak calling was performed with SICER version 1.1 (Zang

268 et al. 2009), if available, input libraries were used as controls for the peak calling

269 (Supplemental Table S1). Identified peaks were filtered to discard peaks for which the main

270 summit was within promoter regions and used as input for the ROSE algorithm version 0.1 to

271 identify typical enhancers and SEs. For detailed parameters see, Supplemental Material;

272 Supplemental File S1 and Supplemental File S2.

273

274 Computational analyses

275 The calculation of typical enhancer and SE distributions around TSSs was performed using

276 Nebula (Boeva et al. 2012). Typical enhancer and SE enrichments over gene bodies were

277 calculated with a customized script (Supplemental File S3) and control enrichments were

278 obtained by bootstrap resampling with 100 iterations. To calculate the percentage of typical

279 enhancer and SE sequences overlapping with genomic features, typical enhancer and SE

280 annotations were compared to RefSeq Gene annotations (Rosembloom et al. 2015) using

281 BEDTools intersect function (Quinlan and Hall 2010). Sequence conservation scores were

282 calculated based on the vertebrate conservation PhastCons tracks from UCSC (Siepel and

283 Haussler 2005; Siepel et al. 2005) associated with each of the genome versions used for

284 read mapping using hgWiggle (Kent et al. 2002) and a customized Python script

285 (Supplemental File S4). For ortholog comparisons, typical enhancer and SE target genes

286 were annotated based on gene proximity using Nebula. All gene names were converted to

287 Ensembl ids and compared based on homology annotations from Ensembl (Genes 82;

288 Cunningham et al. 2015). Analysis of the ATAC-seq library was performed as previously

12

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

289 described (Buenrostro et al. 2013). Over-represented motifs in ATAC-seq peaks within SEs

290 were identified using the RSAT peak-motifs tool (Thomas-Chollier et al. 2012a; Thomas-

291 Chollier et al. 2012b). For more details see, Supplemental Material.

292

293 Microinjections

294 Each of the vectors containing SE regions (for cloning details see the Supplemental

295 Material) was co-injected with Tol2 mRNA into one-cell stage zebrafish embryos. GFP

296 expression was monitored during the first three days post-fertilization. All injection

297 experiments were repeated at least twice (Supplementary Table S3). For more details, see

298 Supplemental Material.

299

300 DATA ACCESS

301 Zebrafish H3K27ac ChIP-seq data generated in this study have been submitted to the NCBI

302 Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) (Edgar et al. 2002)

303 under accession number GSE75734.

304

305 ACKNOWLEDGEMENTS

306 We thank Igor Ulitsky, Matthew Guenther and Violaine Saint-André for helpful comments on

307 this manuscript. We also thank all members of the Shkumatava lab for help with zebrafish

308 dissections and for useful discussions. High-throughput sequencing was performed by the

309 ICGex NGS platform of Institut Curie supported by the grants ANR-10-EQPX-03 (Equipex)

310 and ANR-10-INBS-09-08 (France Génomique Consortium) from the ANR (“Investissements

311 d’Avenir” program) and by the Canceropole Île-de-France. This work was supported by

312 grants from ERC (FLAME-337440), ATIP-Avenir and La Fondation Bettencourt Schueller

13

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

313 and FRM (DBI201312285578). YAPR was partially funded by a scholarship from Secretaría

314 de Ciencia, Tecnología e Innovación – Seciti, México.

315

316 Author contributions

317 AS conceived and designed the project. YAPR, VB, ACM and AS designed experiments;

318 ACM and AS performed zebrafish ChIP-seq; YAPR performed computational analyses and

319 prepared plasmid constructs; YAPR and AB performed microinjections and microscopy; SM

320 assisted with experimental work; YAPR, ACM and AS wrote the manuscript. All authors

321 reviewed and approved the manuscript; VB, EB and AS supervised the project.

322

323 DISCLOSURE DECLARATION

324 The authors declare no competing interests.

325

326 FIGURE LEGENDS

327 Figure 1. Identification of typical enhancers and SEs in vertebrate genomes.

328 (A) Workflow for the identification of vertebrate typical enhancers and SEs. Schematic

329 representations depict the cells and tissues analyzed.

330 (B) Saturation curves of H3K27ac density across brain datasets (whole brain for zebrafish,

331 olfactory bulb for mouse and middle frontal lobe for human). The number of ranked typical

332 enhancers and SEs by H3K27ac density (x-axis) and their densities (y-axis) are plotted.

333 Horizontal dotted lines represent density cutoffs used for the classification of SEs and

334 vertical dotted lines demark SEs from typical enhancers. The total number of predicted SEs

335 is noted on the right side of each graph.

336

14

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

337 Figure 2. Genomic distribution of typical enhancers and SEs.

338 (A) Density plots representing the proportion of genes (y-axis) covered by typical enhancers

339 and SEs in the vicinity of TSSs (x-axis) in zebrafish brain, mouse cerebellum and human

340 angular gyrus.

341 (B) Proportion of gene bodies overlapping with typical enhancers, SEs and control regions

342 (y-axis) in different zebrafish, mouse and human cells and tissues (x-axis). The mean and

343 the standard deviation (black bars) calculated from bootstrap analyses of control regions are

344 shown. All comparisons between typical enhancers and SEs and their controls have

345 significant differences (P-values from z-scores ≤ 3x10-4), with the exception of zebrafish

346 pluripotent state and heart typical enhancers. NS, not significant.

347 (C) Distribution of typical enhancer and SE sequences across genomic features. The y-axis

348 shows the percentage of total brain typical enhancer or SE base pairs overlapping the

349 different genomic features represented in the legend. Adult brain datasets for mouse and

350 human correspond to olfactory bulb and cingulate gyrus, respectively.

351

352 Figure 3. Cell and tissue specificity of vertebrate typical enhancers and SEs.

353 (A) Distribution of H3K27ac at selected genes (genomic position represented on the x-axis)

354 in both pluripotent state and adult brain of zebrafish, mouse and human (raw tag counts

355 represented on the y-axis). Typical enhancers and SEs are denoted by grey bars and red

356 bars, respectively.

357 (B) Chow-Ruskey diagrams representing the overlap between pluripotent state (orange),

358 brain (green), heart (purple), intestine (red) and testis (blue) typical enhancers and SEs in

359 zebrafish. Color-coded tables show the percentages of cell- or tissue-specific and non-

360 specific regions for each dataset.

361

15

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

362 Figure 4. SE conservation in vertebrates.

363 (A) Metagenes of sequence conservation of typical enhancers and SEs from zebrafish whole

364 brain, mouse olfactory bulb and human middle frontal lobe. The x-axis depicts the start and

365 end of typical enhancers and SEs flanked by 3 kb of adjacent sequence. The y-axis

366 represents sequence conservation calculated by PhastCons.

367 (B) Venn diagrams show the number of orthologous genes associated with brain typical

368 enhancers (left) and SEs (right) in zebrafish (green), mouse (blue) and human (purple).

369 Color-coded tables show the percentages of intersection and difference for each species.

370 The observed differences in overlap between typical enhancers and SEs in the three species

371 are significant (p-values ≤ 5.497x10-8) based on G-tests of independence.

372 (C) ChIP-seq binding profiles for H3K27ac at the indicated loci in zebrafish, mouse and

373 human brain (raw tag counts represented on the y-axis). Typical enhancers and SEs are

374 denoted by grey bars and red bars, respectively. Gene positions are noted along the x-axis.

375 (D) Box plots depicting average sequence conservation of brain SEs with maintained

376 orthologous association in zebrafish, mouse and human and with no maintained orthologous

377 association. The y-axis shows sequence conservation calculated by PhastCons. The box

378 bounds the interquartile range divided by the median and the notch approximates a 95%

379 confidence interval for the median. All observed differences in conservation between SE

380 categories are significant (p-value ≤ 9.1x10-3) based on Wilcoxon rank-sum tests.

381

382 Figure 5. Analysis of zebrafish SE composition by ATAC-seq.

383 (A) Venn diagrams representing the overlap between ATAC-seq peaks (purple) and Nanog

384 peaks (orange) genome-wide (left) and within pluripotent state SEs (right).

16

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

385 (B) Cluster, consensus motif sequence and logos of SOX-related de-novo-found motifs in

386 ATAC-seq peaks within SEs (left). JASPAR matrix models (right) of SOX2, SOX9 and

387 ESRRA. Ncorr, normalized correlation between identified motifs and JASPAR models.

388 (C) Top molecular function and wiki pathway GO terms enriched for the ATAC-seq peaks

389 containing sites of the de-novo identified oligos_7nt_m2 (left) and oligos_6nt_m3 (right)

390 motifs shown in (B). Binomial FDR q-values for the GO terms are displayed in a color-scale

391 (q-values ≤ 6.7x10-4).

392

393 Figure 6. Functional analysis of vertebrate SEs.

394 (A) Genomic context and conservation of the zebrafish (left) and mouse (right) irf2bpl and

395 Irf2bpl loci. Horizontal bars represent SEs (red). Raw H3K27ac ChIP-seq, ATAC-seq and

396 Nanog ChIP-seq profiles are shown in tag counts (y-axis). The TFBS track represents the

397 TFBS enrichment along the mouse locus. The Vertebrate Cons tracks represent

398 conservation scores calculated by PhastCons. Grey and green highlighted regions

399 correspond to the regions tested in reporter assays. Regions driving specific GFP

400 expression are indicated in green.

401 (B) GFP expression driven by the zebrafish SE-irf2bpl D region (left) and the mouse K

402 region (right) in transgenic zebrafish embryos at 48 hpf. White arrows indicate the olfactory

403 placode (op).

404 (C) Genomic context and conservation of the zebrafish and mouse zic2a and Zic2 loci as

405 described in A. Horizontal bars represent typical enhancers (grey) and SEs (red).

406 (D) GFP expression driven by the zebrafish P, Q and S regions (left) and the mouse T region

407 (right). H, hindbrain; nt, notochord; r, retina; sc, spinal cord; t, telencephalon.

408

409 REFERENCES

17

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

410 Adam RC, Yang H, Rockowitz S, Larsen SB, Nikolova M, Oristian DS, Polak L, Kadaja M,

411 Asare A, Zheng D, et al. 2015. Pioneer factors govern super-enhancer dynamics in

412 stem cell plasticity and lineage choice. Nature 521: 366–370.

413 Aday AW, Zhu LJ, Lakshmanan A, Wang J, Lawson ND. 2011. Identification of cis regulatory

414 features in the embryonic zebrafish genome through large-scale profiling of

415 H3K4me1 and H3K4me3 binding sites. Dev Biol 357: 450–462.

416 Avilion AA, Nicolis SK, Pevny LH, Perez L, Vivian N, Lovell-Badge R. 2003. Multipotent cell

417 lineages in early mouse development depend on SOX2 function. Genes Dev 17:

418 126–140.

419 Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A,

420 Kellis M, Marra MA, Beaudet AL, Ecker JR, et al. 2010. The NIH Roadmap

421 Epigenomics Mapping Consortium. Nat Biotechnol 28: 1045–1048.

422 Betschinger J, Nichols J, Dietmann S, Corrin PD, Paddison PJ, Smith A. 2013. Exit from

423 Pluripotency Is Gated by Intracellular Redistribution of the bHLH Transcription Factor

424 Tfe3. Cell 153: 335–347.

425 Boeva V, Lermine A, Barette C, Guillouf C, Barillot E. 2012. Nebula--a web-server for

426 advanced ChIP-seq data analysis. Bioinformatics 28: 2517–2519.

427 Bogdanović O, Fernandez-Minan A, Tena JJ, de la Calle-Mustienes E, Hidalgo C, van

428 Kruysbergen I, van Heeringen SJ, Veenstra GJC, Gomez-Skarmeta JL. 2012.

429 Dynamics of enhancer chromatin signatures mark the transition from pluripotency to

430 cell specification during embryogenesis. Genome Res 22: 2043–2053.

431 Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. 2013. Transposition of native

432 chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding

433 and nucleosome position. Nat Methods 10: 1213–1218.

434 Cannavò E, Khoueiry P, Garfield DA, Geeleher P, Zichner T, Gustafson EH, Ciglar L, Korbel

435 JO, Furlong EE. 2016. Shadow enhancers are pervasive features of developmental

436 regulatory networks. Curr Biol 26: 38–51.

18

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

437 Chadwick LH. 2012. The NIH Roadmap Epigenomics Program data resource. Epigenomics

438 4: 317–324.

439 Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB, Wong E, Orlov YL, Zhang W, Jiang J, et

440 al. 2008. Integration of External Signaling Pathways with the Core Transcriptional

441 Network in Embryonic Stem Cells. Cell 133: 1106–1117.

442 Clarke SL, VanderMeer JE, Wenger AM, Schaar BT, Ahituv N, Bejerano G. 2012. Human

443 developmental enhancers conserved between deuterostomes and protostomes.

444 PLoS Genet 8: e1002852. doi: 10.1371/journal.pgen.1002852.

445 Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, Hanna J,

446 Lodato MA, Frampton GM, Sharp PA, et al. 2010. Histone H3K27ac separates active

447 from poised enhancers and predicts developmental state. Proc Natl Acad Sci 107:

448 21931–21936.

449 Cunningham F, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham

450 P, Coates G, Fitzgerald S, et al. 2015. Ensembl 2015. Nucleic Acids Res 43: D662–

451 D669.

452 Domené S, Bumaschny VF, de Souza FSJ, Franchini LF, Nasif S, Low MJ, Rubinstein M.

453 2013. Enhancer turnover and conserved regulatory function in vertebrate evolution.

454 Philos Trans R Soc Lond B Biol Sci 368: 20130027–20130027.

455 Edgar R, Domrachev M, Lash AE. 2002. Gene Expression Omnibus: NCBI gene expression

456 and hybridization array data repository. Nucleic Acids Res 30: 207–210.

457 Goode DK, Callaway HA, Cerda GA, Lewis KE, Elgar G. 2011. Minor change, major

458 difference: Divergent functions of highly conserved cis-regulatory elements

459 subsequent to whole genome duplication events. Development 138: 879–884.

460 Guenther MG, Lawton LN, Rozovskaia T, Frampton GM, Levine SS, Volkert TL, Croce CM,

461 Nakamura T, Canaani E, Young RA. 2008. Aberrant chromatin at genes encoding

462 stem cell regulators in human mixed-lineage leukemia. Genes & Dev 22: 3403–3408.

19

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

463 Hare EE, Peterson BK, Iyer VN, Meier R, Eisen MB. 2008. Sepsid even-skipped enhancers

464 are functionally conserved in Drosophila despite lack of sequence conservation.

465 PLoS Genet 4: e1000106. doi: 10.1371/journal.pgen.1000106.

466 Heinz S, Romanoski CE, Benner C, Glass CK. 2015. The selection and function of cell type-

467 specific enhancers. Nat Rev Mol Cell Biol 16: 144–154.

468 Heng JCD, Feng B, Han J, Jiang J, Kraus P, Ng JH, Orlov YL, Huss M, Yang L, Lufkin T, et

469 al. 2010. The Nuclear Receptor Nr5a2 Can Replace Oct4 in the Reprogramming of

470 Murine Somatic Cells to Pluripotent Cells. Stem Cell 6: 167–174.

471 Hnisz D, Abraham BJ, Lee TI, Lau A, Saint-André V, Sigova AA, Hoke HA, Young RA. 2013.

472 Super-Enhancers in the Control of Cell Identity and Disease. Cell 155: 934–947.

473 Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, Collins JE, Humphray S,

474 McLaren K, Matthews L, et al. 2013. The zebrafish reference genome sequence and

475 its relationship to the . Nature 496: 498–503.

476 Hromas R, Ye H, Spinella M, Dmitrovsky E, Xu D, Costa RH. 1999. Genesis, a Winged Helix

477 transcriptional repressor, has embryonic expression limited to the neural crest, and

478 stimulates proliferation in vitro in a neural development model. Cell Tissue Res 297:

479 371–382.

480 Kaufman CK, Mosimann C, Fan ZP, Yang S, Thomas AJ, Ablain J, Tan JL, Fogley RD, van

481 Rooijen E, Hagedorn EJ, et al. 2016. A zebrafish melanoma model reveals

482 emergence of neural crest identity during melanoma initiation. Science 351:

483 aad2197. doi: 10.1126/science.aad2197.

484 Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. 2002. The

485 human genome browser at UCSC. Genome Res 12: 996–1006.

486 Kidder BL, Palmer S. 2010. Examination of transcriptional networks reveals an important

487 role for TCFAP2C, SMARCA4, and EOMES in trophoblast stem cell maintenance.

488 Genome Res 20: 458–472.

20

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

489 Kieffer-Kwon KR, Tang Z, Mathe E, Qian J, Sung MH, Li G, Resch W, Baek S, Pruett N,

490 Grøntved L, et al. 2013. Interactome maps of mouse gene regulatory domains reveal

491 basic principles of transcriptional regulation. Cell 155: 1507–1520.

492 Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat Meth 9:

493 357–359.

494 Lee HJ, Lowdon RF, Maricque B, Zhang B, Stevens M, Li D, Johnson SL, Wang T. 2015.

495 Developmental enhancers revealed by extensive DNA methylome maps of zebrafish

496 early embryos. Nat Commun 6: 6315. doi: 10.1038/ncomms7315.

497 Lovén J, Hoke HA, Lin CY, Lau A, Orlando DA, Vakoc CR, Bradner JE, Lee TI, Young RA.

498 2013. Selective Inhibition of Tumor Oncogenes by Disruption of Super-Enhancers.

499 Cell 153: 320–334.

500 Ma Z, Swigut T, Valouev A, Rada-Iglesias A, Wysocka J. 2011. Sequence-specific regulator

501 Prdm14 safeguards mouse ESCs from entering extraembryonic endoderm fates. Nat

502 Struct Mol Biol 18: 120–127.

503 Mansour MR, Abraham BJ, Anders L, Berezovskaya A, Gutierrez A, Durbin AD, Etchin J,

504 Lawton L, Sallan SE, Silverman LB, et al. 2014. An oncogenic super-enhancer

505 formed through somatic mutation of a noncoding intergenic element. Science 346:

506 1373–1377.

507 Mouse ENCODE Consortium. 2012. An encyclopedia of mouse DNA elements (Mouse

508 ENCODE). Genome Biol 13: 418. doi: 10.1186/gb-2012-13-8-418.

509 Nord AS, Blow MJ, Attanasio C, Akiyama JA, Holt A, Hosseini R, Phouanenavong S,

510 Plajzer-Frick I, Shoukry M, Afzal V, et al. 2013. Rapid and Pervasive Changes in

511 Genome-wide Enhancer Usage during Mammalian Development. Cell 155: 1521–

512 1531.

513 Parker SC, Stitzel ML, Taylor DL, Orozco JM, Erdos MR, Akiyama JA, van Bueren KL,

514 Chines PS, Narisu N, Black BL, et al. 2013. Chromatin stretch enhancer states drive

515 cell-specific gene regulation and harbour human disease risk variants. Proc Natl

516 Acad Sci 110: 17921–17926.

21

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

517 Patton EE, Widlund HR, Kutok JL, Kopani KR, Amatruda JF, Murphey RD, Berghmans S,

518 Mayhall EA, Traver D, Fletcher CD, et al. 2005. BRAF mutations are sufficient to

519 promote nevi formation and cooperate with p53 in the genesis of melanoma. Curr

520 Biol 15: 249–254.

521 Prescott SL, Srinivasan R, Marchetto MC, Grishina I, Narvaiza I, Selleri L, Gage FH, Swigut

522 T, Wysocka J. 2015. Enhancer Divergence and cis-Regulatory Evolution in the

523 Human and Chimp Neural Crest. Cell 163: 68–83.

524 Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities for comparing genomic

525 features. Bioinformatics 26: 841–842.

526 Rada-Iglesias A, Bajpai R, Swigut T, Brugmann SA, Flynn RA, Wysocka J. 2011. A unique

527 chromatin signature uncovers early developmental enhancers in humans. Nature

528 470: 279–283.

529 Ren B, Yue F. 2015. Transcriptional enhancers: Bridging the Genome and Phenome. Cold

530 Spring Harb Symp Quant Biol 2015 November 18. doi: 10.1101/sqb.2015.80.027219.

531 Rosenbloom KR, Armstrong J, Barber GP, Casper J, Clawson H, Diekhans M, Dreszer TR,

532 Fujita PA, Guruvadoo L, Haeussler M, et al. 2015. The UCSC Genome Browser

533 database: 2015 update. Nucleic Acids Res 43: D670–D681.

534 Rubinstein M, de Souza FSJ. 2013. Evolution of transcriptional enhancers and animal

535 diversity. Philos Trans R Soc Lond B Biol Sci 368: 20130017–20130017.

536 Saint-André V, Federation AJ, Lin CY, Abraham BJ, Reddy J, Lee TI, Bradner JE, Young

537 RA. 2016. Models of human core transcriptional regulatory circuitries. Genome Res

538 26: 385–396.

539 Schier A, Talbot WS. 2005. Molecular genetics of axis formation in zebrafish. Annu Rev

540 Genet 39: 561–613.

541 Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth

542 J, Hillier LW, Richards S, et al. 2005. Evolutionarily conserved elements in

543 vertebrate, insect, worm, and yeast genomes. Genome Res 15: 1034–1050.

22

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

544 Siepel A, Haussler D. 2005. Phylogenetic hidden Markov models. In Statistical methods in

545 molecular evolution (ed. R. Nielsen), pp. 325–351. Springer, New York.

546 Siersbæk R, Rabiee A, Nielsen R, Sidoli S, Traynor S, Loft A, La Cour Poulsen L,

547 Rogowska-Wrzesinska A, Jensen ON, Mandrup S. 2014. Transcription Factor

548 Cooperativity in Early Adipogenic Hotspots and Super-Enhancers. Cell Rep 7: 1443–

549 1455.

550 Smith E, Shilatifard A. 2014. Enhancer biology and enhanceropathies. Nat Struct Mol Biol

551 21: 210–219.

552 Sutton J, Costa R, Klug M, Field L, Xu D, Largaespada DA, Fletcher CF, Jenkins NA,

553 Copeland NG, Klemsz M, et al. 1996. Genesis, a winged helix transcriptional

554 repressor with expression restricted to embryonic stem cells. J Biol Chem 271:

555 23126–23133.

556 Taher L, McGaughey DM, Maragh S, Aneas I, Bessling SL, Miller W, Nobrega MA,

557 McCallion AS, Ovcharenko I. 2011. Genome-wide identification of conserved

558 regulatory function in diverged sequences. Genome Res 21: 1139–1149.

559 Thakurela S, Sahu SK, Garding A, Tiwari VK. 2015. Dynamics and function of distal

560 regulatory elements during neurogenesis and neuroplasticity. Genome Res 25:

561 1309–1324.

562 Thomas-Chollier M, Herrmann C, Defrance M, Sand O, Thieffry D, van Helden J. 2012a.

563 RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets. Nucleic Acids Res

564 40: e31. doi: 10.1093/nar/gkr1104.

565 Thomas-Chollier M, Darbo E, Herrmann C, Defrance M, Thieffry D, van Helden J. 2012b. A

566 complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using

567 peak-motifs. Nat Protoc 7: 1551–1568.

568 Vacaru AM, Di Narzo AF, Howarth DL, Tsedensodnom O, Imrie D, Cinaroglu A, Amin S, Hao

569 K, Sadler KC. 2014. Molecularly defined unfolded response subclasses have

570 distinct correlations with fatty liver disease in zebrafish. Dis Model Mech 7: 823–835.

23

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

571 Vahedi G, Kanno Y, Furumoto Y, Jiang K, Parker SCJ, Erdos MR, Davis SR, Roychoudhuri

572 R, Restifo NP, Gadina M, et al. 2015. Super-enhancers delineate disease-associated

573 regulatory nodes in T cells. Nature 520: 558–562.

574 Vella P, Barozzi I, Cuomo A, Bonaldi T, Pasini D. 2012. Yin Yang 1 extends the Myc-related

575 transcription factors network in embryonic stem cells. Nucleic Acids Res 40: 3403–

576 3418.

577 Vermunt MW, Reinink P, Korving J, de Bruijn E, Creyghton PM, Basak O, Geeven G,

578 Toonen PW, Lansu N, Meunier C, et al. 2014. Large-scale identification of

579 coregulated enhancer networks in the adult human brain. Cell Rep 9: 767–779.

580 Villar D, Berthelot C, Aldridge S, Rayner TF, Lukk M, Pignatelli M, Park TJ, Deaville R,

581 Erichsen JT, Jasinska AJ, et al. 2015. Enhancer Evolution across 20 Mammalian

582 Species. Cell 160: 554–566.

583 Visel A, Blow MJ, Li Z, Zhang T, Akiyama JA, Holt A, Plajzer-Frick I, Shoukry M, Wright C,

584 Chen F, et al. 2009. ChIP-seq accurately predicts tissue-specific activity of

585 enhancers. Nature 457: 854–858.

586 Wei Y, Zhang S, Shang S, Zhang B, Li S, Wang X, Wang F, Su J, Wu Q, Liu H, et al. 2016.

587 SEA: a super-enhancer archive. Nucleic Acids Res 44: D172-D179.

588 White R, Rose K, Zon L. 2013. Zebrafish cancer: the state of the art and the path forward.

589 Nat Rev Cancer 13: 624-636.

590 Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young

591 RA. 2013. Master Transcription Factors and Mediator Establish Super-Enhancers at

592 Key Cell Identity Genes. Cell 153: 307–319.

593 Xu C, Fan ZP, Müller P, Fogley R, DiBiase A, Trompouki E, Unternaehrer J, Xiong F,

594 Torregroza I, Evans T, et al. 2012. Nanog-like Regulates Endoderm Formation

595 through the Mxtx2-Nodal Pathway. Dev Cell 22: 625–638.

596 Yue F, Cheng Y, Breschi A, Vierstra J, Wu W, Ryba T, Sandstrom R, Ma Z, Davis C, Pope

597 BD, et al. 2014. A comparative encyclopedia of DNA elements in the mouse genome.

598 Nature 515: 355–364.

24

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

599 Zang C, Schones DE, Zeng C, Cui K, Zhao K, Peng W. 2009. A clustering approach for

600 identification of enriched domains from histone modification ChIP-Seq data.

601 Bioinformatics 25: 1952–1958.

25

Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor LaboratoryPérez-Rico203679_Fig1 Press

A Pluripotent Brain Heart Intestine Testis state

450 My

80 My

H3K27ac ChIP-seq

Mapping to reference genome

Peak calling

Typical enhancer SE

Super-enhancer identification

Intra- and interspecies comparisons B Zebrafish Mouse Human

Cutoff: 5720.4492 Cutoff: 9832.6656 Cutoff: 6386.5332 60,000 150,000 60,000

40,000 100,000 40,000 664 SE 993 SE 1,323 SE 20,000 50,000 20,000 H3K27ac density H3K27ac density H3K27ac density

0 0 0 0 4,000 8,000 12,000 0 5,000 10,000 15,000 0 10,000 20,000 Enhancers ranked Enhancers ranked Enhancers ranked by H3K27ac by H3K27ac by H3K27ac Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor LaboratoryPérez-Rico203679_Fig2 Press

A Typical enhancers Zebrafish Mouse Human 0 0.04 0.08 0 0.025 0.05 0 0.02 0.04

−100kb−50kb TSS 50kb 100kb −100kb−50kb TSS 50kb 100kb −100kb−50kb TSS 50kb 100kb

SEs

(density) Zebrafish Mouse Human 0.02 0.025 0.01 Proportion of genes covered 0.012 0 0.02 0.04 0 0

−100kb−50kb TSS 50kb 100kb −100kb−50kb TSS 50kb 100kb −100kb−50kb TSS 50kb 100kb B Typical enhancers SEs Control regions

NS 0.2 0.3 0.4 NS with overlap tion of gene bodies 0.1 Propo r 0.0 bulb state Right Brain gyrus Heart Heart Testis ESCs Testis ESCs ventricle Intestine Intestine Intestine Olfactory Cingulate Pluripotent

C 2 kb upstream UTR Exon Intron Intergenic

Zebrafish Mouse Human 100 100 100 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 % of base pairs % of base pairs % of base pairs Typical SEs Typical SEs Typical SEs enhancers enhancers enhancers Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor LaboratoryPérez-Rico203679_Fig3 Press

A Pluripotent state Adult brain

110 6kb 120 6kb

Tags 0 0 nanog nanog

110 5kb 120 5kb

{ Tags 0 0 neurod2 neurod2

250 20kb 150 20kb

Tags 0 0 Esrrb Esrrb

250 10kb 150 10kb

{ Tags 0 0 Mir5098 Dlx1as Mir5098 Dlx1as Metap1d Metap1d Dlx1 Dlx2 Dlx1 Dlx2

62 10kb 62 10kb

Tags 0 0 ZIC3 ZIC3

62 20kb 62 20kb

{ Tags 0 0 SLC6A1 SLC6A1 SLC6A11 SLC6A1-AS1 SLC6A11 SLC6A1-AS1 B Typical enhancers SEs

Intestine Pluripotent Intestine Pluripotent 3,834 state state 451 271 1,256 273 2,076 2,047 72 544 40 35 200 247 26 682 75 383 368 47 275 400 2 13 4,516 240 11 15 1,235 223 42 1,244 30 4 15 11 126 1 3 61 747 4 4 2 7 96 Brain 22 Brain 179 84 19 2512 79202 171 237 325 298 3,037 301 371 2,768 Heart Heart 527 Testis Testis

% % Specific 35 38 24 30 31 Specific 58 62 28 41 53 Non-specific 65 62 76 70 69 Non-specific 42 38 72 59 47 Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor LaboratoryPérez-Rico203679_Fig4 Press

A Typical enhancers Zebrafish Mouse Human 0.12 0.16 Conservation Conservation Conservation 0.08 0.10 0.12 0.14 by PhastCons by by PhastCons by PhastCons by 0.08 0.10 0.14 0.18 -3kb Start End +3kb -3kb Start End +3kb -3kb Start End +3kb

SEs Zebrafish Mouse Human 0.10 0.14 0.18 Conservation Conservation Conservation by PhastCons by by PhastCons by PhastCons by 0.06 0.10 0.14 0.10 0.14 0.18 -3kb Start End +3kb -3kb Start End +3kb -3kb Start End +3kb B C 130 Typical enhancers SEs 12kb Tags 0 Zebrafish Zebrafish clybl zic5 zic2a 3,651 1,157 287 240 180 3kb 1,306 1,621

503 Tags 0 2,905 Zic5 Zic2 Mouse 4,057 2,037 Mouse 2610035F20Rik 3,529 2,827 3,317 3,557 39 5kb % Int.Dif. % Int.Dif.

31 69 23 77 Tags Human Human 0 27 73 8 92 LOC101927437 ZIC2 26 74 6 94 ZIC5 LINC00554 D

SEs

Conserved association

Non-conserved association Conservation by PhastCons Conservation by 0.0 0.1 0.2 0.3 0.4 0.5 0.6

Brain Middle Olfactory Angular Anterior Inferior Cerebellum bulb gyrus Cingulategyrus caudate Hippocampusmiddle frontal lobe temporal lobe Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor LaboratoryPérez-Rico203679_Fig5 Press

A Genome-wide comparison Comparison within SEs

ATAC−seq Nanog ChIP−seq ATAC−seq Nanog ChIP−seq peaks peaks peaks peaks

33,101 9,973 14,870 1,047 1,264 980

p-value = e-2917.71 p-value = e-7462.03 B De-novo found motifs JASPAR matrix models

2 oligos_7nt_m2 MA0143.3 SOX2 2 1

bits 1 bits

0 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 8 9 11 5’ 10 12 13 14 15 16 17 3’ 5’ 3’ 1,476 sites wwtcArGGCCwTTGkkw 585 sites Ncorr = 0.401

2 oligos_6nt_m3 MA0077.1 SOX9 MA0592.2 ESRRA 2 2 1 bits 1 1 bits bits

0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 0 0 1 2 3 4 5 6 7 8 9 11 10 12 13 14 15 16 17 11 5’ 3’ 5’ 3’ 5’ 10 3’ 421 sites 76 sites 7,063 sites Ncorr = 0.479 Ncorr = 0.531 C oligos_7nt_m2 oligos_6nt_m3

Molecular function Molecular function Wiki pathways

protein heterodimerization activity GO:0046982 DNA binding GO:0003677 noncanonical wnt pathway WP215 protein dimerization activity GO:0046983 protein dimerization activity GO:0046983 canonical wnt - zebrafish WP566 protein binding GO:0005515 protein heterodimerization activity GO:0046982 FGF signaling pathway WP152 DNA binding GO:0003677 seq. specific DNA binding TF activity GO:0003700 Id signaling pathway WP1374 nucleic acid binding GO:0003676 protein binding GO:0005515 Nodal signaling pathway WP341 Wnt signaling pathway WP1325

Log10 (FDR q-value) Log10 (FDR q-value) Wnt signaling pathway and pluripotency WP1344 -45 -35 -25 -15 -5 -14 -10 -6 -2 Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor LaboratoryPérez-Rico203679_Fig6 Press

A SE-irf2bpl ABCD EF G HIJ K 80 3kb 250 H3K27ac 24kb H3K27ac 0 0 ATAC-seq TFBSs Nanog Vertebrate Vertebrate Cons Cons Lrrc74a Irf2bpl Cipc Zdhhc22 si:ch211-185a18.2 irf2bpl B D region K region op op

op op 200 µm 200 µm

C SE-zic2a LMORSNPQ TU 55 7kb 190 H3K27ac 3kb 0 H3K27ac ATAC-seq 0 TFBSs Nanog Vertebrate Vertebrate Cons Cons Zic5 Zic2 zic5 2610035F20Rik zic2a D P region h

t T region sc Q region rp h

r t S region 500 µm t nt 500 µm Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press

Comparative analyses of super-enhancers reveal conserved elements in vertebrate genomes

Yuvia A. Pérez Rico, Valentina Boeva, Allison C. Mallory, et al.

Genome Res. published online December 13, 2016 Access the most recent version at doi:10.1101/gr.203679.115

Supplemental http://genome.cshlp.org/content/suppl/2017/01/17/gr.203679.115.DC1 Material

P

Accepted Peer-reviewed and accepted for publication but not copyedited or typeset; Manuscript accepted manuscript is likely to differ from the final, published version.

Creative This article is distributed exclusively by Cold Spring Harbor Laboratory Press for Commons the first six months after the full-issue publication date (see License http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

Email Alerting Receive free email alerts when new articles cite this article - sign up in the box at Service the top right corner of the article or click here.

Advance online articles have been peer reviewed and accepted for publication but have not yet appeared in the paper journal (edited, typeset versions may be posted when available prior to final publication). Advance online articles are citable and establish publication priority; they are indexed by PubMed from initial publication. Citations to Advance online articles must include the digital object identifier (DOIs) and date of initial publication.

To subscribe to Genome Research go to: https://genome.cshlp.org/subscriptions

Published by Cold Spring Harbor Laboratory Press