LETTER doi:10.1038/nature11082

Topological domains in mammalian genomes identified by analysis of chromatin interactions

Jesse R. Dixon1,2,3, Siddarth Selvaraj1,4, Feng Yue1, Audrey Kim1, Yan Li1, Yin Shen1, Ming Hu5, Jun S. Liu5 & Bing Ren1,6

The spatial organization of the genome is intimately linked to its high quality and accurately capture the higher order chromatin struc- biological function, yet our understanding of higher order genomic tures in mammalian cells. structure is coarse, fragmented and incomplete. In the nucleus of We next visualized two-dimensional interaction matrices using a eukaryotic cells, interphase occupy distinct chro- variety of bin sizes to identify interaction patterns revealed as a result of mosome territories, and numerous models have been proposed our high sequencing depth (Supplementary Fig. 7). We noticed that at for how chromosomes fold within territories1. These bin sizes less than 100 kilobases (kb), highly self-interacting regions models, however, provide only few mechanistic details about the begin to emerge (Fig. 1a and Supplementary Fig. 7, seen as ‘triangles’ relationship between higher order chromatin structure and genome on the heat map). These regions, which we term topological domains, function. Recent advances in genomic technologies have led to rapid are bounded by narrow segments where the chromatin interactions advances in the study of three-dimensional genome organiza- appear to end abruptly. We hypothesized that these abrupt transitions tion. In particular, Hi-C has been introduced as a method for iden- may represent boundary regions in the genome that separate topo- tifying higher order chromatin interactions genome wide2. Here we logical domains. investigate the three-dimensional organization of the human and To identify systematically all such topological domains in the mouse genomes in embryonic stem cells and terminally differen- genome, we devised a simple statistic termed the directionality index tiated cell types at unprecedented resolution. We identify large, to quantify the degree of upstream or downstream interaction bias for megabase-sized local chromatin interaction domains, which we a genomic region, which varies considerably at the periphery of the term ‘topological domains’, as a pervasive structural feature of the topological domains (Fig. 1b; see Supplementary Methods for details). genome organization. These domains correlate with regions of the The directionality index was reproducible (Supplementary Table 2) genome that constrain the spread of heterochromatin. The domains and pervasive, with 52% of the genome having a directionality are stable across different cell types and highly conserved across index that was not expected by random chance (Fig. 1c, false discovery species, indicating that topological domains are an inherent rate 5 1%). We then used a Hidden Markov model (HMM) based on property of mammalian genomes. Finally, we find that the the directionality index to identify biased ‘states’ and therefore infer boundaries of topological domains are enriched for the insulator the locations of topological domains in the genome (Fig. 1a; see binding CTCF, housekeeping , transfer RNAs and Supplementary Methods for details). The domains defined by HMM short interspersed element (SINE) retrotransposons, indicating were reproducible between replicates (Supplementary Fig. 8). that these factors may have a role in establishing the topological Therefore, we combined the data from the HindIII replicates and domain structure of the genome. identified 2,200 topological domains in mouse ES cells with a median To study chromatin structure in mammalian cells, we determined size of 880 kb that occupy ,91% of the genome (Supplementary genome-wide chromatin interaction frequencies by performing the Fig. 9). As expected, the frequency of intra-domain interactions is Hi-C experiment2 in mouse embryonic stem (ES) cells, human ES cells, higher than inter-domain interactions (Fig. 1d, e). Similarly, FISH and human IMR90 fibroblasts. Together with Hi-C data for the mouse probes6 in the same topological domain (Fig. 1f) are closer in nuclear cortex generated in a separate study (Y. Shen et al., manuscript in space than probes in different topological domains (Fig. 1g), despite preparation), we analysed over 1.7-billion read pairs of Hi-C data similar genomic distances between probe pairs (Fig. 1h, i). These find- corresponding to pluripotent and differentiated cells (Supplemen- ings are best explained by a model of the organization of genomic DNA tary Table 1). We normalized the Hi-C interactions for biases in the into spatial modules linked by short chromatin segments. We define data (Supplementary Figs 1 and 2)3. To validate the quality of our Hi-C the genomic regions between topological domains as either ‘topo- data, we compared the data with previous chromosome conformation logical boundary regions’ or ‘unorganized chromatin’, depending on capture (3C), chromosome conformation capture carbon copy (5C), their sizes (Supplementary Fig. 9). and fluorescence in situ hybridization (FISH) results4–6. Our IMR90 We next investigated the relationship between the topological Hi-C data show a high degree of similarity when compared to a previ- domains and the transcriptional control process. The Hoxa locus is ously generated 5C data set from lung fibroblasts (Supplementary Fig. 4). separated into two compartments by an experimentally validated insu- In addition, our mouse ES cell Hi-C data correctly recovered a previ- lator4,7,8, which we observed corresponds to a topological domain ously described cell-type-specific interaction at the Phc1 gene5 boundary in both mouse (Fig. 1a) and human (Fig. 2a). Therefore, (Supplementary Fig. 5). Furthermore, the Hi-C interaction frequencies we hypothesized that the boundaries of the topological domains might in mouse ES cells are well-correlated with the mean spatial distance correspond to insulator or barrier elements. separating six loci as measured by two-dimensional FISH6 Many known insulator or barrier elements are bound by the zinc- (Supplementary Fig. 6), demonstrating that the normalized Hi-C data finger-containing protein CTCF (refs 9–11). We see a strong enrich- can accurately reproduce the expected nuclear distance using an inde- ment of CTCF at the topological boundary regions (Fig. 2b and pendent method. These results demonstrate that our Hi-C data are of Supplementary Fig. 10), indicating that topological boundary regions

1Ludwig Institute for Cancer Research, 9500 Gilman Drive, La Jolla, California 92093, USA. 2Medical Scientist Training Program, University of California, San Diego, La Jolla, California 92093, USA. 3Biomedical Sciences Graduate Program, University of California, San Diego, La Jolla, California 92093, USA. 4Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, California 92093, USA. 5Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, Massachusetts 02138, USA. 6University of California, San Diego School of Medicine, Department of Cellular and Molecular Medicine, Institute of Genomic Medicine, UCSD Moores Cancer Center, 9500 Gilman Drive, La Jolla, California 92093, USA.

00 MONTH 2012 | VOL 000 | NATURE | 1 ©2012 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

a 100 CS5 insulator a 60 counts

interacting 0 Normalized Chr6: 50000000 51000000 52000000 53000000 54000000 Domains - 50 Normalized

DI interacting counts 0 –50_ Chr7: 27000000 27500000 HMM state Domains 5 - _ CTCF 30 0.2 _ - H3K4me3 5 - DI 0.3_ –30 _ RNA PolII 5 - HOXA1 EVX1 HIBADH 0.5_ SKAP2 BC031342 HOXA7 NS5ATP1 JAZF1 p300 3 - HOXA2 HOXA9 BC034444 TSL-A 0.2_ HOXA3 HOXA10 TAX1BP1 H3K4me1 3 - HOXA4 HOXA11 0.2_ HOXA5 HOXA11AS Stk31 Npy Mpp6 Cycs Cbx3 Skap2 Evx1 Jazf1 Creb5 Chn2 HOXA6 HOXA13 2410003K15Rik Dfna5 Npvf Nfe2l3 Hoxa1 Hibadh Tril Igf2bp3 Osbpl3 Hnrnpa2b1 Hoxa2 Hoxa9 Tax1bp1 Cpvl b CTCF c All CTCF sites Tra2a 5430402O13Rik Mir148a Hoxa3 9430076C15Rik 0.3 Ccdc126 C530044C16Rik Snx10 Hoxa4 Mir196b 31,968 D330028D13Rik Hoxa5 Hoxa10 Hoxa6 Hoxa11 Mira Hoxa13 0.2 Boundary Hoxa7 5730457N03Rik associated 4,846 Interactions downstream c 0.1 b 1.0 DI (actual) Non-boundary B e 0.8 DI (random) associated 0.6 False positive 0 27,122 –500 kb Boundary +500 kb 0.4 rate 1% CTCF binding sites per 10 kb P -value = 1.65 × 10

0.2 Distance of 80-kb A 1 – Empirical d hESC IMR90 mESC Cortex Interactions upstream cumulative density 0 f 0 10 20 30 40 50 Boundary Putative boundary DI (absolute value) separates two d LAD domains 40 Intra-domain Inter-domain 30 –126 Boundary AB Degree of bias separates LAD and Biased 20

Normalized interacting counts non-LAD domain

downstream 100 200 300 400 500 600 700 10 0 interaction counts Biased Median normalized 0 0 0.5 1.0 1.5 2.0 upstream Genomic distance (Mb) Intra Inter Boundary separates two ‘Intra-domain’ ‘Inter-domain’ non-LAD domains 1,754 shared boundaries 1,754 shared boundaries 1,159 shared f Chr11: 96200000 96300000 g Chr2: 74500000 74600000 FISH probes: FISH probes: Domain 1 Domain 2 Domain HMM state HMM state Boundary Boundary Boundary Boundary Boundary 50 - 50 - ± 500 kb ± 500 kb ± 500 kb ± 500 kb ± 500 kb mESC DI mESC DI log2 (H3K9me3/input) log2 (H3K9me3/input) log2 (Dam–laminB1/Dam) –50_ –50_ Gm53 Hoxb3 Gm11529 Lnp Evx2 Hoxd1 e 03.0 03.0 3.0 –3.0 Hoxb13 Hoxb8 Hoxb1 Skap1 Hoxd13 Mtx2 50 Mir196a-1 Hoxb2 Hoxd12 Hoxd3 Hoxb9 Mir10a Hoxd11 Hoxd4 Hoxb7 Hoxb4 Hoxd10 Mir10b Hoxb6 Hoxd9 Hoxb5 Hoxd8 Normalized hiGenomic distance Measured distance 0 interacting counts 2 Mb 100 0.12 Chr2: hg18 138000000 139000000 140000000

) 0.10

80 2 hESC domain Intra-domain _ 0.08 30 Hoxb cluster 60 hESC DI Inter-domain 0.06 –30_ 40 IMR90 domain _ Hoxd cluster 0.04 30 distance ( d 20 0.02 IMR90 DI

Squared interprobe Squared –30_ between FISH probes

between FISH probes _ Genomic distance (kb) 0 0.00 16 hESC H3K9me3 0 _ Figure 1 | Topological domains in the mouse ES cell genome. a, Normalized _ 16 Hi-C interaction frequencies displayed as a two-dimensional heat map IMR90 H3K9me3 0 _ overlayed on ChIP-seq data (from Y. Shen et al., manuscript in preparation), THSD7B SPOPL HNMT NXPH2 directionality index (DI), HMM bias state calls, and domains. For both LOC647012 directionality index and HMM state calls, downstream bias (red) and upstream bias (green) are indicated. b, Schematic illustrating topological domains and Figure 2 | Topological boundaries demonstrate classical insulator or resulting directional bias. c, Distribution of the directionality index (absolute barrier element features. a, Two-dimensional heat map surrounding the Hoxa value, in blue) compared to random (red). d, Mean interaction frequencies at all locus and CS5 insulator in IMR90 cells. b, Enrichment of CTCF at boundary genomic distances between 40 kb to 2 Mb. Above 40 kb, the intra- versus inter- regions. c, The portion of CTCF binding sites that are considered ‘associated’ domain interaction frequencies are significantly different (P , 0.005, Wilcoxon with a boundary (within 620-kb window is used as the expected uncertainty test). e, Box plot of all interaction frequencies at 80-kb distance. Intra-domain due to 40-kb binning). d, Heat maps of H3K9me3 at boundary sites in human interactions are enriched for high-frequency interactions. f–i, Diagram of intra- and mouse. e, UCSC Genome Browser shot showing heterochromatin domain (f) and inter-domain FISH probes (g) and the genomic distance spreading in the human ES cells (hESC) and IMR90 cells. The two-dimensional between pairs (h). i, Bar chart of the squared inter-probe distance (from ref. 6) heat map shows the interaction frequency in human ES cells. f,Heatmapof FISH probe pairs. mESC, mouse ES cell. Error bars indicate standard error LADs (from ref. 14) surrounding the boundary regions. Scale is the log2 ratio of (n 5 100 for each probe pair). DNA adenosine methylation (Dam)–lamin B1 fusion over Dam alone (Dam– laminB1/Dam). share this feature of classical insulators. A classical boundary element Fig. 2d are present in both pluripotent cells and their differentiated is also known to stop the spread of heterochromatin. Therefore, we progeny, the topological domains and boundaries appear to pre-mark examined the distribution of the heterochromatin mark H3K9me3 in the end points of heterochromatic spreading. Therefore, the domains humans and mice in relation to the topological domains12,13. Indeed, do not seem to be a consequence of the formation of heterochromatin. we observe a clear segregation of H3K9me3 at the boundary regions Taken together, the above observations strongly suggest that the topo- that occurs predominately in differentiated cells (Fig. 2d, e and logical domain boundaries correlate with regions of the genome dis- Supplementary Fig. 11). As the boundaries that we analysed in playing classical insulator and barrier element activity, thus revealing a

2 | NATURE | VOL 000 | 00 MONTH 2012 ©2012 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH potential link between the topological domains and transcriptional a mESC only Cortex only hESC only IMR90 only control in the mammalian genome. 776 169 678 504 We compared the topological domains with previously described domain-like organizations of the genome, specifically with the A and B Overlap Overlap compartments described by ref. 2, with lamina-associated domains 893 1,289 (LADs)10,14, replication time zones15,16, and large organized chromatin 17 K9 modification (LOCK) domains . In all cases, we can see that topo- Cortex-enriched logical domains are related to, but independent from, each of these b dynamic interacting region 40 previously described domain-like structures (Supplementary Figs 12– 40 counts

15). Notably, a subset of the domain boundaries we identify appear to counts interaction Normalized 0

interaction 0 Normalized mark the transition between either LAD and non-LAD regions of the 400 kb 400 kb Chr12 51000000 Chr12 51000000 - - H3K4me3 5 5 genome (Fig. 2f and Supplementary Fig. 12), the A and B compart- 0.3_ 0.3_ - - RNA Pol II 5 5 ments (Supplementary Fig. 13, 14), and early and late replicating chro- 0.5_ 0.5_ - - CTCF 5 5 matin (Supplementary Fig. 14). Lastly, we can also confirm the 0.2_ 0.2_ H3K4me1 3 - 3 - previously reported similarities between the A and B compartments 0.2_ 0.2_ Foxg1 Foxg1 and early and late replication time zone (Supplementary Fig. 16)16. cd15 Genes at Nanog e We next compared the locations of topological boundaries iden- mESC-specific mESC-enriched Cortex-enriched interactions Phc1 interactions interactions tified in both replicates of mouse ES cells and cortex, or between both 12 replicates of human ES cells and IMR90 cells. In both human and Grik2 9 (glutamate 1,272 (96.6%) mouse, most of the boundary regions are shared between cell types Genes at receptor) 8,616 (98.2%) (Fig. 3a and Supplementary Fig. 17a), suggesting that the overall 6 cortex-specific interactions Snca domain structure between cell types is largely unchanged. At the 3 (alpha- boundaries called in only one cell type, we noticed that trend of Foxg1 RNA-seq (r.p.k.m.) Synuclein) 43 (3.4%) 154 (1.8%) 0 +3 –3 Intra-domain upstream and downstream bias in the directionality index is still readily mESC r.p.k.m. mESCCortex log Inter-domain apparent and highly reproducible between replicates (Supplementary 2 ( cortex r.p.k.m.) Fig. 17b, c). We cannot determine if the differences in domain calls f mESC in humans hESC hESC in mouse mESC between cell types is due to noise in the data or to biological phenomena, 1,944 total 3,030 total 2,792 total 2,117 total such as a change in the strength of the boundary region between cell Overlap Overlap types18. Regardless, these results indicate that the domain boundaries 1,476 1,502 are largely invariant between cell types. Lastly, only a small fraction of the boundaries show clear differences between two cell types, suggesting Mouse to human Human to mouse P < 2.2 × 10–16 P < 2.2 × 10–16 that a relatively rare subset of boundaries may actually differ between g cell types (Supplementary Fig. 18). 80 Mouse ESC The stability of the domains between cell types is surprising given counts interaction previous evidence showing cell-type-specific chromatin interactions Normalized 0 and conformations5,7. To reconcile these results, we identified cell- 96000000 type-specific chromatin interactions between mouse ES cell and mouse Chr10: 95000000 - cortex. We identified 9,888 dynamic interacting regions in the mouse 50 –50 _ genome based on 20-kb binning using a binomial test with an empirical Ccdc41 2310039L15Rik 5730420D15Rik Btg1 4932415G12Rik Socs2 Eea1 4930556N09Rik false discovery rate of ,1% based on random permutation of the Plxnc1 Mrpl42 Mir3058 Cradd Ube2n replicate data. These dynamic interacting regions are enriched for Nudt4 h differentially expressed genes (Fig. 3b–d, Supplementary Fig. 19 and 60 Human ESC Supplementary Table 5). In fact, 20% of all genes that undergo a four- counts

fold change in expression are found at dynamic interacting loci. interaction Normalized 0 This is probably an underestimate, because by binning the genome at 20 kb, any dynamic regulatory interaction less than 20 kb will be Chr12: 93000000 92000000 91000000 missed. Lastly, .96% of dynamic interacting regions occur in the same 50 _ –50 _ domain (Fig. 3e). Therefore, we favour a model where the domain PLXNC1 SOCS2 NUDT4 C12orf74 LOC256021 CCDC41 CRADD LOC643339 BTG1 organization is stable between cell types, but the regions within each LOC144486 LOC144481 PLEKHG7 CLLU1 domain may be dynamic, potentially taking part in cell-type-specific MRPL42 UBE2N EEA1 CLLU1OS regulatory events. Figure 3 | Boundaries are shared across cell types and conserved in The stability of the domains between cell types prompted us to evolution. a, Overlap of boundaries between cell types. b, Genome browser investigate if the domain structure is also conserved across evolution. shot of a cortex enriched dynamic interacting region that overlaps with the To address this, we compared the domain boundaries between mouse Foxg1 gene. c, Foxg1 expression in reads per kilobase per million reads ES cells and human ES cells using the UCSC liftover tool. Most of the sequenced (r.p.k.m.) in mouse ES cells and cortex as measured by RNA-seq. boundaries appear to be shared across evolution (53.8% of human d, Heat map of the gene expression ratio between mouse ES cell and cortex of genes at dynamic interactions. e, Pie chart of inter- and intra-domain dynamic boundaries are boundaries in mouse and 75.9% of mouse boundaries interactions. f, Overlap of boundaries between syntenic mouse and human are boundaries in humans, compared to 21.0% and 29.0% at random, 216 216 sequences (P , 2.2 3 10 compared to random, Fisher’s exact test). P value ,2.2 3 10 , Fisher’s exact test; Fig. 3f). The syntenic regions g, h, Genome browser shots showing domain structure over a syntenic region in in mouse and human in particular share a high degree of similarity in the mouse (g) and human (h) ES cells. Note: the region in humans has been their higher order chromatin structure (Fig. 3g, h), indicating that inverted from its normal UCSC coordinates for proper display purposes. there is conservation of genomic structure beyond the primary sequence of DNA. binding sites are located within boundary regions (Fig. 2c). Thus, We explored what factors may contribute to the formation of topo- CTCF binding alone is insufficient to demarcate domain boundaries. logical boundary regions in the genome. Although most topological We reasoned that additional factors might be associated with topo- boundaries are enriched for the binding of CTCF, only 15% of CTCF logical boundary regions. By examining the enrichment of a variety of

00 MONTH 2012 | VOL 000 | NATURE | 3 ©2012 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER histone modifications, chromatin binding and transcription Finally, we analysed the enrichment of repeat classes around boundary factors around topological boundary regions in mouse ES cells, we elements. We observed that Alu/B1 and B2 SINE retrotransposons in observed that factors associated with active promoters and gene bodies mouse and Alu SINE elements in humans are enriched at boundary are enriched at boundaries in both mouse and humans (Fig. 4a and regions (Fig. 4a and Supplementary Figs 24 and 25). In light of recent Supplementary Figs 20–23)19,20. In contrast, non-promoter-associated reports indicating that a SINE B2 element functions as a boundary in marks, such as H3K4me1 (associated with enhancers) and H3K9me3, mice24, and SINE element retrotransposition may alter CTCF binding were not enriched or were specifically depleted at boundary regions sites during evolution25, we believe that this contributes to a growing (Fig. 4a). Furthermore, transcription start sites (TSS) and global run on body of evidence indicating a role for SINE elements in the organiza- sequencing (GRO-seq)21 signal were also enriched around topological tion of the genome. boundaries (Fig. 4a). We found that housekeeping genes were particu- In summary, we show that the mammalian chromosomes are seg- larly strongly enriched near topological boundary regions (Fig. 4b–d; mented into megabase-sized topological domains, consistent with see Supplementary Table 7 for complete GO terms enrichment). some previous models of the higher order chromatin structure1,26,27. Additionally, the tRNA genes, which have the potential to function Such spatial organization seems to be a general property of the genome: as boundary elements22,23, are also enriched at boundaries (P value it is pervasive throughout the genome, stable across different cell types ,0.05, Fisher’s exact test; Fig. 4b). These results suggest that high levels and highly conserved between mice and humans. of transcription activity may also contribute to boundary formation. In We have identified multiple factors that are associated with the support of this, we can see examples of dynamic changes in H3K4me3 boundary regions separating topological domains, including the insu- at or near some cell-type-specific boundaries that are cell-type specific lator binding factor CTCF, housekeeping genes and SINE elements. (Supplementary Fig. 24). Indeed, boundaries associated with both The association of housekeeping genes with boundary regions extends CTCF and a housekeeping gene account for nearly one-third of all topo- previous studies in yeast and insects and suggests that non-CTCF logical boundaries in the genome (Fig. 4e and Supplementary Fig. 24). factors may also be involved in insulator/barrier functions in mam- malian cells28. The topological domains we identified are well conserved between a H3K4me3 H3K36me3 H3K4me1 H3K9me3 ) ) ) ) –1 –1 –1 –2 6 –1 mice and humans. This indicates that the sequence elements and

2 0 mechanisms that are responsible for establishing higher order struc- –4 4 tures in the genome may be relatively ancient in evolution. A similar –2 1 –6 2 partitioning of the genome into physical domains has also been

(mark/input) ( × 10 (mark/input) ( × 10 29 2 2 observed in Drosophila embryos and in high-resolution studies of

0 log –8 log Peaks per 10 kb ( × Peaks per 10 kb ( × 0 –4

) –500 kb Boundary +500 kb –500 kb Boundary +500 kb –500 kb Boundary +500 kb –500 kb Boundary +500 kb the X-inactivation centre in mice (termed topologically associated –6

) 30 ) 10 TSS GRO-seq –1 IMR90 H3K4me3 mESC domains or TADs) , indicating that topological domains may be a × 4 –1 3 12 1.6 SINE element 10

( × fundamental organizing principle of metazoan genomes. 10 2 2 0.8 8 total TSS) ( × 1 METHODS SUMMARY 6 Cell culture and Hi-C experiments. J1 mouse ES cells were grown on gamma- Repeats per 10-kb

0 Peaks per 10 kb ( × 0 Median r.p.k.m. 0 4 –500 kb Boundary +500 kb –500 kb Boundary +500 kb –500 kb Boundary +500 kb –500 kb Boundary +500 kb irradiated mouse embryonic fibroblasts cells under standard conditions (85% high TSS/(10 kb glucose DMEM, 15% HyClone FBS, 0.1 mM non-essential amino acids, 0.1 mM b c Biological Cellular Molecular b-mercaptoethanol, 1 mM glutamine, LIF 500 U ml21,13 Gibco penicillin/ 2,000 process component function P < 2.2 × 10–16 Observed 10–8–8 Expected (random) 10 streptomycin). Before collecting for Hi-C, J1 mouse ES cells were passaged onto 1,500 feeder free 0.2% gelatin-coated plates for at least two passages to rid the culture of 10–6–6 10 feeder cells. H1 human ES cells and IMR90 fibroblasts were grown as previously 13 1,000 P < 2.2 × 10–16 10–4–4 described . Collecting the cells for Hi-C was performed as previously described, 10 with the only modification being that the adherent cell cultures were dissociated 10–2–2 500 10 with trypsin before fixation. P < 0.05 Sequencing and mapping of data. Hi-C analysis and paired-end libraries were Benjamini corrected P -value Benjamini corrected Boundaries with mark <20 kb 0 2 0 prepared as previously described and sequenced on the Illumina Hi-Seq2000

Cell cycle platform. Reads were mapped to reference human (hg18) or mouse genomes CTCF tRNA Translation Ribosome RNA binding genes Mitochondrion (mm9), and non-mapping reads and PCR duplicates were removed. Two- RNA processing genes 2 Housekeeping dimensional heat maps were generated as previously described .

RibonucleoproteinMembrane complex-enclosed lumen Data analysis. For detailed descriptions of the data analysis, including descrip-

Non-membrane bound organelle Structural constituent of ribosome tions of the directionality index, hidden Markov models, dynamic interactions e identification, and boundary overlap between cells and across species, see d Housekeeping Boundaries and associated marks

) Tissue specific Supplementary Methods. –6 12 20.2% 27.9% CTCF + Other gene CTCF + Received 26 September 2011; accepted 27 March 2012. other gene 3.3% housekeeping 8 gene Published online 11 April 2012. House- keeping 4 6.0% 27.6% 1. Cremer, T. & Cremer, M. Chromosome territories. Cold Spring Harb. Perspect. Biol. CTCF only 2, a003889 (2010). Nothing 14.9% 2. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions

TSS/(10 kb × total TSS) ( 10 0 –500 kb Boundary +500 kb reveals folding principles of the . Science 326, 289–293 (2009). 3. Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates Figure 4 | Boundary regions are enriched for housekeeping genes. systematic biases to characterize global chromosomal architecture. Nature Genet. a, Chromatin modifications, TSS, GRO-seq and SINE elements surrounding 43, 1059–1065 (2011). boundary regions in mouse ES cells or IMR90 cells. b, Boundaries associated 4. Wang, K. C. et al. A long noncoding RNA maintains active chromatin to coordinate with a CTCF binding site, housekeeping gene, or tRNA gene (purple) compared homeotic gene expression. Nature 472, 120–124 (2011). to expected at random (grey). c, P-value chart. d, Enrichment of 5. Kagey, M. H. et al. Mediator and cohesin connect gene expression and chromatin architecture. Nature 467, 430–435 (2010). housekeeping genes (gold) and tissue-specific genes (blue) as defined by 6. Eskeland, R. et al. Ring1B compacts chromatin structure and represses gene Shannon entropy scores near boundaries normalized for the number of genes expression independent of histone ubiquitination. Mol. Cell 38, 452–464 (2010). in each class (TSS/10 kb/total TSS). e, Percentage of boundaries with a given 7. Noordermeer, D. et al. The dynamic architecture of Hox gene clusters. Science 334, mark within 20 kb of the boundaries. 222–225 (2011).

4 | NATURE | VOL 000 | 00 MONTH 2012 ©2012 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

8. Kim, Y. J., Cecchini, K. R. & Kim, T. H. Conserved, developmentally regulated 24. Lunyak, V. V. et al. Developmentally regulated activation of a SINE B2 repeat as a mechanism couples chromosomal looping and heterochromatin barrier domain boundary in organogenesis. Science 317, 248–251 (2007). activity at the homeobox gene A locus. Proc. Natl Acad. Sci. USA 108, 7391–7396 25. Schmidt, D. et al. Waves of retrotransposon expansion remodel genome (2011). organization and CTCF binding in multiple mammalian lineages. Cell. 148, 9. Phillips, J. E. & Corces, V. G. CTCF: master weaver of the genome. Cell 137, 335–348 (2012). 1194–1211 (2009). 26. Jhunjhunwala, S. et al. The 3D structure of the immunoglobulin heavy-chain locus: 10. Guelen, L. et al. Domain organization of human chromosomes revealed by implications for long-range genomic interactions. Cell 133, 265–279 (2008). mapping of nuclear lamina interactions. Nature 453, 948–951 (2008). 27. Capelson, M. & Corces, V. G. Boundary elements and nuclear organization. Biol. 11. Handoko, L. et al. CTCF-mediated functional chromatin interactome in pluripotent Cell 96, 617–629 (2004). cells. Nature Genet. 43, 630–638 (2011). 28. Amouyal, M. Gene insulation. Part I: natural strategies in yeast and Drosophila. 12. Xie, W. et al. Base-resolution analyses of sequence and parent-of-origin dependent Biochem. Cell Biol. 88, 875–884 (2010). DNA methylation in the mouse genome. Cell 148, 816–831 (2012). 29. Sexton, T. et al. Three-dimensional folding and functional organization principles 13. Hawkins, R. D. et al. Distinct epigenomic landscapes of pluripotent and lineage- of the Drosophila genome. Cell 148, 458–472 (2012). committed human cells. Cell Stem Cell 6, 479–491 (2010). 30. Nora, E. P. et al. Spatial partitioning of the regulatorylandscape of the X-inactivation 14. Peric-Hupkes, D. et al. Molecular maps of the reorganization of genome-nuclear centre. Nature http://dx.doi.org/10.1038/nature11049 (this issue). lamina interactions during differentiation. Mol. Cell 38, 603–613 (2010). Supplementary Information is linked to the online version of the paper at 15. Hiratani, I. et al. Genome-wide dynamics of replication timing revealed by in vitro www.nature.com/nature. models of mouse embryogenesis. Genome Res. 20, 155–169 (2010). 16. Ryba, T. et al. Evolutionarily conserved replication timing profiles predict long- Acknowledgements We are grateful for the comments from and discussions with range chromatin interactions and distinguish closely related cell types. Genome Z. Qin, A. Desai and members of the Ren laboratory during the course of the study. We Res. 20, 761–770 (2010). also thank W. Bickmore and R. Eskeland for sharing the FISH data generated in mouse 17. Wen, B., Wu, H., Shinkai, Y., Irizarry, R. A. & Feinberg, A. P. Large histone H3 lysine 9 ES cells. This work was supported by funding from the Ludwig Institute for Cancer dimethylated chromatin blocks distinguish differentiated from embryonic stem Research, California Institute for Regenerative Medicine (CIRM, RN2-00905-1) (to B.R.) cells. Nature Genet. 41, 246–250 (2009). and NIH (B.R. R01GH003991). J.R.D. is funded by a pre-doctoral training grant from 18. Scott, K. C., Taubman, A. D. & Geyer, P. K. Enhancer blocking by the Drosophila CIRM. Y.S. is supported by a postdoctoral fellowship from the Rett Syndrome Research gypsy insulator depends upon insulator anatomy and enhancer strength. Genetics Foundation. 153, 787–798 (1999). Author Contributions J.R.D. and B.R. designed the studies. J.R.D., A.K., Y.L. and Y.S. 19. Bilodeau, S., Kagey, M. H., Frampton, G. M., Rahl, P. B. & Young, R. A. SetDB1 conducted the Hi-C experiments; J.R.D., S.S. and F.Y. carried out the data analysis; J.S.L. contributes to repression of genes encoding developmental regulators and and M.H. provided insight for analysis; F.Y. built the supporting website; J.R.D. and B.R. maintenance of ES cell state. Genes Dev. 23, 2484–2489 (2009). prepared the manuscript. 20. Marson, A. et al. Connecting microRNA genes to the core transcriptional regulatory circuitry of embryonic stem cells. Cell 134, 521–533 (2008). Author Information All Hi-C data described in this study have been deposited in the 21. Min, I. M. et al. Regulating RNA polymerase pausing and transcription elongation in GEO under accession number GSE35156. We have developed a web-based Java tool to embryonic stem cells. Genes Dev. 25, 742–754 (2011). visualize the high-resolution Hi-C data at a genomic region of interest that is available at 22. Donze, D. & Kamakaka, R. T. RNA polymerase III and RNA polymerase II promoter http://chromosome.sdsc.edu/mouse/hi-c/. Reprints and permissions information is complexes are heterochromatin barriers in Saccharomyces cerevisiae. EMBO J. 20, available at www.nature.com/reprints. The authors declare no competing financial 520–531 (2001). interests. Readers are welcome to comment on the online version of this article at 23. Ebersole, T. et al. tRNA genes protect a reporter gene from epigenetic silencing in www.nature.com/nature. Correspondence and requests for materials should be mouse cells. Cell Cycle 10, 2779–2791 (2011). addressed to B.R. ([email protected]).

00 MONTH 2012 | VOL 000 | NATURE | 5 ©2012 Macmillan Publishers Limited. All rights reserved