Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions
Total Page:16
File Type:pdf, Size:1020Kb
LETTER doi:10.1038/nature11082 Topological domains in mammalian genomes identified by analysis of chromatin interactions Jesse R. Dixon1,2,3, Siddarth Selvaraj1,4, Feng Yue1, Audrey Kim1, Yan Li1, Yin Shen1, Ming Hu5, Jun S. Liu5 & Bing Ren1,6 The spatial organization of the genome is intimately linked to its high quality and accurately capture the higher order chromatin struc- biological function, yet our understanding of higher order genomic tures in mammalian cells. structure is coarse, fragmented and incomplete. In the nucleus of We next visualized two-dimensional interaction matrices using a eukaryotic cells, interphase chromosomes occupy distinct chro- variety of bin sizes to identify interaction patterns revealed as a result of mosome territories, and numerous models have been proposed our high sequencing depth (Supplementary Fig. 7). We noticed that at for how chromosomes fold within chromosome territories1. These bin sizes less than 100 kilobases (kb), highly self-interacting regions models, however, provide only few mechanistic details about the begin to emerge (Fig. 1a and Supplementary Fig. 7, seen as ‘triangles’ relationship between higher order chromatin structure and genome on the heat map). These regions, which we term topological domains, function. Recent advances in genomic technologies have led to rapid are bounded by narrow segments where the chromatin interactions advances in the study of three-dimensional genome organiza- appear to end abruptly. We hypothesized that these abrupt transitions tion. In particular, Hi-C has been introduced as a method for iden- may represent boundary regions in the genome that separate topo- tifying higher order chromatin interactions genome wide2. Here we logical domains. investigate the three-dimensional organization of the human and To identify systematically all such topological domains in the mouse genomes in embryonic stem cells and terminally differen- genome, we devised a simple statistic termed the directionality index tiated cell types at unprecedented resolution. We identify large, to quantify the degree of upstream or downstream interaction bias for megabase-sized local chromatin interaction domains, which we a genomic region, which varies considerably at the periphery of the term ‘topological domains’, as a pervasive structural feature of the topological domains (Fig. 1b; see Supplementary Methods for details). genome organization. These domains correlate with regions of the The directionality index was reproducible (Supplementary Table 2) genome that constrain the spread of heterochromatin. The domains and pervasive, with 52% of the genome having a directionality are stable across different cell types and highly conserved across index that was not expected by random chance (Fig. 1c, false discovery species, indicating that topological domains are an inherent rate 5 1%). We then used a Hidden Markov model (HMM) based on property of mammalian genomes. Finally, we find that the the directionality index to identify biased ‘states’ and therefore infer boundaries of topological domains are enriched for the insulator the locations of topological domains in the genome (Fig. 1a; see binding protein CTCF, housekeeping genes, transfer RNAs and Supplementary Methods for details). The domains defined by HMM short interspersed element (SINE) retrotransposons, indicating were reproducible between replicates (Supplementary Fig. 8). that these factors may have a role in establishing the topological Therefore, we combined the data from the HindIII replicates and domain structure of the genome. identified 2,200 topological domains in mouse ES cells with a median To study chromatin structure in mammalian cells, we determined size of 880 kb that occupy ,91% of the genome (Supplementary genome-wide chromatin interaction frequencies by performing the Fig. 9). As expected, the frequency of intra-domain interactions is Hi-C experiment2 in mouse embryonic stem (ES) cells, human ES cells, higher than inter-domain interactions (Fig. 1d, e). Similarly, FISH and human IMR90 fibroblasts. Together with Hi-C data for the mouse probes6 in the same topological domain (Fig. 1f) are closer in nuclear cortex generated in a separate study (Y. Shen et al., manuscript in space than probes in different topological domains (Fig. 1g), despite preparation), we analysed over 1.7-billion read pairs of Hi-C data similar genomic distances between probe pairs (Fig. 1h, i). These find- corresponding to pluripotent and differentiated cells (Supplemen- ings are best explained by a model of the organization of genomic DNA tary Table 1). We normalized the Hi-C interactions for biases in the into spatial modules linked by short chromatin segments. We define data (Supplementary Figs 1 and 2)3. To validate the quality of our Hi-C the genomic regions between topological domains as either ‘topo- data, we compared the data with previous chromosome conformation logical boundary regions’ or ‘unorganized chromatin’, depending on capture (3C), chromosome conformation capture carbon copy (5C), their sizes (Supplementary Fig. 9). and fluorescence in situ hybridization (FISH) results4–6. Our IMR90 We next investigated the relationship between the topological Hi-C data show a high degree of similarity when compared to a previ- domains and the transcriptional control process. The Hoxa locus is ously generated 5C data set from lung fibroblasts (Supplementary Fig. 4). separated into two compartments by an experimentally validated insu- In addition, our mouse ES cell Hi-C data correctly recovered a previ- lator4,7,8, which we observed corresponds to a topological domain ously described cell-type-specific interaction at the Phc1 gene5 boundary in both mouse (Fig. 1a) and human (Fig. 2a). Therefore, (Supplementary Fig. 5). Furthermore, the Hi-C interaction frequencies we hypothesized that the boundaries of the topological domains might in mouse ES cells are well-correlated with the mean spatial distance correspond to insulator or barrier elements. separating six loci as measured by two-dimensional FISH6 Many known insulator or barrier elements are bound by the zinc- (Supplementary Fig. 6), demonstrating that the normalized Hi-C data finger-containing protein CTCF (refs 9–11). We see a strong enrich- can accurately reproduce the expected nuclear distance using an inde- ment of CTCF at the topological boundary regions (Fig. 2b and pendent method. These results demonstrate that our Hi-C data are of Supplementary Fig. 10), indicating that topological boundary regions 1Ludwig Institute for Cancer Research, 9500 Gilman Drive, La Jolla, California 92093, USA. 2Medical Scientist Training Program, University of California, San Diego, La Jolla, California 92093, USA. 3Biomedical Sciences Graduate Program, University of California, San Diego, La Jolla, California 92093, USA. 4Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, California 92093, USA. 5Department of Statistics, Harvard University, 1 Oxford Street, Cambridge, Massachusetts 02138, USA. 6University of California, San Diego School of Medicine, Department of Cellular and Molecular Medicine, Institute of Genomic Medicine, UCSD Moores Cancer Center, 9500 Gilman Drive, La Jolla, California 92093, USA. 00 MONTH 2012 | VOL 000 | NATURE | 1 ©2012 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER a 100 CS5 insulator a 60 counts interacting 0 Normalized Chr6: 50000000 51000000 52000000 53000000 54000000 Domains - 50 Normalized DI interacting counts 0 –50_ Chr7: 27000000 27500000 HMM state Domains 5 - _ CTCF 30 0.2 _ - H3K4me3 5 - DI 0.3_ –30 _ RNA PolII 5 - HOXA1 EVX1 HIBADH 0.5_ SKAP2 BC031342 HOXA7 NS5ATP1 JAZF1 p300 3 - HOXA2 HOXA9 BC034444 TSL-A 0.2_ HOXA3 HOXA10 TAX1BP1 H3K4me1 3 - HOXA4 HOXA11 0.2_ HOXA5 HOXA11AS Stk31 Npy Mpp6 Cycs Cbx3 Skap2 Evx1 Jazf1 Creb5 Chn2 HOXA6 HOXA13 2410003K15Rik Dfna5 Npvf Nfe2l3 Hoxa1 Hibadh Tril Igf2bp3 Osbpl3 Hnrnpa2b1 Hoxa2 Hoxa9 Tax1bp1 Cpvl b CTCF c All CTCF sites Tra2a 5430402O13Rik Mir148a Hoxa3 9430076C15Rik 0.3 Ccdc126 C530044C16Rik Snx10 Hoxa4 Mir196b 31,968 D330028D13Rik Hoxa5 Hoxa10 Hoxa6 Hoxa11 Mira Hoxa13 0.2 Boundary Hoxa7 5730457N03Rik associated 4,846 Interactions downstream c 0.1 b 1.0 DI (actual) Non-boundary B e 0.8 DI (random) associated 0.6 False positive 0 27,122 –500 kb Boundary +500 kb 0.4 rate 1% CTCF binding sites per 10 kb P -value = 1.65 0.2 Distance of 80-kb A 1 – Empirical d hESC IMR90 mESC Cortex Interactions upstream cumulative density 0 f 0 10 20 30 40 50 Boundary Putative boundary DI (absolute value) separates two d LAD domains 40 Intra-domain × Inter-domain 10 30 –126 Boundary AB of bias Degree separates LAD and Biased 20 Normalized interacting counts non-LAD domain downstream 100 200 300 400 500 600 700 10 0 interaction counts Biased Median normalized 0 0 0.5 1.0 1.5 2.0 upstream Genomic distance (Mb) Intra Inter Boundary separates two ‘Intra-domain’ ‘Inter-domain’ non-LAD domains 1,754 shared boundaries 1,754 shared boundaries 1,159 shared f Chr11: 96200000 96300000 g Chr2: 74500000 74600000 FISH probes: FISH probes: Domain 1 Domain 2 Domain HMM state HMM state Boundary Boundary Boundary Boundary Boundary 50 - 50 - ± 500 kb ± 500 kb ± 500 kb ± 500 kb ± 500 kb mESC DI mESC DI log2 (H3K9me3/input) log2 (H3K9me3/input) log2 (Dam–laminB1/Dam) –50_ –50_ Gm53 Hoxb3 Gm11529 Lnp Evx2 Hoxd1 e 03.0 03.0 3.0 –3.0 Hoxb13 Hoxb8 Hoxb1 Skap1 Hoxd13 Mtx2 50 Mir196a-1 Hoxb2 Hoxd12 Hoxd3 Hoxb9 Mir10a Hoxd11 Hoxd4 Hoxb7 Hoxb4 Hoxd10 Mir10b Hoxb6 Hoxd9 Hoxb5 Hoxd8 Normalized hiGenomic distance Measured distance 0 interacting counts 2 Mb 100 0.12 Chr2: hg18 138000000 139000000 140000000 ) 0.10 80 2 hESC domain _ Intra-domain d 0.08 30 Hoxb cluster 60 hESC DI Inter-domain 0.06 –30_ 40 IMR90 domain _ Hoxd cluster 0.04 30 distance ( 20 0.02 IMR90 DI Squared interprobe Squared –30_ between FISH probes between FISH probes _ Genomic distance (kb) 0 0.00 16 hESC H3K9me3 0 _ Figure 1 | Topological domains in the mouse ES cell genome. a, Normalized _ 16 Hi-C interaction frequencies displayed as a two-dimensional heat map IMR90 H3K9me3 0 _ overlayed on ChIP-seq data (from Y.