3DAROC18

Summary day #1

David Castillo, François Serra & Marc A. Marti-Renom Structural Genomics Group (CNAG-CRG) Data groups

Experimental observations Statistical rules

Laws of physics Integrative modeling

Data collection Data interpretation Representation

Modeling Sampling Model analysis LETTERS NATURE

XVI ARTICLES

0 I 0

II 0 Chromosome a 0 c XV III 0 I II III IV V VII VIII IX X XI XII XVI XIII 1 XIV I XV II Downloaded from genome.cshlp.org0 on May 30, 2012 - Published by Cold Spring Harbor Laboratory Press IV III 0.5 XIV IV V 0 0 0 VI VII Molecular Cell XIII V VIII –0.5 0 IX 0 0VI 1.0 Figure 6 Population-based analysis of 3.0 Chromosome X −1 Chr18 XI a bVII XII XII –1.5 The 3D Architecture of a Bacterial Genome 0 XIII chromosome territory localizations in the nucleus. 0 VIII XIV 0 X 2 XI IX Chr19 0 0 −2 X XV 6 XVI 18 10 Principles3 of 3D genome organization in yeast 780 5 2.5 770 760 750 7 740

730 0 (a) The distribution of the radial positions for 10 20 30 8 720 40 710 50 60 12 70 13 700 80 0.8 690 90 I 680 100 1 15 9 4 b 670 110 d 120 660 130 650 140 0 chromosomes 18 (red dashed line) and 19 (blue 640 15 630 160 11 620 170 8 3D modeling of genomic180 domains: other methods 610 190 60 14 2.0 0 200 590 210 580 220 6 1 solid line), calculated from the genome structure 570 230 20 560 550 0 540 10 20 4 530 30 0.6 Centromeres16 520 40 21 19 17 population. Radial positions are calculated for 510 50 Telomeres 500 60 Early origins 490 70 2 1.5 80 80 22 Late origins 4 470 90 Breakpoints (Scer)

100 Percentage of positive interactions 460 110 Breakpoints (Scer and Kwal) 4500 12 44 1 0 0 the center of mass of each chromosome and are 14030 430 00.20.40.60.81 Density 150 A 420 16 410 170 Percentage of negative interactions 400 180 0 190 390 200 380 210 220 230 370 240 250 XIV 360 270 260 350 340 330 320 310 300 290 given as a fraction of the nuclear radius. (b) The 1.0 (i) 280 0.4 (ii) (iii) (iv) Figure 4 | Inter-chromosomal interactions. a 0.0 , Circos diagram showing chromosome I, and a distinct region of corresponding size on chromosome average radial position of all chromosomes plotted interactions between chromosome I and the remaining chromosomes. All 16 XIV. c, Inter-chromosomal interactions between all pairs of the 32 yeast a yeast chromosomesVisualizing are aligned circumferentially, and arcs depict distinct Relative radial position chromosomal arms (the 10 kb region starting from the midpoint 3D of the Genomes inter-chromosomal interactions. Bold red hatch marks correspond to centromere in each arm is excluded). For each chromosome, the shorter arm against their size. Error bars, s.d. (c) Clustering of 0.5 centromeres. To aid visualization of centromere0.5 clustering, these is always placed before the longer arm. Note that the arms of small representations were created using the overlap set of inter-chromosomal 0.2chromosomes tend to interact with one another. The colour scale chromosomes with respect to the average distance interactions identified from both HindIII and EcoRI libraries an FDR corresponds to the natural log of the ratio of the observed versus expected threshold of 1%. Additional heat maps1.1 and Circos diagrams are provided in number of interactions (see Supplementary Materials). d, Enrichment of Supplementary Fig. 9. b, Circos diagram, generated using the inter- interactions between centromeres, telomeres, early origins of replication, 0 chromosomal interactions identified from the HindIII libraries at an FDR and chromosomal breakpoints. To measure enrichment of strong Spatial Organizations of Chromosomes between the center of mass of each chromosome threshold of 1%, depicting the distinct interactions between a small and a interactions with respect to a given class of genomic loci, we use receiver large chromosome (I and XIV, respectively).1.7 of the interactions operating curve (ROC) analysis. pair in the genome structure population. The 0 0.2 0.4 between0.6 these two chromosomes0.8 primarily involve1.0 the entirety of 50 100 150 200 250 chromosomal pairings, except for pairing between the two smallest depict intra-chromosomal folding, we incorporated a metric that Relativearms distance (1R and 9R) (Supplementary2.1 Fig. 16a). However, the preference converts interaction probabilitiesChromosome into nuclear distances (assigningsize (Mb) clustering dendrogram, identifies two for intra-chromosomal arm pairing versus inter-chromosomal arm 130 bp of packed chromatin a length of 1 nm, ref. 30) (Supplemen- b from nuclearpairing center decreased with increasing distance from centromeres tary Figs 17 and 18 and Supplementary Methods). Using this ruler, dominant clusters is shown on . The matrix ofJhunjhunwala (2008) Cell (Supplementary Fig. 16 b–d). These2.5 observations indicate that yeast we calculated the spatial distances between all possible pairings of the Hu (2013) PLoS Computational Biology chromosome arms are highly flexible. 16 centromeres (Supplementary Tables 14 and 15) The results are average distances between pairs of chromosomes Combining our set of 4,097,539 total and 306,312 distinct inter- consistent with previous observations12. Kalhor (2011) Nature Biotechnology! Cluster 2 actions with known spatial distances3.0 that separate sub-nuclear land- The resulting map resembles a water lily, with 32 chromosome c 12 d marks , we derived a three-dimensional map of the yeast genome. To arms jutting out from a base of clustered centromeres (Fig. 5). Tjong (2012) Genome Research is shown at the bottom. The intensity of blue ClusterDuan 1 Minus Probe Genome Position (Mb) (2010) Nature 3.5 I I Figure 5 | Three-dimensionalCluster model 1 of the yeast II II genome. Two views representing two different color increases with decreasing distance. (d) (Left III III IV IV angles are provided. Chromosomes are coloured 4.0 V V VI VI as in Fig. 4a (also indicated in the upper right). All VII VII panels) The density contour plot of the combined VIII VIII chromosomes cluster via centromeres at one pole 0.0IX 0.5 1.1 1.6 2.1 2.5IX 3.1 3.6 4.0 X X of the nucleus (the area within the dashed oval), XI XI 15 10 Nucleic Acids Research, 2010 XII XII while chromosome XII extends outward towards localization probability for all the chromosomes in XIII Plus Probe Genome PositionXIII (Mb) XIV XIV the nucleolus, which is occupied by rDNA repeats 11 XV XV (indicated by the white arrow). After exiting the XVI XVI 1 nucleolus, the remainder of chromosome XII cluster 1 (top panel) and cluster 2 (bottom panel) A B B interacts with the long arm of chromosome IV. 14 10Kb Cluster 1 Cluster 2 Cluster 3 Cluster 4 A10a calculated from all the structures in the genome 20 A1 A2 A3 A4 A5 A6 A7 A9 A11 A13 16 3' 5' structure population. The rainbow color-coding A10b 21 6 19 3' 5' on the central nuclear plane ranges from blue 4 5 22 ©2010 Macmillan Publishers Limited. All rights reserved (minimum value) to red (maximum value).http://genomebiology.com/2009/10/4/R37 17 Genome Biology 2009, Volume 10, Issue 4, Article R37 Fraser et al. R37.12 2 4 (Right panels) A representative genome 6 3 X Cluster 2 2 structure from the genome structure population. 3 HoxA genes Base density (kb) Figure 5.2008 3D Topology of the(a) Immunoglobulin Heavy-Chain Locus(b) o o TECH NICAL REo P ORTS o 2014

10 A2A1 A3 A4 A5 A6 1 Downloaded from Nucleic Acids Research, 2012, Vol. 40, No. 16 7723 Chromosome territories are shownThe for 3D topologyall the of the Igh locus in pre-pro-B and pro-B cells was resolved using trilateration. The relative positions of 12 genomic markers180 spanning the entire 180 180 180 5 A7 A9 A10b A11 A13 immunoglobulin heavy-chain locus were computed. Two different views are shown for both cell types. 0 7 0 20 40 60 80 100 120 140 160 chromosomes in cluster 1 (top) and(A) 3D all Topology the of the Igh locus in pre-pro-B cells. D 8 Figure 4 3D models of the ENm008Genomic ENCODE Position (kb) region containing the chromosomes in clusters 2 (bottom).(B) 3D TopologyThe of the Igh locus in pro-B cells.18 Grey objects indicate CH regions and the 30 flanking region of the Igh locus. Blue objects indicate proximal VH a b regions. Green objects indicate distal VH regions. Red line indicates the connecting-globin the proximallocus. V(Haand) 3D JH structureregions. Linkers of the are indicatedGM12878 only to models show represented 12 http://nar.oxfordjournals.org

All rights reserved. connectivity. C 10Kb localization probabilities are calculated following 13 by the centroid of cluster 1. The 3D model is A10acolored as in its linear . A1 A2 A3 A4 A5 A6 A7 A9 A11 A13 28 9 3' representation 5'(Fig. 1a). Regulatory elements are represented as spheres a previously described procedure from. the spatial distance measurements4 directly to the cumula- conclude that it is the Igh topology3' that mechanistically permits 5' colored red (HS40), orange (other HSs) and greenA10b (CTCF). (b) 3D tive frequency distributions as predicted by4 a13 3D random18 7 walk10 long-rangeX 2 22 genomic21 20 interactions1 15 to occur in pro-B cells with Inc (see Experimental Procedures for details). Interestingly, the the- relativelystructure high of frequency. the K562CTCF models binding site represented1 23 by 4 the 5 centroid 6 7 of cluster 2. 9 12 8 5 3 6 17 19 16 14 11 1 µm oretical distance distribution for a 3D random walk approached Data are represented as in panel a. (c) Distances between the Nuclear-globin envelope HeLa 0Max. 100 nm 100 nm the distance distribution observed for the D cluster (Figure 7; DISCUSSIONgenes (restriction fragments 31 and 32) and other restriction fragments by on August 30, 2010 TCC frequency (Supplementary Methods). H HoxA CTCF IMR90 h4-h5). These data indicate that the probabilities for DH elements in ENm008. The plot showsJurkat the distribution and s.d. of the mean of c undifferentiated CTCF1differentiated CTCF2 CTCF3 CTCF4 Baù (2011)1,000 Nature Structural & Molecular Biology! If a contact is not enforced, noto be assumptions in close proximity to the JH elements approach those ob- Immunoglobulindistances for GM12878 Heavy-Chain500 nmCD4+ models T cell Locus in Topologycluster 1 (blue) and K562 500models nm in 500 nm 500 nm Downloaded from Fraser (2009) GenomeCTCF5 CTCF6 Biology CTCF7 ! served for a random walk. In contrast, for larger genomic sepa- Howcluster chromosomes 2 (red). (d are) Average structured distances in 3D space (and istheir largely s.e.m.) un- between a pair Umbarger (2011) Molecular Cell

America, 800 are made about the relative positions of the correspondingA1-5 A6 spheres.A7E A9 A10DespiteA11 thisA13 Asburylarge (2010)heterogeneity, BMC Bioinformatics the structure population reveals rations, the theoreticalFigure distanceFerraiuolo distributions 1. Population-based (2010) did not compareNucleic knownAcids analysis and Research only of recently the S. dataGM12878 cerevisiae have emergedgenome that have provided organization. To analyze structural features of the genome, we definedK562 an optimization 234567of loci located on either end of the ENm008 domain, as determined Junier (2012) Nucleic Acids Research well with the observed12,30 spatial distance distribution, consistent insight into the organizationC of the chromatin fiber in eukaryotic 600 Therefore, in contrast to other approaches problem, our withmethod three does main1 not components. by aFISH distinct with two (Top fosmidandpanels) nonrandomprobes A(see structural Online Methods)chromosome representation and from a organization.2D of chromosomes Specifically, as flexible chromatin fibers (center), a structural rep- HUVEC with the presence of chromatin territories and spatial confine- nuclei. Such studies have described the yeast chromatin fiber, http://nar.oxfordjournals.org/ 2 representation of the IMP-generatedCluster models 1 in both cell lines. Cluster 2 Cluster 3 Cluster 4 ment (Figure 7; h4-h7,resentation h4-h10 and h4-h11). of the Consequently, nuclear we architecturein large part, as (left a worm-like), and chain the (Bystricky scoring et functional., 2004). The quantifying the genome400 structure’s accordance with nuclear landmarkGM12878 constraints correlate contact frequencies with average distances; it relies purely3 theA9b population> 0.2 clearly identifies the preferred radialDistance (nm) positions parSof (c) (e) Example images obtainedK562 with FISH of GM12878 and K562 cell lines. (right). (Middle panels) An4 optimization

b at Universitat De Barcelona on September 17, 2012 In a diploid cell, most loci are present in two copies. Becausegroup. (B )the Three-dimensional The local base densityPearson’s scan of the transcriptionally correlation silent HoxA cluster. Local between base densities at consecutive the 10 bp wasFISH- and population-basedd 500 GM12878 e GM12878 K562regions around the -globin locus during a contact between LCR (green+star) and G (green). Blue sites: CTCF sites that form a connected network estimated in 100 possible 5C3D outputs models with Microcosm 1.0 (y-axis) and represented graphically along the corresponding genomic region b g (ENCODE hg18 Chr7:27079118 to 27236536) (x-axis). The weight of the trace is proportional to the standard deviation with sharper areas indicating K56 2 of interaction (Supplementary Figure S1). Darkest blue sites: CTCF sites that surround the b-globin locus. Red sites: the isolated interaction between © 100 cluster. GM12878 models were locally consistent; only one fragment −3 C-08 and C-10. The conformation can be divided into a loop (stabilized by the red sites) and a compact globule (dashed orange ellipse) encompassing TCC data do not distinguish between these copies, the optimal assignsmaller deviations.- (C) CTCFaverage binds to multiple discrete positionssites conserved in various cell lines was at the 5 0-end 0.71 of the HoxA cluster. (P Conserved< 10 CTCF sites ) for the 22 chromosomes400 (e.g., chromosomeare 4, highlighted whose by yellow(reverse vertical size lines. is (D )21) Conserved 1.5 ofMb), CTCFthese binding models the sites are clustered LPD did three-dimensionally not is highesthave at thea consistent 50-end of in the HoxA thecluster. local The conformainteracting- with) certain others. For instance, chromosomethe region from C-03 to C-10. 1 (B) has Spatial location of the contact: using 1000 equilibrium simulations of the same best-fit polymer as in A, we report position CTCF binding sites numbered in (C) are illustrated in the example 5C3D output model presented in (A). CTCF binding sites are represented (i) the radial mass distribution500 of thenm compact globule, i.e. the average probability density for the location of the C-03 to C-10 region with respect to 80 4 G by colored spheres as indicated in the legend below. (E) CTCF binding sites are significantly close to each other in three-dimensional models. 300 500 nm the globule center of mass; (ii) the radial distribution of g and LCR during contacts and (iii) the radial distribution of the LCR (no matter the ment of each sphere to a specific contact is determinedcentral region as a ofpart theDistances of nucleus our between pairs oftion CTCFwhose again binding(that sites is, were alongradial measurednot500 superimposable with thenm Microcosm positions central 2.0. and expressed aswithinwere axis.P-values summarized 150 previously innm a heatmap. for Numbers at500 determined than nma significantly 75% . Instead, higher radial chance of interacting withposition chromosomes of G ). One can see that the G 3/LCR contacts tend to occur away from the globule center. (C) Spatial location of the globin genes in K562 the top and onDiversity the left of heatmap identify CTCF binding sites. Intersecting column and row number identifies the CTCFof pair. P-values are representationsg g ! (obtained from 100 simulations of the best-fit polymer). Genes tend to be located away from the center regardless of LCR contact. Large distances 31 60 color-coded28,29 based on theof scale the presented models). on the right. P-values In K562 were calculated cells, as described as inmany ‘Materials andas Methods’82% section. of the fragments were 200 optimization process using the integrative modelingWe then platform ask what. factorspositions are responsible in a control for population the chromo- generatedand without 6 than withTCC anydata other did chromosome. Its interactionsare particularly enhanced with in the case the of the g genes. (D) Same as in C but for 293T cells. No particular location can be observed for any of the genes. consistent across the models. This analysis shows that even in the Distance (nm Local density (kb) 40 100 Finally, starting from random positions,somes’ we simultaneously preferred locations. opti- Formorenot each variable agree chromosome,Figure K562 with models 2. the Modeling most weFISH calculate of Revealsthedata region (Pearson’s a newthecontains 3D Architecture conserved larger = –0.2, chromosomes ofSupplementary the Swarmer 4, 7, Genome and 12 are substantially depleted (Fig. 3A). G 20 differentiated 0 blue; note that the blue line is smooth since LCR- g A polymer model where these interactions alone are mized the positions of all the spheres in a populationstructure of population 10,000undifferentiated genome for a nucleuslocalFig. features, 9a containing),(A) andindicating Outline that the of diversity our only that modeling a isTCC single the methodology.result data chro- of variable are sufficient Restriction positionStrikingly,- fragments for almostgenerating wereFISH identical modeled Modelsthe (2D) as chromosome points connected interaction by springs.interactions The preferences only distance occur during derived a subset of from all the steps contactpresent frequency leads to a reduction of the tendency for globin represented by the red and black curves). Extending this genes to be spatially close to the LCR when the chromatin ing of only a betweensmall minority pairs of of fragments fragments (18%). was used (i) to define the equilibrium length4 of the spring (see Supplemental ExperimentalA Procedures) that connected these structures to a score of zero, indicating thatmosome no restraint0 but otherwise violationsNO constrained correct radial inLINK a manner distributions identical seen toto thein the imagingare1D observed experiments inand an independent. In genome-wide2D chromosomedataanalysis to g and b indicates con- that all globin genes, but fiber is stiff (Supplementary Figure S5). To investigate the 1 2 3 4 5 6 7 9 10fragments 11 13 (ii). The 3D coordinates of all points were randomlyFormation initialized of chromatin (iii), and globules optimization was performedparticularly to deriveg-globin genes,a structure tend to be located that minimally more per- influence violates of these these interactions, in particular whether the full simulation (i.e., theHoxA gene single index chromosome population) (Fig. 2C). formation capture experiment (Fig. 3A; Supplementalipherally Fig. to the 2A; globule Duan regardless of LCR contact strongest interactions found in 293T cells are sufficient to remained (Supplementary Methods). Modelsgeneral, reproduceequilibrium the radial known lengths long-rangechromosome (iv, a). interactions This initialization positions and tend optimization to increaseA noteworthy procedure with feature wastheir repeatedobserved in thousands both cell lines of times was tothe(Figure generate formation 4C). In an contrast, ensemble in 293T cells, of where structures. the globule Thesedecrease structures LCR–gene interactions compared to K562 cells, is less compact, no preferential location is observed for we used two additional models: one where only the two To how consistent this structureThree-dimensionalFigure population 8 Comparing models of the human is HoxAwith cluster the duringthe two cellular experi differentiation structure- Wesize, populationsdetermined withwere somewhether superimposed reveals noticeable the 3D great models and differencesexceptions grouped reflected based the (knownFig. uponet 6b long- their al.). One coordinates,2010).of compact of these Pearson’s yieldingchromatin cases clusters correlation clusters,is of which models between we termed in which chromatin the the 3D chromosome-pairany coordinates ofglo thebules. locus sites of of restriction interest (Figure fragments 4D). These arestrongly structurally interacting sites are present (ignoring all other Three-dimensional models of the human HoxA cluster during cellular differentiation. 5C arrayrange datasets interactions from (a) undifferentiated involving and (b) differentiated the -globin genes (Fig. 4). We used the In GM12878 cells, the ENm008 region forms a singlefindings chromatin suggest that, in addition to favoring contacts interactions measured by 3C in 293T cells) and another samples were usedfor to predict each models of chromosomethe HoxA cluster with the 5C3D program. location Green lines repres (Fig.ent genomic 2D).DNA and vertices For define example, boundaries in the full contact frequencies in our structure populationwith the LCR, and the those CTCF-driven globule in K562 using chromatin with no interacting sites. Since the inter- ment, we calculated the block contactbetween frequency consecutive restriction map fragments. fromColored spheres the represent popu transcription- start sitesthe of HoxA radial genes as verydescribed positions similarin the legend. (c) (iv, Increased b).of chromosomes 18 and 19 which, despite their selected cluster of models to calculate the average distance between globule, whereas in K562 cells, the locus forms two chromatincells tends globules to displace the genes to be activated, i.e. the action events we defined earlier (40 nm between chromatin local genomic density surrounding 5' HoxA transcription start sites accompanies cellular differentiation. The y-axis indicates local genomic density and HoxA 15 -globin genes here, away from the surrounding paralogue groupssimulation, are identified on the x-axis. A linear large schematic chromosomesrepresentation of the HoxA cluster is shown reside at the(B) top, 3D an substantiallyd green density shading highlights map the representations farther from of the fourdetected clusters5 fromin the a wild- experiment swarmer is 0.94 modeling (P < 10 run.À Each). In queried theg random fragment control, is represented by a 3D Gaussianfiber centers) that do not has always occur in 293T cells as they do lation of structures and compared it regionwith of greatest the density original change. Error bars represent data. standard The deviations. two thesimilar restriction size, fragment we observedcontaining the at different-globin genes positions and other .( Fig.Chromosome 4a,b and Supplementary 19 Videos 1 and 2). Thischromatin. large-scale in K562 cells, we used the minimal distance obtained in a correlation coefficient >0.8 with the space this fragment occupies across all models within the cluster. The positioning of the maximally polar fragment100 simulations (located as an alternative metric to represent the SPB region toward the nucleolusrestriction fragments than would in ENm008 be expected in both GM12878 based and K562the cells. contact difference frequencies in conformation do not between display the two any cell significant lines is alsoDominant evidenced chromosome- CTCF interactions and stiff chromatin prevent were strongly correlated with an average Pearson’s correlation of 0.94, LCR–target proximities. is located7 closer kb from to the theparS center) elements of isthe indicated nucleus, in orange. whereas chromosome contacts between the LCR and globin genes in 293T cells on chromosome tethering alone.Restriction The fragments differences containing are caused the enhancer by a (HS40) vol- and pair-globin contact by the preferencescontact-map differences (Pearson’s between correlation GM12878 and between K562 models experimen- The model with no interacting sites serves as a baseline confirming the excellent agreement between contact frequenciesGenome Biology 2009,in 10:R37 The interaction potentials observed in 293T cells can be (red lines, Figure 5). One might hypothesize that genes18 wereis preferentially closely(C) The juxtaposed centroid located modelin K562 of closercells swarmer (159.1 to clusters the 13.3 nuclear nm). 1–4. ForIn envelope more(Fig. information5a). The (Fig. heat regarding 6amap). shows these that clusters,most distances see Figure in GM12878divided S2 and into are twoTable categories S2. based on strength (Sup- introducing any interacting sites in this locus would ume exclusion effect: Because of tethering, the chromosomes must tal data and the random control is 0.57) (Supplementalplementary Table Fig. S1). 2B). The strongest potentials are bring the LCR closer to targets on average. However, the structure population and experiment (Supplementary Fig. 7b–d). contrast,Furthermore, HS40 was the the only homologous fragment that was copies located fartherof chromosome from smaller 18than are in K562 often cells, consistentÀ with the formationbetween of a C-08single and C-20 and between C-20 and C-21. interestingly, the model with just two pairs of strongly compete for the limited space around the SPB. Smaller chromosomes Next, we compare contact frequencies for all possible pairings Furthermore, three independently calculated populations showed that thedistant -globin fromgenes in each the inactive other GM12878 whereas cells those (228.2 of 17.3 chromosome nm) compact 19 chromatin are often globule. However, also consistent with the 5C data, are naturally more restricted tothan regions in K562 closer cells;A prominent to all the other SPB, fragments feature which emerges in in GM12878 turn from cellsof all were the four 32the clusters: chromosome-globin genes thearms and arms the distantThe (Fig. regulatoryparS 3B,C).Region elements It is evident Dictates are closer that in the some Orientation of the Entire our structure population was highly reproducible (Pearson’s r > 0.999), closely associated (Fig. 6a and Supplementary Fig. 9b), in agreement tends to exclude parts of largerlocated chromosomes closerare to the wound from-globin sinusoidally these genes ( regions.Fig. 4 throughc). These For observations spacepairs withof space chromosomeroughly in K562 1.5 cells period than arms in GM12878 haveCaulobacter a cells greater (red areasChromosome propensity in Fig. 5a). to interact which also indicates that, at this resolution, the size of the model© All rights reserved. 2010 Inc. Nature America, arewith consistent independent with previous 3Cexperimental experiments showing evidence that strong5. inter- To explore whether these globules have some degree of internal smaller chromosomes, the oppositerepeats effect is per observed; arm. The in partial the full mirroringthan between others. clusters In particular, 1 and 2 chromosomeOur models arms suggest with that<500 the kbparS (chro-sites play a direct role in orga- population was sufficiently large (Supplementary Methods). action between HS40 and the -globin genes is evident only when organization, we determined the locations of genes and putative regu- simulation, they exhibit anthe increased genes areand probabilityexpressed. clusters density 3 and 4 aroundhas the effect the of causingmosomes thelatory 1, arms 3,elements 5, to 6, be within 8, either and the chromatin 9) arenizing more globules. the likely swarmer We measured to interact cell the radial chromosome. with each Such a finding is con- Structure-based analysis of territory colocalizations SPB (Supplemental Fig. 1). Importantly,intertwined due to the (clusters volume 3 exclusion and 4) or separatedother (clusters thanpositions longer 1of and active arms. 2). genes, We For gene instance,sistent promoters, with HSs, the recentsites short bound analysesarm by CTCF of chromo- that have suggested that these Validation by fluorescence in situ hybridization Figure 3. Spatial organization of genomic and epigenetic features. We used the 3D chromosomal structure BACH predicted for chromosome Structural features of the genome population When chromosomefavor the intertwined territories conformation, are clustered as the based correspondingand on sites their marked average modelwith trimethylated sequence histone elements H3 Lys4 (H3K4me3) are specifically by anchored to the Caulobacter effect, the preferred locationWe ofused a an chromosome independent method, is not fluorescence defined in bysitu hybridizationsome 1Rcalculating is almost the eight average times distance more between likely each tocorresponding interact with restric the- 2 shortin the HindIII sample as an illustrative example. In Figure 3A,Figure 3L, each sphere represent a topological domain. The volume of each sphere is Because chromatin contacts in the TCC datatethering are observed alone but over also dependsa (FISH),distances, on to validate theclusters totaltwo a particular number havemain aspect lower groups and of our variability lengths 3D can models be of ( Figurefor identified the ENm008arm S2C) of and( chromosomeFig.tion lower fragment 6c). IMP The and objec- 3Lthefirst geometrical than old with center cell the of pole long the globules. through armof Notably, 4R. interactions Alsowe proportional these with tothe the ParB genomic and size PopZ of the corresponding topological domain. In Figure 3A, the red, white and blue colors represent topological wide range of frequencies, the resulting populationall other chromosomes shows a fairly in theregion.group nucleus. For small(chromosomestive genomic function domains scores such1, 11, (asTable the 14–17 one S2 studied). However,and here, observations19–22) deter- it isfound tends possible that are to in in thattheoccupy almost IMP both models completeproteins from bothagreement cell (Bowman types, active et with genes al., the and 2008; confor-domains Ebersbach belonging et to al., compartment 2008; Toro A, straddle region and compartment B, respectively. Topological domains with the same compartment label tend to on the same side of the structure. In Figure 3B,Figure 3L, the red, white and blue colors represent topological domains with high large degree of structural variation (Supplementary Fig. 8a,b). miningthe centralthe spatialconformations regionpositions of individual the exist nucleus within restriction as a populationis fragments evident within from of swarmergene their promoters population- cells. are enriched nearet al., the 2008center ).of Thus,the globule, we whereas hypothesized that the orientation of the the domain by FISH is not straightforward given the resolutionmation of inactive capture genes experiments and restriction fragments (Pearson’s that do correlation not contain genes coefficient are value of of features, median value of features and low value of features, respectively. The color scheme is proportional to the magnitude of the Genome-wide chromosome contact patterns 15 For instance, on average only 21% of contacts are shared between lightbased microscopy, joint whichlocalization is limited to probabilities ~200 nm. However, (Fig. the0.93, models6d). TheseP

NATURE STRUCTURAL & MOLECULAR BIOLOGY ADVANCE ONLINE PUBLICATION 5 LinuxLets commands start!!

Command Description Example Action print working directory pwd path & name of dir. I am in now list contents of directory ls list contents of current dir. list contents of the test dir. that hangs from the current ls test/ working dir. ls -lh vertical list of dir. contents change directory cd go to home directory

cd /home/user/Docs go to the Docs directory

cd .. go to parent directory directory mkdir test creates directory test/ remove directory rmdir test remove test/ if empty copy cp fileA fileB copy fileA to fileB move or rename or directory mv a b change name from a to b

mv a .. move a to parent directory more see file contents more a.txt see contents of a.txt gedit simple text editor! gedit a.txt edit a.txt firefox a web and directory browser firefox a.html or firefox a.jpg use web browser to view file or man information on a command info ls manual page for the 'ls' command Python definitions Definitions

• variables a = 1 b = 3.14 c = ‘charles’ • loops i = 0 for i in range(0, 10, 1): while i < 10: print i # print i i = i + 1

• conditionals for i in range(0, 10, 1): if i == 3: print ‘we have 3’ elif i > 3: print ‘we have many’ else: print ‘we have few’ • lists, tuples, dictionaries a = [0, 1, 2, 3, 4] b = (0, 1, 2, 3, 4) c = {‘one’: 11, ‘two’: 22 ‘three’: 33, ‘four’: 79} Quality plots of the reads HiC mapping

not sequenced

read1 read2

fragment Iterative based mapping mapping

mapped read1

ligation site mapped read2 ? mapped mapped mapped read1 read2 read2’ + AAGCTT AAGCTT TTGCAA TTGCAA - Dangling-end (15%) AAGCTT TTGCAA < 500 b + AAGCTT AAGCTT - TTGCAA TTGCAA Extra dangling-end (5%) < 5 b AAGCTT AAGCTT TTGCAA TTGCAA TTGCAA Semi dangling-end (?) AAGCTT AAGCTT TTGCAA

+ AAGCTT AAGCTT - TTGCAA TTGCAA Self-circle (10%)

+/- AAGCTT AAGCTT TTGCAA TTGCAA Error (<2%) > 500 b > 500 b

AAGCTT AAGCTT TTGCAA TTGCAA Random-break (<20%) < 100 b

AAGCTT AAGCTT TTGCAA TTGCAA Too short (<1%)

TTGCAA AAGCTT > 10 kb AAGCTT TTGCAA AAGCTT AAGCTT TTGCAA TTGCAA Too large (<1%)

AAGCTT AAGCTT TTGCAA TTGCAA

AAGCTT AAGCTT TTGCAA TTGCAA Over-represented (<1%) AAGCTT AAGCTT TTGCAA TTGCAA Dynabeads with

AAGCTT AAGCTT streptavidin TTGCAA TTGCAA

TTGCAA AAGCTT AAGCTT Restriction enzyme (RE) site TTGCAA TTGCAA

AAGCTT AAGCTT RE ligation site, repaired TTGCAA TTGCAA Duplicated (20%) TTGCATCGAA nucleotides in yellow (new AAGCTT AAGCTT TTGCAA TTGCAA cytosins are byotiniliated)

Genomic single strand regions Valid pair AAGCTT AAGCTT TTGCAA TTGCAA (60% of intersection, read fragments from Hi-C <50% of the total)

How confortable are you with…

• Linux/Python to follow the tutorials? • Reading a quality plot of your reads? • Differences between iterative and fragment-based mapping • Stats for quality measure of a Hi-C experiment? • Applied filters to reads? • Reading out a TADbit Hi-C map?