Spatially confined folding of in the interphase nucleus

Julio Mateos-Langeraka,1,2, Manfred Bohnb,1, Wim de Leeuwc, Osdilly Giromusa, Erik M. M. Mandersa, Pernette J. Verschurea, Mireille H. G. Indemansd, Hinco J. Giermand, Dieter W. Heermannb, Roel van Driela, and Sandra Goetzea,3,4

aSwammerdam Institute for Life Sciences, University of Amsterdam, Kruislaan 318, 1098 SM Amsterdam, The Netherlands; bInstitute of Theoretical Physics, University of Heidelberg, Philosophenweg 19, 69120 Heidelberg, Germany; cNational Research Institute for Mathematics and Computer Science, Kruislaan 413, 1098 SJ Amsterdam, The Netherlands; and dDepartment of Human Genetics, Academic Medical Center, University of Amsterdam, P.O. Box 22700, 1100 DE Amsterdam, The Netherlands

Edited by Jasper Rine, University of California, Berkeley, CA, and approved January 9, 2009 (received for review September 23, 2008) function in higher involves major changes in Earlier studies have indicated that the structure of chromatin the spatial organization of the chromatin fiber. Nevertheless, our may be explained by a random walk (RW) model for distances understanding of chromatin folding is remarkably limited. Polymer up to 2 Mb, while on a larger scale there is a completely different models have been used to describe chromatin folding. However, behavior (4, 5). Folding at larger length scales has been explained none of the proposed models gives a satisfactory explanation of using several models. One approach has been to model the fiber experimental data. In particularly, they ignore that each chromo- as a random walk in a confined geometry (6). Two polymer some occupies a confined space, i.e., the chromosome territory. models have been proposed that introduce loops to explain Here, we present a polymer model that is able to describe key chromatin folding. One is the random-walk/giant-loop (RWGL) properties of chromatin over length scales ranging from 0.5 to 75 model, which assumes a RW-backbone to which loops of 3 Mb Mb. This random loop (RL) model assumes a self-avoiding random are attached (7). A second model, the multiloop-subcompart- walk folding of the polymer backbone and defines a probability P ment (MLS) model, proposes rosette-like structures consisting for 2 monomers to interact, creating loops of a broad size range. of multiple 120-kb loops (5, 8). None of these models is able to Model predictions are compared with systematic measurements of describe the folding of chromatin at all relevant length scales. All CELL BIOLOGY chromatin folding of the q-arms of chromosomes 1 and 11. The RL predict that the physical distance between 2 FISH markers model can explain our observed data and suggests that on the monotonously increases with the genomic distance. Clearly, this tens-of-megabases length scale P is small, i.e., 10–30 loops per 100 is incorrect at bigger length scales, since the chromatin fiber is Mb. This is sufficient to enforce folding inside the confined space geometrically confined by the dimensions of the cell nucleus. of a chromosome territory. On the 0.5- to 3-Mb length scale More so, individual chromosomes have been shown to occupy chromatin compaction differs in different subchromosomal do- subnuclear domains that are much smaller than the nucleus mains. This aspect of chromatin structure is incorporated in the RL itself, i.e., the chromosome territories with sizes in the range of model by introducing heterogeneity along the fiber contour length 1 to a few micrometers (9). Evidently, an intrinsic property of the due to different local looping probabilities. The RL model creates chromatin fiber inside the cell nucleus is that it assumes a a quantitative and predictive framework for the identification of compact state that cannot be described by classic polymer nuclear components that are responsible for chromatin–chromatin models. This raises the fundamental question what physical interactions and determine the 3-dimensional organization of the principles make chromatin to fold in a limited volume. chromatin fiber. How can be explained that a polymer folds such that, irre- spective of the length of the polymer, its physical extend does not genome organization ͉ polymer model ͉ chromatin folding increase? We have shown that this can be achieved by bringing parts of the polymer together that are nonadjacent along the he chromatin fiber inside the interphase nucleus of higher contour of the polymer, thus forming loops on all length scales Teukaryotes is folded and compacted on several length scales. (10). There is extensive experimental evidence that chromatin On the smallest scale the basic filament is formed by wrapping loops exist in the interphase nucleus. Various studies have double-stranded DNA around a protein octamer, form- indicated that the chromatin fiber forms loops that at their bases ing a nucleosomal unit every Ϸ200 bp. This beads-on-a-string may be attached to a still poorly defined structure that is called type filament in turn condenses to a fiber of 30-nm diameter, nuclear scaffold/matrix (11). Recent investigations indicate that which detailed organization is still under debate (1–3). At bigger the formation of chromatin loops involves specific proteins, length scales the spatial organization of chromatin in the inter- including SatB1 (12), CTCF and other insulator binding proteins phase nucleus is even more unclear. Imaging techniques do not allow one to directly follow the folding path of the chromatin fiber in the interphase nucleus. Therefore, indirect approaches Author contributions: J.M.-L., E.M.M.M., P.J.V., R.v.D., and S.G. designed research; J.M.-L., O.G., and S.G. performed research; W.d.L., M.H.G.I., and H.J.G. contributed new reagents/ have been used to obtain information about chromatin folding. analytic tools; J.M.-L., M.B., D.W.H., and S.G. analyzed data; and J.M.-L., M.B., D.W.H., One way, pursued in this study, is fluorescence in situ hybrid- R.v.D., and S.G. wrote the paper. ization (FISH) to measure the relationship between the physical The authors declare no conflict of interest. ␮ distance between genomic sequence elements (in m) and their This article is a PNAS Direct Submission. genomic distance (in megabases). There have been several 1J.M.-L. and M.B. contributed equally to this work. attempts to explain the folding of chromatin in the interphase 2Present address: Institute of Human Genetics, Centre National de la Recherche Scienti- nucleus using polymer models. The strength of polymer models fique, Rue de la Cardonille 141, 34396 Montpellier, France. is their ability to make predictions on the structure of chromatin 3Present address: Center for Model Organism Proteomes, Institute of Molecular Biology, by pointing out the driving forces for observed folding motifs. University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland. These predictions can then be tested experimentally. However, 4To whom correspondence should be addressed. E-mail: [email protected]. a polymer model that is able to explain chromatin folding This article contains supporting information online at www.pnas.org/cgi/content/full/ spanning different length scales is still lacking. 0809501106/DCSupplemental.

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0809501106 PNAS Early Edition ͉ 1of6 Downloaded by guest on September 26, 2021 Fig. 1. The random loop polymer model. (A) The diagram schematically shows a small part of the polymer, which is build up of loops with a broad range of sizes. The attachment points are marked by colored circles. (B) Molecular dynamics simulations of a polymer with randomly positioned loops. The relationship between the mean square displacement between 2 monomers and their contour distance is shown for different values of P. P denotes the probability that a pair of monomers interacts. Looping probabilities range from 13 (P ϭ 3 ϫ 10Ϫ4) to 133 (P ϭ 3 ϫ 10Ϫ3) loops per chain. The chain length is N ϭ 300 monomers. The increase in mean square displacement at Nm Ͼ 250 is due an increased freedom of the chain ends. (C) Comparison of simulations of the RL model with experimental data. The polymer chain length is N ϭ 300 monomers and a coarse-grained monomer is equivalent to 500 kb. At this scaling the RL model correctly predicts the leveling off at genomic distances above Ϸ10 Mb. Simulations are shown for 4 P values (range 5 ϫ 10Ϫ4 to 3 ϫ 10Ϫ3), corresponding to 1–9 loops per 10 Mb. The experimental data from Fig. 2 are shown.

(13). Other studies show long-range chromatin-chromatin inter- are not adjacent along the backbone interact with a probability actions due to transcription factories in which transcriptionally P. As a consequence loops on all length scales are generated active genes at different positions on a chromosome and from randomly as illustrated in Fig. 1A. Obviously, assuming random different chromosomes, come together (14). loop formation as we do in the RL model is an approximation, The random loop (RL) polymer model offers a unified since in the living cell chromatin-chromatin fiber interactions description of chromatin folding at different length scales (10). will most likely depend on physical interactions between specific We show that the RL model adequately describes a large set of regulatory elements. experimental data that systematically measure the in situ 3D The RL model introduces 2 important features that have not distances between pairs of FISH probes that mark specific points been addressed by polymer models for chromatin up to now. on the chromatin fiber of the q arms of chromosomes 1 and 11 First, it takes into account that intrapolymer interactions, i.e., in human primary fibroblast. We show that the RL model loop-attachment points, vary from cell to cell and therefore presents a simple explanation of the spatial confinement of the measurements are an average over the ensemble that is repre- chromatin fiber in chromosome territories. A heterogeneous sented in the model by assigning a probability for looping extension of the model with respect to local transcriptional (disorder average). Second, it does not assume a fixed loop size, activity is presented showing a good correlation with short in contrast to the RWGL and MLS models. In the RWGL model, distance measurements in different regions. Our results suggest for example, the assumption of loops of a fixed size leads to a that the formation of loops of a broad size range is a key random walk behavior on a scale larger than the loop size, with determinant of chromatin folding at genomic length scales the loops playing the role of ‘‘effective monomers.’’ between 0.5 Mb and 75 Mb. In a first approach the RL model assumed that the probability P for 2 monomers to interact is the same for any pair of Results monomers (10). Such model allows a semianalytical calculation Random Loop Polymer Model. Chromatin polymer models predict of the mean square displacement, which rapidly becomes inde- the relationship between mean square physical distance and pendent of polymer length. The RL model ignored excluded genomic distance between 2 FISH markers on the same chro- volume interactions for reasons of mathematical tractability. mosome. Although parameters such as gene activity and epige- Because this may have a major impact on the behavior of the netic state probably influence local chromatin properties, we model, we have analyzed how the predictions of the model assume here as a first approximation that chromatin can be change if we lift this limitation. We have used molecular modeled as a homogeneous polymer. This simplifying assump- dynamics (MD) simulations to obtain chain conformations and tion has been made for all polymer models on chromatin so far to introduce excluded volume interactions in the model. Because (5, 7, 15). However, below we extend the model to incorporate 2 averaging processes have to be performed, i.e., over the heterogeneity of the polymer. For a polymer with N monomers, thermal disorder and over the ensemble of loop configurations, classical models predict that the mean square displacement simulations are very time-consuming. Since here we are only between the end points of the polymer scales like interested in large-scale behavior, a coarse-graining approach ͗R2͘ ϳ N2␯, [1] can be used. In our simulations we equilibrate polymers of length N ϭ 300 (for details on the MD simulations see SI Appendix). Fig. in which ␯ depends on the type of polymer model (see below). 1B shows the results of simulations for different looping prob- Unavoidably, Eq. 1 is in conflict with the confined geometry of abilities P. In contrast to classical polymer models, the mean chromosomes inside the interphase nucleus. The recently devel- square displacement becomes independent of the contour length oped random loop (RL) polymer model overcomes this problem, at intermediate length scales, resulting in a spatially confined as the mean square displacement becomes independent of the polymer structure. Interestingly, already a small number of loops chain length at bigger length scales (10). results in an almost complete independency of the mean square The RL model assumes that the polymer consists of a Gauss- displacement of the genomic distance, without any additional ian chain backbone with N monomers (numbered by indices 1 to assumptions. It is stressed that loops on all length scales are N), the spatial positions denoted by r1… rN. Loops are intro- necessary to make the mean the square displacement indepen- duced by assigning each pair of monomers {i, j},͉i Ϫ j͉ Ͼ 1a dent of contour length (see also ref. 10). Looping probabilities probability P to interact and form a loop, i.e., 2 monomers that P in Fig. 1B range from 3 ϫ 10Ϫ4 to 3 ϫ 10Ϫ3, corresponding to

2of6 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0809501106 Mateos-Langerak et al. Downloaded by guest on September 26, 2021 sequences, together spanning a large part of the q-arm of chromosome 1 and essentially the complete q-arm of chromo- some 11 (see Table S1). For most 3D distance measurements 30–50 nuclei were imaged and quantitatively evaluated, resulting in practice in 45–75 measurements for each pair of BAC probes, i.e., each genomic distance, allowing statistical analysis of the datasets. The distribution of measured distances represents cell-to-cell variation, which relates to the conformational en- semble that the polymer model averages over. We have analyzed exclusively cells in G1 to reduce cell cycle effects on chromatin folding. Fig. 2A shows the transcriptome map of the 1q and 11q areas. The starting points of the arrows above the maps indicate the positions of the reference FISH probes. The arrowheads marks the locations of the FISH probe that has the largest genomic distance to the reference probe. All physical distances have been determined with respect to the reference probe. Green arrows and green data points refer to ridges, red ones to anti-ridges. Black arrows in Fig. 2A indicate long distance measurements beyond ridge and anti-ridge domains. Physical distances were measured in 3D space between the centers of gravity of the 3D FISH signals of the individual BAC probes. Fig. 2. Experimental data. (A) Domains of different transcriptional activity Plots of the mean square distance as a function of the genomic and gene density (ridges and anti-ridges) are shown on the human transcrip- distance, covering a large part of the q-arm of chromosome 1 (27 tome map of the q-arms of chromosomes 1 and 11. Each vertical line in the map Mb) and essentially the complete q-arm of chromosome 11 (75 represents a specific gene. The length of the line depicts its median transcrip- Mb), are shown in Fig. 2C. Results show that the average physical tion over a moving window of 49 genes. Ridges are indicated by green boxes, distance to the reference probe does not increase at genomic anti-ridges by red ones. The colored arrows above the map designate the ridge and anti-ridge regions where spatial distances between pairs of BAC probes distances beyond 3–10 Mb. The maximal distances are in the 1.5- ␮ CELL BIOLOGY were measured, using FISH. The tail of each arrow indicates the position of the to 2.5- m range, similar to the size-range of chromosome reference BAC for that set of measurements. Physical distances of the refer- territories and well below the diameter of the cell nucleus. The ence BAC to loci at increasing genomic distances in the direction of the observed leveling off is most probably related to the limited arrowhead were measured in 3D using confocal microscopy. (B) Plots show the space that chromosomes occupy in interphase, i.e., the chromo- mean square physical distances ͗R2͘ as a function of the genomic distance for some territories (9). Fig. 2B shows how the mean square physical ridges (green) and anti-ridges (red) on chromosome 1 and 11 in the 0.5- to distance to the reference FISH probe depends on the genomic 10-Mb range. Data points in green and red correspond to the ridges and distance for the ridge and anti-ridge domains on chromosome 1q anti-ridges, respectively. Error bars represent standard errors. Measurements and the ridge and on chromosome 11q. Above Ϸ3 Mb genomic were made corresponding to the colored arrows shown in A.(C) The mean square displacement ͗R2͘ is shown as a function of genomic distance in the 25- distance the measured physical distances level off, similar as seen to 75-Mb range. Measurements were made corresponding to the black arrows for long genomic distances (Fig. 2C). Average physical distances shown in A. Error bars represent standard error. for anti-ridges are smaller than observed for ridges, reflecting their different degrees of compaction, agreeing with earlier measurements (16, 18). 13 up to 133 loops per N ϭ 300 polymer. As expected, the plateau All measurements show considerable cell-to-cell variation for value of ͗R2͘ rapidly decreases, because the number of loops the physical distances. This is not due to errors in 3D measure- increases and therefore the polymer becomes more compact. For ments, since their precision is better than 100 nm (see Materials P smaller than 10Ϫ4 leveling-off becomes less pronounced, and Methods). Also, differences between cells due to different becoming a normal SAW model as P approaches zero. Notably, cell cycle stages are unlikely, because all analyzed nuclei were in qualitatively the same behavior is observed for the RL model G1. Apparently, cell-to-cell variation is an intrinsic property of ignoring excluded volume interactions (10). We therefore con- chromatin folding, reflecting the thermal and conformational clude that at bigger length scales excluded volume interactions ensemble. These experimental results show that there are at least contribute only to a limited extend to the behavior of the RL 2 regimes for chromatin folding: one at short genomic distances model. up to Ϸ2 Mb, at which the mean square distance increases with the genomic distance, and another at large genomic distances, Experimental Data to Test the Model. To explore whether the RL beyond 10 Mb, where the mean square distance is independent model is able to explain experimental data on chromatin folding of genomic length. Below we integrate these experimental results in the interphase nucleus we have performed systematic mea- in the RL polymer model introduced above. surements that relate the physical distance between 2 pairs of genomic sequence elements and their genomic distance. To do Integration of Short- and Long-Length Scale Experimental Data by the so, we applied the FISH technique on primary human fibroblasts RL Model. The RL model proposes that large-scale chromatin under conditions that preserve 3-dimensional (3D) nuclear folding is driven by chromatin looping. The prediction of a structure, in combination with semiautomated 3D confocal leveling-off in the mean square displacement is in agreement microscopy and 3D image processing and analysis (16). We have with the experimental data. How can we bring theory and concentrated on the q-arms of chromosomes 1 and 11, because experiment together? The simulations use a polymer with a the human transcriptome map shows that these chromosome length N ϭ 300. By mapping a coarse-grained monomer to arms contain pronounced gene dense and transcriptionally 500-kb chromatin we obtain a chain of an effective length of 150 highly active regions, and gene-poor low activity areas [Fig. 2A Mb, i.e., the size range of a human chromosome. In the model and (17)]. Such domains have been named ridges (regions of the mean square displacement is a complex function of the chain increased ) and anti-ridges, respectively. Approx- length N, separation between monomers Nm and looping prob- 2 imately 60 bacterial artificial chromosomes (BACs) were se- ability P: ͗R ͘ϭfN(Nm,P). In this context the single variable lected that recognize approximately evenly spaced genomic parameter is P, because N is fixed to 300. To compare our

Mateos-Langerak et al. PNAS Early Edition ͉ 3of6 Downloaded by guest on September 26, 2021 data up to genomic distances of 2 Mb to keep away from distances at which leveling off begins. Figs. 3 A and B show that neither the RW, nor the SAW model fulfills this criterion. Fig. 3C indicates that a scaling with ␯ ϭ 1/3, as defined for the globular state (GS) model, is more consistent with the experi- mental data, indicating a considerably more compact state than predicted by the RW and SAW models. We have incorporated data of Yokota et al. (20) in Fig. 3 (blue data points) in support of this conclusion. ␯ ϭ Fig. 3. Comparison of the short distance (0.5–2 Mb) experimental data with Although the exponent 1/3 (Eq. 1) is true for the the random walk, self-avoiding walk and globular state polymer models. The globular state polymer model, one should be aware of the fact panels show the experimental short distance data for the ridge (green) and that the model is only valid for end-to-end distances of a anti-ridge (red) on the q-arm of chromosome 1 and a dataset obtained for a polymer, whereas we here deal with intrachain distances. ridge on human chromosome 4 published by Yokota et al. (20). The linear Fitting the RL model to our experimental data shows that such regression lines in the panels show the trend of the datasets, interpreted in value of ␯ is only valid in a narrow range of genomic distances terms of the different polymer models [random walk (RW) (A); self-avoiding before a plateau level is reached. Finally, for other loci even walk (SAW) (B), globular state (GS) (C)], i.e., different values of ␯ that belong ␯ Ϸ to the different polymer models. Each of the models predict that the value higher levels of compaction with scaling exponent 0.1–0.2 ͗ 2͘ 2␯ have been observed (15). Thus, the interpretation of the data ratio R /Nm is independent of the genomic distance. The analysis shows that the GS model fulfills this prediction best. in terms of one of the classical polymer model would be an extreme oversimplification. In contrast to the RW and SAW models the RL model is based on intrachain attractive forces, simulation results to the experimental data we have to introduce i.e., chromatin loops. On short length scales the RL model also a scaling factor for the ͗R2͘ axis. This factor is somewhat arbitrary shows a power-law dependence of the mean-square displace- and on this level of coarsening strongly depends on monomer ment in relation to genomic distance. Fig. 4A shows that geometry and does not reflect biological parameters in a simple practically any value for the exponent ␯ (Eq. 1) Ͻ0.5 (RW manner (10). In Fig. 1C we have scaled the results of the model) can be obtained by choosing different looping proba- simulations in Fig. 1B to the experimental data, using 320 nm per bilities P. coarse-grained monomer. This number has been determined Therefore, we extended the original RL model, which assumes such that the model fits to the plateau level of the experimental the same looping probability P for all pairs of monomers, to data. Fig. 1C shows that the RL model is able to qualitatively incorporate local differences in P values (thus making the describe the large-scale genomic distance data quite well. This is polymer heterogeneous). We assign different looping probabil- remarkable because we do not include information about at what ities for different regions based on the distribution of ridges and positions along the chromatin fiber loops are formed. anti-ridges in the human transcriptome map as shown in Fig. 2A At shorter genomic distances, i.e., on the length scale of ridges (17). As a first approximation we divide the polymer in ridge and and anti-ridges (0.5–2 Mb), another folding regime dominates, anti-ridge regions and define 3 different looping probabilities, because the measured mean square distances at this scale i.e., PR, defining loop formation in ridge regions, PAR for increase with genomic distances and as a first approximation Eq. anti-ridges and Pinter for the interaction between such regions. Ϫ5 1 applies. To see whether one of the basic polymer models for Fig. 4B shows the result of a simulation for PR ϭ 3 ϫ 10 , PAR ϭ Ϫ5 Ϫ5 which Eq. 1 holds true (the random walk, self-avoiding walk or 7 ϫ 10 and Pinter ϭ 1 ϫ 10 . The RL model with these values globular state (19)), applies to our data, we conducted a sensitive describes the folding of the ridge and anti-ridge of chromosome comparison between these polymer models and the experimen- 11 remarkably well. Details on the implementation of the RL 2␯ tal dataset by dividing out the leading order term Nm of Eq. 1 model with different P values can be found in SI Appendix. A fit ͗ 2͘ 2␯ and analyzed the ratio R /Nm as a function of the contour of the RL model for the same set of P values to ridge and length for the measurements shown in Fig. 2B. The value of anti-ridge data of chromosome 1 is shown in Fig. S1. ͗ 2͘ 2␯ R /Nm should be independent of the contour length. We use An alternative way to introduce heterogeneity in looping

Fig. 4. Incorporation of chromatin fiber heterogeneity into the RL model by assuming different looping probabilities. (A) Qualitative behavior of the random loop model. The relationship between the mean square displacement between 2 monomers and their contour distance is shown for different values of the looping probability P. In the short-length scale regime the mean square displacement follows a scaling law where ͗R2͘ϷN2␯. The scaling exponent ␯ varies over a broad range of values, depending on the looping probability P. The figure shows data from the model without excluded volume and for a chain length of N ϭ 600. (B) Simulations of the RL model using different P values for ridges, anti-ridges and the interactions between these regions on the q-arm of chromosome Ϫ5 Ϫ5 Ϫ5 11, as shown in Fig. 2. The assigned P values are pR ϭ 3 ϫ 10 , pAR ϭ 7 ϫ 10 and pinter ϭ 1 ϫ 10 , respectively. Calculations are without excluded volume; the coarse-grained monomer is set at 75 kb. (C) Simulation of the RL model with looping probability that depends on genomic distance according to the power law function p(l) ϭ alϪb ϩ c, resulting in an increased number of small loops at short distances compared with large loops at long distance. Comparison to Fig. 1B shows that the qualitative behavior of the RL model is not affected by such change in looping probability distribution.

4of6 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0809501106 Mateos-Langerak et al. Downloaded by guest on September 26, 2021 probability into the RL model is to assume that loops on short The leveling-off of physical distances at large genomic dis- length scales are more abundant than loops on large scales. For tances that we observe (Fig. 2) is in good agreement with the RL the original RL model the loop-size distribution is s(l) ϳ model (Figs. 1C and 4B). This leveling-off is due to the presence 1/(N Ϫ l) (10). Heterogeneity in looping probability can be of loops on all length scales and the averaging procedure over the implemented by assuming that the probability p for a pair of ensemble of loop configurations. Although polymer models monomers {i, j} to interact depends on their genomic distance involving loops have been proposed before to explain chromatin l ϭ ͉i Ϫ j͉. This can be achieved by a power-law distribution folding (7, 8), these models cannot explain experimental results p(l) ϭ alϪb ϩ c. The reason for assuming such kind of that show that the mean square displacement becomes indepen- distribution is that a power-law behavior arises naturally in the dent of genomic distance above a few megabases. The RWGL distribution of random contacts in random or self-avoiding model, which assumes fixed-size loops, results in 2 folding walks. Fig. 4C shows that this does not change the qualitative regimes at different length scales, in both of which the mean behavior of the RL model, i.e., it still shows leveling-off, square displacement increases monotonically with genomic dis- provided that there is a significant probability to form large tance (7). The MLS model assumes rosette-like structures with loops. This indicates that the qualitative behavior of the RL multiple loops of fixed size (120 kb) and results in a power-law model is not very sensitive to the distribution of loop sizes dependence of the mean-square displacement on genomic dis- along the length of the polymer. tance, similar to Eq. 1 (8). We have extended the RL model to take into account local Discussion differences in chromatin compaction, as for instance found in In this study we present a polymer model that qualitatively ridges and anti-ridges along the q-arms of chromosomes 1 and 11 explains the folding of a chromosome in a limited volume, e.g., (Fig. 2), by locally assigning different looping probabilities to the a chromosome territory (9). This random loop (RL) model polymer. Although still highly simplifying, this explains remark- predicts that loop formation is the major driving force for ably well the difference in compaction of ridges and anti-ridges, chromatin compaction (10). The RL model assumes that the assuming a 2.5-fold difference in looping probability for the measured observables, e.g., the mean square displacement, are studied region on human chromosome 11 (Fig. 4B). There is derived from an ensemble of loop configurations formed by abundant experimental evidence for heterogeneous chromatin interactions between different parts of the polymer with a looping along the chromatin fiber. For instance, loops with sizes certain probability P. A major characteristic of the RL model is in the 10-kb range have been observed in the beta-globin locus, CELL BIOLOGY that the mean square displacement becomes independent of the where gene activity is correlated with loop formation that brings contour length at longer distances. together different regulatory elements of the locus (23). Another Here, we extend the original RL model beyond the limitations example are loops between promoter and enhancer sequences, of its original formulation (10). We performed extensive MD which span a broad genomic length scale in the 1- to 1,000-kb simulations to establish the effect of excluded volume on the range (24). Even larger loops are associated with transcription behavior of the RL model. It turns out that the introduction of factories, which bring together transcriptionally active genes excluded volume does not alter the model’s main properties. We from different parts of a chromosome, and from different also explored the effect of heterogeneity along the contour chromosomes (25). length of the fiber, creating polymers with domains of different Thus, the RL model allows a unified description of the folding local looping probability and therefore different compaction. of the chromatin fiber inside the interphase nucleus over dif- This for instance mimics the distribution of ridges and anti-ridges ferent length scales and explains different levels of compaction on chromosomes (Fig. 2A). Introducing such heterogeneity by assuming different looping probabilities, related for instance improves the prediction of the model with respect to the folding to local differences in transcription level and gene density. The of ridges and anti-ridges at short length scales (Ϸ1 Mb) (Fig. 4B), RL model creates a basis for explaining the formation of but does not alter the overall behavior of the model at bigger chromosome territories, not requiring a scaffold or other phys- length scales (see Fig. S2). ical confinement. While there is a lot of evidence that chromatin- We have performed systematic 3D-FISH measurements to chromatin interactions play a crucial role in genome function validate the model. At genomic length scales Ͼ10 Mb, distances (e.g., see refs. 23 and 25), our study proposes that it also plays between pairs of FISH probes are shown to be independent of an important role in chromatin organization inside the inter- genomic distance for the q-arms of chromosomes 1 and 11 in G1 phase nucleus on the scale of the whole chromosome (tens of nuclei of human primary fibroblasts. This property is most likely megabases) and on that of subchromosomal domains in the size due to the confinement of interphase chromosomes in chromo- range of a few megabases. Importantly, various aspects of the RL some territories. Measurements of Trask and coworkers (6, 7, 20, model can be experimentally verified, e.g., by perturbing chro- 21) did not show such leveling off of physical distances at large matin-chromatin interactions and analyzing its effect on chro- genomic distances. Rather, they reported a monotonous increase matin folding. Although experimental data on loop distributions with increasing genomic distance up to 180 Mb and interpret this are not yet available, experimental techniques such as the 4C as evidence that chromatin folding reflects a RW polymer technology (26, 27) will allow the measuring of looping proba- model. At least in part, this discrepancy can be explained by the bilities and loop size distribution along the length of complete fact that these authors used different cell fixation and FISH chromosomes. These and other experimental parameters can labeling methods, which preserve the structure of the nucleus be incorporated into the RL model, moving toward a stepwise less well than those used here. Also, most measurements have more realistic polymer model for chromatin folding in higher been carried out 2-dimensionally. Together, this is likely to result eukaryotes. in systematic distortions of their datasets. At short distances (Ͻ2 Materials and Methods Mb) our experimental results are similar to those obtained by others (4); however, their interpretation in terms of a RW differs and Fluorescence in Situ Hybridization. Human primary female fibroblasts (04–147) were cultured in DMEM containing 10% FCS, 20 mM from ours. Shopland and coworker also determined distances on glutamine, 60 ␮g/mL penicillin and 100 ␮g/mL streptomycin. Cells were used the short length scale (22), suggesting probabilistic 3D folding up to passage 25 to avoid effects related to senescence. BACs were selected states of chromatin. These probabilistic folding states can be from the BAC clones available in the RP11-collection at the Sanger Institute explained by the probabilistic chromatin-chromatin interactions (Table S1). Genomic distances were defined as the distance between centers in the RL model. of the BACs. BAC DNA was isolated using the Qiagen REAL prep 96 kit (Qiagen)

Mateos-Langerak et al. PNAS Early Edition ͉ 5of6 Downloaded by guest on September 26, 2021 and DOP-PCR amplified (16). Nick-translation was used to label the probes, Random Loop Model. The chromatin fiber is modeled as a polymer consisting either with digoxigenin or biotin (Roche Molecular Biochemicals). FISH was of N coarse-grained monomers. In a general approach the Hamiltonian can be carried out as described in ref. 16. written as

Confocal Laser-Scanning Microscopy. For each experiment Ͼ45–75 nuclei were 1 H ϭ H ͕͑r ͖͒ ϩ H ͕͑r ͖͒ ϩ ͸ ␬ ʈr Ϫ r ʈ2 imaged. Twelve-bit 3D images were recorded in the multitrack mode to avoid backbone i EV i 2 ij i j cross-talk, using a LSM 510 confocal laser-scanning microscope (Carl Zeiss) ͉iϪj͉Ͼ1 equipped with a 63x/1.4 NA Apochromat objective, using an Ar-ion laser at 364 nm, an Ar laser at 488 nm and a He/Ne laser at 543 nm to excite DAPI, FITC and where the position vectors of the monomers are denoted as r1,…,rN. The first Cy3, respectively. Fluorescence was detected with the following bandpass term assures the connectivity of the chain, the second term accounts for filters: 385–470 nm (DAPI), 505–530 nm (FITC) and 560–615 nm (Cy3). Images excluded volume interactions. The third term accounts for the formation of ϫ ϫ were scanned with a voxel size of 50 50 100 nm. loops and its disorder. The interaction constants ␬ij are random variables with a specific probability distribution. Simulations were carried out using the Image Processing and Data Evaluation. Automated image analysis was carried ESPResSo package within the NVT-Ensemble and Langevin thermostat (28). out on raw datasets with the ARGOS software to identify nuclear sites labeled Simulated chains have a length of N ϭ 300 monomers. Details on the MD by BACs and to compute their 3D position in the nucleus as described in ref. 16. simulations can be found in SI Appendix. Simulations were performed on the In short, chromatic aberration was measured via Tetraspeck Microspheres HELICS2-cluster at the Interdisciplinary Center for Scientific Computing (IWR) (Molecular Probes) and corrected for in the analysis. After background sub- in Heidelberg. traction, images were treated with a bandpass filter to remove noise. Subse- quently, images were segmented and ensembles of interconnected voxels ACKNOWLEDGMENTS. We thank the Sanger Institute and Eric Schoenmakers were regarded as the site labeled by a BAC. The center of mass was calculated (University Nijmegen, Nijmegen, The Netherlands) for providing BACs. We for each labeled site at subvoxel resolution and 3D distances between BACS thank Jens Odenheimer for helpful comments concerning data analysis. This were measured. To estimate the systematic measuring error we hybridized work was supported by European Commission (as part of the 3DGENOME cells with a mixture of the same BAC marked with 2 different fluorophores and program) Contract LSHG-CT-2003-503441. M.B. thanks the Heidelberg Grad- measured the distances between the 2 signals. Measurements resulted in uate School of Mathematical and Computational Methods for the Sciences for accuracy better than 50 nm in all 3 dimensions: x ϭ 7 Ϯ 9 nm; y ϭ 40 Ϯ 11 nm; partial support and the research training group ‘‘Simulational Methods in z ϭ 22 Ϯ 12 nm. Physics’’ for funding.

1. Finch JT, Klug A (1976) Solenoidal model for superstructure in chromatin. Proc Natl 16. Goetze S, et al. (2007). The three-dimensional structure of human interphase chromo- Acad Sci USA 73:1897–1901. somes is related to the transcriptome map Mol Cell Biol 27:4475–4487. 2. Woodcock C, Frado L, Rattner J (1984) The higher-order structure of chromatin: 17. Versteeg R, et al. (2003) The human transcriptome map reveals extremes in gene Evidence for a helical ribbon arrangement. J Cell Biol 99:42–52. density, intron length, GC content, and repeat pattern for domains of highly and 3. Ghirlando R, Felsenfeld G (2008) Hydrodynamic studies on defined heterochromatin weakly expressed genes. Genome Res 13:1998–2004. fragments support a 30-nm fiber having six nucleosomes per turn. J Mol Biol 376:1417– 18. Gilbert N, et al. (2004) Chromatin architecture of the human genome: Gene-rich 1425. domains are enriched in open chromatin fibers. Cell 118:555–566. 4. van den Engh G, Sachs R, Trask BJ (1992) Estimating genomic distance from DNA 19. Grosberg AY, Khokhlov AR (1994) Statistical Physics of Macromolecules (AIP, sequence location in cell nuclei by a random walk model. Science 257:1410–1412. Woodbury, NY). 5. Mu¨ nkel C, Langowski J (1998) Chromosome structure predicted by a polymer model. 20. Yokota H, van den Engh G, Hearst J, Sachs R, Trask B (1995) Evidence for the organi- Phys Rev E 57:5888–5896. 6. Hahnfeldt P, Hearst JE, Brenner DJ, Sachs RK, Hlatky LR (1993) Polymer models for zation of chromatin in megabase pair-sized loops arranged along a random walk path interphase chromosomes. Proc Natl Acad Sci USA 90:7854–7858. in the human G0/G1 interphase nucleus. J Cell Biol 130:1239–1249. 7. Sachs R, Engh G, Trask B, Yokota H, Hearst J (1995) A random-walk/giant-loop model 21. Liu B, Sachs R (1997). A two-backbone polymer model for interphase chromosome for interphase chromosomes. Proc Natl Acad Sci USA 92:2710–2714. geometry. Bull Math Biol 59:325–337. 8. Mu¨ nkel C, et al. (1999) Compartmentalization of interphase chromosomes observed in 22. Shopland LS, et al. (2006) Folding and organization of a contiguous chromosome simulation and experiment. J Mol Biol 285:1053–1065. region according to the gene distribution pattern in primary genomic sequence. J Cell 9. Cremer T, Cremer C (2001) Chromosome territories, nuclear architecture and gene Biol 174:27–38. regulation in mammalian cells. Nat Rev Genet 2:292–301. 23. Palstra RJ, et al. (2003) The beta-globin nuclear compartment in development and 10. Bohn M, Heermann DW, van Driel R (2007) Random loop model for long polymers. Phys erythroid differentiation. Nat Genet 35:190–194. Rev E 76:051805. 24. Petrascheck M, et al. (2005) DNA looping induced by a transcriptional enhancer in vivo. 11. Bode J, Goetze S, Heng H, Krawetz S, Benham C (2003) From dna structure to gene Nucleic Acids Res 33:3743–3750. expression: Mediators of nuclear compartmentalization and dynamics. Chromosome 25. Fraser P, Bickmore W (2007) Nuclear organization of the genome and the potential for Res 11:435–445. gene regulation. Nature 447:413–417. 12. Cai S, Lee CC, Kohwi-Shigematsu T (2006) SATB1 packages densely looped, transcrip- 26. Simonis M, et al. (2006) Nuclear organization of active and inactive chromatin domains tionally active chromatin for coordinated expression of cytokine genes. Nat Genet uncovered by chromosome conformation capture-on-chip (4c). Nat Genet 38:1348– 38:1278–1288. 1354. 13. Splinter E, et al. (2006) Ctcf mediates long-range chromatin looping and local histone modification in the beta-globin locus. Genes Dev 20:2349–2354. 27. Dostie J, et al. (2006) Chromosome Conformation Capture Carbon Copy (5C): A 14. Fraser P (2006) Transcriptional control thrown for a loop. Curr Opin Genet Dev massively parallel solution for mapping interactions between genomic elements. 16:490–495. Genome Res 16:1299–1309. 15. Jhunjhunwala S, et al. (2008) The 3d structure of the immunoglobulin heavy-chain 28. Limbach H, Arnold A, Mann B, Holm C (2006) Espresso—an extensible simulation locus: Implications for long-range genomic interactions. Cell 133:265–279. package for research on soft matter systems. Comput Phys Commun 174:704–727.

6of6 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0809501106 Mateos-Langerak et al. Downloaded by guest on September 26, 2021