<<

Network analysis identifies intermingling regions as regulatory hotspots for transcription

Anastasiya Belyaevaa,b, Saradha Venkatachalapathyc, Mallika Nagarajanc, G. V. Shivashankarc,d, and Caroline Uhlera,b,1

aLaboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139; bInstitute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA 02139; cMechanobiology Institute, National University of Singapore, Singapore 117411; and dInstitute of Molecular Oncology, Italian Foundation for Cancer Research, Milan 20139, Italy

Edited by David A. Weitz, Harvard University, Cambridge, MA, and approved November 13, 2017 (received for review May 15, 2017) The 3D structure of the genome plays a key role in regulatory con- transcriptional machinery, and regulatory factors to coordinate trol of the cell. Experimental methods such as high-throughput expression, also known as transcription factories, has been pro- chromosome conformation capture (Hi-C) have been developed posed as a model for gene regulation (14–16). Collectively, to probe the 3D structure of the genome. However, it remains these studies suggest that interchromosomal regions could har- a challenge to deduce from these data chromosome regions bor coregulated gene clusters. However, missing in this picture is that are colocalized and coregulated. Here, we present an inte- a systematic analysis linking 1D epigenetic marks and 3D inter- grative approach that leverages 1D functional genomic features mingling regions and their roles in transcription control. (e.g., epigenetic marks) with 3D interactions from Hi-C data to Various methods have been developed to infer the spatial con- identify functional interchromosomal interactions. We construct nectivity of the whole genome from Hi-C data. Restraint-based a weighted network with 250-kb genomic regions as nodes and approaches transform Hi-C contact matrices into distances to Hi-C interactions as edges, where the edge weights are given by deduce one consensus structure (17–21). However, it remains a the correlation between 1D genomic features. Individual interact- challenge to map contact frequencies to spatial distances due to ing clusters are determined using weighted correlation clustering biases in Hi-C matrices (22). A different approach is to produce on the network. We show that intermingling regions generally an ensemble of structures that could explain the experimental fall into either active or inactive clusters based on the enrich- data (23, 24). Computational methods have largely focused on ment for RNA polymerase II (RNAPII) and H3K9me3, respectively. inferring the 3D genome structure based on Hi-C data alone We show that active clusters are hotspots for transcription fac- without leveraging functional genomic data for studying its archi- tor binding sites. We also validate our predictions experimentally tecture. A recent study has explored this idea by superimposing by 3D fluorescence in situ hybridization (FISH) experiments and ChIP-seq data of three transcription factors (TFs) on the 3D show that active RNAPII is enriched in predicted active clusters. genome architecture inferred from Hi-C and determined func- Our method provides a general quantitative framework that cou- tional hotspots in Saccharomyces cerevisiae (25). Another study ples 1D genomic features with 3D interactions from Hi-C to probe used 1D epigenomic tracks to predict 3D interactions (26). But the guiding principles that link the spatial organization of the there remains a lack of a general quantitative framework that genome with regulatory control. integrates 1D functional genomic features with 3D intermingling regions to determine a regulatory code for interchromosomal chromosome intermingling | Hi-C | network and clustering analysis | interactions. | 3D FISH In this paper, we take a unique approach by integrating Hi-C and functional genomic data to predict regions that are he 3D structure of the genome plays a key role in regu- Tlatory control of the cell. Historically, the spatial organiza- Significance tion of the genetic material has been probed with fluorescence in situ hybridization (FISH), and it was shown that chromo- We develop a network analysis approach for identifying clus- some organization is nonrandom. Each chromosome occupies ters of interactions between , which we vali- its own territory with gene-dense chromosomes more likely to date experimentally. Our method integrates 1D features of be in the nuclear interior (1). As an addition to FISH, chromo- the genome, such as epigenetic marks, with 3D interactions, some conformation capture methods (3C, 4C, 5C, and Hi-C) have allowing us to study spatially colocalized regions between been designed to probe the 3D organization of the genome by chromosomes that are functionally relevant. We observe that measuring the genome-wide contact frequencies over a popula- tion of cells (2–5). Computational and experimental efforts have clusters of interchromosomal regions fall into active and inac- largely focused on investigating intrachromosomal contacts. Stud- tive categories. We find that active clusters share transcrip- ies where these interactions have been analyzed together with epi- tion factors and are enriched for transcriptional machinery, genetic modifications as measured by immunoprecip- suggesting that chromosome intermingling regions play a key itation sequencing (ChIP-seq) showed that epigenetic marks are role in genome regulation. Our method provides a unique tightly linked to shaping the architecture of the genome (6, 7). quantitative framework that can be broadly applied to study Few studies have considered interchromosomal interactions. It the principles of genome organization and regulation during was shown that regions on neighboring chromosome territories processes such as cell differentiation and reprogramming. may loop out and intermingle with each other in a transcription- dependent manner (8, 9). In addition, a recent study has revealed Author contributions: A.B., S.V., G.V.S., and C.U. designed research; A.B., S.V., M.N., that intermingling regions are enriched in both active and repres- G.V.S., and C.U. performed research; A.B., S.V., G.V.S., and C.U. contributed new reagents/analytic tools; A.B., S.V., G.V.S., and C.U. analyzed data; and A.B., S.V., G.V.S., sive epigenetic marks, as well as the active form of RNA poly- and C.U. wrote the paper. merase II (RNAPII) and transcription factors (10). Further- more, it was identified that genes are spatially colocalized and The authors declare no conflict of interest. coregulated by sharing common transcription factors (11, 12) This article is a PNAS Direct Submission. and epigenetic machinery like the polycomb proteins (13). For This open access article is distributed under Creative Commons Attribution- example, TNFα-responsive genes (on the same and different NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND). chromosomes) have been shown to colocalize upon their stim- 1To whom correspondence should be addressed. Email: [email protected]. ulation. Their spatial clustering was found to be correlated with This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. their temporal expression patterns (12). The clustering of genes, 1073/pnas.1708028115/-/DCSupplemental.

13714–13719 | PNAS | December 26, 2017 | vol. 114 | no. 52 www.pnas.org/cgi/doi/10.1073/pnas.1708028115 Downloaded by guest on October 2, 2021 Downloaded by guest on October 2, 2021 eyeae al. et Belyaeva 1. Fig. inter- in submatrices interact- average highly of large of determining identification consists (i) by method 1: domains Fig. Our ing in marks. outlined steps their epigenetic four reg- in and shared by similarities factors coregulated to be ulatory might due thus spatially and features whole- interact regulatory the at that regions scale chromosome Our of genome RNA-seq. clusters identify and to (DNase-seq), was aim hypersensitivity I ChIP- coregu- DNase TF experiments marks, seq, and Hi-C epigenetic namely, from colocalized information, information regulatory spatially and spatial both leveraged we are lated, that regions somal Domains. Intermingling of Identification Results cluster. significantly active is predicted RNAPII the active in that enriched neg- using confirm predicted also a predictions and vs. our control cluster ative active validate predicted We a comparing binding. can clusters by FISH TF active regions where for intermingling clusters, hotspots inactive that are and find active into We features divided resolution. genomic be 250-kb their a in correlations at chromosomal by of network weighted a interactions construct net- we a approach, Using inter- analysis regions. work interchromosomal intermingling on chromosome study focus function to cell We actions for manner. important coordinated are a that in and pathways of expression gene repress gene clustering of or coordinate mode spatial to activate cell This the the (27). enable is may coregulation regula- analysis regulation their gene our for of by regions model captured genomic The is 3D. that in tion coregulated and colocalized ersie(idernsHK7e n 39e) n te inrrns aeois (F categories. rings) The (inner other algorithm. Appendix and (SI LAS regions H3K9me3), the and by listed rings—H3K27me3 as determined in (middle RNA-seq subnetwork as repressive and the (box) DNase-seq, on submatrix ChIP-seq, in clustering a TF ring correlation modifications, in inner weighted chromosomes. (histone together by features of found obtained genomic pair clusters are the each two between that for correlation nodes algorithm of link strength LAS Edges the the in region. by by given 250-kb identified are a weights regions to edge 250-kb corresponds interacting network of the + number in log(1 the after containing SD Matrix and (B) (C mean average. high by with (standardized trices transformation and Appendix (SI preprocessing 20 after and resolution 250-kb at matrix FDE A untoko h hoooeitrcinntokcrepnigt w itntcutr.Ndsaeclrdb hoooenme.Ec node Each number. chromosome by colored are Nodes clusters. distinct two to corresponding network interaction chromosome the of Subnetwork ) fajcn 5-bnds (D nodes. 250-kb adjacent of S2) Table Appendix, SI vriwo h rpsdqatttv rmwr o eetn nemnln ein.( regions. intermingling detecting for framework quantitative proposed the of Overview etrsaegopdit cie(ue ig—N-e,RAI,HKm1 34e,HKm3 33m3 H3K9ac), H3K36me3, H3K4me3, H3K4me2, H3K4me1, RNAPII, rings—RNA-seq, (outer active into grouped are Features S2. Table Appendix, SI .Rcaglrbxsrpeetitrcigdmisfrti aro hoooe sdtce yteLSagrtm hc nssubma- finds which algorithm, LAS the by detected as chromosomes of pair this for domains interacting represent boxes Rectangular ). ). oietf interchromo- identify To BC and E ciiy(omlzdnme fpasi 5-brgo)o h eoi etrsfrthe for features genomic the of region) 250-kb a in peaks of number (normalized Activity ) .Cnitn with 1B Consistent Fig. ). S1 30), Table (2, Appendix, observations chromosomes (SI all previous Mb across domains 903.25 interacting spanned pair highly of of chromosome size (FDR) number particular total the rate the to discovery for 1B, corresponds [false matrix Fig. identified the yields regions pro- in maps 250-kb high this entry interchromosomal with each Applying pairwise regions where matrix. all the interchromosomal to captures shown cedure algorithm the As LAS in 20. and the intensity 19 1A, chromosomes Fig. for map in contact Hi-C the in 1A Fig. domains. interacting highly ieadaeaevle sotie in matrix outlined balances as that value, average (29) large and algorithm iterative size (LAS) the used submatrix We regions. average 250-kb two between quency hthsahg average submatrix high contiguous X a prob- a sought finding has We submatrix that maps. interchromo- following Hi-C interacting the in highly solving lem Appendix), of by set regions (SI bias stringent somal data After a the 28. identified ref. transforming we from and obtained filtering, resolution, correction, 250-kb at cells 90) coregulated and colocalized net- spatially (iv) obtain domains. and to of coregulation, clustering of network the measure work of a a correlation as the of marks by construction superimposed weighted edges (iii) with marks regions domains, interacting regulatory interacting superimposing the (ii) on maps, Hi-C chromosomal ahrn orsod ooegnmcfaue itdfo ue igto ring outer from listed feature, genomic one to corresponds ring Each C. (m eaaye iCdt rmhmnln bols (IMR- fibroblast lung human from data Hi-C analyzed We × n odercmn fec eoi etr nteintermingling the in feature genomic each of enrichment Fold ) hr ahetyi nitrhoooa otc fre- contact interchromosomal an is entry each where ), xml fa bevditrhoooa iCcontact Hi-C interchromosomal observed an of Example A) PNAS | eebr2,2017 26, December τ ihntera-auddt matrix data real-valued the within , < x rnfrain o hoooe 19 chromosomes for transformation) ) 4.16 hw h dnie domains identified the shows × 10 | IAppendix SI hw htgene-dense that shows −8 o.114 vol. , .The Appendix]. SI | o 52 no. odiscover to U | (k 13715 × l )

BIOPHYSICS AND COMPUTATIONAL chromosomes such as 15–17 and 19–22 had a high number of based on the associated regulatory profiles (SI Appendix, Table intermingling 250-kb regions. In addition, as previously noted S2). We used eXtreme gradient boosting trees with 10-fold cross- (31), we found a striking difference between chromosomes 18 and validation to train our classifier. Using all features, the classi- 19—although these two chromosomes are approximately equal fier achieves an accuracy of 85% ± 5% and the corresponding in size, the gene-poor chromosome 18 has a low level of inter- receiver operating characteristic (ROC) curve in Fig. 2A has an mingling across most chromosomes, while the gene-rich chromo- area under the curve (AUC) of 0.77. some 19 tends to intermingle more with other chromosomes. To quantify the importance of each feature by itself and in con- junction with all other features, we computed its univariate and Integration of Functional Genomic Data and Network Analysis. We multivariate rank based on its depth in the decision trees of the obtained functional genomic data: TF ChIP-seq, histone modi- ensemble (Fig. 2B and SI Appendix, Fig. S2). The most impor- fications, DNase-seq, and RNA-seq data from ENCODE (32), tant features determined by this analysis are lamin B1 (LMNB1), Roadmap Epigenomics (33), and GEO databases (SI Appendix, H3K9me3, H3K56ac, and H2A.Z. The importance of both Table S2). We used these experimental data as a regulatory repressive (H3K9me3, LMNB1) and active (H3K56ac, H2A.Z) profile for all 250-kb regions that lay within the intermingling marks ties with the observation that intermingling regions con- domains. tain both active and repressed regions (35). Furthermore, pre- Considering each selected 250-kb region as a node, a whole- vious mapping of LMNB1 in the genome revealed the pres- genome network of chromosomal interactions was constructed ence of lamina-associated domains (LADs) that interact with as follows. Between chromosomes, the edges in the network the lamina on the nuclear envelope, spatially organize chromo- were placed between pairs of 250-kb regions that lay within the somes by anchoring them to the lamina, and display coordi- same submatrix as identified by the LAS algorithm. Within chro- nated gene repression (36–38). H3K9me3 is enriched in LADs mosomes, edges were placed between loci that fall within the and may facilitate gene silencing in LADs (37, 39). The context- same intrachromosomal domain, as determined in ref. 28. After dependent importance of this feature is in line with its low establishing the skeleton of the network, the edge weights were univariate, but high multivariate rank (SI Appendix, Fig. S2). calculated as follows. Since our goal was to determine spa- H3K56ac is a known mark of transcriptionally active chromatin tially coregulated regions, we weighted the edges by Spearman’s regions (40, 41). Finally, H2A.Z is enriched at transcription start correlation between the genomic profiles of adjacent 250-kb sites (42), indicating its involvement in transcription initiation, regions. This combined approach can mitigate some of the noise and it appears to be a defining feature of intermingling on its associated with using Hi-C contact frequencies alone. In addition, own (SI Appendix, Fig. S2). it allows us to identify chromosome intermingling regions with Performing stepwise feature elimination shows that ∼13 fea- coordinated activity, which might be controlled by the same set tures are sufficient for achieving high AUC (Fig. 2C) and the of TFs or epigenetic marks, as opposed to domains that interact corresponding features are annotated by asterisks in Fig. 2B. in 3D by chance. A subnetwork containing six 250-kb regions from three distinct chromosomes is shown in Fig. 1C. The edge weights Intermingling Clusters Are Divided into Active and Inactive Clusters. in this subnetwork suggest the presence of two separate clusters. While it is interesting to evaluate intermingling regions alto- To retrieve intermingling regions that are coregulated, the gether, studying these on a cluster-by-cluster level may give weighted network of 250-kb regions was partitioned into clus- ters, using weighted correlation clustering (34) (see SI Appendix). This approach can for example identify regions that are brought together for transcription, since these would have high RNAPII A C and low repressive epigenetic marks. This approach indeed found two clusters in the subnetwork shown in Fig. 1C. The reg- ulatory profiles of the six regions, separated into two clusters, are illustrated in Fig. 1 D and E. As a consequence of using weighted correlation clustering, the genomic features within a cluster are more similar than across clusters. Interestingly, the particular cluster in Fig. 1D is enhanced for active genomic fea- tures (we analyzed H3K9ac, H3K36me3, H3K4me3, H3K4me2, H3K4me1, RNAPII, and RNA-seq) and depleted for repressive B features (we analyzed H3K27me3 and H3K9me3), while the clus- ter in Fig. 1E is depleted for active features. Using this method, 446 clusters (totaling 459.5 Mb; SI Appendix, Table S1) were identified (P value < 2.2 × 10−16 under a χ2 test) that consist of at least two nodes and span multiple chromosomes (SI Appendix, Table S3 and Dataset S1). On average, 2.5 chromosomes interact within one cluster (SI Appendix, Fig. S1). We analyzed the enrichment of regulatory marks in intermin- gling regions and found that these regions were most enriched for RNAPII, namely by a factor of 2.23 (Fig. 1F). We also found the active and repressive marks (e.g., H3K9ac, H3K4me3, and H3K9me3) to be enriched in intermingling clusters, which is con- sistent with a previous study (10).

Regulatory Features are Predictive of Intermingling. To character- ize intermingling regions as a whole and evaluate whether they Fig. 2. Performance and feature importance for classifying intermingling are distinct from nonintermingling regions on a regulatory level, regions. (A) ROC curve for eXtreme gradient boosting trees classifier that we built a classifier and determined the features that contribute was trained on genomic features of intermingling vs. nonintermingling the most to distinguishing between these two classes. These fea- regions. This results in AUC of 0.77. (B) Features ranked in order of impor- tures may represent a mechanism to spatially cluster genes for tance (relative depth of feature in the decision tree) for distinguishing inter- their coregulation. We annotated 250-kb regions as intermin- mingling domains. (C) AUC when recursively eliminating one feature at a gling or nonintermingling based on the results from our net- time based on 10-fold cross-validation. Near-optimal performance is reached work analysis and clustering. We then performed classification with 13 features, which are indicated by asterisks in B.

13716 | www.pnas.org/cgi/doi/10.1073/pnas.1708028115 Belyaeva et al. Downloaded by guest on October 2, 2021 Downloaded by guest on October 2, 2021 oeta rncito tr ie htcnanmtf o h TFs the for motifs col- contain of consideration. list that 45, under a sites with ref. start us transcription provided from step potential FANTOM5 filtering peaks the This of CAGE Appendix). 2016 part ( SI as of project JASPAR samples tissue set the human 353 filtering robust across lected and a of overlaying set with larger by We database a (386) motifs. consider TF ENCODE motifs also 52 to from TF for analysis available TFBS additional in lines an the resulted ChIP- performed cell This using obtain Appendix ). human filtered (SI to all then (32) database from and peaks overlaid 2016 sites were seq binding JASPAR data TF the These analyzing TFBS. used by and We cluster colocalizing in a (TFBS). involved in be regions may that coregulating TFs Binding. shared TF for for clusters Hotspots Are Clusters Active clus- binding. active TF 2.94 for that of hotspots suggest be ratio findings may HOT:LOT These ters S4). a Table by Appendix, low- regions, (SI with (LOT) comparison in target (44), clusters TFs occupancy target many active by high-occupancy in occupied overrepresented addition, are were that In regions S3). i.e., regions, Fig. (HOT) , Appendix RNAPII. (P (SI expression for a ters gene not under higher but significantly 0.004 = H3K9me3 value had also for RNAPII clusters 3B enriched for Active as Fig. enriched defined clusters in were those shown enrichment as as (fold defined (18 categories clusters were of both clusters 4% in intermingling only the (P were with of inactive, that separation and Inter- clusters) clear active clusters). into a (171 clusters observed 38.3% we was estingly, H3K9me3 mark any inactivating for enriched not or marks active mark. five active all for enriched either ftecutr—17 16cutr)wserce o all 3A for Fig. a and in enriched H3K4me3, shown as H3K36me3, clusters)—was H3K4me1 H3K9ac, (186 marks—RNAPII, clusters—41.7% active the of eyeae al. et Belyaeva enrich- fold the computed ( features S3 we regulatory several Table obtained, for the cluster each we analyze of To ment clusters regions. inactive of regions other active types with other regions with spa- hypoth- inactive clustered we and are and (43) regions processes active evidence that regulatory previous esized on between Based links colocalization. tial the into insights show- inactive silenced, and and active active of categories both two are into clusters separate 446 clusters clusters. of the five- dia- that 18 the Venn ing of only (B) intersection that marks. the in active Note clusters five 186 in all (the diagram clusters for way many active enriched Interestingly, the are of RNAPII. clusters 446) gram and of of mark number (186 epigenetic the clusters active representing each diagram for Venn enriched Five-way (A) clusters. 3. Fig. AB χ h ecnaeo lseserce o h repressive/ the for enriched clusters of percentage The au = value 2 test, lsicto fitrigigrgosit cieadinactive and active into regions intermingling of Classification and .Ntby h aoiyo lseswere clusters of majority the Notably, Appendix). SI 4.699 n lseserce o h iecn akH3K9me3. mark silencing the for enriched clusters and A) .W on htahg proportion high a that found We S1). Dataset × )btntfrHKm3 ncieclusters Inactive H3K9me3. for not but >1) 10 −4 t et ncmaio ihiatv clus- inactive with comparison in test) ne a under (P χ au = value 2 test, epoe h active the probed We .Active Appendix). SI 1.398 × IAppendix, SI 10 −5 under h o 5atv lsesaesonin shown are clusters active Appendix). across S6 15 shared (SI were test top that permutation The TFs a for using sites chromosomes, binding multiple of presence the to cate- S5), Validation. cluster Table Experimental of and types S6 distinct Fig. regions. contained two Appendix, intermingling for clusters of (SI gories inactive existence TFBS that the of found reaffirming number we low 4C). hand, a other (Fig. the transport” revealed On cluster intracellular such this fibroblasts “cytoskeleton-dependent the to in related as of processes S5) biological analysis Fig. for enrichment term Appendix, an (GO) (SI region ontology genes one Gene between expressed and colocalization 17. 12 chromosome the chromosome chromosomes on by on both regions formed of 250-kb is adjacent regions cluster two on This NRF1 4B). and (Fig. USF1 TFs in the studied for chromo- cluster the multiple example, 1D For spanning Fig. S4). regions Fig. Appendix , across (SI shared somes are Mann– that a proteins, TFs under of counts family TFBS E2F in the increase and Whitney significant CTCF, factors, a Several YY1, genome. showed EGR1, whole the as vs. such clusters active for region .The S8). Fig. Appendix, network the (SI analyzing 17 by chromosomes com- obtained and least-interacting we 12 We chosen that of chromosomes control S7). thus between negative Fig. overlap were a to of Appendix, and amount clusters (SI the pared validation ranked experimental highly for top the among xrse n ooaie nteitrigigcutrsonin shown cluster intermingling the by in colocalized regions. and clustered expressed the of coregulation each may col- and cluster (C for colocalization nonzero the in in one TFBS regions role least multiple its of among at shared indicate number TF containing A TFs the shown. are Only entry with cluster. umn the cluster in active region representative 250-kb a to ing whole- Mann–Whitney the a with (under compared distribution clusters genome active in TFBS overrepresented nificantly 4. Fig. C A infiatyerce Otrscmue rmtegnsta are that genes the from computed terms GO enriched Significantly ) h aoiyo ciecutr otie idn ie for sites binding contained clusters active of majority The 250-kb per counts TFBS of distributions the compared We hoooe 2ad1 eecnitnl on together found consistently were 17 and 12 Chromosomes . P au sn DAVID, using value o 0Tswt sig- with TFs 10 Top (A) clusters. active across terms GO and TFBS novn hoooe 2ad1 otisbnigsites binding contains 17 and 12 chromosomes involving U et(i.4A). (Fig. test PNAS | IAppendix SI eebr2,2017 26, December erne h ciecutr according clusters active the ranked We B ]. U | arxcorrespond- Matrix (B) test). o.114 vol. IAppendix SI | o 52 no. B Table , | [ranked 13717

BIOPHYSICS AND COMPUTATIONAL BIOLOGY chromosome territories were identified in human fibroblast (BJ) Welch two-sample t test), showing that the chromosome pair 12 cells using DNA FISH and visualized using a laser scanning con- and 17 indeed contains an active mark at the site of intermingling. focal microscope (Fig. 5 A–F). To obtain a representative sample of the population, we imaged at least 200 cells for each chromo- Discussion some pair. We confirmed that chromosomes 12 and 17 consis- Understanding the spatial organization of the chromosomes tently intermingle in a population of cells (Fig. 5C; SI Appendix, within the has been a major question in cell biology. Fig. S9; and Movie S1), while the negative control chromosome A number of studies have suggested that the packing of DNA pair does not (Fig. 5F; SI Appendix, Fig. S10; and Movie S2). To plays a critical role in regulating genomic programs (3). Earlier quantify our results, the intermingling degree, i.e., the amount experiments took advantage of chromosome painting methods of overlap between the two pairs of chromosome territories, was and revealed that chromosomes are organized nonrandomly and calculated as explained in SI Appendix. We found that the chro- in a cell–type-specific manner (1, 8, 9). Analysis of gene position- mosome pair 12 and 17, which was predicted to interact, had a ing using FISH showed that coregulated genes were coclustered significantly higher intermingling degree than the negative con- (11, 12). Such clusters of genes were also found to be colocalized trol pair 3 and 20 (Fig. 5G, P value = 0.005 under a Welch two- with transcription-related machinery such as active RNAPII and sample t test). The percentage of nuclei that were intermingling TFs (11, 12). Recent developments in chromosome capture tech- (intermingling degree >0) was higher in the predicted pair of nologies further revealed that genome-wide chromosome con- interacting chromosomes, 12 and 17, than in the negative con- tact maps are correlated with epigenetic marks (6, 7). The major- trol, 3 and 20 (SI Appendix, Fig. S11). In addition, we also cal- ity of studies using chromosome conformation capture focused culated the enrichment of active RNAPII in the intermingling on linking chromatin contacts with epigenetic modifications at regions for the aforementioned pairs (SI Appendix). We found the resolution of genes in intrachromosomal regions (6, 7). How- that the predicted chromosome pair, 12 and 17, which belongs to ever, the coupling between the global organization of chromo- an active cluster, had significantly higher enrichment for active somes with genome-wide epigenetic marks and the intermingling RNAPII in the intermingling regions compared with the nega- regions as an additional layer of transcriptional regulation has tive control pair, 3 and 20 (Fig. 5H, P value = 7.125e-05 under a not been well studied. In this paper, we developed a network analysis approach to reveal the principles of transcription-dependent chromosome intermingling by taking advantage of 3D contact maps obtained A G using Hi-C and 1D epigenetic marks, TF ChIP-seq, DNA acces- sibility, and RNA-seq. Our computational approach focuses on interchromosomal domains, since their organizational princi- ples have been largely unknown. The proposed quantitative framework enables the prediction of chromosome intermingling regions at a genome-wide scale, thereby complementing exper- B C imental methods such as FISH that can be used to study spe- cific clusters of interchromosomal interactions. The novelty of our method lies in leveraging 1D genomic features in combina- tion with 3D interactions from Hi-C data. This allows us to study functionally colocalized regions: Since interactions can occur by chance in 3D, some intermingling regions may not be of bio- D H logical relevance. By leveraging epigenetic marks and data from TF binding and DNA accessibility, as well as gene expression, we can determine interchromosomal regions that are colocalized and coregulated. Our predictions reveal intriguing patterns of chromosome organization and have been validated by FISH experiments. Our EF findings recapitulate known principles of chromosome interac- tions, such as the tendency of gene-dense chromosomes to inter- mingle more frequently (2, 30) and the enrichment of RNAPII in intermingling regions (10), suggesting that RNAPII may play a crucial role in establishing and maintaining chromosome interac- tions. We observe that the clusters of interchromosomal regions Fig. 5. Experimental validation. (A) Representative images of the fall broadly into two categories, active and inactive, where active maximum-intensity Z projections of the nucleus, active RNAPII, and chro- clusters are enriched for active epigenetic marks and RNAPII mosomes 17 and 12, from Left to Right, respectively. (B) Raw image result- and inactive clusters are enriched for H3K9me3. Interestingly, ing from merging the nuclear (blue) and the two chromosome chan- we found that active clusters are hotspots for TF binding sites, nels depicting the overlap between chromosomes 17 (purple) and 12 with several TFs being shared among multiple chromosomes (cyan). (C) Image in B after segmentation with nucleus (white), chromo- within a cluster. These clusters contain genes with biologically some 17 (red), and chromosome 12 (green). Yellow regions are the over- relevant GO terms. We established the predictive power of lapping or intermingling regions. (C, Right) Enlargement of the region our model through experimental validation. Using FISH experi- in the dotted white boxes in C, Left.(D) Representative images of the ments we showed that the predicted intermingling chromosomes maximum-intensity Z projections of the nucleus, active RNAPII, and chro- interact consistently across a population of cells and that such mosomes 20 and 3, from Left to Right, respectively. (E) Raw image result- intermingling regions are enriched for active RNAPII. Our quan- ing from merging the nuclear (blue) and the two chromosome channels titative analysis provides evidence that TF hotspots in active clus- depicting the overlap between chromosomes 20 (purple) and 3 (cyan). ters are colocalized with active epigenetic modifications and with (F) Image in E after segmentation with nucleus (white), chromosome 20 (red), and chromosome 3 (green). (F, Right) Enlargement of the region in the RNAPII and have a significantly higher gene expression than dotted white boxes in F, Left.(G) Boxplot depicting intermingling degree inactive clusters, suggesting that the relative positioning of the between chromosomes 12 and 17 and chromosomes 3 and 20 (P value = chromosomes in the cell nucleus is optimized to facilitate the 0.005 under a Welch two-sample t test). (H) Boxplot depicting the enrich- clustering of coregulated genes, TFs, epigenetic modifications, ment of active RNAPII between chromosomes 12 and 17 and chromosomes and transcriptional machinery. 3 and 20 (P value = 7.125e-05 under a Welch two-sample t test). (All scale Collectively, these findings suggest that the spatial organiza- bars, 5 µm.) tion of the genomic material in the cell nucleus is optimized for

13718 | www.pnas.org/cgi/doi/10.1073/pnas.1708028115 Belyaeva et al. Downloaded by guest on October 2, 2021 Downloaded by guest on October 2, 2021 6 he ,Siahna V(07 hoooeitrigig ehnclhtpt for hotspots Mechanical intermingling: Chromosome (2017) GV Shivashankar C, Uhler 16. distant between contacts regulatory Polycomb-dependent (2011) al. et F, Bantignies 13. TNFα (2012) genes al. co-regulated et A, between Papantonis associations Preferential 12. (2010) al. chromo- et of S, basis Schoenfelder physical intermingling-the 11. Chromosome (2016) al. et S, Maharana 10. 8 en ,Rps ,RgrP ora ,Mzioac 21)3 eoereconstruc- genome 3D (2014) J Mozziconacci A, Cournac P, Roger semi-definite J, with Riposo modeling A, chromosome Lesne 3D (2013) 18. WK Sung KC, Toh G, Li Z, Zhang 17. nucleome. 4D human the of organization Functional (2015) al. et H, Chen gene and 15. organization Genome factories: Transcription (2013) PR Cook A, Papantonis 14. 5 aus ,BntsnH ea R(06 icvrn ososi ucinlgenomic functional in hotspots Discovering (2016) MR Segal H, driving Bengtsson reveals D, analysis Capurso structure genome 25. 3D Population-based (2016) al. et H, Tjong 24. pretty Beyond chromosomes: Modeling (2015) LA and Mirny genomes G, of Fudenberg MV, modeling Imakaev three-dimensional 22. Restraint-based (2015) al. two- et a F, via Serra architecture genome 21. 3D the of Reconstruction inferring (2015) for HL approach Bengtsson statistical MR, A Segal (2014) 20. JP Vert WS, Noble F, Ay N, Varoquaux 19. 3 agS uJ egJ(05 neeta oeigo Dcrmtnstructure. chromatin 3D of modeling Inferential (2015) J Zeng J, Xu S, Wang 23. A loih o dniyn ihyitrcigrgos egtdcor- weighted nonintermingling and regions, intermingling the interacting into matrices, highly classification Hi-C raw clustering, identifying the relation for processing for algorithm used LAS methods the about Details Methods and Materials homeostasis. of reprogram- maintenance differentiation, the cell insights or as ming, gain reg- such to and processes reorganization during framework chromosome ulation quantita- useful between interplay our a the that provide into believe will We approach single-cell available. robust tive more the become as that increased data be anticipate genomic will iden- we method our to However, of information regions. power enough interacting carry highly data tify epigenetic genome-wide by and population-level showed using contact that We model methods type. our cell imaging from any single-cell predictions analyze the to gen- validating applied is experimentally be here can present and we eral framework The programs. transcription eyeae al. et Belyaeva .Bac R ob 20)Itrigigo hoooetrioisi interphase in territories chromosome of Intermingling (2006) A between Pombo link of MR, the Branco types probe to distinct methods 9. reveals experimental and data Modeling ChIP-seq (2012) al. and et KV, Hi-C Iyer of by 8. Integration identified (2012) genomes al. et mammalian X, in Lan domains Topological 7. (2012) al. communica- et chromosomal JR, of moderator Dixon as 6. genome 3D The (2016) L chromosome Mirny of J, analysis Dekker and mapping 5. Genome-wide (2016) B Ren M, of Hu organization AD, Domain Schmitt architecture: 4. Genome (2013) B Steensel Van WA, interactions Bickmore long-range of 3. mapping male Comprehensive human (2009) in al. chromosomes et E, all Lieberman-aiden of maps 2. Three-dimensional (2005) al. et A, Bolzer 1. eoeregulation. genome USA Sci Acad drosophila. in loci hox transcribed. are genes miRNA and coding sive cells. erythroid in interactome transcriptional a reveal cells. differentiated in organization some associations. transcription-dependent and 4:e138. translocations in role suggests chromosomes. of organization spatial and transcription global linkages. chromatin interactions. chromatin of analysis tion. architecture. chromosomes. interphase genome. human the of principles folding reveals rosettes. prometaphase and nuclei fibroblast info hoooa contacts. chromosomal from tion data. Hi-C and programming regulation. aasproe n3 hoai ofiuainreconstructions. configuration chromatin 44:2028–2035. 3D on superposed data organization. genome spatial in forces Res Acids pictures. domains. genomic algorithm. stage genome. the of structure 3D Cell ESLett FEBS 164:1110–1121. 43:e54. hmRev Chem a e o elBiol Cell Mol Rev Nat 112:8002–8007. M Bioinformatics BMC ESLett FEBS 589:3031–3036. uli cd Res Acids Nucleic rnsCl Biol Cell Trends 113:8683–8705. Cell Cell 144:214–226. 589:2987–2995. Bioinformatics optBiol Comput J 152:1270–1284. inl hog pcaie atre hr respon- where factories specialized through signals a Methods Nat Nature 17:743–755. 27:810–819. 16:373. 40:7690–7704. rcNt cdSiUSA Sci Acad Natl Proc uli cd Res Acids Nucleic 485:376–380. 20:831–846. 30:i26–i33. LSBiol PLoS MOJ EMBO 11:1141–1143. Science a Genet Nat 31:4404–4414. 3:e157. 326:289–293. 44:5148–5160. 113:E1663–E1672. LSOne PLoS 42:53–61. uli cd Res Acids Nucleic 7:e46628. LSBiol PLoS rcNatl Proc Nucleic 7 un W epciR,SemnB 20)Sseai n nertv analy- integrative and Systematic (2009) BT Sherman Ra, Lempicki DW, Huang 47. Paths tools: enrichment Bioinformatics (2009) RA Lempicki BT, Sherman DW, promoter-level A Huang (2014) (DGT) 46. CLST and PMI RIKEN the and Consortium characterisation FANTOM and The identification Genome-wide chromatin 45. (2016) W inactive Shu X, Bo and C, Ren F, active Liu H, Li of 44. organization Nuclear (2006) al. et M, Simonis 43. 2 asiA ta.(07 ihrslto rfiigo itn ehltosi h human the in methylations histone of profiling High-resolution (2007) al. et A, histone Barski of acetylation 42. CBP/p300-mediated (2009) JK Tyler of KC, cells. Hansen MS, human Identification Lucia in C, H3K56ac Das (2015) in changes 41. T cycle-dependent Cell Misteli (2015) al. et N, S, Stejskal Sciascia 40. G, Pegoraro TC, of expression Voss alter can S, periphery nuclear Shachar the 39. to Recruitment (2008) al. et LE, Finlan 38. 7 ulnL ta.(08 oanognzto fhmncrmsmsrvae by revealed chromosomes human of organization Domain (2008) nuclear al. of et maintenance the L, for B1 Guelen lamin of 37. role The (2015) T Ried MR, Erdos J, Camps mech- and 36. Players architecture: genome Three-dimensional (2015) N Dillon A, Pombo 35. cluster- correlation for methods comparing and Bounding (2009) W Schudy M, reference Elsner 111 the of 34. analysis in Integrative (2015) elements al. DNA et Consortium, of Epigenomics Roadmap encyclopedia 33. integrated An (2012) Consortium chromosomes ENCODE architectures of morphology 32. and Genome localization the (2012) in Differences (1999) L al. et Chen JA, Croft F, 31. Alber N, Jayathilaka H, Tjong subma- R, average large Kalhor Finding (2009) 30. AB reveals Nobel resolution CM, kilobase Perou at VJ, Weigman genome AA, human Shabalin the of 29. map 3D A (2014) al. et SSP, Rao 28. 7 ekrJ itl 21)Ln-ag hoai interactions. chromatin Long-range (2015) T Misteli J, epigenomes. Dekker 1D from 27. maps interaction 3D Constructing (2016) al. et Y, Zhu 26. rjcsAec W1N-6105) n h fc fNvlResearch Naval Research of Advanced Office Defense the the and par- (1651995), was (N00014-17-1-2147). (W911NF-16-1-0551), NSF C.U. Agency funding. the for Projects by Program Grant supported Tier-3 and tially Education Singapore, G.V.S. of of (NSF) and University Ministry M.N., National Foundation the S.V., Institute, Science 1122374. Mechanobiology the Grant National thank under the Fellowship Research and Graduate T32GM87232 Grant Training ACKNOWLEDGMENTS. protein-levels Chromsome-intermingling-region-indentifcation-and-characterisation-of- at available is analysis image the at available is clusters anastasiyabel/functional of analysis and identification the in provided are analysis used image settings and and imaging methods cell confocal the the for and features, protocols, FISH genomic chromosome of and enrichment culture fold of computation the domains, h oefritrhoooa ewr osrcinvaLSadfor and LAS via construction network interchromosomal for code The rnilso hoai looping. chromatin of principles Biol spect mun i flregn it sn AI iifraisresources. bioinformatics DAVID using lists gene 57. large of sis lists. gene large of analysis 37:1–13. functional comprehensive the toward atlas. expression mammalian genome. human the in regions HOT of (4C). capture-on-chip conformation 38:1348–1354. chromosome by uncovered domains genome. 56. lysine on H3 Cycle mapping. imaging high-throughput using 923. factors positioning gene cells. human in genes apn ula aiainteractions. lamina nuclear mapping function. and structure anisms. 2017. https://dl.acm.org/citation.cfm?id=1611638.1611641. 4, December at Accessed Available PA). Processing Stroudsburg, Language Natural and ming ILP. beyond ing epigenomes. human genome. human nucleus. human the in mod- population-based and eling. capture conformation chromosome tethered by revealed data. dimensional high in trices 7:10812. 14:3851–3863. a Biotechnol Nat a e o elBiol Cell Mol Rev Nat Cell 7:a019356. . 129:823–837. rceig fteNALHTWrso nItgrLna Program- Linear Integer on Workshop HLT NAACL the of Proceedings Nature PNAS Nature Nature 30:90–98. LSGenet PLoS elBiol Cell J Nucleus 459:113–117. 489:57–74. ..wsprilyspotdb I Predoctoral NIH by supported partially was A.B. | chromosome eebr2,2017 26, December 518:317–330. Nature 16:245–257. 6:8–14. n plStat Appl Ann Cell 145:1119–1131. https://github.com/SaradhaVenkatachalapathy/ 4:e1000039. 507:462–470. 159:1665–1680. Nature M Genomics BMC AscainfrCmuainlLinguistics, Computational for (Association h oefrperforming for code The interactions. 453:948–951. 3:985–1012. | o.114 vol. 17:733. https://github.com/ odSrn abPer- Harb Spring Cold IAppendix SI | o 52 no. a Protoc Nat uli cd Res Acids Nucleic Cell a Genet Nat a Com- Nat | 162:911– . 13719 4:44– Cell

BIOPHYSICS AND COMPUTATIONAL BIOLOGY