Downloaded by guest on September 26, 2021 www.pnas.org/cgi/doi/10.1073/pnas.1816424116 positions by approximated be can that 10n NRL distribution cells, (mES) a stem nonuniform embryonic are follows mouse length, in linker par- fibers; the the eukaryotic plus around in bp) wrapped (147 DNA of nucleosome amount ent the or (13). (NRL), kb 5 length to often 1 is from range chromatin that living islands in small in Acetylation found its 12). along (10, lengthening axis) fiber principal (i.e., associated decondensation tail is fiber H4 and linear the with contacts on 16 kilobase-range lysine decreases of of (H4K16ac) Acetylation structures (9–11). secondary tail hydrogen respective folded in the stably participate more can promoting many residues bonding, regulating modified of in The roles key one expression. modu- play and that tails—is architecture chromatin (PTMs) N-terminal late modifications within posttranslational residues reversible, lysine chromatin the affect strongly to known linker are structure. and binding positioning, surrounding (LH) modifications, nucleosome the histone tails, chromatin histone into epigenetic H4) of nucleosome many and acetylation Among the H3, (7). from H2B, solvent domains away tail H2A, N-terminal extend of flexible that has each histone copies Each (4–6). (two consists fibers nucleosome Chromatin each tone 3). where (2, nucleosomes, cell of of up the made of are stages development during on differentiation regulated depends and tightly looping are Such that factors (1). epigenetic megabases to kilobases spanning D modeling chromatin gene predict incorpo- and models understand resolution activity. to nucleosome modifications for epigenetic factors way rating epigenetic the by opens strongly chro- and coordinated how pro- is emphasizes work architecture acetylation our matin and Thus, contacts. LH long-range combined duce contacts, transient alone feature acetylation increases and a contacts long-range regions, decrease LH alone while proteins promoter Interestingly, regulation. near transcriptional with “stripes” associated exhibit by matrices average promoters ity the between decrease and con- distance gene’s simultaneously accommo- dynamic contacts gene the which multiple a loops HOXC on date hierarchical forms by feature 55-kb spontaneously characterized each the hub, cluster nection of silico gene role in The the “fold” folding. investigate to and sites cluster binding nucleosome islands, on LH acetylation data and (NFRs), histone available regions linker integrate nucleosome-free we and positioning, Here, nucle- acetylation, by binding. tail level (LH) histone structural the positioning, at osome orchestrated is expression Gene Jos by Edited and 10012; a Bascom D. hub Gavin gene HOXC driven an epigenetically of formation reveals modeling Mesoscale omnyrfre oa rncito tr ie TS) r gen- are (TSSs), sites start transcription 5 as to the referred commonly near Regions (14). domains eateto hmsr,NwYr nvriy e ok Y10003; NY York, New University, York New Chemistry, of Department ulooepstos fe esrda ulooerepeat nucleosome as measured often positions, Nucleosome to groups acetyl of addition acetylation—the tail Histone 4 po N rpe ihl rudegtcr his- core eight around tightly wrapped DNA of bp ∼147 5 + ncrmtnfiesloigars ag eoi distances genomic large across depends looping fibers organisms chromatin eukaryotic on in regulation evelopmental | (where otc hub contact c e okUiest-atCiaNra nvriyCne o opttoa hmsr,NwYr nvriySaga,202 hnhi China Shanghai, 200122 Shanghai, University York New Chemistry, Computational for Center University Normal China University-East York New .Ouhc ieUiest,Hutn X n prvdDcme 1 08(eevdfrrve etme 5 2018) 25, September review for (received 2018 21, December approved and TX, Houston, University, Rice Onuchic, N. e ´ n | ,1 2 1, 0, = a hoai folding chromatin hitpe .Myers G. Christopher , 0 . . . n fec eeecdn region, gene-encoding each of end ,with 9), 0 m otc probabil- Contact nm. ∼100 | eestructure gene n = r5ms common most 5 or 4 a n aa Schlick Tamar and , | hoai loop chromatin b orn nttt fMteaia cecs e okUiest,NwYr,NY York, New University, York New Sciences, Mathematical of Institute Courant a,b,c,1 1073/pnas.1816424116/-/DCSupplemental ulse nieFbur ,2019. 4, February online Published hsatcecnan uprigifrainoln at online information supporting contains article This 1 4774.y page on Commentary See the under Published Submission.y Direct PNAS a is article This G.D.B. interest.y of research; conflict no paper. declare performed y the authors wrote The G.D.B. T.S. and research; C.G.M., G.D.B., and designed data; analyzed G.D.B. contributions: Author mesoscale a Here, “fold” and straightforward. build not to data is features available model these all structural integrate combining we and but unified organisms (34)], a many ENCODE into for [e.g., available lines of publicly cell function is a position as features progression genome epigenetic cancer regarding and Information (32), (33). differentiation (31), has development also 30). LH (29, diseases (28). and to regulation necessary development some also in of is roles it transcription genes, have active many mice silences maintain in LH studies while that and knockout shown acetylation LH fibers, as (27). chromatin such regions of PTMs gene-encoding regions specific with specific anticorrelated to often localizes immunopre- (ChIP-Seq) LH assays Chromatin that sequencing (26). shows with the DNA combined to (ChIP) linker binds cipitation exiting that or (CTD) domain entering C-terminal small, intrin- and disordered a charged positively sically binds contains long, the that a LH and (GH) of (19), head Each DNA nucleosomal globular shortening (25). a a domain, axis) N-terminal principal (i.e., uncharged its compaction along linear lead- fiber significant (18–24), off DNA exiting to or and on entering ing by concentrations, formed axis various dyad at the dynamically through structure DNA secondary binding fiber affects strongly protein LH the disordered more with associated straight 17). are (16, stiff, linkers globules with longer associated are and bp) short fibers, 162–182 where lengths fibers, NRL chromatin linker (i.e., of DNA linkers structure that the show affect strongly modeling can mesoscale and reconstituted fibers vitro chromatin in of nucleosome-free Studies (15). as NFRs) known or (also regions, nucleosomes of depleted erally owo orsodnesol eadesd mi:[email protected] Email: addressed. be should correspondence whom To n pn h a o ulooerslto oeigof modeling resolution nucleosome for genes. architecture regulated way and epigenetically chromatin the influence promoters opens to and bridges coordinated epigenetic how are that emphasizes factors discover work hub Our We junctions. contact exon/intron occupancy. linker spontaneous nucleosome acetylation, a and tail HOXC binding, histone regulated incorporates histone developmentally that nucleosome-resolution the locus a gene of “fold” chro- and model controlling publicly build mesoscale use in to we Here, data factors understood. available well epigenetic not is of structure matin role precise The Significance hne neieei etrsaepr fterglto of regulation the of part are features epigenetic in Changes positions, nucleosome nonuniform and acetylation tail Besides PNAS NSlicense.y PNAS | ac 2 2019 12, March . y | o.116 vol. www.pnas.org/lookup/suppl/doi:10. | o 11 no. | 4955–4962

BIOPHYSICS AND SEE COMMENTARY COMPUTATIONAL BIOLOGY chromatin model of the 55-kb HOXC gene cluster and sys- after living systems. We use this term as in prior work (35) tematically investigate the effect of LH, acetylation, NFRs, and to distinguish these from fibers that have uniform or heteroge- DNA linker length distributions on chromatin folding through neous nucleosome positions. Life-Like+Ac fibers have the same reference systems that contain different subsets of the gene clus- life-like linker length distribution plus acetylation islands with ter’s epigenetic features. Our model of the HOXC gene locus acetylated histone tails. Finally, Life-Like+LH fibers contain a includes five genes (containing 11 exons), 284 nucleosomes, five life-like linker length distribution with LH placed in intergenic acetylation islands, and 55 kb of DNA. regions. We find that epigenetically modified chromatin (with LH, acetylation, and NFRs) in this context cooperates to com- Contact Probability Matrices Show Increased Contact Densities in pact a gene hub, forming a series of densely packaged hier- HOXC Fibers. In Fig. 2, we show the 2D contact probability archical loops. This increased density is associated with a matrix for the HOXC system compared with Life-Like+Ac and dynamic kilobase-range contact hub that bridges promoters Life-Like reference systems. HOXC contact maps show a sig- within the locus, specifically bringing into contact an LH-rich nificant increase in contact density and coverage, with two dis- and acetylation-rich region, as well as “stripe” regions in the cernible contact domains: LH-rich (top left of contact map) and nucleosome contact maps, indicating sampling around pro- acetylation-rich region (bottom right of contact map). The lat- moter regions. Our results indicate that nucleosome resolution ter region is more dense than the former. In fact, stripes or models that explicitly incorporate epigenetic features derived lines running horizontally or vertically, originating near NFRs, from experimental data can explain and help predict the genetic demarcate these two subdomains (also see SI Appendix, Fig. S1). regulation of various gene loci as a function of epigenetic Two stripes originating at the HOXC8 promoter indicate contact modifications. between this promoter and all other portions of the fiber. This is also evident in a fiber image shown in Fig. 2. The Life-Like+Ac Results (Fig. 2B) and Life-Like (Fig. 2C) reference fibers show homoge- In addition to the HOXC gene model shown in Fig. 1 B and neously distributed contacts where acetylation leads to increased C, which contains life-like linker length distribution with LH, contacts of low intensity, but do not form stripes, nor do they acetylation islands, and NFRs, we define five other reference show the separation of two domains. Life-Like fibers show the systems that contain a subset of the HOXC gene cluster’s fac- least amount of long-range contacts, with a uniformly distributed tors: Uniform, Uniform+NFR, Life-Like, Life-Like+Ac, and contact map. Life-Like+LH (Materials and Methods and SI Appendix). The The epigenetic features also emerge from a close-up of the Uniform fiber has uniform linker lengths without NFRs. The 30- to 55-kb region (acetylation-rich) (SI Appendix, Fig. S2). Uniform+NFR fiber has uniform linker lengths and NFRs. This subregion shows increased contacts bridging HOXC5 and Life-Like fibers have a distribution of linker lengths modeled HOXC6 genes within the acetylated region. Contact matrices for

AB Repeating Unit of the Mesoscale Model HOXC Model Wild Type Tails DNA H3 Exon 9 bp / H4 Acetylated Intron bead Tails HOXC10 Intergene HOXC5 LH H2B Linker Histone GH: 6 beads CTD: 22 beads HOXC6 NFRs HOXC8 H2A Tails NCP HOXC9 5 aa / 300 1 bead beads C Annotated HOXC Model Linker Length (bp) Nucleosome Index

Linker Histone Nucleosome Free Region Acetylation Island

HOXC10 HOXC9 HOXC8 HOXC6 HOXC5

Fig. 1. Mesoscale model and HOXC system setup. (A) The repeating unit of the mesoscale model consists of a rigid nucleosome core with wrapped DNA represented by 300 point charges and beads for linker DNA, LH, and the flexible histone tails. See details in ref. 8. NCP, nucleosome core particle. (B)A rendering of the HOXC gene cluster model demonstrates gene positions and epigenetic elements in a 3D model. Acetylated tails are red, and WT tails are blue. Intron DNA is drawn in dark blue, exon DNA is colored light blue, and intergenic DNA is dark red. (C) DNA linker lengths used in the HOXC model are plotted against the nucleosome index, along with annotated gene locations and epigenetic features, with acetylation islands in red, LH in teal, and NFRs in green.

4956 | www.pnas.org/cgi/doi/10.1073/pnas.1816424116 Bascom et al. Downloaded by guest on September 26, 2021 Downloaded by guest on September 26, 2021 acme al. et Bascom prob- interaction total the analyzed we illustrate motifs, and folding various contacts the kilobase-range Bridges forming that in Hub acetylation and Dynamic a Form Promoters. to those Gene Cooperate of LH and volume Acetylation largest the produce 3). fibers (Fig. studied intra- These seg- mES in and fiber (38). observed those “clutches” stiff cells to produce similar nucleosome interactions bp), forming short-range clutch 173 NFRs, = near NRL ments (where linkers short than signifi- ume not that does with ules compared LH when but volume fibers. overall looping, the LH/NFR affect cantly and DNA regions of rich amount hier- the Life-Like solvent. extensive increase to with NFRs exposed fibers where looping, condensed archical similarly display fibers (8). looping hierarchical extensive (S (R ficients gyration of radii volumes, average life-like LH. of or effects NFRs, the acetylation, can included placement, not loops nucleosome had these we like that (8), space capture experiments showed conformation (3C) in we chromatin stack in While observed loops lead- 37). contacts reproduce where 36, loops, structure (8, hierarchical 3D flaking into rope compact fold a to to tend ing metaphase at Hierarchi- fibers Extensive with Globules Looping. Dense cal Forms System Gene HOXC 2. Fig. diinlrfrnessesaesonin S5 shown and are systems reference additional (C marks. acetylation and NFRs, positions, nucleosome NFRs. life-like and with system Reference (B) system. HOXC A

i.3sosfie edrnso l ytm tde,with studied, systems all of renderings fiber shows 3 Fig. HOXC5 HOXC6 HOXC8 HOXC9 HOXC10 Realistic NRLs, OC0HX9HX8HX6HOXC5 HOXC6 HOXC8 HOXC9 HOXC10 0 1020304050 . ie edrnsadcnatpoaiiymtie for matrices probability contact and renderings Fiber Life-Like+Ac Life-Like w br,wt oeicesdlna opcini LH- in compaction linear increased some with fibers, HOXC ,20 rvosy ehv hw htcmatchromatin compact that shown have we Previously, ). HOXC odmntaetecoeaientr fLH of nature cooperative the demonstrate To HOXC fibers. NFRs br,btocp infiatylre vol- larger significantly a occupy but fibers, kb Life-Like+LH br omsihl oecmatglob- compact more slightly form fibers , br omtems es lblswith globules dense most the form fibers Ac Life-Like , LH Acetylated Tails ; 50Samples Intergenic DNA Uniform br,wihcontain which fibers, g 2 br eaesmlryto similarly behave fibers ,adsdmnaincoef- sedimentation and ), is S4 Figs. Appendix, SI BC and Realistic NRLs, Uniform+NFR 0 1020304050 HOXC Life-Like+Ac Life-Like Wild Type Tails ∼20% n eeec br.Bakbr tpadsds hwtepstoso OCgns (A) genes. HOXC of positions the show sides) and (top bars Black fibers. reference and Exon DNA NFRs kb , Ac ainilnsaefudimdaeyo ihrsd fteLH. the of side either on immediately acety- found inter- where are the genes, islands HOXC6 in is lation and particularly HOXC8 formed between regions, region type LH genic and loop acetylation second-strongest between between loops The interactions forming epigenet- regions. regions, strongest LH-dense Among intergenic the and hub. LH however, the between are elements, within contacts modified of ically majority regions the acetylation acetylation/acetylation make up (WT) and (LH/LH), elements unmodified involving (Ac/LH), regions Interactions acety- (Ac/Ac). regions and dis- LH LH NFRs (NFR/Ac), six and (NFR/LH), 5A regions regions of Fig. LH in position. lation count and observed genomic count NFRs contact of contact between function the normalized a the plotted as shows types we contact maps, tinct contact in to Tendencies Aggregation Fiber. the Distinct Imparts Feature Epigenetic Each 1- the in contacts of amount range. least 20-kb the to show gray) acetylation; or loops. fibers. reference hierarchical the or for with elevated loops compared clearly large are from contacts arise All range 20-kb the in to loop- interactions local 10- EMANIC and from interactions, arise clutch by range near-neighbor 10-kb from and cells to ing arise 5- the living range in contacts 5-kb in Next, near- (36). to observed Several 1- (40). interactions, the data intraclutch in well Micro-C peaks via as nucleosome chromatin (39), neighbor yeast radiation situ in ionizing as with cells inter- interactions human short-range of nucleosomal in probing genome-wide situ and for (36) in capture (EMANIC) interaction evidence observed nucleosome microscopy-assisted Experimental been electron via folding. largest has The morphology fiber 4. zigzag zigzag Fig. of in at position nucleosomes indicative next-neighboring genome to of corresponds function peak a as ability ; 50Samples ofrhrdsigihtentr fcnat observed contacts of nature the distinguish further To Intron DNA Linker Histone eeec ytmwt ielk ulooepositions nucleosome life-like with system Reference ) PNAS 0 1020304050 Realistic NRLs, | ac 2 2019 12, March Life-Like Life-Like HOXC NFRs kb | ; 50Samples o.116 vol. br gencurve) (green fibers br wt oLH no (with fibers | HOXC o 11 no. 5 bp, ∼250 | fibers 10 10 10 10 10 0 4957 1 -4 -3 -2 -1 -5

BIOPHYSICS AND SEE COMMENTARY COMPUTATIONAL BIOLOGY A 375 nm

HOXC Uniform Uniform+NFR Life-Like+AcLife-Like+LH Life-Like

B 4E+06 250 300 (S) ) 200 3

3E+06 (nm) 225 2 g 150 w,20 2E+06 150 100 1E+06 75 Average Average 50 Volume (nm Volume Average S Average 0E+00 R Average 0 0 LL LL LL HOXC LL+AcLL+LH HOXC LL+AcLL+LH HOXC LL+AcLL+LH UniformUn+NFR UniformUn+NFR UniformUn+NFR

Fig. 3. Epigenetic factors control fiber morphology and density. (A) Fiber renderings of HOXC, Uniform, Uniform+NFR, Life-Like+Ac, Life-Like+LH, and 2 Life-Like systems. (B) The volume, Rg, and Sw,20 measurements as averaged across all 50 trajectories.

There are also moderate interactions between acetylation These dynamic connections can also be seen from Fig. 6B, regions and one NFR in the region between HOXC9 and which shows the distance between gene-promoter pairs for HOXC10, where an LH-dense region is immediately surrounded HOXC10/HOXC5, HOXC10/HOXC9, HOXC9/HOXC8, and by NFRs. Finally, acetylation islands form weakly interacting HOXC9/HOXC5 genes as a function of simulation step, with regions near most of the TSSs. Regions with high amounts of three accompanying fiber snapshots. All four gene pairs, which acetylation and nucleosome depletion show some weak Ac/NFR represent both neighboring gene segments and gene pairs on the interactions, particularly along the HOXC6 exon region. NFRs opposite side of the gene neighborhood, transiently form con- interacting with NFRs define the weakest form of epigenetic tacts where their distance is close to 50 nm, but no single pair interaction. forms a contact that persists throughout the simulation.

The Contact Hub Decreases Average Distance Between Promoters. Discussion To characterize the nature of connections between gene pairs, It is well known that epigenetic features lead to distinct chro- we calculated the Cartesian distance between pairs of +1 matin morphologies both in vitro (41) and in vivo (42), sug- nucleosome cores flanking TSSs between genes in the system. gesting that active, euchromatin states may spontaneously segre- Fig. 5B shows the average pairwise distance counts between gate from inactive, heterochromatin states when modified. Our all promoter pairs for HOXC fibers, Life-Like fibers, and Life- mesoscale model allows for nucleosome resolution study of the Like+Ac fibers, with a rendering of the DNA and promoter folding and formation of one such activation hub, providing regions for each system. The average distance between gene unique insight into the mechanics of gene regulation. promoters decreased from ∼150 to ∼50 nm, indicating the increased proximity between these sites and the formation of a Nucleosome Resolution Epigenetic Models Are Now Possible. Previ- contact hub. ously, we have shown that chromatin fibers spanning 10 kb can form hierarchical loops while maintaining zigzag topology, in Hierarchical Looping Allows for Many Dynamic Connections Within good agreement with EMANIC cross-linking data of interphase the Hub. Finally, we show in Fig. 6 that the strongly interacting and metaphase fibers in HeLa cells (36). We also showed that regions of the connection hub within HOXC fibers are dynamic, this effect can accommodate looping at the 80-kb scale found where hierarchical looping leads to the formation of multiple by 3C experiments at developmentally regulated genes such as contacts simultaneously. Fig. 6A shows an example where an the GATA4 gene locus (8). Although sophisticated models have individual folded HOXC gene fiber forms four simultaneous been developed that use epigenetic profiles to predict global contacts, involving intragene, contiguous-gene, and noncontigu- chromatin-folding features (43), computational models that rep- ous gene contacts. The corresponding contact matrix is anno- resent epigenetic factors such as acetylation and LH explicitly are tated by contact locations. Note how this folding motif allows for not available, to our knowledge. Our results indicate that epige- nonexclusive contacts between neighboring pairs, next-neighbor netic factors affect the kilobase-scale chromatin folding through pairs, or nonadjacent pairs of genes without tangling the over- modulating the formation and dynamics of hierarchical loops, all fiber. providing a structural interpretation of the role of epigenetic

4958 | www.pnas.org/cgi/doi/10.1073/pnas.1816424116 Bascom et al. Downloaded by guest on September 26, 2021 Downloaded by guest on September 26, 2021 oesmlaeu utlo otcsi igestructures single al. et in Bascom contacts observed we multiloop While scales. apparent simultaneous time become long only some or HOXC6 features ensembles these and aggregate However, in HOXC8, HOXC10 2). HOXC9, of (Fig. downstream with genes just contacts genes. region other strong NFR-rich all forming an with see contact also into We promoter HOXC8 the to brings 10- experimen- the in observed with contacts those of globules range. coverage 20-kb to dynamic increasing into similar significantly fiber tally, densities the and condense volumes fea- to these serve Together, tails. effectively acetylated tures acetylated less folded neighboring is strongly polyelectrolyte by NFRs DNA screened compact linker attracting long to the by where at acting regions, leading polymer contacts acetylation, total stiffness, transient with acety- the with this cooperates tail structure reduces LH Histone stochastic distances. fiber. linkers more dynamic short a less to such con- a kilobase-range near to decreasing lation leading cost, but a and clutches, at tacts individual sepa- comes stiffening stiffening to length by this serve exam- linker clutches further neighboring Life-like fiber For positions) rate sequence. nucleosome separates scales. (or genomic regions genomic distributions the specific all along at can- at clutches depletion that generalized nucleosome properties be ple, folding easily complex not imparts Hub feature Connection Promoters. genetic Dynamic a Local Form Bridging to Cooperate Features Epigenetic contacts looping Such show genes. our (45) in HOXC6 evident al. and are et HOXC9, Liang HOXC10, by among developed Linker- Bridge (BL-Hi-C), conformation variant, Hi-C Hi-C high-resolution a circular and with (4C), (44), ChIP capture data Additionally, tags morphology. contact paired-end loops contacts. fiber Chromatin 20-kb interactions. on to intraclutch 10- features from for arise account range loops 2-kb hierarchical to and 1- contacts, 10-kb the to in 5- Contacts for fibers. account zigzag clutches in neighboring common between interactions next-neighbor measure kb) (>1 4. Fig. trcinbtenteL-adaeyainrc regions acetylation-rich and LH- the between Attraction 5x10 5x10 5x10 5x10 5x10 nlsso nencesm neato otcsfor contacts interaction internucleosome of Analysis -5 -4 -3 -2 -1 HOXCLife-Like+AcLife-Like HOXC fibers. u eut ugs htec epi- each that suggest results Our i±2 Zigzag >1kb HOXC n eeec ytm.W noaesrcua ek sflos hr-ag contacts Short-range follows. as peaks structural annotate We systems. reference and i±3 hmclmdlcnfl ee epn oitrrtbiological interpret to physical– helping gene, a Here, a in system. fold based living can modeled model the features chemical epigenetic in to that variables important show physical we is pat- basic these from it Hi-C produce mod- terns that implicitly, in these models assumption seen nucleosome-resolution of develop this majority as the incorporate compartments, as els inactive function However, separa- from (47–49). a spontaneous experiments active as the states) involves of explaining gas-like tion structure transcription, to chromatin elevated liquid of for (i.e., model transitions phase attractive current A Conclusion long- extrusion), similarly loop play (and likely roles. factors, complex CTCF transcription as and factors RNAs, such other noncoding course, here, positioned Of modeled precisely genome. not to the probabil- opposed along aggregate elements as of boundary samples, function a dynamic as mechanism across arise a ity only such indicate, and from data dynamic resulting are Our boundaries the chromatin. that unmodified however, sep- nearby spontaneously it chromatin from Thus, modified arates hub. heavily connection that the plausible show with is interact HOXC assays the not of CHiP-Seq does downstream which region results. genes, the our in sparse by is acetylation supported that thus is lators additionally observed expression This gene (46). of hub. eukaryotes nature in the stochastic the while within explain diffusion hubs helps sampling between efficient help connections local and may strong maintaining loading forming maps by facilitates contact proteins, cell the of in the patterns how stripe explain Thus, regions 5A). other of and (Fig. effects promoters cumulative between the forming reflecting contacts transient level, promoters ensemble multiple the bridge on to arises propensity increased the 6A), (Fig. i±4 h yohssta pgntcmrscnata onayinsu- boundary as act can marks epigenetic that hypothesis The i±4,5,6 intra-clutch contacts (1-3kb) PNAS | ac 2 2019 12, March local loops (3-10kb) | o.116 vol. | o 11 no. (10-15kb) h-loops | 4959

BIOPHYSICS AND SEE COMMENTARY COMPUTATIONAL BIOLOGY A Total Interaction Count by Interaction Type 1 LH/LH LH/Ac Ac/Ac LH/NFR Ac/NFR WT 0.8

0.6

0.4

Normalized 0.2 Contact Count 0 01020304050kb

HOXC10 HOXC9 HOXC8 HOXC6 HOXC5

Average TSS Pairwise Distances B Promoter HOXC8 HOXC10 100 HOXC HOXC10 HOXC9 HOXC6 80 Life-Like+Ac HOXC5 HOXC9 HOXC9 Life-like 60 HOXC8 40 HOXC6

Count HOXC8 HOXC5 20 HOXC10 HOXC6 0 0 100 200 300 400 HOXC5 0 100 200 300 400 Distance (nm) HOXC Life-Like+Ac Life-Like

Fig. 5. Formation of dynamic contact hub depends on epigenetic features. (A) Number of contacts observed in HOXC fiber system between LH/LH (teal), LH/Ac (pink), Ac/Ac (red), LH/NFR (blue), Ac/NFR (green), and any interaction that involves unmodified regions (WT; gray) as a function of genomic position. (B, Left) Plot of average pairwise distance count between gene promoters in the HOXC system vs. two reference systems. (B, Right) Renderings of chromatin configurations with promoter contacts (green/yellow highlight).

function of the HOXC gene locus. Specifically, LH, histone Materials and Methods tail acetylation, and nucleosome placement cooperate to con- Mesoscale Model. Our mesoscale chromatin model (16, 51, 52) is com- trol the dynamic folding landscape of chromatin. In particular, posed of four distinct components: a modified worm-like chain polymer that LH and acetylation reverse the trend that each factor to represent linker DNA as beads, partially charged beads represent- imparts alone to promote and target kilobase-range contacts ing the electrostatically charged surface of the nucleosome deter- is both surprising and indicative of the complex orchestration mined by our Discrete Surface Charge Optimization algorithm (53), of many factors on the structural and biophysical levels of the coarse-grained beads for histone tails, and coarse-grained beads for LH residues (see ref. 8 and SI Appendix for full details of the model and genome. parameters). While our HOXC fibers provide a model system for study- ing the structure of genes in mammals, many more structural Building the HOXC Gene Locus. To obtain realistic nucleosome positions, epigenetic features need to be investigated. In particular, the acetylation island positions, and LH distributions, we used various experi- DNA CpG methylation, binding, long non- mental data to model each parameter and produce the model of nucleo- coding RNAs, and CTCF factors missing from our current model some positions, acetylation islands, and LH binding regions shown in Fig. 1 likely play important roles in modulating structure. In addition, B and C. The University of California, Santa Cruz (UCSC) genome browser because epigenetic genomic data based on DNA sequencing was used to annotate gene locations and acetylation peak locations (by generally represent ensembles which are averaged across het- H3K27ac Chip-Seq) (54). NFRs were modeled based on micrococcal nucle- erogenous cell populations, and some nucleosome positions can ase sequencing (MNase-Seq) data of H9 embryonic stem cells from Yazdi et al. (55), downloaded from the Omnibus (accession code have poor resolution depending on the methodology and ensem- GSM1194221) (56) and visualized by using the Integrated Genome Browser ble averaging involved (50), more work on proper interpretation (57). Using these data, we identified the locations of 17 NFRs, which we of such data for incorporation into structural models is needed. classified as being either “long” or “short.” The former group has 4 NFRs Both improvements in modeling and experiments and in their with 162-bp linkers, and the latter has 13 NFRs with 108-bp linkers. To joint interpretation will be important for moving toward higher- model realistic nucleosome positions, we used the distribution of linker resolution views of genes and genome architecture. Nonetheless, lengths from chemical mapping in mES cells (14), where we placed nucle- our current mesoscale approach represents a link between struc- osomes such that the overall distribution observed by chemical mapping tural experiments and the functions of genes in living tissues, matched that of our model (Fig. 1C). LHs were placed to emulate trends while retaining predictive power of the underlying models. As seen in mES cells (27), where they are found in intergenic and nonacetylated regions only. we have shown here, incorporating publicly available epigenetic profiles into in silico structural models makes it feasible now to Simulation Parameters and Data Analysis. In addition to the HOXC gene fold into 3D and investigate the structure and function of var- model shown in Fig. 1 B and C, we defined five other reference systems ious genes in model organisms or cell lines and, ultimately, to of ∼55 kb in length that we called Uniform, Uniform+NFR, Life-Like, Life- obtain a precise description of the dynamic 3D topology of the Like+LH, and Life-Like+Ac (SI Appendix). These systems were designed to genomic code. test for the effect of each parameter in comparison with the real system.

4960 | www.pnas.org/cgi/doi/10.1073/pnas.1816424116 Bascom et al. Downloaded by guest on September 26, 2021 Downloaded by guest on September 26, 2021 ht ept Hpstoigi h ydai,teaymti tutr of structure asymmetric the previously axis, showed dyad We nm. the 2.6 in of placed positioning distance and LH a fixed despite by fold rigidly separated that, through was axis al. dyad LH et the (59). Bharath on by modeling predicted molecular structure and LH recognition H1d in rat all details the See for on simulations. based given all are for LH used inter- and was in acetylation, in placed NFRs, systems LH of reference with positions distribution placed The length regions. peaks linker genic (58). life-like acetylation browser a genome with UCSC contain the distribution fibers from taken length data to linker (14). according NFRs life-like with a mapping have chemical by positions osome Appendix (SI above. lengths as 108-bp linker The placed of length. percentages were the linker NFRs and 612-bp 162-bp above, as vs. or placed 108- were NFRs a these as of positions either modeled NFRs, by interrupted trajec- single MC, the a whereas promoters. in between length connections simulation transient The of of function formation a the as demonstrate yellow, plotted and are green pairs in gene highlighted four promoters Carlo. of Monte the regions with promoter snapshots, between Fiber distances tory. Cartesian (B) matrix. probability HOXC 6. Fig. acme al. et Bascom .Bso D abnas Y clc 21)Mssaemdln eel hierar- reveals modeling Mesoscale (2016) T Schlick KY, Sanbonmatsu GD, nucleosome. Bascom the of 8. tails histone The (1998) TJ Richmond K, mediated Luger Solvent (2002) 7. TJ Richmond AW, fundamental Maeder K, nucleosome, Luger the DF, of Sargent CA, years Davey Twenty-five 6. (1999) chromatin. Y of Lorch particles core RD, nucleosome Kornberg of of 5. Structure regulation (1977) the al. into et insights JT, New Finch function: 4. Enhancer (2011) VG tran- Corces of C-T, signatures Ong chromatin 3. predictive and Distinct (2007) al. Long- et (2007) ND, DR Heintzman Higgs 2. WG, Wood JA, Sloane-Stanley M, Gobbi De D, Vernimmen 1. A hclloigo hoai br ergn euaoyelements. regulatory gene near fibers 120:8642–8653. chromatin of looping chical 8:140–146. 1.9 at particle core nucleosome Biol the of structure the in interactions chromosome. eukaryote the of particle 269:29–36. expression. genome. gene tissue-specific human the in enhancers 318. and promoters scriptional expression. poised between gene transition active the and of timing the regulate interactions chromosomal range HOXC5 HOXC6 HOXC8 HOXC9 HOXC10 Uniform HOXC10 0 eecutrfie ssonwt kthdmntaigtefie odn n h ee n brfligptenadtecrepnigcontact corresponding the and pattern fiber-folding and genes the and folding fiber the demonstrating sketch a with shown is fiber cluster gene 319:1097–1113. irrhclloigacmoae orsmlaeu op na niiulfie.Oefolded One fiber. individual an in loops simultaneous four accommodates looping Hierarchical (A) hub. the within form contacts dynamic Multiple 4 1 Multiple GeneContactsinaSingleFiber Uniform+NFR brsse a nfr ikrlnt f5 pwtotNFRs, without bp 53 of length linker uniform a has system fiber 3 10 1 2 HOXC9 IAppendix SI 20 oee fe iigmmaingnm nucle- genome mammalian living after modeled ) MOJ EMBO brsse a nfr ikrlnt f5 bp 53 of length linker uniform a has system fiber a e Gen Rev Nat kb HOXC8 26:2041–2051. nielzdzga trigstructure starting zigzag idealized An . 30 Cell Life-Like 98:285–294. 12:283–293. 2 IAppendix SI 40 HOXC6 br aeadsrbto of distribution a have fibers 3 u Hmdlwas model LH Our . Life-Like+Ac 50 4 HOXC5 urOi e Dev Gen Opin Curr resolution. A ˚ a Gen Nat Life-Like+LH hsCe B Chem Phys J 39:311– Nature fibers 10 10 10 10 10 Mol J 0 1 -2 -1 -4 -3 -5 B 5 uB hri ,Sgi D rs R(00 ulooedpee ein ncell- in regions Nucleosome-depleted Peri (2010) 16. FR Cross ED, Siggia G, embryonic mouse Charvin in B, organization Lu nucleosome into 15. Insights and conserved (2016) of al. prediction et Genome-wide LN, (2007) Voong K 14. Zhao CM, Farrell G, Wei accessi- T, chromatin regulates Roh acetylation Histone 13. (2017) J Langowski J, Erler R, Zhang 12. Nordenski CF, Liu N, Korolev R, Yang Q, modifica- Chen epigenetic 11. by unfolding Chromatin (2015) al. et R, Collepardo-Guevara 10. ihpromnecmuigcutrPic n h rvt lse Schulten. R01- University cluster York private Awards the New and Phillip-Morris the Prince Sciences cluster on and computing performed Medical high-performance USA was Phillip-Morris General Computing (T.S.). and of International R35-GM122562; Institute and National GM055264 Health, of gyration ACKNOWLEDGMENTS. of radius volume, fiber calculating of in shown (R description Volume is detailed surface map. sample a enclosing contact A For surface fiber. final the bounding the of convex elements for a all from summed calculated tra- and were each system measurements across each normalized for 50 were with jectory maps simulated Contact were systems trajectories. other independent the and trajectories, independent 10 Life-Like and Uniform+NFR, Uniform, The chromatosome asymmetric an to leads (60). interaction organization nucleosome and GH the

Distance100 (nm)150 200 .PtynD,PpinG 21)Eeg adcp nlsso iodrdhistone disordered of analyses landscape Energy (2011) GA Papoian DA, Potoyan 9. 50 g 2 Pairwise DistancesBetweenSelectedGenePromoters l ytm eesmltdfr6 ilo ot al tp rmore. or steps Carlo Monte million 60 for simulated were systems All yl-euae rmtr nuerlal eeepeso neeycl cycle. cell every in expression gene Cell reliable ensure promoters cycle-regulated mapping. chemical through cells stem patterns. acetylation histone by enhancers nonconserved interaction. inter-nucleosome in H4K16 acidic of Role tail–H2A bility: N-terminal H4 histone the by interaction. compaction patch chromatin and stacking multiscale A study. interactions: computational internucleosome of impairment dramatic by explained tions dynamics. conformational their of 133:7405–7415. organization special reveal tails tutr safnto fDAlne length. linker DNA of function a as structure ,adsdmnaincefiin (S coefficient sedimentation and ), 1 si ˇ 18:544–555. 22 ,Clead-uvr ,ShikT(00 oeigsuiso hoai fiber chromatin of studies Modeling (2010) T Schlick R, Collepardo-Guevara O, c ´ 1 HOXC9/ HOXC8 HOXC10/HOXC5 HOXC10/HOXC9 HOXC9/HOXC8 HOXC9/HOXC5 HOXC10/ HOXC9 o Biol Mol J mCe Soc Chem Am J PNAS 23 2 hswr a upre yNtoa Institutes National by supported was work This HOXC10/HOXC5 HOXC10/HOXC9 HOXC9/HOXC8 HOXC9/HOXC10 14:2075–2096. | ac 2 2019 12, March MC Step(M) 137:10205–10215. Cell w 24 ,20 67:1555–1570.e15. ,see ), o Biol Mol J Hsseswr iuae with simulated were systems +LH l 21)Rglto fnucleosome of Regulation (2017) L old ¨ HOXC10/ 2 HOXC5 IAppendix SI | 3 ipy J Biophys o.116 vol. 403:777–802. eoeRes Genome 25 i.S3. Fig. Appendix, SI 112:450–459. | . o 11 no. 3 mCe Soc Chem Am J 17:74–81. Promoter 26 | 4961 Dev

BIOPHYSICS AND SEE COMMENTARY COMPUTATIONAL BIOLOGY 17. Correll SJ, Schubert MH, Grigoryev SA (2012) Short nucleosome repeats impose 39. Risca VI, Denny SK, Straight AF, Greenleaf WJ (2017) Variable chromatin structure rotational modulations on chromatin fibre folding. EMBO J 31:2416–2426. revealed by in situ spatially correlated DNA cleavage mapping. Nature 541:237– 18. Thoma F, Koller T, Klug A (1979) Involvement of histone H1 in the organization of 241. the nucleosome and of the salt-dependent superstructures of chromatin. J Cell Biol 40. Hsieh TH, et al. (2015) Mapping nucleosome resolution chromosome folding in yeast 83:403–427. by Micro-C. Cell 162:108–119. 19. Zhou Y-B, Gerchman SE, Ramakrishnan V, Travers A, Muyldermans S (1998) Position 41. Li G, Reinberg D (2011) Chromatin higher-order structures and gene regulation. Curr and orientation of the globular domain of linker histone H5 on the nucleosome. Opin Genet Dev 21:175–186. Nature 395:402–405. 42. Boettiger AN, et al. (2016) Super-resolution imaging reveals distinct chromatin 20. Thomas JO (1999) Histone H1: Location and role. Curr Opin Cell Biol 11:312–317. folding for different epigenetic states. Nature 529:418–422. 21. Fan Y, et al. (2005) Histone H1 depletion in mammals alters global chromatin struc- 43. Di Pierro M, Cheng RR, Lieberman Aiden E, Wolynes PG, Onuchic JN (2017) De novo ture but causes specific changes in gene regulation. Cell 123:1199–1212. prediction of human chromosome structures: Epigenetic marking patterns encode 22. Woodcock CL, Skoultchi AI, Fan Y (2006) Role of linker histone in chromatin struc- genome architecture. Proc Natl Acad Sci USA 114:12126–12131. ture and function: H1 stoichiometry and nucleosome repeat length. Chromosome 44. Ernst J, Kellis M (2012) ChromHMM: Automating chromatin-state discovery and Res 14:17–25. characterization. Nat Methods 9:215–216. 23. Fyodorov DV, Zhou B-R, Skoultchi AI, Bai Y (2018) Emerging roles of linker histones 45. Liang Z, et al. (2017) BL-Hi-C is an efficient and sensitive approach for capturing in regulating chromatin structure and function. Nat Rev Mol Cell Biol 19:192–206. structural and regulatory chromatin interactions. Nat Commun 8:1622. 24. Ali Ozt¨ urk¨ M, Cojocaru V, Wade RC (2018) Toward an ensemble view of chromato- 46. Blake WJ, Kærn M, Cantor CR, Collins JJ (2003) Noise in eukaryotic gene expression. some structure: A paradigm shift from one to many. Structure 26:1050–1057. Nature 422:633–637. 25. Bednar J, et al. (1998) Nucleosomes, linker DNA, and linker histone form a unique 47. Barbieri M, et al. (2012) Complexity of chromatin folding is captured by the strings structural motif that directs the higher-order folding and compaction of chromatin. and binders switch model. Proc Natl Acad Sci USA 109:16173–16178. Proc Natl Acad Sci USA 95:14173–14178. 48. Jost D, Pascal C, Cavalli G, Vaillant C (2014) Modeling epigenome folding: Forma- 26. Caterino TL, Hayes JJ (2010) Structure of the H1 C-terminal domain and function in tion and dynamics of topologically associated chromatin domains. Nucleic Acids Res chromatin condensation. Biochem Cell Biol 89:35–44. 42:9553–9561. 27. Cao K, et al. (2013) High-resolution mapping of H1 linker histone variants in embry- 49. Di Pierro M, Zhang B, Aiden EL, Wolynes PG, Onuchic JN (2016) Transferable model onic stem cells. PLoS Genet 9:e1003417. for chromosome architecture. Proc Natl Acad Sci USA 113:12126–12131. 28. Geeven G, et al. (2015) Local compartment changes and regulatory landscape alter- 50. Schopflin¨ R, et al. (2013) Modeling nucleosome position distributions from experi- ations in histone H1-depleted cells. Genome Biol 16:289. mental nucleosome positioning maps. Bioinformatics 29:2380–2386. 29. Hu J, et al. (2018) Dynamic placement of the linker histone H1 associated with 51. Collepardo-Guevara R, Schlick T (2014) Chromatin fiber polymorphism triggered by nucleosome arrangement and gene transcription in early Drosophila embryonic variations of DNA linker lengths. Proc Natl Acad Sci USA 111:8061–8066. development. Cell Death Dis 9:765. 52. Arya G, Schlick T (2006) Role of histone tails in chromatin folding revealed by a 30. Popova EY, et al. (2013) Developmentally regulated linker histone H1c promotes het- mesoscopic oligonucleosome model. Proc Natl Acad Sci USA 103:16236–16241. erochromatin condensation and mediates structural integrity of rod photoreceptors 53. Beard DA, Schlick T (2001) Modeling salt-mediated electrostatics of macromolecules: in mouse retina. J Biol Chem 288:17895–17907. The discrete surface charge optimization algorithm and its application to the nucle- 31. Noordermeer D, et al. (2011) The dynamic architecture of clusters. Science osome. Biopolymers 58:106–115. 334:222–225. 54. Karolchik D, et al. (2003) The UCSC genome browser database. Nucleic Acids Res 32. Bonev B, et al. (2017) Multiscale 3D genome rewiring during mouse neural develop- 31:51–54. ment. Cell 171:557–572.e24. 55. Yazdi PG, et al. (2015) Increasing nucleosome occupancy is correlated with an 33. Salzberg AC, et al. (2017) Genome-wide mapping of histone H3K9me2 in acute increasing mutation rate so long as DNA repair machinery is intact. PLoS One myeloid leukemia reveals large chromosomal domains associated with massive gene 10:e0136574. silencing and sites of genome instability. PLoS One 12:e0173723. 56. Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI gene 34. ENCODE Project Consortium (2004) The ENCODE (ENCyclopedia of DNA elements) expression and hybridization array data repository. Nucleic Acids Res 30:207– project. Science 306:636–640. 210. 35. Bascom GD, Kim T, Schlick T (2017) Kilobase pair chromatin fiber contacts promoted 57. Nicol JW, Helt GA, Blanchard SG, Jr, Raja A, Loraine AE (2009) The Integrated Genome by living-system-like DNA linker length distributions and nucleosome depletion. J Browser: Free software for distribution and exploration of genome-scale datasets. Phys Chem B 121:3882–3894. Bioinformatics 25:2730–2731. 36. Grigoryev SA, et al. (2016) Hierarchical looping of zigzag nucleosome chains in 58. Kent WJ, et al. (2002) The browser at UCSC. Genome Res 12:996– metaphase . Proc Natl Acad Sci USA 113:1238–1243. 1006. 37. Gavin B, Schlick T (2017) Linking chromatin fibers to gene folding by hierarchical 59. Bharath MMS, Chandra NR, Rao MRS (2003) Molecular modeling of the chromato- looping. Biophys J 112:434–445. some particle. Nucleic Acids Res 31:4264–4274. 38. Ricci MA, Manzo C, Garc´ıa-Parajo MF, Lakadamyali M, Cosma MP (2015) Chromatin 60. Luque A, Collepardo-Guevara R, Grigoryev S, Schlick T (2014) Dynamic condensation fibers are formed by heterogeneous groups of nucleosomes invivo. Cell 160:1145– of linker histone C-terminal domain regulates chromatin structure. Nucleic Acids Res 1158. 42:7553–7560.

4962 | www.pnas.org/cgi/doi/10.1073/pnas.1816424116 Bascom et al. Downloaded by guest on September 26, 2021