Braiding topology and the energy landscape of organization

Dana Krepela,1, Aram Davtyana, Nicholas P. Schafera, Peter G. Wolynesa,b,c,d, and José N. Onuchica,b,c,d,1

aCenter for Theoretical Biological Physics, Rice University, Houston, TX 77005; bDepartment of Chemistry, Rice University, Houston, TX 77005; cDepartment of Physics and Astronomy, Rice University, Houston, TX 77005; and dDepartment of Biosciences, Rice University, Houston, TX 77005

Contributed by José N. Onuchic, November 22, 2019 (sent for review October 10, 2019; reviewed by Ronald M. Levy and Jie Liang) Assemblies of structural maintenance of (SMC) proteins (17). Mutations in the coiled-coil region that should break the and kleisin subunits are essential to chromosome organization and coil structure are associated with various human diseases (11). segregation across all kingdoms of life. While structural data exist for The precise mechanisms of loading, entrapment, and release parts of the SMC−kleisin complexes, complete structures of the entire of DNA from SMC−kleisin complexes, as well as the role of the complexes have yet to be determined, making mechanistic studies ATPase regions located in the head domains, are not yet known, difficult. Using an integrative approach that combines crystallographic owing to the paucity of crystallographic information on the intact structural information about the globular subdomains, along with complexes. While the structures of large parts of SMC complexes coevolutionary information and an energy landscape optimized force have been determined, the coiled coils, notably, have not yet field (AWSEM), we predict atomic-scale structures for several tripartite been fully characterized crystallographically. Even less is known SMC−kleisin complexes, including prokaryotic , eukaryotic structurally about the ways in which the complexes in- , and eukaryotic condensin. The molecular dynamics simula- teract with DNA. In this work, to begin to address these issues, tions of the SMC−kleisin protein complexes suggest that these com- we develop detailed models of the eukaryotic forms of cohesin plexes exist as a broad conformational ensemble that is made up of (euk-cohesin) and condensin (euk-condensin) as well as the different topological isomers. The simulations suggest a critical role prokaryotic condensin (prok-condensin) complexes. To do this, for the SMC coiled-coil regions, where the coils intertwine with we start by first constructing a global statistical model of the various linking numbers. The twist and writhe of these braided coils sequence data of the families of the complexes by employing a are coupled with the motion of the SMC head domains, suggesting recently developed global statistical inference method called that the complexes may function as topological motors. Opening, direct coupling analysis (DCA) (18). DCA gives structural in- closing, and translation along the DNA of the SMC−kleisin protein formation once we realize that the residue pairs that evolve in a complexes would allow these motors to couple to the topology of correlated way, as revealed by multiple sequence alignments, are DNA when DNA is entwined with the braided coils. likely to be spatially proximate in at least one structure of the conformational ensemble. DCA has already been used to obtain SMC−kleisin complexes | chromosome organization | protein topology | predictions of bacterial SMC proteinstructures(19,20),butco- DNA topology evolutionary information about the coiled coils is somewhat sparse. To understand the coiled-coil structural dynamics, we therefore also he molecular mechanisms that establish chromosome struc- employed an optimized transferable coarse-grained protein force Tture remain mysterious. One prominent hypothesis suggests that loop-extrusion motors dynamically bind and translocate seg- Significance ments of chromosomes to form structural loops. Critical players in these loop-extrusion complexes include proteins of the structural − maintenance of chromosomes (SMC) family of ATPases. These SMC kleisin protein complexes play an important role in estab- proteins form the core of an evolutionarily conserved complex that lishing the 3-dimensional organization of DNA in cells. Leveraging regulates genome dynamics and stability in all kingdoms of life (1– coevolutionary sequence information and molecular simulations 3). Each complex has 2 members of the SMC family of proteins with an energy landscape optimized force field, we developed along with a single member of the kleisin family that associate into complete structural models of several kleisin complexes and a tripartite structure that can bind and entrap chromosomes (4–7). studied their motions. We find that the coiled-coil regions of The SMC proteins contain rather long coiled coils that appear to the complexes braid together, and this braiding is coupled to the movements of the head domains, which are ATPase mo- close back on themselves, meeting at a central globular domain tors. By comparison with cross-linking experiments, these sim- called the hinge. The N-terminal and C-terminal regions fold to- ulations suggest that the complexes exist in an ensemble of gether to form a second globular domain called the head. Both the topologically distinct isomers with different patterns of braiding hinge and the head domains have long been understood to play of the coils. Together, these observations raise the question of essential roles in the mechanisms of action of the SMC complexes whether, in their function, SMC−kleisin complexes couple to the (8–10). Between the head and the hinge domains, there are long topology of DNA. α-helices that organize into coiled coils. The coiled-coil regions of

SMC complexes have generally been considered to be inert linkers Author contributions: D.K., A.D., P.G.W., and J.N.O. designed research; D.K., A.D., and between the functional head and hinge domains, but this view is N.P.S. performed research; D.K., A.D., and N.P.S. analyzed data; and D.K., A.D., N.P.S., changing. The coiled coils are now seen as central players of SMC P.G.W., and J.N.O. wrote the paper. activity (11–17). Cross-linking/mass spectroscopy analyses of Reviewers: R.M.L., Temple University; and J.L., University of Illinois at Chicago. both prokaryotic and eukaryotic condensin (euk-condensin) sug- The authors declare no competing interest. gest that, rather than forming only a single rod structure, Published under the PNAS license. multiple rather stable contacts are formed between the coiled See online for related content such as Commentaries. coils of the 2 SMC monomers that make up the functional dimer 1To whom correspondence may be addressed. Email: [email protected] or jonuchic@ (14–16). An analysis of cohesin revealed that association and rice.edu. dissociation of the extended SMC1 and SMC3 coiled coils can be This article contains supporting information online at https://www.pnas.org/lookup/suppl/ modulated by acetylation of lysines in the coiled coils (13). doi:10.1073/pnas.1917750117/-/DCSupplemental. Breaking the periodicity of the coiled coils leads to loss of function First published December 30, 2019.

1468–1477 | PNAS | January 21, 2020 | vol. 117 | no. 3 www.pnas.org/cgi/doi/10.1073/pnas.1917750117 Downloaded by guest on September 26, 2021 field, the associative memory, water-mediated, structure and energy coil regions of the pair of SMC monomers. We define a contact model (AWSEM) (21). AWSEM is a solvent-averaged free en- as being formed when a residue of the coiled coil from the first ergy function that represents proteins using only 3 beads per SMC monomer is no farther than 11 Å from a residue of the amino acid (corresponding to the Cα, Cβ, and O atoms). The coiled coil of the second SMC monomer. We then can define a efficiency of AWSEM’s encoding makes it possible to perform provisional order parameter, Q, for forming a completely braided molecular dynamics (MD) simulations on entire SMC−kleisin structure. Q is the fraction of the contacts found in the completely complexes (Fig. 1). The force field has been learned from a braided structure formed in a given conformation. Q equals 1 for database of known protein monomer structures and protein the fully associated braided coiled coil, while Q equals 0 for complex structures using an optimized machine-learning scheme completely separated coiled coils. Starting from an initial config- based on energy landscape theory (21−23). uration that has an “Oshape” (Fig. 2A, cluster 1), annealing with The AWSEM simulations predict that the coiled coils of the 2 AWSEM-MD results in an ensemble of “Y-shape” conformations, SMC monomers braid together. Taken together with cross-linking in which the braided coiled-coil regions are associated along only experiments (14, 15), these simulations suggest that the complexes parts of the length of these regions (Fig. 2A, clusters 2 to 5). The exist as an ensemble of possible conformations, with different Y-shape conformations, upon further twisting, can convert into “I- conformations having different topologies of braiding of the coils. shape” conformations, in which the braided coiled-coil regions are For the prok-condensin, we validate the structures obtained from associated along the entire length of the coiled coils (Fig. 2A, the AWSEM-MD simulations by comparing their local confor- cluster 6). How the intercoil contacts form in the SMC−SMC mations of the coiled-coil regions with those inferred in a recently braided coiled coil during the course of the simulation is shown in published reconstruction of the SMC rod model (14). For euk- Fig. 2B. A discussion of the structural comparison of the annealed cohesin, the simulations also show that the braiding of the SMC− structures with existing experimental crystal structures can be SMC coiled coils leads to an ensemble of fluctuating topologies. found in SI Appendix, Figs. S5, S6, and S8 and Table S2. The changes of topology are correlated with the rotation of the SMC head ATPase domains. In this way, the SMC protein com- Establishing the Initial Structures of euk-Cohesin and euk-Condensin plex may couple to the braiding topology of DNA if the coils Protein Complexes. In establishing the initial structures for the become entwined with the DNA. kleisin subunits of these assemblies, crystal structures of several fragments of the proteins that had already been obtained Results experimentally were used as input. These include the Rad21 Simulations with an Energy Landscape-Optimized Coarse-Grained Ndomain (yeast, [PDB] ID code 4UX3) (24), the

Model Predict a Braided Structural Ensemble for prok-Condensin. Rad21 Cdomain (yeast, PDB ID code 1W1W) (25), Rad21 in- BIOPHYSICS AND COMPUTATIONAL BIOLOGY Starting from a structure for prok-condensin (20) that was teraction domains [yeast and human, PDB ID codes 5F0O (26) previously obtained using the DCA methodology, we annealed and 4PK7 (27), respectively], and the Brn1 interaction domain the structure using AWSEM-MD simulations and checked the (yeast, PDB ID code 5OQQ) (28). Multiple sequence alignments extent to which the resulting structures agree with newly ac- for each kleisin subunit allow us to infer probable contacts using quired structural information. In Fig. 2A,wepresentthe DCA for the kleisin subunits of euk-cohesin and euk-condensin. annealed structures of the prok-condensin complex. As can be A comparison of the contacts predicted by DCA with those seen, AWSEM-MD simulations of the SMC−kleisin protein found in the crystal structures of these domains is shown in SI complexes result in many close interactions between the coiled- Appendix, Fig. S1. The agreement between the 2 sets of contacts

Fig. 1. Schematic of the integrative computational approach. Initial structures of SMC−kleisin protein complexes were constructed using, as input, 5 ex- perimentally determined structures of parts of the molecule: a subdomain of the kleisin subunit (magenta), the head regions (PDB ID codes 5XEI for prok- condensin and 4UX3 and 1W1W for both euk-cohesin and euk-condensin), 2 coiled-coil segments folded on themselves (PDB ID codes 5NNV and 5XG2), and the central region called the hinge (PDB ID codes 1GXL, 2W5D, and 4RSI for prok-condensin, euk-cohesin, and euk-condensin, respectively). Using multiple sequence alignments for the SMC and kleisin proteins, DCA was used to infer both interdomain and intradomain contacts. We then used a structure-based model with these predicted coevolving contacts as input to produce the initial complete structures. We then performed MD simulations of the entire SMC− kleisin complexes using AWSEM, which is a transferable coarse-grained model wherein proteins are represented with 3 beads per amino acid (corresponding to Cα, Cβ, and O atoms). The coiled coils braid together as the structures are annealed using AWSEM-MD.

Krepel et al. PNAS | January 21, 2020 | vol. 117 | no. 3 | 1469 Downloaded by guest on September 26, 2021 Fig. 2. The prok-condensin MD simulations result in an ensemble of braided coiled-coil structures. (A) The structural ensemble of the prok-condensin complex. SMC monomers are shown in gray; kleisin subunits are shown in magenta. (B) Corresponding AWSEM-MD contact maps shown in gray. Clusters are ordered starting from smaller Q values (where Q is the fraction of braided coiled-coil contacts present in a cluster out of the total number of fully braided coiled-coil contacts, from hinge domain to head domains) to higher Q values: cluster 1, in which structures with smaller values of Q (Q < 0.1) exhibit an O- shape ring structure, to cluster 6, in which structures with higher values of Q (Q > 0.6) exhibit an I-shape braided structure. (C) The AWSEM-MD simulation contact map using a contact threshold of 11 Å of the SMC−SMC coiled-coil region. The color indicates the frequency at which contacts were formed across the entire ensemble of annealed structures. Frequent contacts are shown in dark red, and infrequent contacts are shown in light red.

is good. Comparison between the DCA-derived residue contacts The cohesin complexes belong to an intermediate structural (shown in black) and the contacts that resulted from the class where members do possess well-defined local structure AWSEM-MD structure prediction simulations (shown in color) but, at the same time, allow for substantial flexibility in their global for both the SMC1−Rad21−SMC3 and the SMC2−Brn1−SMC4 geometry. This flexibility may be linked to protein function, making tripartite complexes can be found in SI Appendix, Fig. S2. For it essential to characterize the protein geometry and catalog its both euk-cohesin and euk-condensin, the DCA calculations in- changes through possible functional states. SMC complexes consist dicated that there should be contacts between the coiled-coil re- of globular domains (head and hinge domains) which are con- gions of the 2 SMC monomers. Although inter-SMC coevolution nected by coiled-coil regions, which, while being somewhat flexible, inferred contacts are predicted to exist along the entire coiled-coil have considerable bending rigidity. To characterize the geometry of regions, most of the strongly coevolving contacts appear near the such systems, we use the mathematical apparatus used for another hinge and at the bottom part of the coiled coil, near the kink of the biopolymer—DNA (33, 34). It has long been known that the way SMC monomers (SI Appendix,Fig.S2, shown as red beads that are the 2 strands of DNA twist and writhe about each other is im- connected by dashed blue lines). We call the coiled-coil in- portant in expression and heredity. The geometry of coiling terruption in the SMC the “kink” region, as it connects 2 straight strands can be altered by changing the linking number or applying parts of the SMC coiled coils. Crystal structures of both prok- external torques to modify how the strands twist about each other. condensin head domain and the yeast cohesin SMC3 head do- While the SMC dimer by itself is not covalently closed, once main (PDB ID codes 1XEX and 4UX3, respectively) show similar the head and the hinge domains bind together, the ring does structural organizations (15, 24, 29, 30). The evolutionarily con- become closed, with individual coils now resembling the DNA served nature of these interactions between the 2 SMC monomers strands that, in circular bacterial genomes, twist around each suggests that these interactions serve some functional role. Further other (even though they do this in a significantly less regular information on the construction of euk-cohesin and euk-condensin manner than does DNA, which has more-uniform elastic prop- can be found in SI Appendix,Figs.S3andS4. erties). To characterize the SMC geometry, we calculated 2 kinds of quantities. First, we calculated the twist angle Θ registered at Variety of Braided Coiled-Coil Topologies for the euk-Cohesin Tripartite various residues along the braided coiled-coil region. This angle Complex. We now carry out the DCA methodology for the is calculated by taking helical fragments from each chain which eukaryotic complexes. Starting from the initial structures for have approximately matching registers and calculating the the euk-cohesin, we carried out MD simulations for 12 ns twisting angle of the corresponding fragments around each other using the AWSEM-MD force field to equilibrate the structures using a third-order spline fit to each protein backbone path (Fig. and more broadly explore the landscapes. These simulations yielded 3A, orange box). We also calculated the twist number Tw for each an ensemble of euk-cohesin protein complex structures of varying of the SMC monomer coiled-coil regions. This number describes geometries. We then characterized the braiding geometry and the the local rotation of any pair of coiled coils around their central topology of the structures in this ensemble. Many proteins have axis (Fig. 3A, blue box). For a given region, changing Tw by 2π well-defined 3-dimensional (3D) structures that are tightly linked corresponds to a single twist around this central axis. Twist is an to their functional states. Nevertheless, in recent years, more and additive quantity that can be independently calculated for each more attention has been drawn to proteins that are partially or segment of a coiled-coil pair and summed to get the total twist intrinsically disordered. Characterizing such proteins and linking number. In addition, the coupled coils distort. This distortion is their structural ensembles to their (often multiple) functions present characterized by the writhe number Wr. The writhe number of considerable challenges that are not fully resolved (31, 32). the braided conformation of the centerline of each SMC monomer

1470 | www.pnas.org/cgi/doi/10.1073/pnas.1917750117 Krepel et al. Downloaded by guest on September 26, 2021 BIOPHYSICS AND COMPUTATIONAL BIOLOGY

Fig. 3. Braided coiled-coil structure exhibits various topologies. (A) Schematic representation of calculations performed in order to characterize the euk- cohesin trimer complex. Coiled coils of SMC1 and SMC3 are represented in light and dark gray, respectively. Partitioning of coiled coil into segments is shown in red and blue dots along the coiled coil. Θ, Tw,andWr definitions are shown in orange, red, and blue boxes, respectively. (C) Topology-related variables of the euk- cohesin protein complex as a function of time. Twist number of each SMC monomer, total twist number, writhe number, and linking number are shown in gray, blue, red, and purple, respectively. The total linking number exhibits fluctuations ranging between 10π and 12π.(B and D) Topology variables of the euk-cohesin protein complex as a function of both residue number along the coiled coil (y axis) and time (x axis): (B) Twist angle of SMC3. (D) The total writhe number, Wr,ofthe braided conformation of the centerline of each SMC monomer. Higher and lower values of topology variables are shown using a color gradient from red to blue.

describes the nonlocal torsion of the system (Fig. 3A,redbox).The AWSEM-MD results in the SMC coiled coils becoming braided writhe is equal to the number of crossings that a closed curve would together as shown in Fig. 3A, which displays the twist and writhe have with itself when averaged for all possible directions of view (or of all of the pairs of the 4 coils of SMC1 and SMC3. Each of the corresponding 2D projections). To calculate both the twist Tw and 4 coils has ∼350 residues. We refer to the first residue as the one the writhe Wr for the SMC dimer, we fitted each helical strand with a which is the nearest to the hinge domain, and to the last residue spline of the third order to find its local axis. The central axis for each as the residue that is nearest to the head domain. We shall refer coiled coil was constructed by taking the arithmetic mean of the to the coiled coils of SMC1 as coils 1 and 2 and refer to those of coordinates of the 2 helical strands at corresponding contacting lo- SMC3 as coils 3 and 4. For this system, we thus can calculate a cations. The linking number, which can be obtained as the sum of the L = T + W Materials and Methods total of 6 topological variables as a function of time. These twist and writhe numbers k w r (see variables include the 2 twist values, one for coils 1 and 2, which for definitions and details on the topology calculations), should be belong to SMC1, and the other for coils 3 and 4, which belong to conserved if the closed tripartite complex does not break in a to- the SMC3 monomer (Fig. 3C). We also calculate the values of pological sense. For a circular and strictly closed system, Lk is always conserved, but, of course, our approximation to it can fluctuate. the writhe for 4 intermonomer pairs, including pairs (1 and 3), (1 Some of the fluctuations can be attributed to numerical errors due to and 4), (2 and 3), and (2 and 4), as well as values for the indi- curve fitting, truncation of the SMC structure (only coiled coils were vidual coiled coils (Fig. 4). For each of these 6 topological var- used for calculations of Tw and Wr),andtothefactthatthecurves iables, we present not only the overall value of each topological used in the numerical calculations were approximated by a discrete quantity as a function of time and residue position along the set of points. Large changes in the linking number, however, indicate coiled coil but also the cumulative values of these registered discrete changes in global topology of the system (35–37). every 50 residues along the sequences. The calculations of the We first study how the braiding topology evolves during a cumulative values indicate which regions along the coiled coil simulation when starting from an initial “V-shape” euk-cohesin. contribute the most to the overall geometry of the braided system.

Krepel et al. PNAS | January 21, 2020 | vol. 117 | no. 3 | 1471 Downloaded by guest on September 26, 2021 Fig. 4. Writhe values of SMC1−SMC3 braided coiled coils suggest various contributions to the overall geometry of the braided system. Writhe values analysis as a function of time. For each panel, the writhe of each SMC1−SMC3 coiled-coil pairs is described as follows: coils 1 and 3 values are shown in the Upper Left ðWr13Þ, coils 1 and 4 values are shown Upper Right ðWr14Þ, coils 2 and 3 values are shown in the Lower Left ðWr23Þ, and coils 2 and 4 values are shown in the Lower Right ðWr24Þ. Writhe analyses are shown for the following regions: (A) Lower region of braided coiled-coil structures (residues 1 to 100). (B) Middle region of braided coiled-coil structures (residues 200 to 300). (C) Top region of braided coiled-coil structures (residues 280 to 380). (D) Coiled-coil braiding event as obtained throughout the first 3 ns. For each panel, the cumulative writhe contributions along the coiled coils are plotted on top. Writhe values as a function of both residue number along the coiled coil (y axis) and time (x axis) are plotted at the bottom of each panel. Higher and lower values of topology variables are shown using a color gradient from red to blue.

Following the cumulative sum of Tw along the chain, we can see Tw12 + Tw34). The approximate linking number of the braided that the twist propagates along the coiled coil as the braids are system fluctuates between 10π and 12π, since the system is not fully formed along the chains (Fig. 3 B and C). As the braiding pro- closed in a strict sense. The changes in the topology of the complex gresses, the total writhe increases (Fig. 3C, red line) to the value of suggest that the braided structure is flexible and could take on ∼4π at the expense of a decrease in total twist values, which de- varying topologies if DNA were to entwine with the braided coils. crease by ∼3π (Fig. 3C, gray lines representing each SMC mono- We now turn to examine the contribution of the individual mer Tw12 and Tw34, and blue line representing the total twist coils to the increase in writhe values (Fig. 4). Each of the

1472 | www.pnas.org/cgi/doi/10.1073/pnas.1917750117 Krepel et al. Downloaded by guest on September 26, 2021 4inter-SMC1−SMC3 coiled-coil pairs (1 and 3), (1 and 4), (2 and 3), twisted in the appropriate direction by −2π, 5) similarly preparing and (2 and 4) exhibits various writhe values and contributes dif- an isomer in which a torque leading to a −2π twist was applied only ferently to the overall writhe of the system. Each coiled-coil pair to SMC3, and 6) similarly preparing an isomer by applying a torque can be described as having 3 regions arranged sequentially along to SMC1 thus changing its twist by −2π. Summary of isomers 1 to 6 the braided structure: isshowninFig.5B. In all, we created 8 topological isomers. Comparisons of the 1) The bottom region between the head domain and the kink, contact maps of the isomers and of the potential energies of the including the first 100 residues, exhibits a low writhe of Wr ≈ SI Appendix π isomers can be found in , Fig. S9 and Table S1. The 6 0.4 . These low writhe values are obtained for residues above twisted isomers result in buckled “C-shape” conformations, with the kink. Braiding does not occur below the kink region. As ∼ the hinge domain inclining toward the head domain. In such C- the system relaxes upon braiding at 3 ns, the coiled-coil pair shape conformations, buckling occurs in the regions of residues (1 and 4) strands first braid around each other and then A Top ∼ 260 to 310 along each of the 4 coils (Fig. 5 , , light gray). In entwine with pairs (1 and 3) and (2 and 3) at 9 ns into all cases, the twisted isomers, when further relaxed under the the AWSEM simulation (Fig. 4A). ∼ AWSEM force field in the absence of any constraints, formed I- 2) The middle region (residues 100 to 250) along the braided shape configurations (Fig. 5 A, Bottom, dark gray). coiled coils shows an increase in writhe. Larger values of the Recently, the purified euk-cohesin complex was investigated writhe are seen for the (1 and 3) and (1 and 4) coil pairs, which by Barysz et al. (14) using a combination of amino acid selective exhibit writhe values of ∼2.5π. A decrease in the writhe values is B cross-linking and mass spectroscopy. They identified a total of 15 seen for residues 250 to 300 for all 4 coiled-coil pairs (Fig. 4 ). SMC1−SMC3 cross-linking sites as being of high-confidence. 3) The top region, close to the hinge (residues 300 to 375), shows The ensemble of predicted structures in this paper captures all an increase in writhe values during the dynamics. The largest 15 of those cross-links and suggests many more Lys−Lys pairs contributions to this increase in writhe values in this region that could possibly be identified in more extensive cross-linking come from the (1 and 4) and (2 and 4) coil pairs, reaching experiments. The combination of these sites enabled the as- writhe values of ∼3.5π. The analysis of the cumulative writhe C sembly of a single computational structure of cohesin in which along the sequences in Fig. 4 indicates the braiding of the the coiled coils approached each other only to a distance of 25 Å central part of these 2 coiled-coil pairs (residues 320 to 340). (14). In Fig. 5C, we present a summary of the inferred cross- This contribution to the writhe fluctuates throughout the sim- π ∼ π π links. We examined whether any single isomer from our simu- ulation by as much as 0.5 (from 0.9 to 0.4 ). lations could explain all of the contacts that are found and, if not,

Lastly, we focus on the coiled-coil braiding events, which occur what would be a minimal set of isomers that would capture the BIOPHYSICS AND COMPUTATIONAL BIOLOGY during the first 3 ns. The braiding events occur in 2 main regions reported set of contacts at a closer distance of 11 Å (correlation SI Appendix along the structure: The first is near the hinge, in which the coils with experimental structure can be found in , Fig. S7 initially braid [residues 300 to 375 for all 4 pairs and 250 to 300 and Table S3). As can be seen, no individual structure captures for (1 and 4) and (2 and 4)]. These 2 pairs exhibit writhe values all of the reported cross-links between the 2 coiled coils. Both the O-shape and the V-shape isomers form a similar set of as high as Wr ≈ 1.2π. Residues 150 to 200 of the coiled-coil pairs become braided together and contribute smaller values of writhe, contacts along the length of their coiled coils. The V-shape W ≈ π D isomer does form some additional contact sites near the head with r 0.4 (Fig. 4 ). C That highly flexible SMC dimer configurations form has been domain, due to its braided structure (Fig. 5 ). A second set of previously suggested by studies using high-speed liquid atomic contacts, located mostly at the bottom part of the coiled-coil region, closer to the head domain, is found for the isomers in force microscopy (12) and fluorescence imaging (29), in which which torque was applied to twist the SMC monomers relative to SMC dimers were purified from budding yeast. Based on cross- one another (Fig. 5A, isomers 1, 3, 5, and 6). These additional linking experiments, it has been suggested that the coiled-coil contacts arise from a change in the location of the kink (Fig. 5A regions of both euk-cohesin and euk-condensin are closely op- and SI Appendix, Fig. S6). Upon simulating the motion of these posed to each other along their full length (Fig. 5A) (14). The twisted isomers with AWSEM-MD, even more contacts are linking number can only change by opening and closing of the formed at the center of the coiled-coils region (Fig. 5 A, Bottom, complex. Opening and closing processes are slow on MD time shown in dark gray). Only the combination of these equilibrated scales and thus are hard to sample. We therefore explicitly con- isomers finally captures the full pattern of cross-linking reported structed and explored the dynamics of 7 additional isomers of the euk in ref. 14. The results of our simulation along with the cross- -cohesin which differ in braiding topology. Together, these linking studies suggest that the euk-cohesin in vivo exists as an isomers are able to capture all of the experimentally inferred ensemble containing topologically distinct conformations. − SMC1 SMC3 coiled-coils cross-links that were reported in ref. 14 The topology analysis of the 6 additional isomers obtained by A (Fig. 5 ). One of these isomers gives rise to an O-shape initial AWSEM simulations (shown in Fig. 5A) indicates that all of the conformation in which the hinge domains of both SMC monomers topologically distinct isomers form stable braided coiled-coil are bound together with each other, and the kleisin subunit is structures. For each topological isomer, we present the average bound to both SMC head domains, forming a partially closed values of twist, writhe, and linking numbers upon relaxing the initial ring structure. To construct a topologically distinct isomer, isomers under the influence of the AWSEM force field, re- we started from this initial dimer structure but then separated the moving both torque and center of mass constraints (shown in Fig. SMC1 and SMC3 monomers from each other. At this point, we 5D). The lowest linking number is found for isomer 5, in which a performed a series of operations to generate distinct topological twist of −2π was applied to SMC3. Both the V-shape and the O- isomers. These isomers are constructed by 1) adding torques to shape isomers are essentially topologically identical. They have twist the head domains of SMC1 and SMC3 by +2π. After doing similar writhe and linking number values (writhe and linking this, the SMC1 and SMC3 were then put together by constraining numbers for the V-shape are 3.37π and 12.33π, respectively, the center of mass of the separated monomers (Fig. 5A). Additional while those of the O-shape have values of 3.21π and 12.05π for isomers were constructed using a similar protocol by 2) applying a writhe and linking, respectively). These similarities are attribut- torque to increase the twist only of SMC3 by +2π before bringing able to their having similar initial semiclosed structures. Topo- the monomers together, 3) applying a torque only to SMC1 in- logical isomers in which the external torque twisted both the coils creasing its twist by +2π before assembly, 4) similarly preparing an by ±2π display the largest values of writhe ðWr > 4πÞ and of linking isomer in which both the SMC1 and SMC3 head domains were numbers ðLk > 14πÞ (Fig. 5D, red) compared with the other

Krepel et al. PNAS | January 21, 2020 | vol. 117 | no. 3 | 1473 Downloaded by guest on September 26, 2021 Fig. 5. The euk-cohesin SMC1−SMC3 exists as an ensemble of braided structures with different topologies. (A) Representation of twisted isomers 1 to 6 (Top), and relaxed isomers 1 to 6 (Bottom), are shown in light and dark gray, respectively. The experimentally inferred contacts captured by each isomer are shown in purple, blue, green, orange, and red, depending on the location along the braided structure. (B) Summary of SMC topological isomers. The topological isomers discussed in the manuscript are shown in blue. The x value represents the twist of the SMC3 monomer, and the y value represents the twist of the SMC1 monomer. (C) Comparison of the euk-cohesin SMC1−SMC3 monomers cross-linking contacts as reported in ref. 14, with 4 representative isomer structures that were obtained using AWSEM-MD simulations. Contacts are divided into 5 groups according to their location along the SMC coiled coil. Contacts near the head domain are shown in purple. Contacts in the middle of the coiled coils are indicated with blue, green, and orange. Contacts near the hinge are shown in red. Taken together, the set of topological isomers captures all contacts reported experimentally. (D) Summary of average topological

characteristics Tw, Wr, and Lk upon relaxation are shown in blue, red, and purple, respectively, as obtained for all 8 topological isomers. The average values of the linking numbers range between 11π and 14π for all isomers.

φ twisted structures. Upon relaxation of the initial structure for all and head−SMC3 show that the heads rotate at the same time but in isomers, no significant changes in topology were seen, obtaining opposite directions (Fig. 6B, dark red and blue). Upon braiding, this linking numbers ranging only between 11π and 14π. rotation continues in a correlated fashion. Two rotation events appear at ∼8and∼10 ns during the relaxation simulation (Fig. 6B). The Coiled-Coil Region Is Allosterically Coupled to the SMC Head We now discuss the structural changes of the SMC3 head ATPase Motor Domain. Both euk-condensin and euk-cohesin ex- domain. The available crystal structure for the SMC3 head do- trude DNA while consuming ATP (38–40), indicating an active main includes the ATPase domain in its ATP-γ-S state (24). It mechanism. In cohesin, the SMC hinge domain acts as an entry has been hypothesized that SMC head domains take on multiple gate for DNA, and the SMC−kleisin interaction regions function conformations and that the cohesin ring opening is triggered by as exit gates. It is interesting to examine how the motion of the sequential activation of the ATP sites (38, 39). The AWSEM- SMC head domain couples to the structural dynamics of the rest of MD simulations result in 3 distinct structural states differing in the complex. To quantify this coupling, we introduce an angle φhead. the vicinity of the ATPase domain (Fig. 6D, cyan beads). In Fig. This angle is defined by structurally aligning the extended coiled- 6C, we compare the contact maps for each structurally distinct coil region of each conformation (see Fig. 6A and Materials and state of the head domain. Frustration analysis of the 3 structures Methods for more details) and then calculating the rotation angle of of the head domain (see SI Appendix, Fig. S2 for more in- both SMC1 and SMC3 head domains in the plane defined by the formation on frustration analysis) reveals that, while contact coiled-coil extension and the center of mass of the SMC head do- pairs involving residues S2521 to E2621 are highly frustrated for φ main. Our results demonstrate that head−SMC3 increases by nearly all 3 states (shown in blue dashed box), only state 2 exhibits a π/2 as the SMC coiled coils become braided together (Fig. 6A, high frustration cluster near the ATPase region (Fig. 6D, cyan black). The cumulative twists of several regions along the extended beads). The structural changes in this region suggest a potential coiled coil, θbraid (Fig. 6A, blue, purple, and red), change sequen- mechanism by which the rotation of the ATPase head domain φ tially along the coiled coil in a correlated fashion. Both head−SMC1 can propagate throughout the entire SMC monomer.

1474 | www.pnas.org/cgi/doi/10.1073/pnas.1917750117 Krepel et al. Downloaded by guest on September 26, 2021 BIOPHYSICS AND COMPUTATIONAL BIOLOGY

φ φ θ Fig. 6. Braided coiled coil structure exhibits angular sliding dynamics. (A) head−SMC1, head−SMC3, and braid1−3 values as function of time, shown from top to bottom, in gray, black, blue, purple, and red. These results suggest correlation between the twist angles of the SMC3 monomer. (B) The 3D twist angle along SMC1 and SMC3 as a function of both residue number along the coiled coil (y axis) and time (x axis). Higher and lower values of twist angle values are shown from red to blue. The twist angles along the SMC1 and SMC3 coil are correlated with each other, that is, the twist propagates along the coiled coil. (C) Comparison between the head region contact maps for the 3 SMC3 head structural clusters, which suggest that the twisting is accompanied by structural changes within this domain. (D) Configurational frustration analyses for the 3 SMC3 head structures. The predicted structures appear in gray; highly frustrated interactions are in red lines. ATP region is shown as cyan beads. The frustration analyses show that different regions are frustrated in each structure.

Discussion We simulated several topological isomers of the complex in which In this study, we predicted the structural ensembles for both the 2 SMC1 and SMC3 subunits of the euk-cohesin had been first prok-condensin and 2 members of the eukaryotic SMC−kleisin twisted relative to one another by applying torque. The combined protein complexes, cohesin and condensin, using an integrative ensemble of topological isomers provides agreement with experi- approach. We have combined fragmentary crystallographic data mental cross-linking data, while no individual structure could ac- with coevolutionary information derived from multiple sequence count for all of the cross-links inferred from the experiment. These alignments to obtain contact predictions for the interfaces be- results suggest that SMC complexes are flexible and that, in vivo, tween the subunits of the complex. Starting from these initial they exist as an ensemble of various conformations with different structures, we used AWSEM-MD simulations of the entire topologies. Topological isomerism would allow the motor domains SMC−kleisin protein complexes to characterize an ensemble of to couple the topology of DNA to the topology of the coiled coils if distinct conformations including several partially twisted Y- the braided coils become entwined with DNA. shape conformations and a fully twisted I-shape conformation. How the SMC−kleisin complexes operate remains unclear. The coexistence of multiple metastable ring conformations ap- While human condensin and cohesin are known to extrude DNA pears to be necessary to rationalize experimental studies in- by using ATP, it has also been suggested that an active molecular volving the selective cross-linking of pairs of residues (14–16, 40). mechanism is needed to open the ring. Both ATP and its hydro- The structure of euk-cohesin is often represented as a closed, rod- lysis by SMC proteins have been found to be essential for DNA shaped conformation in which the coiled coils of both SMC1 and loading. ATP binding or hydrolysis can trigger distinct conforma- SMC3 are closely juxtaposed, as opposed to being clearly separated tional changes within the 2 head regions that could act synergisti- in an open ring conformation (14, 16). Our analysis suggests that cally to open the tripartite ring for DNA entry. Our simulations zipping up of the SMC−SMC coiled-coil region leads to braiding of show that the braiding of the coiled coils correlates with the an- these coils, which may change the topology of the ring before it is gular rotation of the SMC head domain of the complex, φhead.The closed. The twist of the braids is coupled with the overall writhe, Wr. twist of the head domain is accompanied by structural changes,

Krepel et al. PNAS | January 21, 2020 | vol. 117 | no. 3 | 1475 Downloaded by guest on September 26, 2021 demonstrating that the head domain can adopt several dis- Calculating Twist and Writhe for the SMC Dimer. To characterize the confor- tinctive structures, suggesting a functional coupling of the head mations of SMC complex, we employed the mathematical apparatus often domain with the coiled coils. It is likely not a coincidence that used to characterize closed circular DNA. While the SMC dimer is not strictly the step size inferred from single-molecule studies of DNA considered to be an actual closed system, it may be considered so between the extrusion is matched by the length of the coiled coil (of about heads and the hinges of its monomers. Such interactions are often formed 50 nm) in the cohesin complex. even when starting from open (e.g., O-shape) conformation and persist for Understanding precisely how SMC−kleisin complexes interact most or all of the remaining simulation time. For SMC dimers, we defined a with DNA requires further investigations. Our results suggest a direction of movement starting from the head of the first SMC toward its possible mechanism by which rotation, once initiated in the head hinge, and from the hinge of the second SMC toward it head. For such circular domain, can propagate to the hinge region, mediated by the coiled- and semiclosed systems, we calculated twist (Tw) and writhe (Wr) numbers, coil region. This propagation could allow the entrapment, release, which describe the local twisting of the coiled coil (or of the coils around the and propagation of DNA strands inside the SMC ring complex. central axis) and the nonlocal torsional stress of the closed curve describing the system, respectively. The sum of Tw and Wr represents the so-called Materials and Methods linking number (Lk) that itself can independently be defined as one-half MD Simulations of Protein Complexes Using AWSEM-MD. Following the struc- the number of times that 2 closed curves wind around each other. For a ture reconstructions using structure-based models based on the DCA contacts, we strictly closed system, Lk is a topological invariant. The definition and the performed MD simulations of the bacterial and euk-cohesin and euk-condensin description of numerical computations of Tw and Wr can be found elsewhere using AWSEM, which is a transferable coarse-grained model where a protein is (33, 34). Here, we briefly provide their formulations and describe the com- represented with 3 beads per amino acid (corresponding to Cα, Cβ,andO atoms) putations that were done for the SMC dimer. (21, 41, 42). The interactions between these beads are given by a nonadditive We are given 2 directed curves C1 and C2 that roughly follow each other. force field that incorporates physically motivated potentials and bioinformatics ~ ~ They are described by a series of vectors r1ðsÞ and by r2ðsÞ, respectively, as one (42). In this work, we additionally included the long-range electrostatic interac- proceeds down. These curves can describe the strands in a circular DNA mol- tions, given by a Debye−Hückel potential that incorporates both the solvent ecule or individual protein chains in a closed coiled-coil system. We can also − dielectric effect and the screening of charge charge interactions (21), ~ ~ ~ define the centerline of the curves C given as rðsÞ = ðr1ðsÞ + r2ðsÞÞ=2alongwith X qi qj − = the corresponding unit tangent and normal vectors to the centerline as V = K e rij lD. DH elect e r i

A~ =~a − ð~a · n^Þn^, Data Availability. The AWSEM code is available online at https://github.com/ adavtyan/awsemmd (21). The structure of each subunit was processed by the ~ ~ ~B = b − b · n^ n^, SMOG server at http://smog-server.org/ (43). Frustration analysis was per- formed using the AWSEM Frustratometer server at http://frustratometer. ^ qb.fcen.uba.ar/ (44). S = ~a × b · n^,

^ ACKNOWLEDGMENTS. This work was supported by the Center for Theoret- where n is the normal vector along the first fragment (that was chosen as ical Biological Physics sponsored by NSF Grant PHY-1427654. J.N.O. is a Can- ~ the axis), a is the vector from the first point defining the first fragment cer Prevention and Research Institute of Texas (CPRIT) Scholar in Cancer ~ to the first point defining the second fragment, and b is the vector from Research and was also supported by the NSF Grant NSF-CHE-1614101 and the first point defining the first fragment to the second point defining the by the Welch Foundation (Grant C-1792). P.G.W. gratefully acknowledges second fragment. Sign½S defines the directionality of the twist, with the the support provided by the D. R. Bullard-Welch Chair at Rice University counterclockwise twisting along the axis being defined as positive and the clock- (Grant C-0016). D.K. acknowledges the Council for Higher Education of Israel wise twist as being negative. for partial financial support.

1. T. Hirano, At the heart of the chromosome: SMC proteins in action. Nat. Rev. Mol. Cell 4. G. Gürsoy, Y. Xu, A. L. Kenter, J. Liang, Spatial confinement is a major determinant of Biol. 7, 311–322 (2006). the folding landscape of human chromosomes. Nucleic Acids Res. 42,8223–8230 (2014). 2. K. Nasmyth, C. H. Haering, Cohesin: Its roles and mechanisms. Annu. Rev. Genet. 43, 5. E. Lieberman-Aiden et al., Comprehensive mapping of long-range interactions reveals 525–558 (2009). folding principles of the . Science 326, 289–293 (2009). 3. J. D. P. Rhodes et al., Cohesin can remain associated with chromosomes during DNA 6. A. L. Sanborn et al., extrusion explains key features of loop and domain formation replication. Cell Rep. 20, 2749–2755 (2017). in wild-type and engineered genomes. Proc. Natl. Acad. Sci. U.S.A. 112, E6456–E6465 (2015).

1476 | www.pnas.org/cgi/doi/10.1073/pnas.1917750117 Krepel et al. Downloaded by guest on September 26, 2021 7. G. Fudenberg et al., Establishment of chromosomal regions by loop extrusion. Cell 24. T. G. Gligoris et al., Closing the cohesin ring: Structure and function of its Smc3-kleisin Rep. 15, 2038–2049 (2016). interface. Science 346, 963–967 (2014). 8. C. H. Haering, J. Löwe, A. Hochwagen, K. Nasmyth, Molecular architecture of SMC 25. C. H. Haering et al., Structure and stability of cohesin’s Smc1-kleisin interaction. Mol. proteins and the yeast cohesin complex. Mol. Cell 9, 773–788 (2002). Cell 15, 951–964 (2004). 9. F. Bürmann et al., An asymmetric SMC-kleisin bridge in prokaryotic condensin. Nat. 26. K. W. Muir et al., Structure of the Pds5-Scc1 complex and implications for cohesin Struct. Mol. Biol. 20, 371–379 (2013). function. Cell Rep. 14, 2116–2126 (2016). 10. M. Srinivasan et al., The cohesin ring uses its hinge to organize DNA using non- 27. K. Hara et al., Structure of cohesin subcomplex pinpoints direct shugoshin-Wapl an- topological as well as topological mechanisms. Cell 173, 1508–1519.e18 (2018). tagonism in centromeric cohesion. Nat. Struct. Mol. Biol. 21, 864–870 (2014). 11. A. Matityahu, I. Onn, A new twist in the coil: Functions of the coiled-coil domain of 28. M. Kschonsak et al., Structural basis for a safety-belt mechanism that anchors con- structural maintenance of chromosome (SMC) proteins. Curr. Genet. 64, 109–116 densin to chromosomes. Cell 171, 588–600.e24 (2017). (2018). 29. F. Bürmann et al., A folded conformation of MukBEF and cohesin. Nat. Struct. Mol. – 12. J. M. Eeftens et al., Condensin Smc2-Smc4 dimers are flexible and dynamic. Cell Rep. Biol. 26, 227 236 (2019). 14, 1813–1818 (2016). 30. E. Kim et al., DNA-loop extruding condensin complexes can traverse one another. 13. I. Kulemzina et al., A reversible association between Smc coiled coils is regulated by bioRxiv:10.1101/682864 (26 June 2019). lysine acetylation and is required for cohesin association with the DNA. Mol. Cell 63, 31. M. Lundgren, A. Krokhotin, A. J. Niemi, Topology and structural self-organization in 1044–1054 (2016). folded proteins. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 88, 042709 (2013). 14. H. Barysz et al., Three-dimensional topology of the SMC2/SMC4 subcomplex from 32. D. Melnikov, A. J. Niemi, A. Sedrakyan, Topological indices of proteins. Sci. Rep. 9, 14641 (2019). chicken condensin I revealed by cross-linking and molecular modelling. Open Biol. 5, 33. M. Sayar, B. Avs¸aroglu,˘ A. Kabakçioglu,˘ Twist-writhe partitioning in a coarse-grained 150005 (2015). DNA minicircle model. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 81, 041916 (2010). 15. M. L. Diebold-Durand et al., Structure of full-length SMC and rearrangements re- 34. K. Klenin, J. Langowski, Computation of writhe in modeling of supercoiled DNA. quired for chromosome organization. Mol. Cell 67, 334–347.e5 (2017). Biopolymers 54, 307–317 (2000). 16. M. T. Hons et al., Topology and structure of an engineered human cohesin complex 35. D. M. J. Lilley, The inverted repeat as a recognizable structural feature in supercoiled bound to Pds5B. Nat. Commun. 7, 12523 (2016). DNA molecules. Proc. Natl. Acad. Sci. U.S.A. 77, 6468–6472 (1980). 17. F. Bürmann et al., Tuned SMC arms drive chromosomal loading of prokaryotic condensin. 36. A. J. Spakowitz, Z. G. Wang, DNA packaging in bacteriophage: Is twist important? Mol. Cell 65, 861–872.e9 (2017). Biophys. J. 88, 3912–3923 (2005). 18. M. Weigt, R. A. White, H. Szurmant, J. A. Hoch, T. Hwa, Identification of direct residue 37. D. Norouzi, A. Katebi, F. Cui, V. B. Zhurkin, Topological diversity of chromatin fibers: contacts in protein-protein interaction by message passing. Proc. Natl. Acad. Sci. Interplay between nucleosome repeat length, DNA linking number and the level of – U.S.A. 106,67 72 (2009). transcription. AIMS Biophys. 2, 613–629 (2015). 19. A. Haldane, W. F. Flynn, P. He, R. S. Vijayan, R. M. Levy, Structural propensities of 38. T. Terakawa et al., The condensin complex is a mechanochemical motor that trans- kinase family proteins from a potts model of residue co-variation. Protein Sci. 25, locates along DNA. Science 358, 672–676 (2017). – 1378 1384 (2016). 39. L. Vian et al., The energetics and physiological impact of cohesin extrusion. Cell 173, 20. D. Krepel, R. R. Cheng, M. Di Pierro, J. N. Onuchic, Deciphering the structure of the 1165–1178.e20 (2018). – condensin protein complex. Proc. Natl. Acad. Sci. U.S.A. 115, 11911 11916 (2018). 40. K. Nasmyth, How are DNAs woven into chromosomes? Science 358, 589–590 (2017). 21. A. Davtyan et al., AWSEM-MD: Protein structure prediction using coarse-grained 41. W. Zheng, N. P. Schafer, A. Davtyan, G. A. Papoian, P. G. Wolynes, Predictive energy physical potentials and bioinformatically based local structure biasing. J. Phys. Chem. B 116, landscapes for protein-protein association. Proc. Natl. Acad. Sci. U.S.A. 109, 19244– BIOPHYSICS AND COMPUTATIONAL BIOLOGY 8494–8503 (2012). 19249 (2012). 22. D. U. Ferreiro, J. A. Hegler, E. A. Komives, P. G. Wolynes, Localizing frustration in 42. M. Y. Tsai et al., Electrostatics, structure prediction, and the energy landscapes for native proteins and protein assemblies. Proc. Natl. Acad. Sci. U.S.A. 104, 19819–19824 protein folding and binding. Protein Sci. 25, 255–269 (2016). (2007). 43. J. K. Noel et al., SMOG@ ctbp: Simplified deployment of structure-based models in 23. D. U. Ferreiro, J. A. Hegler, E. A. Komives, P. G. Wolynes, On the role of frustration in GROMACS. Nucleic Acids Res. 38:W657–W661 (2010). the energy landscapes of allosteric proteins. Proc. Natl. Acad. Sci. U.S.A. 108, 3499– 44. R. G. Parra et al., Protein Frustratometer 2: A tool to localize energetic frustration in 3503 (2011). protein molecules, now with electrostatics. Nucleic Acids Res. 44, W356–W360 (2016).

Krepel et al. PNAS | January 21, 2020 | vol. 117 | no. 3 | 1477 Downloaded by guest on September 26, 2021