Wesleyan University Physics Department

Investigating DNA Junction Structure and Dynamics using a Coarse-grained Implicit Ion Model for DNA

by

Abraham Kipnis Class of 2019

An honors thesis submitted to the faculty of Wesleyan University in partial fulfillment of the requirements for the Degree of Bachelor of Arts with Departmental Honors in Physics

Middletown, Connecticut April, 2019 Abstract

Four-way Holliday junctions are cruciform-shaped DNA structures which play vital roles in biological processes. In this thesis, we validate the ability of an explicit ion coarse-grained model for DNA (3SPN.2) to accurately simulate Holliday junction dynamics above and below DNA junction melting temperatures. We analyze a variety of junction behaviors, including ion binding, junction conformations, junction melting, and branch migration, and compare our results with expected results from scientific literature. We discuss four different methods to determine the structure of the junction, evaluate the drawbacks and advantages of these different methods by comparing them with each other and with data from previous studies and show that results qualitatively reflect our expectations of DNA junction structure at equilibrium. Then we use one of these methods to show that DNA junction dynamics as produced by the 3SPN.2 explicit ion model demonstrate the expected trends from literature. Next, we investigate melting by observing the dissociation of individual bases during our simulations and provide quantitative predictions for the dynamics of junction melting. Our results show that melting initiates at specific locations on a per-strand basis. These simulations helped inspire fluorescence melting experiments which strategically place nucleotide base analogs at several locations along junction strands and validate the primary predictions of the modeling. We offer a comprehensive overview of DNA junction research made possible using the 3SPN.2 explicit ion model for DNA and conclude that this model is a useful tool for exploring Holliday junction structure and dynamics in the presence of ions. Acknowledgements

I would like to express my utmost gratitude to: My supervisor, Francis Starr, for giving me these projects, providing guidance and inspiration, • patiently helping me hone my skills and find solutions to problems, and reminding me to not lose the forest for the trees.

My second supervisor, Ishita Mukerji, for having an eye for details and an intuition that only an • experimentalist can have. Other members of the Mukerji lab, including Rachel Savage, Julie McDonald, Nick Taylor, and • Dacheng Zhao, for their contributions to experimental four-way junction research.

Members of the Wesleyan physics department: Tsampikos Kottos, Greg Voth, Renee Sher, Fred • Ellis, Tom Morgan and Yunseong Nam, for teaching me more than I could have imagined during my time here.

Other current and ex members of the Starr lab, including Wujie Wang, Xinyu Zhu, Natalie • Strassheim, Abhishek Fakiraswami, Wenghang Zhang, Hamed Emamy, Amber Storey, Chloe Thorburn and Nathan Shankman for laying the initial groundwork this project and for sharing expertise, advice, and being my personal heroes.

The residents of 59 Warwick: Will Barr, Cail Daley, Ryan Adler-Levine, Gabe Weinreb, and Tony • Strack for being with me during the summer of 2017.

My family, for providing love and care the whole way through. •

ii Contents

1 Introduction 1 1.1 Homologous Recombination ...... 2 1.2 DNA Junction Studies ...... 6 1.3 DNA Junction Melting ...... 12 1.4 DNA Nanotechnology ...... 14 1.5 Motivation and Objectives ...... 16

2 Model and Simulations 17 2.1 Coarse-grained DNA Model ...... 18 2.2 Molecular Simulations ...... 20 2.3 Hidden Markov models and our implementation ...... 23

3 Equilibrium Structure and Dynamics 25 3.1 Radial distribution function ...... 25 3.2 Methods to determine junction isoform ...... 28 3.2.1 Central base distances ...... 28 3.2.2 Consecutive phosphate angles ...... 31 3.2.3 RMSD between ideal isoform structures and current snapshot ...... 33 3.2.4 Markov model state classification ...... 34 3.3 Transitions between conformations ...... 38

iii CONTENTS CONTENTS

4 Melting Dynamics 39 4.1 Method to determine base dissociation ...... 40 4.2 Preferential melting ...... 42 4.3 Isoform transitions during melting process ...... 44

5 Branch Migration Dynamics 47 5.1 Methods to determine junction location ...... 48 5.2 Junction migration probability ...... 48

6 Conclusion 51 6.1 Evaluation of ionic distributions around the junction ...... 51 6.2 Evaluation of junction isoform determination methods ...... 52 6.3 Evaluation of melting data ...... 53 6.4 Future directions ...... 54

iv List of Figures

1.1 Meiotic recombination ...... 2 1.2 Holliday model ...... 3 1.3 Junction conformation and branch migration energy landscape ...... 4 1.4 Holliday junction formation and branch migration ...... 5 1.5 Holliday junction resolution ...... 5 1.6 Products from cleavage of junction by restriction enzymes ...... 6 1.7 Electrophoretic gel mobilities of restriction enzyme products ...... 7 1.8 DNA junction conformer transitions ...... 8 1.9 FRET experiment to determine conformer populations as a function of [Mg2+].....9 1.12 Simulations of B-DNA show increased flexibility at high salt concentrations ...... 10 1.10 FRET experiment to determine conformer populations as a function of other ions . . . . 11 1.11 FRET experiments of mobile junctions to determine junction migration rates ...... 11 1.13 Radial distribution function between Mg2+ions and DNA junction phosphates ...... 12 1.14 Junction melting curve from implicit ion model for DNA ...... 13 1.15 Two-dimensional lattices made from DNA junction subunits ...... 14 1.16 Nanotubes made from DNA junction subunits ...... 15

2.1 Illustration of coarse-grained representation of B-DNA using 3SPN.2 model ...... 18 2.2 Fraction of broken bases from melting simulations ...... 22

3.1 Sodium radial distribution functions ...... 26 3.2 Magnesium radial distribution functions...... 26

v LIST OF FIGURES LIST OF FIGURES

3.3 Schematic of isoform determination using base distances ...... 28 3.4 Implicit ion model central base distance histograms and isoform population distributions 29 3.5 Explicit ion central base distance histogram ...... 30 3.6 Isoform population distributions from base distances criterion using explicit ion model . 30 3.7 Angle between consecutive phosphates ...... 31 3.8 Explicit ion central phosphate angle histograms ...... 32 3.9 Isoform population distributions from central phosphate angles criterion ...... 32 3.10 Isoform population distribution by using RMSD from ideal isoform ...... 33 3.11 Isoform population distributions from base distances Markov models ...... 35 3.12 Isoform population distributions from phosphate angles Markov models ...... 36 3.13 Isoform population distributions from base distances and angles Markov models . . . . . 37 3.14 Transition matrices from phosphate angles criterion ...... 38

4.1 Stages of the melting process ...... 39 4.2 Example of distances and melting time ...... 41

4.3 Number of base pairs and tm from one melting simulation ...... 41

4.4 Mean base pair dissociation time tm at 360K ...... 43 4.5 Experimental fluorescence melting curves ...... 44 4.6 Theoretical melting temperatures for individual strands that comprise the J3 junction . . 45 4.7 Isoform timeseries for a single junction melting simulation ...... 46 4.8 Isoform population distribution during melting ...... 46

5.1 Junction location timeseries, [Na+] 100mM [Mg2+] 0mM ...... 49 5.2 Junction location timeseries, [Na+] 100mM [Mg2+] 50mM ...... 49 5.3 Junction migration probability ...... 50

vi Chapter 1

Introduction

Science, for me, gives a partial explanation for life. “ In so far as it goes, it is based on fact, experience and experiment. ” Rosalind Franklin, a letter to Ellis Franklin, 1940

DNA is an information-encoding polymer responsible for carrying instructions for how organisms live, procreate, and die. For solving the most common structure of DNA, the method by which DNA replicates, Watson, Crick, and Wilkins won the Nobel Prize for Physiology in 1962 with the help of x-ray diffraction data from Rosalind Franklin [1]. DNA is composed of two oppositely oriented strands of single-stranded DNA (ssDNA) in a right-handed helix, with bases on opposing strands connected through hydrogen bonding. Under physiological conditions, double-stranded DNA (dsDNA) exists in “B form” (B-DNA), with one turn of the helix approximately every 10 base pairs, or 34 A˚ . Most of the time, DNA is organized into condensed networks called . In cells, helicases and polymerases aid DNA replication by unzipping the strands and using each as a template for synthesis of a new complementary strand. Imperfect copying and mismatch repair results in approximately one point mutation occurring per 10 billion bases replicated [2], which can alter structure and function. A diversity of phenotypes within a population contribute to a species’ ability to withstand environmental perturbations [2].

1 Section 1.1 - Homologous Recombination 2 1.1 Homologous Recombination

Another important evolutionary tool that increases genetic diversity is meiotic recombination. During meiosis (the repeated division of cells into four cells with half the number of chromosomes), duplicated homologous chromosomes align at the centers of cell nuclei before segregating and triggering cell division [See Figure 1.1]. Recombination begins between aligned homologous chromosomes at sites called chiasmata when recombination create two “nicks” in similar regions of a single strand on each [3].

Figure 1.1: A biological pathway in which DNA junctions appear. In fungi, plants, and male mammals, meiosis begins with the duplication of the cell’s chromosomes (a). Meiotic recombination is then carried out by the formation and resolution of Holliday junctions (b). To complete meiosis, the cell divides into two cells with two sets of chromosomes each (c), and then division occurs again as each of those cells divides into one cell with one set of chromosomes each (d).

An enzyme complexes with the nicked strands, separating them from their intact antisense strand. This enzyme-strand complex then searches for a similar sequence on the opposing chromosome, and the nicked strands cross over and pair with the antisense strand of the homologous duplex. This cross-over region between duplexes is named the Holliday junction, proposed by Robin Holliday in 1964 [3] as a model for aberrant segregation of in fungi [See Figure 1.2 for the recombination schematic from Holliday’s paper]. Holliday’s theory describes the process by which DNA strand exchange occurs, and predicts junctions exist in a parallel conformation with opposing strands oriented in the same direction, as visualized by the conversion 285 The chromatids are now attached at an arbitrary point by a chiasma-like structure involving single strands (half chromatids) of DNA. It is postulated t h a t at the points where strands exchange partner precise breakage and reunion of non-complemen- tary strands can occur so that there is no deletion or duplication in either strand. Section 1.1 - HomologousIf on Recombinatione of t h e pairs of non-complementary strands is involved in this process, then 3 the half chromatid chiasma is converted into a whole chromatid chiasma, thus

3 4

U breaks

hybrid region within which conversion ti can occur 4a 4b (strands 2 and 3 (strands 1 and 4 break and rejoin) break and rejoin) Fig. 1. Diagramatic illustration of the successive stages of effective pairing and Figure 1.2: Holliday’s model fromrecombinatio 1964 inn which of h o m o recombination l o g o u s chromatids happens. Solid lines between represent th homologouse DNA strandschromosomes. of Classically, DNA one chromatid and broken lines those of the other chromatid. The polarity of the junctions were thought to form withstrand strandss is indicate paralleld by th toe arrows each other, and th (3).e shor Structuralt horizontal dataline give haves the shown position this is not the minimum energy of the linker or genetic discontinuity. configuration. This figure has been reproduced from [3]. recombining outside markers and permitting the mechanical separation of chroma- tids which must occur at anaphase (Fig. 1, 4b). If the other pair is involved then no chiasma is formed and there is no recombination of outside markers (Fig. 1, 4a). arrows in Figure 1.2. However,In both cases structural there is a regio datan of from'hybrid in-vitro' DNA extendin experimentsg from the linke showr to four-way the DNA junctions point of breakage. If this part of the genetic material is homozygous then normal base pairing will occur in the hybrid region; but if the annealed region spans a preferentially adopt stackedpoint of heterozygosity— anti-parallela conformationsmutant site—then mispairin [see theg of arrows bases will in occu Figuresr at this 1.8, 1.9 and 1.10 for site. It is further postulated that this condition of mispaired bases is unstable. It schematics of DNA junctionshas been show withn tha thet base experimentallys which are not hel determinedd within the doubl polarity].e helix b Ify norma homologousl chromosomes

are aligned with the25657 same polarity9CC 423:56 prior 46 to .02556 strand invasion /22C yet the minimumD364CCC96,23:56,6C67D6 energy structure of the junction 22:2362C9CC 423:56 46C6 9CC5: 1 as determined experimentally is anti-parallel, then chromosomes must undergo major conformational changes to adopt this structure during recombination. Uncovering the principle behind this disagreement between experimental and biological data is important for understanding how DNA junctions evolve dynamically in-vivo and for manipulating DNA for genetic and nanotechnological applications. After strand invasion, junctions undergo branch migration, during which two opposite “arms” increase while the other two arms decrease in number of base pairs, moving the intersection point between the two strands. A schematic of this process is visualized in Figure 1.4. Proteins such as the E. Coli RuvAB complex and eukaryotic Rad51 and Rad54 can accelerate branch migration [4], and the action of these proteins depends on the DNA sequence of the two strands at the branch point. Locally, DNA junctions exist in one of three major conformational classes: open and two stacked isoforms [See Figures 1.8, 1.9 and 1.10 for examples of these isoforms]. The two stacking conformers correspond to the stacking of different pairs of arms and share the open conformation as an intermediate state. This is observable using a multitude of experimental techniques [5–11]. In order for branch migration to occur, the junction must be in an open configuration, in which the arms are in a square- articles Section 1.1 - Homologous Recombination 4 a b

Fig. 4 Energetics of Holliday junction dynamics. a, Temperature dependence in kJ of (kI→II + kII→I) averaged over 60 molecules of junction 7 under each condition. The large error bars result from theFigure inhomogeneous 1.3: Junction distribution conformer transitionof the transitions and branch rates migration among individual energy landscape molecules. proposed The activation by McKinney et al. [10]. The red and enthalpy (ΔH‡) was obtained by fitting the data using the function ln(k) = Const. – ΔH‡ / RT. b, The proposed energy landscape of conformer transi- green lines are the free energies along the branch migration and conformer transition reaction coordinates. The lines intersect at the tions and branch migration. Red and green lines are the free energy function along the reaction coordinate of conformer transitions (isoIn ↔ On ↔ BM isoIIn) and branch migration (On ↔ On+1), respectively.open conformation,Tn,n+1 is the transition which is state believed between to be anOn intermediateand On+1. Two for lines both intersect the stacked at the conformer open structure, transitions a and for branch migration. common intermediate for both processes. See text for the definition of symbols.

http://www.nature.com/naturestructuralbiology at each stage of branch migration. The decreaseplanar in configuration. the rate of (Molecular As Overmars Probes) et al.in buffer note, A “The (10 mM junction...crosses Tris-HCl, pH 8.0, aand kinetic 50 mM barrier into the transient branch migration in the presence of Mg2+ ions probably results NaCl). Biotinylated junction molecules (10–50 pM in buffer A) were from depopulation of the open state becausehigh-energy of the stabilization open form,then which added doesto the allow treated for surface branch and migration. immobilized. After A 532 one nm orlaser more migration steps the of the stacked X-structure. (Crystalaser) was used to excite the donor, Cy3. Single-molecule molecule crosses the barrierdata were again, obtained either using in aa forwardconfocal directionscanning microscope to yield the28, except second stacked X-conformer The conformer transitions observed here raise interesting for the temperature-dependence studies that used a wide-field questions about the Holliday junction dynamicsor backwards in a cellular tocon- the originalevanescence arrangement field microscope of the22 stacked. For temperature arms ofthe regulation, junction” a [12]. A proposed free text. The genomic DNA molecules are likely constrained in vivo water-circulating bath (NESLAB) was connected to all parts in con- by other cellular components; how such frequent,energy large landscape swiveling fortact junction with migration the sample, and including conformer the transitions prism mount, is visible the sample in Figure 1.3. of DNA helices involved in stacking conformer transitions can be mounting platform and the microscope objective (via a brass tubing accommodated is not known. Combining the In single-molecule order for chromosomescollar). The to sample separate, temperature DNA junctions was monitored must be directly resolved using into a two separate duplexes. thermocouple. Unless otherwise specified, all measurements were fluorescence method with mechanical manipulationDepending toolson the to directionalitymade at 25 °C of in thisa 50 resolution,mM Tris-HCl, eachpH 8.0, resulting oxygen scavenger chromosome system can have small or large apply force or torque may address this issue. One possibility is (7% (w/v) glucose, 1% (v/v) 2-mercaptoethanol, 0.1 mg ml–1 glucose –1 that the Holliday junction conformational dynamicsregions of are genetic altered materialoxidase incorporatedand 0.02 mg ml fromcatalase) the other, with specified corresponding amounts of to MgCl two2 types of recombination by junction-specific proteins. Indeed, RuvA and RuvB (two and NaCl. We also performed measurements on PEG surfaces22 and found similar results both in terms of molecular heterogeneity and ©2003 Nature Publishing Group Group Publishing Nature ©2003 branch migration proteins) in E. coli maintainintermediates the junction in outlined the in Figure 1.5. As stated by McKinney et al. in 2003, “local DNA sequences 2+ open conformation to facilitate branch migration27, and many Mg -dependent changes in the transition rates. can determine the outcome of genetic recombination via the bias in the stacking conformation” [10]. junction-resolving enzymes induce global changes in junction 5 Data analysis. To analyze the traces, we first filtered out blinking conformations . Single-molecule approachesThese illustrated recombinants here areevents an important(defined as pathwaya reversible through transition which of the organisms acceptor to regulate an inact- the exchange of genetic should also be useful in understanding other dynamic aspects of ive state, giving rise to an unquenched donor emission) and photo- Holliday junction during recombination, includingmaterial between its forma- chromosomes,bleaching of increasingeither fluorophore, genetic and diversity then performed in offspring a threshold and species’ populations. How 2+ tion, branch migration and final resolution into duplexes. analysis on the EFRET time traces. For all but the very lowest [Mg ], DNA sequence affectstwo junction distinct migration,EFRET states were binding observed specificity at ∼0.2 and of proteins, 0.6, in reasonable and conformational specificity agreement with the predictions based on the stacked X-structure. Methods is an active area of research.The threshold The followingvaried ranging chapter from serves0.35 to 0.5 as andepending introduction on [Mg to2+]. major discoveries about DNA preparation. Oligonucleotides of the following sequences Transitions were not counted unless the state lasted at least for two were synthesized using phosphoramidite chemistryDNA implemented junctions and on thedata current points, state with ofexceptions the field. being made for one data point transi-

ABI394 DNA synthesizers. Cy5-J7b is 5′-Cy5-CCCTAGCAAGCCG tions that showed anti-correlated behavior in ID and IA. To calculate CTGCTACGG; Cy3-J7h, 5′-Cy3-CCGTAGCAGCGCGAGCGGTGGG; Biot- the average rate, the duration of all events from a specified number J7r, 5′-Biot-CCCACCGCTCGGCTCAACTGGG; J7x, 5′-CCCAGTTGAGCGC of molecules were plotted as a histogram and fit to an exponential TTGCTAGGG; Cy5-J3b, 5′-Cy5-CCCTAGCAAGGGGCTGCTACGG; Cy3- decay. This method yielded similar decay times with those obtained J3h, 5′-Cy3-CCGTAGCAGCCTGAGCGGTGGG; Biot-J3r, 5′-Biot-CCCAC- by autocorrelation analysis. We used autocorrelation analysis to CGCTCAACTCAACTGGG; and J3x, 5′-CCCAGTTGAGTCCTTGCTAGGG. determine the temperature dependent rates. The junction was annealed by first mixing 45, 50 and 55 pmol of the oligonucleotides labeled with biotin, Cy3 and no attachment, respectively, in 25 mM Tris-HCl, pH 7.5, and 25 mM NaCl. This mixture Acknowledgments was slowly cooled from 65 to 37 °C, at which point 55 pmol of the We thank I. Rasnik for experimental help. Funding was provided by the NIH, NSF, Cy5-labeled oligonucleotide was added, followed by 15 min incuba- Searle Scholars Award (T.H.) and by Cancer Research UK (D.M.J.L.). S.A.M. was tion and chilling on ice. Assembled junctions were purified by elec- partially supported by the NIH molecular biophysics training grant. trophoresis using polyacrylamide gels and electroelution. Competing interests statement Single-molecule measurements. Strepatvidin-coated glass or The authors declare that they have no competing financial interests. quartz surfaces were prepared by successive application of 1 mg ml–1 biotinylated BSA (Sigma) and 0.2 mg ml–1 steptavidin Received 10 October, 2002; accepted 19 November, 2002.

96 nature structural biology • volume 10 number 2 • february 2003 bacteria, homologous recombination involves the exchange of genetic information

between compatible regions in partner chromatids, such that new combinations of

alleles exist in the gene population. Chromosome evolution can lead to any number

phenotypic changes that are harmful, beneficial, or neutral for the organism. Molecular

understandings of this activity are of primary concern among those interested in

accelerated mutation during carcinogenesis and designing drugs against this serious

illness. Section 1.1 - Homologous Recombination 5 ! A model for the process of chromosome transformation during Homologous

Recombination was first described over forty years ago (Holliday 1964), yet over the

years, more complex additions and pathways have been identified as the field has

grown. The first step in the model involves the introduction of two “nicks” in the two

homologous DNA duplex strands. As shown in Figure 1.1.2, with this modification, it

becomes possible for each of these nicked strands to separate from their own intact,

antisense strand, cross over to the opposite homologous chromosome, and basepair

with the antisense strand of the homologous duplex. These steps create a region in

each of the duplexes (heterologous DNA) that contains a hybrid of DNA strands Figure 1.1.2: Homologous recombination begins with the nicking of two homologous chromosomes, Figure 1.4: This imagewhich depicts can lead the to junction the formation formation of a crossover process region, from known the creation as a Holliday of nicks junction. (a) inNicks each are of introduced the homologous chromosomes, (b)originating to homologous from strands, each ofallowing the homologous strand invasion duplexes. into the opposite Upon chromosomeligation of the (c-d). nicked Ligation regions, of the to strand invasion (b, c,crossing and d), strand to branch to their new migration partners (eproduces and f). a Afterheterologous branch duplex, migration, containing junction material resolution from both occurs, depicted in Figure chromosomesthis process (f). of (Voet, strand Voet invasion et al. 2008) completes the production of a cross-over region between 1.5 below. Adapted from Voet et al. [13] the duplexes, named the Holliday Junction, in honor of its worthy postulator. Also known

- 3 -

Figure 1.1.3: Resolution of junctions may follow two patterns that may significantly affect chromosomal Figure 1.5: Followinginformation, junctionmigration, depending on the which junction strands must are cut. be A resolved rotation ofinto the arms two a continuous and b from the strands. final product Resolution of of DNA junctions Figure 2 (h) simplifies an explanation of these two modes of resolution. Horizontal scission corresponds can occur in two differentto breaking ways the (j). invading In one strands, way, a while junction-resolving a vertical cut implies protein that the introduces continuous incisionsstrands had along been cut the (j). continuous strands (the Following ligation of the products (k), incision of the crossing strands maintains the continuity of both vertical cut), and in thechromosomes, other way, thewhile incisions cutting the are continuous made alongstrands the produces invading a result strands that has (the trade horizontal arms between cut). the The product of these two chromosome pair (l). (Voet, Voet et al. 2008) different resolution processes is visible in (l). When the incision is made along the continuous strands, the resulting chromosomal products have traded arms, and the chromosomes might be more- 4 genetically - different than when the incisions are made along the invading strands. Adapted from Voet et al. [13] Section 1.2 - DNA Junction Studies 6 1.2 DNA Junction Studies

DNA junctions form during other recombination events such as damage repair, replication, and chro- mosomal rearrangement. As the dynamics of these processes span a range of time and length scales, scientists use various techniques to study them in-vitro, including optical and electron microscopy, gel electrophoresis, Forster¨ resonance energy transfer (FRET), nuclear magnetic resonance (NMR), chemical or enzyme probing, and molecular modeling. Below we review experiments and their results as they relate to the research carried out in this thesis. In 1983, Seeman et al. [14] suggested the sequence of the strands that comprise the junction could be engineered to ensure junction formation and limit or enhance the resulting junction’s ability for branch migration. This group developed algorithms to find sequences to form DNA junctions which do not permit branch point migration. These “immobile” junction sequences are used to study the thermodynamics and dynamic properties of DNA junctions in the absence of branch migration, whereas “mobile” junction sequences are used in studies of junction migration. Duckett et al. and Diekmann and Lilley [5,6] ran experiments using four DNA fragments which anneal into stable junctions with one restriction enzyme (HindII, EcoR1, Xba, or BamHI) target sequence along each arm. This led to standard nomenclature of DNA junction arms as H, R, X, and B. The experiments they carried out studied DNA junction fragment motion through electrophoretic polyacrylamide gels after Cell 80junctions were cleaved by the restriction enzymes, both in the presence and absence of magnesium cations. Fully intact junctions were observed to have slow mobility through gels, whereas junctions whose arms

hadA been cleaved by restriction enzymes had faster motion. Figure 1.6 shows two different products that result from cleaving of junction arms by restriction enzymes.

B+H

y .H /

X’ il H+X

+ \

\ etc Figure 1.6: The locations of the restriction enzyme target sequences are visible on the left. Six possible cleavage products are possible. These six products have the same number of base pairs and molecular weight, but differ in their mobility through gels due Figure 1. Analysis of a Four-Way Junction Ato four-way their geometries,junction was assembled i.e. linear by the structures hybridization haveof four higher 80 nucleotide gel mobilitysynthetic oligonucleotides. than structures (A) The which sequence have of folded. the central See region Figure 1.7 for all possible between the restriction sites. The sequence beyond these sites is the same for each arm, and may be obtained from that presented for the 5’BamHl oligonucleotiderestriction productsin Experimental and Procedures. their relative (B) Method mobilities. of analysis. AdaptedThe schematic from shows Duckett that arms etmay al. be [5]. cleaved at the restriction sites in pairs, giving six possible species for analysis by polyacrylamide gel electrophoresis. No stereochemical arrangement is implied here.

that the mobility was very dependent on cation concentra- selectively cleaved, shortening them by about 25 bp. This tion and that magnesium was very effective in generating is shown diagramatically in Figure 1. a compact conformation, which was then of superior ther- We have demonstrated previously that hybridized DNA mal stability. Thus, the importance of ion binding on the molecules containing stable four-way junctions exhibit structure was suspected. Very recently, Cooper and strongly retarded mobility on gel electrophoresis (Gough Hagerman (1987) studied a synthetic four-way junction by and Lilley, 1985), consistent with bending or flexure at the gel electrophoresis: they ligated DNA fragments to cohe- junction itself. The assembled junction and its cleaved sive termini left on each arm and observed different mobil- forms were therefore studied by electrophoresis in poly- ities depending on the arms extended. They interpreted acrylamide, as shown in Figure 2. We have demonstrated their results in terms of an asymmetric configuration of the importance of divalent ions in the structure of such helices. species (Diekmann and Lilley, 1987), and hence 5 mM In this study we have investigated the structure of four magnesium was included in the running buffer. junctions of related sequence, and deduced a general The complete junction migrates as a narrow band, con- structure for the Holliday junction. The structure involves sistent with the formation of a structurally well defined spe- colinear helix-helix association to generate an X-like cies. The migration is very retarded, corresponding to an structure, and cation binding is essential for the correct apparent size of 510 bp by comparison with regular formation. Two isomers of the structure are possible, and marker fragments (data not shown). The junction was sub- the choice between these is critically dependent on the jected to pairwise cleavage by restriction enzymes, result- sequence at the junction itself. This in turn is the major in- ing in six possible combinations, and these digests were fluence in determining the pattern of enzymatic resolu- electrophoresed beside the uncleaved junction (Figure 2). tion, and hence the products of recombination. In each case two products of cleavage are evident. The relatively minor species running more slowly are three- Results arm junctions, resulting from partial digests in which only one arm was shortened (confirmed by digestion of the Conformation of a Four-Way Junction junction with one restriction enzyme; data not shown), and Four oligonucleotides each of 80 nucleotides were synthe- the faster major products result from two arms becoming sized and hybridized under stringent conditions. Their se- cleaved from the junction. While the three-arm junctions quences were chosen such that they assembled into a migrate equally, this is not true for the two-arm junctions, single structure possessing a four-way helical junction, where the six possible forms each exhibit one of three Each arm of the structure was 40 bp in length, and each rates of migration. These data are inconsistent with either contained a unique restriction site about 12 bp from the tetrahedral (which would result in six equivalent species junction. By this means any of the four arms could be of equal mobility) or square-planar (four slow and two fast Holliday Junction Structure 81

Jl BHBR BX HR HX RX Figure 2. Gel Electrophoretic Analysis of a Four-Way Junction Autoradiograph of a 5% polyacrylamide gel of M the assembled junction (the central sequence CG Complete of which is shown on the left) and the species GC Junction derived by pairwise cleavage of arms using re- TA striction enzymes. Lanes, left to right: Jl, un- JlJNCTloN 1 CG cleaved junction; BH, juncl.ion incubated with GC EcoRl and Xbal; BR, junction incubated with CO Junction Hindlll and Xbal; BX, junction incubated with AGAocm R minus 1 arm B TAGCAAGC Hindlll and EcoFiI; HR, iunction incubated with TCTCGXA ATCGTTCT; BamHl and Xbal; HX, junction incubated with AT GC BamHl and EcoRI; RX, junction incubated with AT BamHl and Hindlll. Note that the lanes are GC Junction headed with the first IeEters of the uncleaved TA minus 2 arms arms. The assignment of the species is indi- m7 rn cated on the left. The junction shortened in one X arm, which results from partial digestion, was confirmed by electrophore:sing a junction in- cubated with a single restriction enzyme in par- allel.

species) symmetry. The paired formation of bands on mediate mobility, possess long arms which define an ob- cleavage of two arms implies twofold symmetry in the tuse angle. structure, and we present an interpretation in Figure 3. Why should the four-way junction form am X, as our data The structure proposed is an X, where the angles between indicate? This might be driven by the tendency of base the arms deviate strongly from 90°, leading to three types pairs in nucleic acids to associate by stacking interactions, of species from the six double digests. This structure is probably the single most important determinant of DNA also consistent with the equivalence of the three-arm junc- conformation. Extensive chemical probing studies of cru- tions, itself implying the presence of symmetry in the ciform structures (Gough et al., 1986; Furlong and Lilley, junction. 1986), and nuclear magnetic resonance studies of an oli- The four arms of the junction have been assigned in Fig- gonucleotide junction (Wemmer et al., 1985), have indi- ure 3, based on the assumption that slower mobility cated that no base pairs are broken in the four-way junc- results from a more acute angle between the arms. The tion. Therefore, the base pairs located immediately at the speed of reptation through gel pores is proportional to the junction are approximately planar, and potentially avail- end-to-end vector (Lerman and Frisch, 1982; Lumpkin and able for interhelical stacking. Models of four-way Holliday Zimm, 1982), and experimentally this is justified by the junctions have been constructed in whiich the helices results of ligating curved DNA fragments, where the stack in pairs (Sigal and Alberts, 1972). In principle, the retardation increases with curvature (Koo et al., 1986; resulting 80 bp stacked arms could lie parallel along their Hagerman, 1985; Ulanovsky et al., 1986; Diekmann et al., length, but we assume that stereochemical factors and 1987). The slowest species possess complete H + R or electrostatic repulsion between the arms result in an B + X arms (see Figure 3); hence these define the acute opening of the angle (like opening a pair of scissors) to angles of the X. The fastest species possess R + X or B generate the proposed X structure. The exiperimental data + H arms, thus defining the colinear arms of the X; and allow us to distinguish between the alternative forms finally, the B + R and H + X species, which are of inter- shown in Figure 4A. The acute angles between B + X and R + H arms indicate that the strands running ,from arm Section 1.2 - DNA Junction Studies 7 B to arm X and from arm H to arm R do not cross, and thus B+Xor the junction is most closely related to the noncrossed type Ft+H slow depicted in Figure 4A. Thus the full assignment of arms 5 R B+Ror X+H is shown in Figure 4B, which is redrawn in the X-form in

8*Hor ---dLw Figure 4C. X H Molecular Geometry Is Determined by the Sequence at the Junction Figure 1.7: Restriction enzyme products and their relative mobilities along a polyacrylamide gel. Since these restriction enzyme Figure 3. Analysis of the Results of Enzyme Cleavage of Junction 1 If base interactions in the junction are a major determinant products have different mobilities but are the same size in number of base pairs, the authors infer that the slower products are more in Terms of an X Structure compact than the intermediate and fast products, and they propose that the junction exists in a stacked conformer prior to cleavage.of geometry, then it might be possible to change the struc- Adapted from DuckettIn this etmodel, al. [5]. the helical arms are stacked to form colinear helices 80 ture by small alterations in the base sequence in this re- bp in length, which are disposed in a manner which is neither parallel gion of the molecule. We therefore resynthesized the oli- not orthogonal. The six possible pairwise restriction digests thus may They thengive performed three types the of experiment product, omittingwhich will magnesium have different and observedgel mobilities, results consistentthe withgonucleotides used to construct the junction, making two restriction enzymesmore acute cutting angles junctions giving in a square-planarmore retarded structure. migration. They In conclude this way that the without arms magnesium,changes, such that the terminal G-C base pair (i.e., that of the X may be assigned as shown. closest to the junction) of arm H became C-G, and the ter- “phosphate repulsion overcomes the free energy available from helix stacking and the junctions adopt a square-planar conformation with fourfold symmetry”. [See Figure 1.8 for a schematic of the transitions from open to stacked conformations]. They also observed an exchange in helix-stacking partners resulting from changing the terminal (closest to the junction) G-C base pair of arm H to a C-G and the terminal A-T of arm X to T-A. This author’s major contributions are that DNA junctions adopt a stacked conformation at high salt concentration and that the preference for helix-stacking partners depends on the sequence at the center of the junction. The phenomenon of local ion concentrations changing the free energy minimum conformation from square-planar to stacked is a universal observation in DNA junctions. Hinckley et al. [15] note, “ionic strength directly determines the flexibility of dsDNA via the degree of shielding of electrostatic repulsion between phosphate sites”. As Ortiz-Lombardia et al. [16] explain, “in the absence of cations the junction is believed to be extended, with its arms unstacked, and to have a four-fold planar conformation. However, in the presence of metal ions (as occurs in physiological conditions), the arms stack in pairs and form a two-fold nonplanar junction, known as the X-stacked model.” This consensus motivates further research on the structure and dynamics of junctions with mobile and immobile junction sequences and different stacking preferences in and around physiological salt concentrations. Forster¨ resonance energy transfer (FRET) [17], sometimes described as the “molecular ruler”, is a spectroscopic technique which can resolve distances on the A˚ scale. This technique measures energy transferred via dipole-dipole interactions between a donor molecule, which absorbs probing light, and an acceptor molecule, which emits measured signals, as a proxy for the distance between donor and Published online December 21, 2004

Nucleic Acids Research, 2004, Vol. 32, No. 22 6683–6695 doi:10.1093/nar/gkh1006 Conformational model of the Holliday junction transition deduced from molecular dynamics simulations Jin Yu1,2, Taekjip Ha2 and Klaus Schulten1,2,*

1Beckman Institute and 2Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA

Received August 13, 2004; Revised November 3, 2004; Accepted November 29, 2004

ABSTRACT point can hop forward or backward in a stochastic way (3). Extensive branch migration over long stretches of DNA Homologous recombination plays a key role in the requires branch migration enzymes that provide a direction restart of stalled replication forks and in the genera- in an otherwise random walk. The Holliday junction is resolved tion of genetic diversity. During this process, two into two duplex DNA molecules via the action of junction homologous DNA molecules undergo strand ex- resolving enzymes (4), and the extent of genetic information change to form a four-way DNA (Holliday) junction. exchange depends on the orientation of resolution. In the presence of metal ions, the Holliday junction In order to understand how proteins process Holliday junc- folds into the stacked-X structure that has two altern- tions, one needs to know the physical and geometrical prop- ative conformers. Experiments have revealed the erties of the junctions themselves. The Holliday junction free spontaneous transitions between these conformers, in solution adopts multiple conformations [see (5) and refer- ences therein]. Under low salt conditions and in the absence of but their detailed pathways are not known. Here, we multivalent ions, the junction extends to an open form, min- report a series of molecular dynamics simulations of imizing the repulsion between the negatively charged the Holliday junction at physiological and elevated phosphates concentrated at the junction (6,7). In the presence (400 K) temperatures. The simulations reveal new of multivalent cations or in high concentration of monovalent tetrahedral intermediates and suggest a schematic cations, the junction overcomes the electrostatic repulsion framework for conformer transitions. The tetrahedral and folds into the stacked-X structure (6,8–10). By symmetry, intermediates bear resemblance to the junction con- two folded conformers are possible as illustrated in Figure 1. formation in complex with a junction-resolving The two conformers interconvert (11–14) and their relative enzyme, T7 endonuclease I, and indeed, one interme- population depends on the DNA sequence near the junction diate forms a stable complex with the enzyme as (6,14,15). Single-molecule measurements have shown that the 2+ demonstrated in one simulation. We also describe rate of conformational transitions decreases at high-Mg con- centrations (14), indicating that the junction has to unfold first, free energy minima for various states of the Holliday hence achieving an intermediate that resembles the Mg2+ free junction system, which arise during conformer tran- open form, before making the conformer transition (Figure 1). sitions. The results show that magnesium ions stabil- Spontaneous branch migration was slowed by Mg2+ ions, also ize the stacked-X form and destabilize the open and implicating an open intermediate form (16). Furthermore, the tetrahedral intermediates. Overall, our study provides junction unfolds to various degrees upon binding to branch a detailed dynamic model of the Holliday junction migration enzymes or junction resolving enzymes (5). There- undergoing a conformer transition. fore, unfolded open forms of the junction appear to be involved in almost every aspect of junction processing despite being minority species under physiological conditions. Section 1.2 - DNA Junction Studies 8 INTRODUCTION Homologous recombination is an essential process in main- taining genomic stability and its defects can lead to serious human diseases including cancer. To cope with DNA damage encountered during genome duplication, a four-way (Holliday) junction is formed between two nearly identical DNA molecules. Homologous recombination is also an important means for generating genetic diversity to provide Figure 1.8: Conformer transition schematic showing the two conformations with arms stacked in an antiparallel configuration, and theFigure open, planar 1. configurationSchematic as an intermediate view of state. the Adapted Holliday from Yu junction et al. [7]. conformer transition. The raw material for evolution (1,2). The Holliday junction can stacked conformer I (left), the open form (middle) and the stacked conformer II undergo spontaneous branch migration in which the branch (right) are shown. The four DNA strands are labeled as a, b, c and d. acceptor [18]. In time resolved single-molecule FRET (or smFRET) experiments from studies which quantify the dynamics of Holliday junction state transitions, molecules with integrated donor and acceptor *To whom correspondence should be addressed at Beckman Institute, 405 N. Mathews, Urbana, IL 61801, USA. Tel: +1 217 244 1604; Fax: +1 217 244 6078; Email: [email protected] probes are bound to a substrate, and the acceptor signal for each molecule is recorded as a function of time [19]. Time-resolved signals from probes at the ends of two arms of the junction allow for real-time Nucleic Acids Research, Vol. 32 No. 22 ª Oxford University Pressmeasurement 2004; of the all distance rights between reserved the arm ends. These distances are used to extract quantities such as the angle between the arms or, in experiments of junction sequences capable of branch migration, the location of the junction center. Classification of these data can resolve junction migration and isomerization transition rates. McKinney et al. [10] used smFRET to analyze transition rates as a function of DNA sequence, types and concentrations of counterions, and temperature, showing that the conformer transition and branch migration processes share the open structure as a common intermediate. In 2004, Joo et al. [19] used time-resolved smFRET to observe a stacking conformer bias in three different DNA junctions as a function of metal ions, observing that the junction arms stack preferentially upon the addition of divalent cations [See Figure 1.9]. Okamoto et al. [11] used a variational Bayes hidden Markov model [See Chapter 2.3 for an introduction to Markov models] method applied to single-molecule FRET trajectories of mobile junctions to classify three junction locations. With their models they investigated branch migration kinetics and reproduced residence times, transition rates, and free energy differences between states. Experiments from 2016 by Litke et al. [20] depict the difference in FRET efficiency of labels placed at the ends of arms of DNA junctions as a function of ionic concentrations. In Figure 1.10, FRET efficiency increases with increasing salt concentration, plateauing at a saturation value, corresponding to two adjacent arms of the junction stacking. In Figure 1.11, we see a timeseries of FRET energy efficiency as reported by Karymov et al. [21] using mobile DNA junctions with FRET labels at the ends of two of the arms. Since the FRET labels get closer or farther apart when the junction migrates, the FRET efficiency 744 Conformational Species of Holliday Junction

Figure 3. Estimating the popu- Section 1.2 - DNA Junction Studieslation 9of the open structure of junction 1. The XB (circles) and XR (squares) vectors of junction 1 have 2 been studied as a function of Mg þ concentration. Eapp values averaged over ,20 molecules are plotted as a function of ion concentration. The averaged Eapp of the XB vector is close to the Eapp of iso I above 2 2 100 mM Mg þ. Below 10 mM Mg þ, the junction exists primarily in the open structure with Eapp value significantly below that of iso I. In between, a clear and gradual transition in the average Eapp is observed. In contrast, no significant change in Eapp is observed in the XR vector. Figure 1.9: Data from time-resolved FRET experiments to determine junction isoform populations as a function of [Mg2+]. Each point is averaged over 15-30 seconds of recorded fluorescence signals from 20 molecules. The time-averaged apparent energy come into close proximity only in the parallel2+ con- structure has been observed in the presence of transfer efficiency (Eapp) for the XB vector increases with increasing Mg , whereas Eapp for the XR vector remains constant, 37 formations (Figure 4(a)). Time traces for this vector monovalent ions only (e.g. 1 M Naþ). We have corresponding to(Figure an increase 4(b) in transition and (d) rate) to exhibited the stacked Iso2a constant conformation. level (from of Joo ettherefore al. 2004 [19]) used single-molecule FRET spectroscopy Eapp (,0.3) without any transition to high Eapp to examine whether the stacking conformer tran- values, and lacking any anticorrelation between sitions can be observed in Naþ alone, how Naþ is directly relateddonor to and the exact acceptor location intensities of the crossover. (Figure 4(c) At low and salt (e) concentrations,) concentration junction affects migration the transition rates, and 2 at a time resolution of 5 ms in several different whether the conformer bias seen in Mg þ is main- is more likely and the residence time at any particular location is short.2 At higher salt concentrations, solution conditions, i.e. 1, 10 and 50 mM Mg þ tained. The transition rate of junction 7 is lower 2 migration is lesswith likely 50 mM and Na theþ FRET, 30 mM efficiency and 50 plateaus mM Mg forþ longerwithout timescales.than NMR, that x-ray of junction diffraction 1 of (Table 1), thus measure- 2 Naþ, 50 mM and 1000 mM Naþ without Mg þ. ments on junction 7 are more accessible. So, crystallized DNAA quantitative junctions, and limit molecular on the dynamics existence (MD) of parallelsimulations havejunction been 7 used was in used addition to examine to the ionic effects of conformations was established by examining the monovalent and trivalent ions. FRET to elucidate DNA junction structure and dynamics. 2 BR vector in the presence of 30 mM Mg þ with no We studied the HB vector of junction 7 at high For example,Naþ in, at 1997, 5 ms Overmars time resolution. et al. [12] used No NMR transition to observe tothe a dynamicsNaþ concentrations of conversion between (400 mM to 2 M) in the absence 2 higher FRET state was observed within a total of Mg þ. The buffer included 2.5 mM or 5 mM the two stackedobservation conformers time and of found 50 seconds a 71/29 measured populationfrom ratio between 23 EDTA them. Scientiststo chelate have any been residual divalent ions. particularly successfulmolecules. at Therefore, simulating DNA a stable junctions parallel to compare confor- with experimentalFigure 5(a) datashows and the predict time record of a single junc- mation with a lifetime greater than 5 ms must be tion 7 molecule in the presence of 1.5 M Naþ. The structural andexceedingly dynamic quantities. rare, with MD studies a probability by Wheatley lower et al., than Yu et al.,corresponding and Wang et al.E [7–9]app histogram report (Figure 5(b)) com- 0.01%. We have performed the same measurements prises two peaks, although there is broad overlap observationson of the spontaneous XH vector, transitions which should from the also open be tosensitive stacked to conformationthat probably of DNA arises junctions from at time-averaging due to various salt conditions.the presence Wheatley of parallel et al. conformations. [9] used the all In atom addition, AMBER 9fast model exchange potential between for DNA states. to iso I is favored over no evidence of parallel conformations was iso II by 3 to 2, in contrast to the 1:1 distribution 2 study 1DCW,obtained a DNA junction for this with vector 10 base in pairsthe presence per strand. of They 50 mM evaluateobserved the stability in of the the AMBER presence 9 of Mg þ (Table 1). The 2 Mg þ at 6 ms time resolution (Figure 4(f) and (g)). transition rates averaged over 13 molecules were model concerning this junction and check agreement between experimental and MD data21 while predicting 21 In contrast to these properties of DNA junctions, kI II 24 s and kII I 36 s . ! ¼ ! ¼ structural quantitieswe have such observed as the distribution a significant of water population and ions (, around25%) the junction.Figure 5(e) shows transition rates for junction 7 of the parallel form of a four-way RNA junction, measured as a function of Naþ concentration from 2 In 2004, Yuusing et al. [7]equivalent carried outlabeling all-atom MD strategy simulations in at single- various ionic400 concentrations mM to 2 M. to As investigate in the case of Mg þ, increased 38 the thermodynamicsmolecule of junction experiments. conformerThe transitions. failure toThe detect method the they employedNaþ concentration measures free in energy this range reduced the rate parallel conformation of the DNA junction is of the conformer transitions, while leaving the over hundredstherefore of MD simulations. unlikely to They be used an artifact two geometric of the coordinates experi- toratio describe between the globaliso I conforma- and iso II unchanged (data not mentation, and we conclude that its population is shown). Therefore, the effect of Naþ appears to be tion of the junction and observe a local free energy minimum near the tetrahedral form of 30 kcal/mol 2 vanishingly small for four-way DNA junctions. very similar to that of Mg þ, except that much in simulations lacking Mg2+ions. After adding Mg2+ions, they observehigher no minimum concentrations near the folded are needed to stabilize the Transitions between stacking conformers of stacked structures. junction 7 in the presence of monovalent ions alone Transitions between stacking conformers of junction 7 in the presence of hexammine 2 If the role of Mg þ is primarily in the screening of cobalt (III) ions electrostatic interactions, high concentrations of monovalent ions may achieve the same effect as Hexammine cobalt (III) ions have a similar 2 2 Mg þ. Indeed, in previous studies the stacked-X octahedral geometry to Mg þ, but differ in two Section 1.2 - DNA Junction Studies 10

form and locate one Mg2+ion bound at junction exchange point. The authors posit that “magnesium ions stabilize the stacked-X form and destabilize the open and tetrahedral intermediates”, a conclusion consistent with other literature in the field. They suggest that the disappearance of a local minimum explains that Mg2+ions destabilize the tetrahedral form, which is a possible intermediate in the conformer transition. Intracellular cation concentrations can regulate stages of the cell cycle, maintain genomic stability, and are an “essential cofactor in almost all enzymatic system involved in DNA processing” [22]. For example, cations aid in stabilizing G-quadruplexes, which are DNA structures important in chromosomal maintenance and restructuring. J. Lee et al. [23] used total internal reflectance fluorescence (TIRF) to observe that increasing K+ ion concentrations correlates with the ratio of folded to unfolded states in G-quadruplexes. The propensity for DNA structures to undergo conformational changes upon addition of BI83CH31-Herschlagcations due ARI to counterion 3 May 2014 shielding11:57 is also evident in double-stranded DNA. MD studies carried out by Bai et al. [24] simulated two 12-bp DNA duplexes joined by a flexible polyethylene glycol (PEG) tether at various ionic conditions to observe this. [See Figure 1.12.]

abc[CH –CH –O] c 5' 2 2 6 3' TA GC CG CG AT TA CG AT TA CG CG TA GC GC TA GC CG GC AT CG CG GC CG GC 3'5' 3' 5'

FigureFigure 1.12: 6 Data from Monte Carlo simulations of two dsDNA strands linked by a PEG tether show increased flexibility at high The interplay of nucleic acid structural ensembles and ion interactions. The tethered duplex system has been used as a model system to saltstudy concentrations. how ion interactions The left modulate image is the a schematic conformational and sequence ensemble of of the nucleic tethered acids duplex (17, 40, system. 41). (a,b The)Sequenceandschematicofthe images on the right show data, fromtethered left to duplex right system. and top The to bottom, system of consists 0.02, 0.06, of two 0.17, 12-bp 0.3, DNA and duplexes 2.0M concentrations (red ) joined by of a monovalent flexible polyethylene ions. The glycolcolored tether dots (representgreen). (c) Visualization of the computationally derived ensemble of the tethered duplex system at various ionic conditions, from left to right theand location top to bottom: of the distal 0.02, end 0.06, of 0.17, the other 0.3, 2.0 DNA M monovalent duplex, and ions; the color the last represents image (bottom the energetic right) shows difference the ensemble between in the conformerabsence of and theelectrostatics minimum-energy (i.e., steric conformer effects only). observed One duplex in the ensemble. is rendered Adapted in gray, and from the Herschlag colored balls et al. represent [24]. the distal end of the other duplex. Colors represent the energetic difference between the conformer and the minimum-energy conformer observed in the ensemble, from red (<1kBT )toblue(>3kBT ). At a low salt concentration, electrostatic repulsion leads to repulsion between the duplexes; at a higher salt concentration, the repulsion is reduced, and a larger conformational ensemble is explored. Adapted with permission from ReferenceIons can40. Copyright assist in 2008, the Americanfunction Chemical of junction-interacting Society. proteins by inducing conformational changes in both proteins and DNA. Some proteins bind to junctions only when the junction is in certain conformations, andDNA local base salt pairing concentrations and stacking (1–3 and kcal/mol). DNA sequencespresent, can and drive the use the of DNAa PEG to instead adopt of those nucleic conformations. Thus The ion atmosphere reduces this electrostatic acid linker removed the potentially complicat- DNA-ion-proteinrepulsion. Theoretical interactions calculations are (see motivations the ingat features the forefront of linker of electrostatics, studies involving stacking, the dynamics of ion section below titled Can We Compute Ion and hydrogen bonding. We followed the con- bindingAtmosphere to DNA. Properties Having and an Energetics in-depth understanding from formational of ionic ensemble interactions of states with by small-angle DNA junctions in the absence Poisson–Boltzmann Theory?) indicate that X-ray scattering (SAXS); compared the SAXS this repulsion is decreased enormously (very results with predictions from PB electrostatic roughly 10-fold to 6 kcal/mol) in 150 mM calculations for each ionic condition used; and ∼ monovalent salt, and even further in millimolar coupled that with the conformational entropy concentrations of divalent ions. from the PEG tether, determined by extensive Access provided by Wesleyan University - CT on 08/08/18. For personal use only. Annu. Rev. Biochem. 2014.83:813-841. Downloaded from www.annualreviews.org The energetic estimates above come from molecular dynamics (MD) simulations. (As the theoretical calculations that have been difficult PEG tether has only rotatable C–C and C–O to test in the context of nucleic acid folding bonds, it represents the simplest challenge for events. Important work in DNA liquid crys- MD, much simpler than with proteins and nu- tals has successfully measured the energies ver- cleic acids.) sus the distance between long helices but in With an increasing concentration of salt, the the context of a complex, condensed phase tethered duplex relaxed from a highly repelled (66–69). To isolate electrostatic energetics, we state (Figure 6c). Quantitative comparisons used a simple model system: two DNA du- were made to SAXS profiles predicted from the plexes tethered by a short polyethylene gly- PB-MD modeling. [A strength of SAXS is that SAXS: small-angle col (PEG) linker (Figure 6a,b)(17,40).The experimental scattering profiles can be directly X-ray scattering preformed helices have known structures and predicted from a given structure (70, 71).] MD: molecular greatly reduced the number of conformations These comparisons revealed similar behavior dynamics

www.annualreviews.org Nucleic Acid–Ion Interactions 823 • Section 1.2 - DNA Junction Studies 11

Figure 1.10: Conformations have an expected FRET efficiency as either low or high (a). FRET efficiency plotted as a function of ion concentration for magnesium, calcium, cobalt hexamine, terbium, europium and neodimium ions show that increasing ion concentrations leads to higher FRET efficiency (b,c). The points are experimental measurements, and the black lines are curves fit to the experimental data to calculate ion-junction binding dissociation constants for each ionic species. Energy transfer efficiency increases as Mg2+concentrations increase from 0 to 400M. Adapted from Litke et al. 2016 [20].

(a) Low [Mg2+]. This timeseries implies branch migration is (b) High [Mg2+]. The authors explain that the plateau of more frequent at lower salt concentration. The authors note FRET values at the black arrow is a period during which the that individual junction steps happen so fast they cannot be junction is energetically trapped and unable to migrate. fully resolved.

Figure 1.11: FRET efficiency from a mobile junction, where FRET efficiency is a proxy for the distance between the ends of the arms. The upper blue curves are FRET efficiency sampled every 6ms, and the lower red curve is the blue curve after processing through a noise removal algorithm. These data show that increased Mg2+concentrations serve to limit branch migration processes. Adapted from Karymov et al. [21]. Section 1.3 - DNA Junction Melting 12

of proteins is important when beginning to understand protein-junction interactions in the presence of ions. References [16, 21, 25–29] describe studies on the binding specificity of junction-binding proteins. DNA junction ion binding locations and mechanisms have been proposed upon interpretation of experimental and computational results. Litke et al. [20] interpret results from FRET and isothermal titration calorimetry experiments of DNA junctions as evidence that the junction center most likely has two Mg2+binding sites. Wheatley et al. [9] also predicted the distribution of counter-ions and solvating water in a DNA junction using MD simulations. Hyeon et al. [25], through simulations and smFRET experiments, observe dynamical heterogeneity of DNA junctions as a function of solvated Mg2+ions. [See Figure 1.13 for their calculations and Chapter 3.1 for our analogous results.] They understand their data as showing that “the secondary structure of the Holliday junction, particularly at the internal multiloop (which mediates the conformational transition between the two isoforms), is pinned by Mg2+ions”, and that various open-state topologies with different internal multiloops act as transition states between stacked isoforms. Lipfert et al. [26] provide a summary of current research and theoretical understanding of DNA interactions with ionic solvents, but DNA-ion binding processes such as junction-ion interactions, and NATURE CHEMISTRY DOI: 10.1038/NCHEM.1463 ARTICLES their physical principles, implications and intepretations are still not completely understood.

2+ Terminus Distribution of Mg a 6 Junction Middle Terminus

) 4 r (

Middle g Junction 2

0 0 51015202530 –8.0 8.0 r (Å)

Figure 1.13: Radial distribution function between Mg2+and phosphates in a DNA junction with 10 base pairs per arm, calculated bcseparately for phosphates at the terminal, middle, and junction locations of the strands, show preferential binding of Mg2+at the junction. Data is from 100ns MD simulation at 310K,G adapted from Hyeon et al. [25].

–1 G (kcal mol ) G δ QTS T C C A G G C G C C A δ G τI II 6 CCAA A G AAGCG C γ δ τconv Tobs 1.3 DNAGGT JunctionT TTCTCGC T C G Melting T G T C A C C G T G T C A A C C C G G G A γ C T τ I CAAGC A A G Theoretical understanding of DNA junction assemblyII and melting has evolved significantlyβ γ with the G G C τ Tobs GCG TTT T C conv A 4 T A G G A T T C G C A advent of more powerful computational resources and experimental techniques. In 1977, Kallenbach and G C GGCC T G C A T C C CCAAGCA A G C G G τβ G C Berman posited that “shortGGTT TTCGC G duplexA oligonucleotidesI II tend to denature in all-or-none fashion, andα γ significant T A A TT G τconv Tobs C A G C T G G C 2 A T C C populations of intermediates ariseG only in longer chains” [14]. In 1996, Peyrard and Dauxois [27] used T A G C α C G τ C T A I II G C G A G C G T $"##$ !"#! !"#$ $"E A C C A CAAGCC A A G C G AGCA G C G G G iso-II iso-I C GTTCGG T T C G C TCGT C G C A A G C A G C T 0 A T G T T C G T C G A G C C G C G T T A T A C T A G T G C iso-II iso-I C G T C T A A G C G G G C C G T C C C A A G A GA G AAGCG C A G G C AAAGCA G C C T G C T T A C C G T C G A C T G TTCGT T C G T CT C TCGT C G G G A GCTTGCTAGGG G C T G ξ ξ η C C CGAACGATCCC A T T A T A C G C A τ τ G T G A G CGCTGCTACGG C G G C I II obs conv C C GCGACGATGCC C G C A G C C G G G C T A C C G C T G G G CCCAGTTGAGC A T G A C T T G A GGGTCAACTCG C C G C C G G CCCACCGCTCG GGGTGGCGAGA T A Figure 5 | Structural model to account for the origin of molecule-to-molecule variation in the Holliday junction dynamics. a, Left: electrostatic potential calculated at 200 mM monovalent ion condition using an X-ray structure of the Holliday junction (PDB code: 1DCW; Supplementary Section S6).

The energy scale for the potential is in kBT/e units. Middle, right: 100 ns molecular dynamics simulation at T 310 K (Supplementary Section S7), showing 2 ¼ that Mg þ ions are localized more at the junction and grooves than at the terminus. b,Hollidayjunctionswithvarioustopologiesofinternalmultiloops,which are the putative QTSs connecting one state in iso-I and another in iso-II. c,ModelforthedynamicsofaHollidayjunctionconstructedbasedonsmFRET experiments and simulations. On the left are the free energy contours for various states. Two isoforms in each state are connected by a distinct open square form, the structures of which are shown in b.Theensemble-averageddistributionoftheFRETefficiencies,Pens(E), is shown at the bottom. On the right, schematics of the free energy profiles are shown with cartoons of Holliday junction structures. The symbols (star, pentagon and so on) at the junction j j h emphasize that the junction structure is intact during the isomerization process. Hence, tI II j a, b, ... obs tconv↔ j, h a, b, g, ... with j= h ↔ ( = ) ≪ T ≪ ( = ) is established.

2 2 1 by a factor of 1,000 in the presence of Mg þ ions, because this Mg pulse annealing experiments. An immediate prediction process requires! the rupture and formation of base pairs. of our model (Fig. 5c) is that interconversion between states a The calculations summarized in Fig. 5 explain our experimental and b should be facilitated by an annealing protocol, enabling the 2 findings. In the folding landscape of the Holliday junction emerging release of Mg þ ions from frozen internal multiloop structures. from our analyses (Fig. 5c), transitions are only allowed between To validate this prediction we performed single-molecule 2 2 iso-I and iso-II via a band of QTSs within which the free energy experiments using a Mg þ pulse sequence [Mg þ] 50 mM ¼ $ gap is small enough to allow interconversion on obs. A lack of tran- 0mM 50 mM to induce transitions between multiple states sitions between two different states (say a andTb) within a given (Supplementary$ Fig. S2). The annealing experiments confirmed a b 2 isoform ( obs tconv↔ ) is explained by noting that the rupture of that washing Mg þ ions from the Holliday junction molecules 2 T ≪ Mg þ-stabilized base pairs is required for rearrangements from indeed facilitates interconversion between trajectories with distinct one multiloop topology to another. The conformational space con- patterns (compare the trajectories or two ps(E;i)s shown on the necting iso-I and iso-II is partitioned into a number of kinetically side of each panel in Fig. 6a, calculated from the blue and red disjoint states (j a, b, g, ...), reflecting the band structure of intervals of the trajectories corresponding to the moment ¼ 2 the QTS ensemble. In this sense, the persistent pattern of an before and after the Mg þ pulse). We also calculated the smFRET trajectory is an imprint of specific disjoint states in the Euclidean distance of ps(E;i) to the centroid of the five clusters 2 rugged folding landscape. in Fig. 4a before and after the Mg þ pulse annealing

912 NATURE CHEMISTRY | VOL 4 | NOVEMBER 2012 | www.nature.com/naturechemistry

© 2012 Macmillan Publishers Limited. All rights reserved. Section 1.3 - DNA Junction Melting 13 computing resources at the Advanced Computing Laboratory of Los Alamos National Laboratory to simulate a 1-D system of particles coupled by nearest-neighbor anharmonic interactions and subjected to an on-site Morse potential. They show that “one-dimensional phase transitions can exist for mechanical systems of particles with a positions characterized by a continuous variable”, a theory which they describe as applicable to the DNA melting phase transion. In 2018, Ouldridge et al. used a minimal one-site-per- nucleotide coarse-grained model for DNA to study DNA junction self-assembly and melting. They find that junction assembly is “most successful in the temperature window below the melting temperatures of the target structure and above the melting temperature of the misbonded aggregates.” [28] In grounding work for our study, Wang et al. [8] used an implicit ion model of DNA to simulate junction melting at various salt concentrations from 10 to 500mM [Na+]. They compared melting temperatures TM calculated from replica exchange molecular dynamics (REMD) simulations with TM inferred from absorption experiments on junctions of the same sequence, and found that their model + closely mimics the experimental dependence of TM on Na concentration. Figure 1.14 shows the agreement between the experimental and simulated melting temperatures.

Figure 1.14: Data from absorption experiments on DNA junctions (b) matches closely to data from the implicit ion model of 3SPN.2 (a). For the theoretical calculations, the authors carried out 17-20 REMD simulations for 6s each, with exchanges between replicas attempted every 200ps. The melting temperature TM increases monotonically as a function of salt concentration for both the absorption experiments and the simulations. TM differs by < 3% between implicit ion 3SPN.2 simulations and absorption experiments for all salt concentrations. Adapted from Wang et al. [8]. Section 1.4 - DNA Nanotechnology 14 1.4 DNA Nanotechnology

DNA junctions have found novel applications in the field of nanotechnology. A popular method of developing nanotechnology is self-assembly, by which nanoparticles with binding affinity are brought together to form nanostructures. Because of DNA’s ‘sticky-ended cohesion’ binding mechanism, it has proved useful in building theoretical and experimental frameworks for self-organizing DNA-based materials such as lattices, nanotubes, and programmable chemical reaction networks. In 1998, Winfree et al. [29] fabricated 2D DNA junction arrays using two-unit and four-unit polymers, characterizing their constructs using gel electrophoresis and observing the patterned crystals using atomic force microscopy (AFM). In 1999, Mao et al. [30] synthesized 1D and 2D DNA lattices using rhombus-like molecules consisting of four junctions, and subsequently visualized these arrays by AFM. Benson et al. [31] demonstrated a method to generate arbitrarily shaped scaffolded DNA nano-structures, and visualized these 2D sheets in a 10mM MgCl2 buffer using AFM [See Figure 1.15 for depictions of 2D DNA junction nanolattices from these studies].

Figure 1.15: Two-dimensional DNA nanolattices made up of four-way junction monomers, visible through AFM. These lattices can be used as supports for nanostructures such as water purification systems [32], controllable membrane channels [33], and molecular electronic or plasmonic circuits [34]. The first diagram, from [31], shows a schematic of DNA tile assembly. The following two pictures, from [30] and [29], respectively, are AFM images of constructed DNA nanolattices. Section 1.4 - DNA Nanotechnology 15

DNA junctions can also be used in nanotiles which can self-assemble into tubes with lengths on the order of micrometers. In 2013, Zhang et al. [37] explored approaches to trigger DNA nanotube growth made of DNA double-crossover tiles containing a rigid core of two parallel DNA duplex helices, verifying their predictions on self-assembly and exploring kinetic models of catalyzed tube formation through time-lapse TIRF (total internal reflection fluorescence) microscopy and AFM. In 2017, Jorgenson et al. [38] used AFM, transmission electron microscopy (TEM), and fluorescence microscopy to visualize the architecture of various assembled DNA nanotube structures. Figure 1.16 shows schematics used to develop DNA nanotubes created from tiles consisting of two DNA junctions with sticky ends.

Figure 1.16: DNA nanotubes made up of monomers which include four-way junctions to increase the nanotube’s stability. The first two pictures are from Zhang et al. 2013 [35] and the second picture is from Jorgenson et al. 2017 [36].

It has been verified experimentally that DNA junctions can be integrated into chemical reaction networks which perform various computational and nanomechanical functions. In 2018, Cherry et al. [39] used branch-migration domains as subunits in a winner-takes-all neural network that identifies the number represented by a hand-written digit with high precision. A summary of the development of DNA nanotechnology can be found in [40] and references within. Understanding the mechanisms by which DNA junctions melt and undergo conformational transitions, and the environmental conditions which determine their conformational specificity and melting temperature and dynamics is important for pushing this burgeoning field forward. Section 1.5 - Motivation and Objectives 16 1.5 Motivation and Objectives

The theory of DNA junction dynamics and junction-protein interactions have been thoroughly investigated using experimental methods as well as all-atom and coarse-grained MD simulations of junctions of various size and mobility in various solvent environments. In this thesis, we investigate the dynamics of a mobile and an immobile DNA junction using a coarse-grained implicit ion model for DNA by simulating DNA junctions at a range of temperatures and in various ionic concentrations on timescales large enough to gather statistics on equilibrium fluctuations, melting dynamics, and junction migration. In Chapters 3.1 and 3.3 we analyze the equilibrium structure at Na+ and Mg2+ concentrations around physiological levels. We detect preferential ion binding at the junction core and calculate junction isoform population distributions and transition rates to compare with ion binding and isoform population distribution data present in the literature. In a multidisciplinary collaboration, researchers use experimental techniques to probe the melting process in parallel with our simulations. Chapter 4 elucidates the dynamic process of junction melting by comparing fluorescence melting data on the stability of specific bases with data from our simulations. Finally, we simulate junction migration to show the explicit ion 3SPN.2 model returns qualitatively consistent results on the salt-concentration dependence of migration probability. From our data, we highlight the advantages and potential shortcomings of the 3SPN.2 coarse-grained implicit ion model for DNA to simulate DNA junctions and offer useful insights for future computational and experimental studies of DNA junctions. Chapter 2

Model and Simulations

The notion that a numerical result should depend on the relation of object to observer “ is in the spirit of physics in this century and is even an exemplary illustration of it. ” Benoit Mandelbrot, The Fractal Geometry of Nature, 1982

In 1953, Metropolis et al. computed the first simulation of a liquid using the Metropolis Monte Carlo technique at Los Alamos National Laboratories on then one of the most powerful computers available [37]. In 1957, Alder and Wainwright simulated a system of hard spheres by integrating classical equations of motion [38] and in 1964, Rahman solved the equations of motion for a set of Lennard-Jones particles [39]. Since this groundwork, biologists’ interest in MD has surged due to its usefulness in explaining links between biomolecular structure and function. Simulating over relevant timescales, researchers explore how macro and micro structure contribute to functional mechanisms. MD simulations integrate the equations of motion of a configuration of particles according to forces calculated from locations, masses, charges, and interaction potential gradients. Simulating many-particle systems is expensive since simulation time scales quadratically with number of particles. Coarse-graining, by modeling groups of atoms as one particle, grants the opportunity to simulate large systems for timescales intractable using atomistic models. Coarse-grained models have drawbacks; they cannot resolve details smaller than the particle discretization and must be parameterized to match system characteristics, which are potentially arduous to determine experimentally. Biological systems are usually immersed in solvents, most commonly water and ions, which can further increase simulation time and space requirements.

17 Section 2.1 - Coarse-grained DNA Model 18 2.1 Coarse-grained DNA Model

To simulate DNA junctions we use 3SPN.2, a coarse-grained three-site per nucleotide model for DNA developed by the de Pablo group at University of Chicago [15, 40, 41]. The 3SPN.2 model represents 3− each DNA base with three particles. For each base, one particle represents the phosphate group (PO4 ), another particle represents the backbone sugar (C5H10O4), and another particle represents the nucleoside, each placed at the center of mass of the corresponding moiety [See Figure 2.1]. Using a top-down parameterization relying on experimentally measured duplex and hairpin melting temperatures, base-step and base-stacking free energies, and equilibrium bond lengths, bend angles, and dihedral angles, this model captures properties of B-DNA such as sequence and salt-concentration dependent duplex and hairpin melting temperatures, major and minor groove widths, and persistence length of single-stranded and double-stranded DNA. With anisotropic potentials between force sites, this model can accurately simulate DNA to study DNA hybridization, DNA-protein binding, and nano-engineered DNA-hybrid materials such as DNA origami and DNA liquid crystals [15].

Figure 2.1: All atom (left) to coarse grained (right) representation of a 10 base pair long strand of B-DNA using the 3SPN.2 model. The middle image shows the mapping of the coarse-grained model on top of the atomistic model, with the coarse-grained force sites placed at the center of mass of the corresponding moeity. Gold particles represent phosphate group sites, yellow particles represent sugar sites, and purple and blue particles represent nucleoside sites. Adapted from [42].

Potentials, parameterized from experimental data, are categorized as either bonded or non-bonded. Bonded potentials include harmonic and anharmonic linear bond potentials, harmonic angle potentials, and Gaussian well dihedral potentials. Melting temperatures Tm for various B-DNA sequences were obtained via UV absorbance measurements, and inter-strand non-bonded interactions were adjusted until the free energies of hybridized and dehybridized states were equal at Tm. Intra-strand base stacking free energies were measured by nicking DNA duplexes and examining their relative electrophoretic mobility, and stacking potential strengths were adjusted until the simulated free energy of stacking agreed with the Section 2.1 - Coarse-grained DNA Model 19 experimental data. Non-bonded potentials consist of excluded volume contributions, intra-strand base stacking, inter-strand cross-stacking, base pairing interactions, and the electrostatic potential. Equilibrium bond lengths, bend angles, and dihedral angles are obtained from the B-DNA fiber crystal structure [41]. Relative entropy coarse graining, in which “reference all-atom simulations are targeted in an iterative approach to find the effective coarse-grained potential that preserves the information in the all-atom ensemble” [15], was used to model the pair correction and the bond and bend angle potentials. A repulsive Lennard-Jones potential of the form  σij 12 σij 6 r[( ) 2( ) ] + r r < rc X  rij rij Uexe = − (2.1) i

qiqj Uion−ion(rij) = + Ucorr(rij) (2.2) 4π0(T )rij

th th “where qi and qj are the charges of the i and j ions, 0 is the dielectric permittivity of vacuum, (t) is the solution dielectric, and rij is the intersite separation” [41]. This model was previously used by Prytkova et al. [43] to simulate DNA melting in small-molecule- DNA-hybrid dimer structures (SMDHs). This model has also been coupled with the mW-ion model by Demille et al. [42] to model DNA with explicit solvation by water and ions. A previous version of the 3SPN.2 model, which included ions implicitly in the potentials as opposed to an explicit representation, was used by Wang et al. [8] to validate this previous version of 3SPN.2 as a model that accurately reproduces experimental data. Their paper confirms the implicit ion model’s ability to reproduce exper- imental results about DNA junctions, including the preference for a stacked isoform at high salt, the existence of a square-planar intermediate between stacked states, and salt concentration dependent melting temperatures. Figure 1.14 shows junction melting curves from the implicit ion 3SPN.2 model at various salt concentrations and from experimental fluorescence melts. Part of the work described in this thesis verifies the 3SPN.2 explicit ion model returns similar melting dynamics and isoform distributions for DNA junctions that the 3SPN.2 implicit ion model does. All simulations were performed with serial computation using LAMMPS (Large-scale Atomic/Massively Parallel Simulator), an open-source, general purpose MD simulation engine developed and distributed by Sandia National Laboratories. We input the initial configuration, temperature, a thermostat random Section 2.2 - Molecular Simulations 20 seed to assure trajectories do not replicate data, 3SPN.2 pair coefficients and potentials, neighbor list parameters, initial velocity distributions, time step, simulation length, output file formats, and output rate. Output data include thermodynamic information such as temperature, kinetic, potential and total energy, pressure, and volume. For our analyses we investigate the trajectories, which capture the positions of all particles in the system at each snapshot.

2.2 Molecular Simulations

We simulated three different ensembles of DNA junctions, split into these categories: 1. An immobile junction ensemble at constant temperature, to determine junction isoform population distributions at various salt concentrations. 2. An immobile junction ensemble at physiological salt concentration above the melting temperature, to describe the dynamics of the melting process and compare with fluorescence melting data. 3. A mobile junction ensemble to investigate the explicit ion 3SPN.2 model’s capacity to predict the dependence of junction mobility on salt concentration. For the equilibrium and melting ensembles, we simulated a 34 base pair junction J3 previously studied with the 3SPN.2 implicit ion model by Wang et al. [8] and fluorescence melting experiments by Savage, McDonald and Litke [4,44,45]. The terminological J3 refers to the choice of the base sequence at the core of the junction, which determines the tendency for the junction to stack. The immobile junction sequence has a total 136 base pairs and 404 force sites.

Immobile junction sequence Strand Sequence (5’-3’) B CCTCCGTCCTAGCAAGGGGCTGCTACCGGAAGGG H CCCTTCCGGTAGCAGCCTGAGCGGTGGTTGAAGG R CCTTCAACCACCGCTCAACTCAACTGCAGTCTGG X CCAGACTGCAGTTGAGTCCTTGCTAGGACGGAGG

Table 2.1: Sequence of strands that comprise the mobile junction.

For the equilibrium ensemble, we solvated the junction in a 200A˚ box with periodic boundary conditions at 16 different combinations of Mg2+ and Na+ concentrations. The salt concentrations and the number of solvating Mg2+, Na+, and Cl− ions at each of these concentrations are given in Table 2.2. At each of these 16 salt concentrations we simulated 10 different trajectories for 1 µs, starting from Section 2.2 - Molecular Simulations 21

Number of Mg2+, Na+ and Cl− particles at each salt concentration [Na+] (mM) 37.3 127.2 227.2 327.3 0 0, 180, 48 0, 613, 481 0, 1095, 963 0, 1577, 1445 0.8 4, 180, 56 4, 613, 489 4, 1095, 971 4, 1577, 1453 [Mg2+] (mM) 9.9 48, 180, 144 48, 613, 577 48, 1095, 1059 48, 1577, 1541 49.8 240, 180, 528 240, 613, 961 240, 1095, 1443 240, 1577, 1925

Table 2.2: Salt concentrations and number of solvating particles at each concentration for the equilibrium ensemble. the open configuration. Particle velocities were fixed at each timestep using a Langevin thermostat at 310K with a damping parameter of 400 time units with the Gronbech-Jensen/Farago formulation so that the total force on each atom has the form s m k T F = F + F + F = F + ( )v + c b m (2.3) c f r c damp dt damp ∗ Langevin dynamics mimics the quasi-random movement of particles within a fluid due to collisions. Since our simulation box contains no water particles, Langevin thermostatting enables solvent-induced diffusion dynamics, i.e. a thermal noise accounting for the lack of solvent. All force site coordinates were unwrapped, so if a particle passes through a periodic boundary, the particle position was logged as if it had not been unwrapped into the periodic box. Particle positions were saved every 20,000 timesteps, corresponding to 0.2ns of simulated time between trajectory snapshots. For the melting ensemble, we first simulated the same junction as in the equilibrium ensemble at a range of temperatures below and above the known melting point to determine the best temperatures at which to analyze melting dynamics. For each temperature from 320K to 380K in steps of 5K we equilibrated 20 replicas of the open conformation of J34 in a 200A˚ box with periodic boundary conditions at [Na+] = 177.2mM (854 Na+ and 722 Cl− sites) and [Mg2+] = 0mM for 20ns. Then we heat the system by increasing the thermostat temperature and simulate an additional 1µs. At each of these temperatures we calculate the mean fraction of broken bases and plot it versus temperature [See Figure 2.2]. From this curve we determine temperatures above 355K have a mean fraction of broken bases above 0.5. Then we launched 100 simulations at each of 4 different temperatures (360, 365, 370, and 375K) above the melting point, using the same simulation protocol as previously; the simulations begin with a 20ns equilibration at 310K, followed by 0.5µs at the higher temperature. Data from melting simulations was compared with data from fluorescence melts of DNA junctions formed by strands with the same sequences we simulated. Section 2.2 - Molecular Simulations 22

Using fluorescent nucleotide analogs inserted in place of regular nucleotides at specific bases of interest along the junction, fluorescence melt data shows the relative stability of specific bases [See Chapter 4 and figures therein].

Figure 2.2: Melting simulation protocol and preliminary data to determine optimal temperatures to simulate melting dynamics. For the blue dotted curve, 20 replicas were launched at temperatures from 320 to 380K in steps of 5K. For the orange curve, we simulated 100 replicas at temperatures from 360 to 380K in steps of 5K. Our analysis of melting dynamics is carried out using these higher temperatures.

For the mobile junction ensemble, we use 44 base pair DNA sequences [See Table 2.3] developed by Wujie Wang to study extended branch migration through polyAT regions at the junction center, as described by Karymov et al. [21]. At each of the salt concentrations at which we simulated the immobile junction equilibrium ensemble [See Table 2.2], we ran 5 independent simulations which start with 0.05ns of equilibration followed by 1.5 microseconds of production. Trajectory data was dumped every 20,000 timesteps, or 0.2 nanoseconds.

Mobile junction sequence Strand Sequence (5’-3’) B TAAGCTTGCAAGCATATATATATATATCTCGTAATTTCCGGTTA H TAACCGGAAATTACGAGATATATATATAGATGCATGCAAGCTTC R GAAGCTTGCATGCATCTATATATATATAATACGTGAGGCCTAGG X CCTAGGCCTCACGTATTATATATATATATATGCTTGCAAGCTTA

Table 2.3: Sequences of strands that comprise the mobile junction. The poly-AT region at the center of each strand allows this junction to exchange base pairs along strands more easily than the immobile junction, which contains both purine (A and G) and pyrimidine (C and T) base pairs at the junction center. Section 2.3 - Hidden Markov models and our implementation 23 2.3 Hidden Markov models and our implementation

Hidden Markov models describe systems with internal states governed by the “limited horizon assumption” that the following state depends only on the previous state. An application for implementing hidden Markov models is a coarse model for weather, which can have one of three possible states: rainy, cloudy, or sunny. Tomorrow’s weather depends on today’s; if today is cloudy we have a 50% chance of rain tomorrow, a 25% chance of sun and a 25% chance of another cloudy day. These transition probabilites and the probabilities for tomorrow’s weather given today is rain or sunny compose a transition matrix h i T which determines weather evolution. A cloudy day today, the unit vector 0 1 0 , multiplied by h i T , returns a vector 0.5 0.25 0.25 representing tomorrow’s weather possibilities. Continued matrix multiplication operations yield the steady state describing the normalized probability of each weather condition after infinite time. The steady state can also be calculated directly from the transition matrix with the equation

(T | (I + 1))−1 v (2.4) − · where I is the identity matrix, 1 is the matrix of all ones, and v is a unit vector the same size as T . These models are useful for classification of molecular structures because they can “automatically iden- tify key conformational states in a way that is unbiased, human-readable, convenient, and rigorous” [46]. A literature review of hidden Markov model applications to biophysics is available from Shukla et al. [46]. In 2006, McKinney et al. [47] used Markov models to analyze FRET signals from fluorescent labels placed at the ends of the junction’s arms, showing DNA junction isomerization. These models were trained to recognize a two-state system with Gaussian emissions, and identified state probabilities, transition rates, and emission spectra for each isomerization state. Okamoto et al. [11] also used Markov models applied to single-molecule FRET trajectories of mobile junctions to study junction migration dynamics. In 2002, Thayer et al. [48] implemented Markov models “based on probablistic roll/tilt dinucleotide models of sequence-dependent DNA structure” to classify protein binding sites. They then proceeded to test the trained models ability to predict the potential for binding sites in unknowns. As Shukla et al. note, “there are still challenges with identifying the best decomposition of conformational space...and connecting [Markov models] to experimental data” [46]. In our discussion of the data output by our Markov models, we consider the limitations and errors imposed by using this method to analyze our results. Section 2.3 - Hidden Markov models and our implementation 24

For our implementation of hidden Markov state models, we used the Mathematica function Estimat- edProcess, which accepts two inputs: the timeseries of measurements (either angle, distance, or both angle and distance data [See Chapters 3.2.2 and 3.2.1 for how we take these measurements]), from every simulation at specific salt concentrations as a table of TemporalData, and a HiddenMarkovProcess. Prior to running the EstimatedProcess, we input the number of states and the functional form of the emission probability densities into the HiddenMarkovProcess. These emission probability densities can take any functional form, the most common being exponential and Gaussian. Our emission probability densities are encoded as a ProductDistribution composed of either four (for when we analyze angle data or distance data) or eight (for when we analyze angle and distance data together) NormalDistributions. Thus each “state” [see Figure 3.3 and Chapter 3.2.4 for the explanation of how we label and classify these states]) has either four or eight Gaussian emission probability densities. The task of the training script is learning the emission probability density parameters by maximizing the log-likelihood of the inputted sequence of observations. After training, the code returns the transition matrix between states and the mean and standard deviation of each Gaussian. From the transition matrices and Equation 2.4 we derive the stationary state of the system which constitutes the isoform population distributions at each salt concentration. A previous implementation of Markov models to DNA junction MD simulations as described by Hyeon et al. [25] use the number of states as a variable parameter; here, we classify three states (Iso1, Open, Iso2) as in Wang et al. [8], for the simplicity of analysis using this technique and to compare this analysis to other classification methods outlined in Chapters 3.2.2 and 3.2.1. Chapter 3

Equilibrium Structure and Dynamics

Under normal conditions the research scientist is not an innovator but a solver of “ puzzles, and the puzzles upon which he concentrates are just those which he believes can be both stated and solved within the existing scientific tradition. ” Thomas S. Kuhn, The Structure of Scientific Revolutions, 1962

3.1 Radial distribution function

The radial distribution function, or density-density correlation function, shows the probability a particle from one chemical species is a distance from a particle of another species, and is calculated by:

V  X X  g(r) = δ(r rij) (3.1) 2πr2N 2 − i j where V is the simulation box volume, N is the number of atom pairs, the sums are over all the atoms th th in each species, and rij is the distance between the i atom in the first species and the j atom in the second species. We split the phosphates into two “species” to investigate preferential ion binding to the junction center. Figure 3.1 and 3.2 show g(r) for Na+ and phosphates and Mg2+ and phosphates at physiologically relevant salt concentrations ([Na+] 100). ≥

25 Section 3.1 - Radial distribution function 26

√ Figure 3.1: As r approaches the box size (200A˚ ), g(r) approaches the ionic concentration. For 3 ∗ 200 A˚ > r > 200 A˚ , g(r) approaches zero because no ions exist outside the corner-to-corner distance. The radii of the peaks of g(r) remain constant across salt concentrations and pair selections, while the peak heights vary. g(r) decreases as Mg2+ concentration increases as a consequence of Mg2+ ions displacing Na+ ions from phosphate sites. The first peak in g(r) is higher for phosphates at the center of the junction, consistent with g(r) calculations from Hyeon et al. [25] [See Figure 1.13].

Figure 3.2: The first peak in the Mg2+ g(r) begins at a larger radius than the first peak of the Na+ g(r) most likely because the Lennard-Jones radius of Na+ in 3SPN.2 is 2.494 A˚ , as opposed to 4.0 A˚ for Mg2+. For our calculations we convert from a probability density into a molar density of ions to make a one-to-one comparison between the radial distribution of ions and the total box ion concentration. By observing the radius at the largest peak in g(r), we obtain the most likely distance an ion resides from any given phosphate in the junction, and by integrating the distribution function over the first peak, we obtain the mean number of nearest neighbor ions. Section 3.1 - Radial distribution function 27

To predict nearest-neighbor solvation numbers, we use the following integration:

Z r0 N = 4πρr2g(r)dr (3.2) 0 where g(r) is in units of number density, ρ is the number density of ions in the box, and r0 is the radial distance to the end of the first peak. The mean number of ions in the first solvation shell of phosphates at the center of the junction and phosphates not at the center of the junction are in Tables 3.2 and 3.1.

First solvation shell occupancy for Na+ ions At junction center Not at junction center [Mg2+] (mM) 0 1 10 50 0 1 10 50 10 1.19 1.04 0.34 0.12 0.83 0.76 0.29 0.11 100 4.74 4.41 1.56 1.38 3.37 3.19 1.23 1.17 [Na+] (mM) 200 7.82 7.56 6.20 4.15 5.30 5.21 4.67 3.44 300 13.61 13.4 11.27 7.89 9.79 9.64 8.56 6.51

Table 3.1: Number of first nearest Na+ neighbors. g(r) was integrated to 4.3 A˚ for each salt concentration. There are more ions (about twice as many) residing near phosphates at the center of the junction than there are ions bound to other phosphates located closer to the junction center. As the magnesium concentration is increased, sodium ions are displaced from the center of the junction, and as the sodium concentration is increased, magnesium ions are displaced from the center of the junction.

First solvation shell occupancy for Mg2+ ions At junction center Not at junction center [Mg2+] (mM) 1 10 50 1 10 50 10 0.004 0.324 2.078 0.002 0.206 1.423 100 0.002 0.063 1.801 0.001 0.041 1.195 [Na+] (mM) 200 0.001 0.121 1.499 0.000 0.065 1.000 300 0.001 0.099 1.223 0.000 0.053 0.823

Table 3.2: First solvation shell occupancy for Mg2+ ions. g(r) was integrated to 5.3 A˚ for each salt concentration. As the sodium concentration is increased, magnesium ions are displaced from the first solvation shell at the center of the junction, and as magnesium concentrations increase, the first solvation shell fills up with magnesium ions. At [Na+] 10mM and [Mg2+] 50mM, the occupancy of the first solvation shell is around 2, which agrees with data from Litke et al. [20] that suggest two divalent cations at the center of the junction are necessary for specific folding of the junction. Section 3.2 - Methods to determine junction isoform 28 3.2 Methods to determine junction isoform

To classify microscopic coordinates into macroscopic data concerning junction conformations with implications for DNA junction biology, biophysics, and nanotechnology, with our goal robust analyses with the ability to measure the isoform during junction melting, we tried four criteria to determine junction conformations in thermodynamic equilibrium. 1. Using distances between base pairs on opposing strands at the center of the junction. 2. Using angles between three successive phosphates in the backbone at the center of the junction. 3. Using the RMSD of configurations from an ideal isoform configuration. 4. Using data from the first two methods as input for Markov models to automatically classify conformations.

Figure 3.3: In the Iso1 conformation (a), distances dTT ∪ dCC are small (<12 A˚ ), dAG ∪ dAG are large (>12 A˚ ), B (yellow) and R (green) strands are linear, and H (red) and X (blue) strands are stacked. In the Iso2 conformation (c), dAG ∪ dAG are small, dTT ∪ dCC are large, B and R strands are stacked, and H and X strands are linear. In open conformations, all distances are large ◦ and angles between adjacent arms are ' 90 . Adapted from Wang et al. [8]

3.2.1 Central base distances

Simulating DNA junctions with the implicit ion 3SPN.2 model, Wang et al. [8] used distances between bases at the junction center as a metric to determine isoform [see above for an outline of this process and Figure 3.4 for a histogram of central base distances and isoform population distributions calculated from this method]. Figure 3.4 shows the implicit ion 3SPN.2 model predicts the junction preferentially adopts the stacked conformations at high salt, with a 36/58 ratio between Iso1 and Iso2. Their calculations agree with results presented by McKinney et al. [10] who find a 33/77 ratio between stacked isoforms. Transition probabilities calculated from this model show that the open state acts as an intermediate between stacked Section 3.2 - Methods to determine junction isoform 29

isoforms. The authors note that this ratio underpredicts the bias toward Iso2 by about 15% when compared to experiments by Ha et al. [10]. They also note that these distances could be accessible experimentally using fluorescent nucleotide base analogs at the junction center. Since these data were calculated from the implicit ion version of 3SPN.2 on an identical junction sequence, we perform the analysis on explicit ion model simulation trajectories to see if results are consistent across model versions. Figure 3.5 shows a normalized histogram of the central base distances from our simulations, plotted for each salt concentration. Qualitatively, we observe that at low salt concentration, the distributions reflect mostly open junctions, and at higher salt concentrations, the peaks www.nature.com/scientificreports/reflect stacked junctions becoming more populated.

(a)FigureEach 5. successive Criterion salt for concentration distinguishing is shifted junction 0.05 conformations. up the (b) DistributionThe authors of performed inter-base 100separation additional at the simulations launched middle of the junction for (a) the AG bases, where a small separation identifes the iso-II conformer, and (b) y-axis.TT or AtCC the pairs, lowest where salt a concentration,small separation the identi peakf ates largethe iso-Id conformer.from initial Te configurationslonger distance with peak ratios at low conforming salt to the original isconcentration due to open conformations is due to open beingconformations; dominant. at The higher vertical salt, it arisesisoform primarily populations from the to complementary check if populations stacked are sensitive to initial form. Te vertical dotted line indicates the cutof criterion we use to subsequently identify conformational states dottedof individual line indicates confgurations. the 12 A˚ cut-off used to differentiate configurations and found population estimates remained stable states. within statistical uncertainty.

Figure 3.4: Normalized histograms of central base distances and isoform distributions for junctions simulated using the implicit ion 3SPN.2 model. Data was calculated from 200µs of simulations at each salt concentration, with simulations launched from the open configuration. Adapted from Wang et al. [8].

Figure 6. Molecule-to-molecule variations in conformational sampling and conformational transition probabilities. (a) Example time series of junction conformations for fve of the 100 ensemble members at [Na+] = 300 mM. The open isoform is short lived, and acts a transition state between iso-I and iso-II. (b) Matrix of the transition probabilities from a given starting state to fnal state at [Na+] = 300 mM. Te transition probabilities to the same state (diagonal elements) are not shown, since the tendency to remain in the current state dominates the scale of other transition probabilities49. Note that the transition probabilities for iso I→ II (and vice-versa) are nearly zero. (c) Salt concentration dependence of the four key transition probabilities.

Scientific RepoRts | 6:22863 | DOI: 10.1038/srep22863 6 Section 3.2 - Methods to determine junction isoform 30

Figure 3.5: Normalized histogram of central base distances for all salt concentrations. We plot data from each Mg2+ concentration, with different Na+ concentrations shifted vertically by 0.1, and label the peaks of each graph as Iso1, Iso2, or Open according to rules prescribed in Figure 3.3. The dotted vertical line is the estimated cut-off distance of 12 A.˚

1.0

0.8

0.6

0.4 ]( �� ) [ ��� +]

0.2

0

[��+] (��)

Figure 3.6: Isoform population distributions from base distances criterion. The shade of red and the inset number correspond to the isoform population calculated using this method at each respective salt concentration. Section 3.2 - Methods to determine junction isoform 31

3.2.2 Consecutive phosphate angles

Observations of higher than expected Iso1 probabilities at high [Na+] led us to double check our results using alternative methods to measure isoform. Since junctions are described by which strands are linear versus stacked, we calculate angles between the three successive phosphates at the center of each strand as alternative criteria to determine the isoform [See Figure 3.7]. We label them by their strands (ie. H, B, X, R) and plot their normalized distributions in Figures 3.8. Bimodal angle distributions led us to use a cutoff angle method as previously used with the base distance criteria.

Figure 3.7: Schematic of the angles between successive phosphates measured for this analysis. Phosphate, sugar, and base sites, and the linear bonds between them are visualized. Measuring this angle helps us to visualize junction configurations in a way that we cannot with only the distances between bases at the junction center.

At lower salt concentrations, where stacked isoforms do not dominate, these histograms have a less clear cut-off angle between the two peaks. To validate this method’s ability to classify isoforms despite the peaks not being as clear-cut as the peaks in the distance histograms, and to check if the choice of cut-off angle alters isoform population distributions, we measure the isoform population distributions using 5 different angle cutoffs from 110◦ to 130◦ in steps of 5◦. We observe no qualitative difference in isoform distributions for these different angle cut-offs. These population distributions are similar to those calculated from the base distances criterion, leading us to believe the results returned by the base distances criteria. Even though the angles from simulation trajectory data do not have a direct analogy to measurements that can be easily made experimentally, we learn about the junction structure from these data. At high salt concentrations, the distributions reflect a stacked junction, and at low salt concentrations, the distributions reflect a more planar conformation. In our analysis of this data using hidden Markov models, we posit that a tetrahedral structure is a more likely alternative than the square planar conformation to be an intermediate between stacked conformations. Section 3.2 - Methods to determine junction isoform 32

Figure 3.8: Angles from opposing strands are binned together. In Iso2, B and R arms are stacked and H and X arms are quasi-linear. In Iso1, B and R angles are quasi-linear (> 90◦) and H and X angles are small (< 90◦). Open and tetrahedral configurations exist as intermediates between the stacked states.

1.0

0.8

0.6

0.4 ]( �� ) [ ��� +]

0.2

0

[��+] (��)

Figure 3.9: Isoform population distributions from central phosphate angles criterion with a cut-off angle of 130◦. These distributions are qualitatively similar to those calculated from the base distances criterion. Section 3.2 - Methods to determine junction isoform 33

3.2.3 RMSD between ideal isoform structures and current snapshot

As a third method to determine the junction configuration, we compare the structure of the junction at each trajectory snapshot to the idealized structure of each isoform. Root mean squared distance (RMSD) calculations quantify how much a molecule deviates from a given structure, and are numerically evaluated using v u n u 1 X 2 RMSD(v, w) = t vi wi (3.3) n || − || i=1

th th where vi is the location of the i atom relative to the molecule’s origin at time v, wi is the i atom’s location at time w, and the sum is over all n simulated particles in the structure. Since molecules rotate and translate during simulation, we transform the junction coordinates onto the idealized isoform structure to minimize the RMSD with respect to the ideal structures. Using the Python package ‘RMSD’ [49], which implements the Kabsch and quaternion algorithms [50, 51] to find the best transformation to relate two sets of vectors, we calculate the RMSD between the current snapshot and each ideal structure, classifying the isoform at that snapshot as that with the smallest RMSD value. Isoform distributions calculated from this classification method are below. This method struggles to produce results that match the literature; open populations are small at low [Na+] and [Mg2+], and stacked Iso2 populations dominate. Some results are consistent with other methods, however; there is an increase in the Iso1 state at [Na+] = 327mM, which is also visible in population distributions from the angle and base distances methods. We considered center-averaging the RMSD timeseries to remove noise. However, given the success of the previous two methods relative to this one, we did not continue this exploration.

1.0

0.8

0.6

0.4 ]( �� ) [ ��� +]

0.2

0

[��+] (��)

Figure 3.10: Isoform distribution by using RMSD from ideal isoform. Section 3.2 - Methods to determine junction isoform 34

3.2.4 Markov model state classification

In this section, we present the results obtained from training the Markov state models on our angles and distance timeseries. Tables 3.3, 3.4, and 3.5 show the means of the four Gaussian emission probability densities for the Markov models trained on the distance data from bases at the center of the junction, the phosphate angles at the center of the junction, and the combination of the base distances and the phosphate angles. To create these tables, we order states by the first value in the list of Gaussian emission means (AG for the base distances table, H for the angles and distances tables). For the base distances Gaussian emission probabilities, standard deviations were on the order of 1-10 A˚ , and for the angle Gaussian emission probabilities, standard deviations were on the order of 10-35◦. Since each trained model returns a transition matrix, leading to 16 independent transition matrices, we do not plot them here [refer to Chapter 3.3 for our analyses of equilibrium dynamics from transition matrices]. Figures 3.11, 3.12, and 3.13 show isoform population distributions for the base distances, angles, and base distances and angles Markov models, calculated from the transition matrices by Equation 2.4. As a double check that Equation 2.4 works, we also empirically derive the stationary state by feeding the temporal data back into the trained model to classify the isoforms at each timestep, and find that the analytical solution and the empirical evaluation of isoform population distributions agree remarkably. There are advantages and drawbacks of using these automated classification methods. First, the states labeled as Open are classified with angles larger than 90◦, which points to a tetrahedral form of the junction being a probable transition state between stacked isoforms. This is an advantage of this method; we recover details of the equilibrium structure that we could not have with simple cutoff methods. A drawback is that some states were “miscategorized” at low salt. At low salt concentrations, when the stacked isoforms do not dominate, we still used the model to classify three states. At [Na+] = 27mM, 127mM and [Mg2+] = 0mM, 1mM, where we do not expect the Iso1 population to be present, the isoform population distribution predicts a large population [See Figures 3.11, 3.12, and 3.13]. Referencing the tables at the concentrations where we see the population is larger than expected, i.e. [Na+] = 27mM [Mg2+] = 0mM in Table 3.4, the mean angles (138, 111, 119, 106) and mean distances (26, 26, 25, 26) are indicative of a tetrahedral or square planar conformation rather than a stacked one. Thus we determined that these states were miscategorized; the Markov model determined these as separate states from the open configuration because we told the model to classify three states, and we mislabelled the states as Iso1 because we order and then label the states based on the first of these mean Gaussian emission values. Section 3.2 - Methods to determine junction isoform 35

Markov model mean base distances for trained classes [Na+]mM, [Mg2+]mM “Iso2” “Open” “Iso1” AG AG TT CC AG AG TT CC AG AG TT CC Distance (A)˚ Distance (A)˚ Distance (A)˚ 27, 0 12, 13, 23, 23 19, 20, 20, 20 26, 26, 25, 26 127, 0 8, 9, 23, 23 18, 19, 22, 23 20, 21, 16, 16 227, 0 8, 8, 23, 23 17, 18, 21, 21 22, 22, 10, 10 327, 0 8, 8, 23, 23 17, 18, 20, 21 22, 22, 9, 9 27, 1 11, 12, 23, 23 19, 20, 19, 20 25, 25, 24, 25 127, 1 8, 9, 23, 23 18, 19, 21, 22 21, 21, 14, 14 227, 1 8, 8, 23, 23 17, 18, 20, 21 22, 22, 9, 10 327, 1 8, 8, 23, 23 16, 17, 21, 22 21, 22, 11, 12 27, 10 8, 8, 23, 23 18, 19, 19, 20 22, 22, 8, 8 127, 10 8, 8, 23, 23 17, 17, 18, 19 22, 21, 7, 8 227, 10 8, 8, 23, 23 16, 18, 20, 21 22, 22, 9, 9 327, 10 8, 8, 23, 24 15, 16, 21, 21 22, 22, 9, 10 27, 50 8, 8, 23, 23 17, 18, 19, 19 22, 22, 8, 8 127, 50 8, 8, 23, 23 16, 17, 20, 20 22, 22, 8, 8 227, 50 8, 8, 23, 24 17, 18, 18, 19 22, 22, 8, 8 327, 50 8, 8, 23, 23 17, 18, 18, 19 22, 22, 8, 8

Table 3.3: Means of Gaussian emissions from models trained on distances. States are ordered from left to right by the value of the first mean (the first AT distance).

1.0

0.8

0.6

0.4 [ ��� +]( �� )

0.2

0

[��+] (��)

Figure 3.11: Isoform population distributions from base distances Markov models. Take care in reading these; in the table above, states are labeled from left to right as Iso2, Open, Iso1, whereas this plot is in the reverse order. Section 3.2 - Methods to determine junction isoform 36

Markov model mean phosphate angles for trained classes [Na+]mM, [Mg2+]mM “Iso2” “Open” “Iso1” H, B, X, R H, B, X, R H, B, X, R Angle (◦) Angle (◦) Angle (◦) 27, 0 76, 126, 77, 123 83, 115, 135, 112 138, 111, 119, 106 127, 0 68, 138, 68, 135 83, 122, 122, 121 139, 109, 125, 105 227, 0 68, 138, 69, 136 108, 125, 115, 127 137, 93, 137, 75 327, 0 67, 138, 68, 137 113, 125, 119, 117 139, 71, 139, 69 27, 1 83, 117, 131, 113 93, 132, 73, 126 138, 110, 127, 104 127, 1 67, 138, 67, 136 111, 129, 121, 116 136, 81, 135, 91 227, 1 68, 138, 69, 136 112, 123, 121, 120 138, 72, 139, 70 327, 1 67, 139, 68, 137 105, 128, 116, 128 136, 90, 138, 73 127, 10 67, 138, 68, 136 118, 122, 124, 115 139, 69, 139, 68 127, 10 69, 138, 70, 136 120, 122, 125, 117 138, 70, 138, 69 227, 10 67, 138, 68, 136 113, 125, 122, 118 139, 72, 139, 70 327, 10 67, 139, 68, 137 107, 132, 118, 123 139, 68, 139, 67 127, 50 67, 138, 67, 137 120, 124, 124, 116 139, 68, 139, 67 127, 50 67, 138, 67, 137 112, 127, 123, 120 138, 68, 139, 67 227, 50 67, 138, 67, 137 120, 122, 124, 118 139, 67, 139, 66 327, 50 67, 139, 67, 137 120, 123, 128, 113 139, 68, 139, 66

Table 3.4: Mean of Gaussian emission probability distribution from Markov models trained on phosphate angles data. States are ordered from left to right by the value of the first mean (the H angle).

Figure 3.12: Phosphate angles Markov model isoform population distribution. When we take into account the misclassification of states, the model trained on phosphate angles data does capture expected equilibrium structures; open state population decreases with increasing salt concentration. Section 3.2 - Methods to determine junction isoform 37

Markov model mean angles and distances for trained classes Na,Mg “Iso2” “Open” “Iso1” H, B, X, R, AT, AT, GG, CC H, B, X, R, AT, AT, GG, CC H, B, X, R, AT, AT, GG, CC 27,0 68,137,69,134,10,10,23,23 80,117,117,114,18,19,21,21 133,111,121,106,23,23,22,22 127,0 67,138,68,135,8,9,23,23 96,127,111,127,16,17,22,22 133,105,132,99,22,22,19,19 227,0 68,138,69,136,8,8,23,23 110,125,116,122,17,18,21,21 139,77,139,70,22,22,10,10 327,0 67,139,68,137,8,8,23,23 112,124,118,117,17,18,20,21 139,69,139,68,22,22,9,9 127,1 69,137,69,134,9,10,23,23 81,119,119,115,18,19,21,21 135,113,121,106,22,22,21,21 127,1 66,138,67,136,8,9,23,23 110,123,119,119,18,19,21,22 136,90,136,85,21,21,14,15 227,1 68,138,69,136,8,8,23,23 112,122,120,118,17,18,20,20 138,70,139,68,22,22,9,9 327,1 67,139,68,137,8,8,23,23 103,130,113,126,15,16,21,22 136,87,138,79,21,21,12,12 127,10 67,138,68,136,8,8,23,23 117,120,124,113,18,18,19,20 139,68,139,67,22,22,8,9 127,10 69,138,70,136,8,8,23,23 120,121,124,116,17,17,18,19 138,69,138,68,22,21,8,8 227,10 67,138,68,136,8,8,23,23 113,124,121,117,17,18,20,20 139,71,139,69,22,22,8,9 327,10 67,139,67,137,8,8,23,24 105,132,116,123,16,16,21,21 139,68,139,67,22,22,8,9 127,50 67,138,67,137,8,8,23,23 119,122,123,114,17,18,19,19 139,68,139,66,22,22,8,8 127,50 67,138,67,137,8,8,23,24 112,126,122,119,16,17,20,20 138,68,139,66,22,22,8,8 227,50 67,138,67,137,8,8,23,24 120,119,124,115,17,18,19,19 139,67,139,66,22,22,8,8 327,50 67,139,67,137,8,8,23,23 120,121,127,111,17,18,18,19 139,67,139,66,22,22,8,8

Table 3.5: Mean of Gaussian emission probability distribution from Markov models trained on distances and angles. States are ordered from left to right by the value of the first mean (the H angle).

Figure 3.13: Base distances and angles Markov model isoform population distribution. This model returns the expected result that stacked states are more probable at higher salt concentrations. Section 3.3 - Transitions between conformations 38 3.3 Transitions between conformations

In order to confirm that, like in experiments, the open configuration acts as an intermediate state between the two stacking conformers in our model, we calculate transition rates between conformations for states identified using the phosphate angles isoform criterion and show the transition matrices for high and low salt concentrations. [See Figure 3.14.] From these we determine that the addition of ions serves to stabilize the stacked isoforms and the junction must transition through the open configuration to exchange stacking strands. Our results for transition probabilities between stacked and open states are consistent with results from the implicit ion 3SPN.2 study by Wang et al. [8], who report an 0.1 transition rate between Iso2 and ≈ + Open and a / 0.1 transition rate for the Iso1-Open transition at 300mM [Na ]. Moreover, the transitions rates directly between Iso1 and Iso2 are essentially zero, demonstrating that the open state provides the only pathway to move between stacked isoforms. This result is also consistent for transition matrices derived from isoform timeseries where isoforms were classified using the base distances criterion and the Markov models. Those figures are not shown here for brevity.

[Na+] 37.3mM [Mg2 + ] 0.0mM angle criterion [Na+] 37.3mM [Mg2 + ] 49.8mM angle criterion Isoform transition matrix Isoform transition matrix 1.0 1.0

Iso2 0.001 0.385 0.613 Iso2 0.0 0.231 0.769 0.8 0.8

0.6 0.6

Open 0.056 0.905 0.039 Open 0.126 0.706 0.168

0.4 0.4 Initial configuration Initial configuration

0.2 0.2 Iso1 0.518 0.482 0.0 Iso1 0.599 0.401 0.0

0.0 0.0 Iso1 Open Iso2 Iso1 Open Iso2 Final configuration Final configuration

(a) Transition matrix for phosphate angles criterion averaged (b) Transition matrix for phosphate angles criterion averaged over all simulations at [Na+] = 27mM, [Mg2+] = 0mM. In or- over all simulations at [Na+] = 27mM, [Mg2+] = 49mM. der for the junction to exchange stacking partners, the junction At this higher Mg2+ concentration (49mM), the Iso2 state is has to transition through the open state. more stable due to a higher transition rate between Iso2 and Iso2

Figure 3.14: Transition matrices from phosphate angles criteria for low and high salt concentrations. Transition rates between the two stacked isoforms (the upper left and lower right values) are vanishingly small, whereas the off diagonals representing transition rates between stacked and open isoforms, are nonzero. As Mg2+ concentrations increase, stacked states become more stable. Chapter 4

Melting Dynamics

The foundation stones of the material universe remain unbroken and unworn. They “ continue this day as they were created - perfect in number and measure and weight. ” Clerk Maxwell, Treatise on Electricity and Magnetism, 1873

Figure 4.1: Five frames from the first 200ns of a single melting simulation, depicting the stages of the melting process. Melting initiates with a bubble of broken bases forming at the center of the junction (1), which then grows outward (2,3), resulting in two arms dissociating from the junction (4). Melting proceeds by the final two intact strands dissociating from each other (5). The phosphates are visualized as one continuous strand. Strand colors are red (H), blue (X), yellow (B) and green (R). This chapter presents data which provide quantitative and qualitative analyses of the junction melting process. Frames were rendered in Visual Molecular Dynamics (VMD) [52].

In this chapter we communicate our findings on the melting dynamics of the J3 junction as observed in our simulations and experimental measurements carried out by Ishita Mukerji et al. First, we discuss the method by which we determine the dissociation of individual base pairs [See Figure 4.2]. We use the

39 Section 4.1 - Method to determine base dissociation 40 relative “time-to-melt” tm as a proxy for base stability in our simulations; if a base pair takes a longer time to dissociate than other base pairs, that base pair is relatively more stable. As an example of the dynamic process of melting and an introduction into our analysis of the relative stability of whole strands, we plot the times at which individual arms of the junction have dissociated on top of the number of intact base pairs versus time for one simulation [See Figure 4.3]. To investigate the relative stability of strands in the junction, we plot the mean time-to-melt for all base pairs along two opposing strands (H and X). Next, we compare our measurements on the stability of these strands with measurements of melting temperature obtained from fluorescent nucleoside analogs placed at specific locations in the junction and show that our results are consistent with those from the experiment. As a last check to determine our analysis is consistent and to provide another structural viewpoint of junction melting, we measure the isoforms of our melting ensemble and show that junctions adopt a stacked conformation prior to melting with isoform population distributions comparable to those in our equilibrium ensemble at similar salt concentrations. These data report that the 3SPN.2 explicit ion model can replicate experimental data on DNA junction melting and provide insights into DNA junction melting dynamics.

4.1 Method to determine base dissociation

To determine melting times of individual base pairs, with the goal of outlining regions of instability within the junction, we measure the distance between the canonical base pairs on each strand. Using an analysis method developed by Wujie Wang, we denote a canonical base pair as melted if the distance between bases is larger than 50 A˚ for more than 50 simulation snapshots (6 nanoseconds). The 50 A˚ distance and 6 nanosecond time are large enough values to assure that base pairs are not classified as melted during thermal fluctuations. Figure 4.2 shows the distance between three canonical base pairs for the first 500ns of one simulation in our melting ensemble at 360K; see the bottom panel for an example of thermal distance fluctuations which are not falsely classified as full base pair dissociation. Figure 4.3 shows the number of intact base pairs and dissociation time of each of the arms for one specific simulation at 360K. In this run, the melting process occurs relatively slowly and on a per-strand basis, whereas in other simulations at higher temperatures, melting occurs rapidly and the dynamics are not as easily visible. Melting begins in this simulation with the dissociation of almost half the base pairs in the junction between 0 and 50ns, followed by dissociation of the XR arm. From 50ns to 300ns, the ≈ ≈ number of base pairs remains approximately constant. At 300ns, the BH strand dissociates. In the next 50ns, the number of intact base pairs drops from 20 to 15, after which the BX strand dissociates. The ≈ ≈ Section 4.1 - Method to determine base dissociation 41

Figure 4.2: Left: timeseries of distances from 3 base pairs, with the determined melting time in orange. Right: the locations of the base pairs in J3 which are plotted on the left. At the start of the simulation, the distances are small, corresponding to intact base pairs with fluctuations due to thermal noise. After some time the distance gets large and stays large, corresponding to a dissociated base pair. The analyzed base locations correspond to locations of fluorescent nucleoside analogs used for fluorescence melting experiments.

360 K, run 6: # of intact base pairs and arm tm

tm of RH

60 tm of XR

tm of BX

50 tm of BH

40

30

20 # of intact base pairs

10

0 0 200 400 600 800 1000 time (ns)

Figure 4.3: Number of base pairs and tm by arm for 1 simulation. The solid vertical lines correspond to the time at which the last base pair in each arm has dissociated, and the dotted vertical lines depict the time at which 50% of the base pairs in that arm have dissociated. Initially, the number of base pairs is 64, representing a fully intact junction. After around 1000ns, the number of base pairs has dropped to zero, with all strands dissociated. We see that the melting of individual arms occurs after a sharp decrease in the number of bonded base pairs (the noisy line going from green to red), which matches our physical intuition. Section 4.2 - Preferential melting 42 next 400ns period ends with dissociation of the RH strand and the number of intact base pairs drops to zero. This graph and others like it offer unique insights into the dynamics of individual junctions melting which are not as transparent as insights offered by other experimental methods.

4.2 Preferential melting

To investigate the dynamics of melting and the stability of the strands and arms of the junction, plot the mean base pair dissociation time for junction simulations at 360K, which we use as a metric for junction base pair stability [See Figure 4.4]. Our expectations are that because the X strand has two fewer hydrogen bonds (88) than the H strand (90) when all canonical Watson-Crick base pairs are intact, and also from theoretical calculations of strand melting temperatures [See Figure 4.6], the X strand will be less stable than the H strand, and calculations from our models show this to be the case. We also calculated the average distance between base pairs before that base pair has melted at the four different temperatures in the higher range of temperatures we simulated, and see that the ends and center of each arm have higher base distance fluctuations prior to melting than the middle of each arm. These results on the instability of the junction center and the ends of the arms is consistent with experimental data from Wujie Wang, Rachel Savage, and Julie McDonald [44, 45], whose data are not shown here but show that melting either occurs from the outside of the strands inwards, or from the center of the junction outwards. Our images of mean base pair melting times show that the center of the junctions dissociate, on average, before the ends of the arms, and also that the X strand is more stable (i.e. has a lower melting temperature) than the H strand. Because not all simulations at 360K fully melted, and our base pair melting time calculation returns the length of the simulation if the base pair is not dissociated by the end of the simulation, the standard deviation of the average base pair melting time is on the order of the scale of the y-axis and is not shown. Experiments that were motivated by our results on preferential melting of individual strands used fluorescent nucleotide analogues in place of bases on individual strands to obtain base pair resolution mapping of junction stability. These experiments proceeded as follows: equal concentrations of three DNA strands corresponding to the three of the four strands in the junction are placed in a buffer solution with the final DNA strand, which has a fluorescent probe in place of one nucleotide. Annealing of this solution creates fully intact DNA junctions, which are then heated while fluorescence is measured as a function of temperature. The fluorescence of the probe is modulated by its distance to the base pair on the antisense strand. Initially, the junction is intact, the probe forms a base pair with the other strand, Section 4.2 - Preferential melting 43

Figure 4.4: Mean base pair dissocation time for all melting simulations at 360K, with base pairs separated by their corresponding strands, or pseudoduplexes. We see the X strand dissociates on average around 200 ns before the H strand. This points to the X strand being a less stable strand than the H strand. and minimal fluorescence is observed. As the junction melts, the probe dissociates from its base pair and begins to fluoresce. By tracking fluorescence as a function of temperature we can obtain a melting temperature, and by comparing melting temperatures across probes placed on different strands, we elucidate the relative stability of those strands. Figure 4.5 shows data from fluorescence melt experiments on DNA junctions of the same sequence as in our simulations. The different curves correspond to fluorescence melts taken for each of four fluorescent probes placed along two strands in the junction. Red curves represent the fluorescence melt curves from probes placed on the X pseudo-duplex, and black curves represent those on the H pseudo-duplex. The temperature required for the normalized fluorescence to reach 50% corresponds to the melting temperature of that base pair. We see that the melting temperatures of the probes along the H strand are higher than the melting temperature of the probes along the X strand, which is consistent with the base pair dissociation time from our simulations [see Figure 4.4]. These data represent the first demonstration of junction melting on a preferential arm-by-arm basis and demonstrate that an alternate model of junction melting which requires the collective melting of the entire junction is not valid. As another comparison on melting dynamics to compare with previous results, we look at theoretical melting temperatures of the DNA sequences that comprise the strands of interest in this DNA junction. Section 4.3 - Isoform transitions during melting process 44

These theoretical melting temperatures are reported by two different methods; The first is from an online tool published by Integrated DNA Technologies (IDT) which calculates the melting temperatures of duplex strands based on sequence, sodium and magnesium concentration, using experimental DNA enthalpy and entropy values defined by Allawi and Santalucia [53]. The second method also uses enthalpy and entropy values from SantaLucia’s experiments input into a Mathematica script. These two calculations differ in their prediction of absolute melting temperatures due to differences in the implementation of salt concentration dependence, but the relative melting temperatures between strands are the most important to consider and are consistent across methods. These theoretical models predict that, if the H strand and the X strands existed in the absence of a DNA junction, that the X strand would be less stable than the H strand. This agrees with data from our simulations and also from experimental fluorescence measurements of DNA junctions by Savage and McDonald [44, 45].

Figure 4.5: Normalized fluorescence intensity versus temperature for four different probe locations. These experimental data show that the X pseudo duplex is more unstable than the H pseudoduplex, which agrees with the results from our measurements of the mean base pair melting time for bases along these strands.

4.3 Isoform transitions during melting process

By probing the structure of DNA junctions during the melting process, we shed light on how melting occurs and offer another way in which to validate our predictions of prefential melting from the mean base pair dissociation times. For each melting simulation we calculate the isoform as predicted by our base-distances cut-off method. Figure 4.7 shows an example isoform timeseries as calculated by the 103

Section 4.3Table - Isoform 4.1. transitions1 Theoretical during melting melting temperatures process of junction pseudo-duplexes 45

Integrated DNA Technologies Starr/SantaLucia Mathematica

Tm (˚C) Relative Stability Tm (˚C) Relative Stability

X34 78.53 2 86.03 2

H34 83.88 1 90.90 1

X17 59.29 4 67.71 4

H17 69.35 1 76.39 1

B17 62.59 3 70.12 3

R17 62.97 2 71.09 2

Figure 4.6: Theoretical melting temperatures of the DNA sequences that comprise the junction. The two columns correspond to different waysAs in whichexpected, these theoretical the relative melting temperaturesstability of were the calculated. duplexes The bluedoes rows not correspond differ betwee to theoreticaln the melting temperatures of the 34 base pair DNA sequences that comprise the X and H strands. The last four rows are theoretical melting temperaturestwo modes for the of 17 calculations base pair DNA sequences while thatthe makeabsolute up the junction melting arms. temperature Adapted from Savage, does. 2017 Also [44]. distancesimportant method) to note for is the that first the 100ns Integrated of a single DNA simulation. Technologies In Figure melting 4.8 we have temperature, plotted the isoform populationwhich takes distributions into account as a function the DNA of time and before NaCl and concentration, after melting. We is see only that 1 the ˚C junction higher populationthan distribution during melting mirrors the isoform population during equilibrium. This data is com patible withboth our experime analysis ofntal strand absorbance stability and and offers fluorescence an insight intomelts the of dynamics the same of X34 junction duplex. melting. The As we consideraverage both absorption the preferential melting melting temperature of individual for strands, D3X and is 76.6 the most ± 0.1 probable ˚C and stacked the average isoforms during melting, we paint a picture of the dynamics of strand dissociation and offer an explanation for junction meltingfluorescence not previously melting explored temperatures in literature. for 6MI-labeled D3X is 77.0 ± 0.8 ˚C, while Integrated DNA Technologies predicted a melting temperature of 78.5 ˚C. Such

accuracy between experimental and theoretical melting temperatures is a promising

result that shows that the experimental techniques of quantifying the melting

temperature of DNA were precise.

Although the melting temperatures of the duplexes were consistent with the

model, these melting temperatures were still an average of 12 ˚C higher than the Section 4.3 - Isoform transitions during melting process 46

Iso2

Open

Iso1

Figure 4.7: Isoform timeseries calculated from base-distances criterion for the first 100ns of run 8 at 360K. During equilibration, when the temperature is 310K, the junction does not transition between isoforms quickly. After equilibration, the temperature increases and faster dynamics leads to frequent isoform transitions. After 40ns, the distances between bases at the junction center lead to an open classification. This corresponds to dissociation of opposing strands. Junctions at 360K do not usually melt this fast, nor does this timeseries paint the full picture of melting dynamics from the context of junction isomerization. This plot is simply shown as an example of an isoform timeseries for the melting system.

370 K, isoform populations before and after melting 100

80 Prior to melting, the junction isoform population reflects that of the equilibrium ensemble. As melting proceeds, more 60 junctions populate the Iso2 state. This agrees with our measurements of strand stability; if the X strand dissociates first, while others are intact, small dAG dAG [See Figure 3.3] 40 measurements can still classify the junction as Iso2. As melting proceeds, all junctions are classified as open.

20 Iso1 # of simulations in ensemble by distances at junction center with conformation as classified Open 0 Iso2

-250 -150 -50 50 150 t (ns) before and after junction fully melts

Figure 4.8: Isoform population distribution for all simulations at 370K. We made this plot by creating an isoform timeseries (as in Figure 4.7) for each melting simulation. Then, for each timeseries, we set as “zero” the time at which all strands have completely dissociated during that simulation. Finally, we “stack” the isoform timeseries on top of each other centered at their new zeroes and tally the number of simulations that are in each state 250ns before and after melting occurs. Chapter 5

Branch Migration Dynamics

“ Just because we don’t understand doesn’t mean that the explanation doesn’t exist. ” Madeleine L’Engle, A Wrinkle in Time, 1962

Branch migration occurs when the junction is in the open state. Cations stack the junction arms by shielding the negatively charged phosphates in the backbone of DNA. Thus branch migration is reduced in the presence of high salt concentrations and occurs on much slower timescales than the exchange of stacking arms. The correlation between junction isomerization and branch migration should be observable in our simulations, given the data presented in this chapter and in Chapter 3.2, but we do not take junction isoform measurements in our branch migration simulations, so we cannot carry out this analysis. Nonetheless, we show that the explicit ion model of 3SPN.2 can be used for studies of DNA junction migration in the presence of Na+ and Mg2+ ions. We calculate timeseries’ of the location of crossover between the strands, and from those calculate a “transition probabilty” for each salt concentration. This reproduces the expected result that the addition of cations serves to decrease junction migration. Since the version of 3SPN.2 we use includes explicit ions and can recreate junction migration dynamics, it might be useful in future studies of the affects of ion binding and unbinding at the heart of the junction on junction isomerization and branch mgiration. In principle, our simulations have the capability to generate statistics from such measurements. A comparison of data on branch migration dynamics as collected from the 3SPN.2 explicit ion model to data collected by Xinyu Zhu on the implicit ion model remains to be done.

47 Section 5.1 - Methods to determine junction location 48 5.1 Methods to determine junction location

Code developed by Xinyu Zhu to measure the junction location in simulations of junctions from the implicit ion model, and adapted for our specific system, determines the junction location for our migrating junction simulations. For each simulation snapshot, the code walks from the end of each arm towards the center of the junction, measuring the distance between canonical base pairs. The junction location for that arm is recorded as the base pair at which the distance becomes larger than a certain cut-off distance. The “true” junction location for that snapshot is calculated as the mean of the junction location from each of the four arms. The developers of this algorithm tested different cut-off distances to determine the optimal cut-off distance at which to determine the junction location; because of time constraints, we did not perform this same analysis.

5.2 Junction migration probability

In Figures 5.1 and 5.2, we plot the junction location as a function of simulated time for all simulations of the migrating junction at low and high magnesium concentrations. We do not filter this data, so some noise in the signal may falsely represent branch migration. We see that at low magnesium concentration, the junction location fluctuates on shorter timescales due to the junction being in the open configuration more, where junction migration is possible [see Figure 1.3]. At higher magnesium concentration, the junction adopts a stacked conformation and is less likely to migrate. We see similar trends in the FRET timeseries of migrating junctions from Karymov et al. in Figure 1.11. In Figure 5.3 we show the total number of our simulation snapshots in which the junction location is different from the previous simulation snapshot, divided by the total number of snapshots. This is a crude approximation for the probability of the junction undergoing a branch migration step, especially since our junction location data was not filtered to remove any noise [See 1.11 for how filtering branch migration signals can help visualize data and improve analysis], but it offers a qualitative picture of how different ionic concentrations affect the dynamics of DNA junction branch migration as simulated in the 3SPN.2 explicit ion model. We average these migration probabilities over the 5 simulations of the migrating junctions at each concentration to get the probability that a junction will migrate on the timescale between snapshots. As expected from experimental FRET data on migrating junctions from Karymov et al. [21], increasing salt concentrations decreases junction migration probability. Section 5.2 - Junction migration probability 49

Figure 5.1: Timeseries of junction location in simulations with no Mg2+. At this concentration, junctions are in the open state often, where branch migration possible. Here the branch point of the junction can shift by more than 5 base pairs during the course of the 1.5µs simulation.

Figure 5.2: Timeseries of junction location at high magnesium concentration. At high salt concentrations, the junction is more likely to be in one of the stacked isoforms, where branch migration is not possible. Here, the branch point shifts less frequently and makes smaller jumps (around 2 base pairs) than during simulations in junctions with no magnesium. Since these data were not filtered to reduce noise, some artifacts exist in the timeseries and are most likely not “real” branch migration steps. Section 5.2 - Junction migration probability 50

Figure 5.3: Migration probability for mobile junction at various salt concentrations. As salt concentration is increased, the probability of junction migration decreases. This result is more drastic for low Na+ concentrations ([Na+] = 127mM) as [Mg2+] increases than it is for higher Na+ concentrations. This shows that Mg2+ ions are more effective at pinning the junction location when Na+ concentrations are low. More detailed descriptions of how Mg2+ ions interact with the junction center and are availabe in a publication by Hyeon et al. [25], but an in depth analysis of the dynamics of junction isomerization, branch migration, and ion binding remains for future studies of this system. Chapter 6

Conclusion

...there are two moments that are important. There’s the moment when you know “ you can find out the answer and that’s the period you are sleepless before you know what it is. When you’ve got it and know what it is, then you can rest easy. ” Dorothy Hodgkin, For our daughters: how outstanding women worldwide have balanced home and career, 1996

We used the 3SPN.2 implicit ion model to examine DNA junctions in equilibrium, during melting, and undergoing the branch migration process. We confirmed our ability to reproduce, to a first approximation, the structural and dynamic aspects of DNA junctions through simulations using this model. Here we discuss the strengths and weaknesses of our methods and evaluate the degrees of success with which the implicit ion model of 3SPN.2 captures the structural and dynamic properties of DNA junctions in the context of our analyses.

6.1 Evaluation of ionic distributions around the junction

We compare with theoretical calculations by Litke et al. [4, 20], who predict two magnesium binding sites at the center of the junction, and find that our results reflect their experimental predictions for Mg2+nearest neighbors at the junction center. We also compare with theoretical calculations of radial distribution functions of ions around the junction from Hyeon et al. [25] and find that our calculations match, showing

51 Section 6.2 - Evaluation of junction isoform determination methods 52 preferential ion binding at the junction center. These calculations provide motivation to use the explicit ion 3SPN.2 model to further investigate ion binding at the junction center and its effects on dynamical junction processes such as isomerization, melting, and branch migration.

6.2 Evaluation of junction isoform determination methods

Central to our discussion is our model’s bias toward stacked conformations at lower salt concentrations. Our analyses show that at low salt concentrations ([Na+] = 27mM and 127mM and [Mg2+] = 0mM and 1mM), the open state dominates and stacked states have probabilities close to zero. As salt concentration increases, the fraction of junctions in the open state decreases while the fraction of junctions in the Iso2 state increases, but this transition of stacking partners is sensitive to which ion concentration changes. For example, we see at [Na+] = 27mM along the increasing [Mg2+] axis, the Iso2 probability increases whereas the Iso1 probability does not increase. In comparison, looking at [Mg2+] 50mM along the increasing [Na+] axis, the Iso1 probability increases and the Iso2 probability decreases. These results are contrary to the expected result that salt concentration does not affect the junction bias for which arms stack, which could be a statistical error attributed to the number and length of simulations or a systematic error attributed to the model’s lack of ability to capture dynamic properties of DNA at high ionic strength. The developers of the 3SPN.2 model note that “it is unclear how appropriate the model is for calculations of dynamic properties at moderate to high ionic strengths.” [41] The dissonance between our observations and expectations may be a consequence of several shortcom- ings in our study. Simulations may not be extensive enough to calculate a representative ensemble average. (i.e a single trajectory may not sample the entire junction configurational space). Short equilibration times and simulations beginning from the open state may have lead to an overclassification of the open configuration. Our model also shows a larger than previously measured ratio of Iso1 and Iso2 populations, which could be due to similar reasons as for the oversampling of the Open state. Since simulation time is finite, those simulations that transition from Open to Iso1 might stay in Iso1 for the duration of the simulation, especially at high salt concentrations. In the implicit ion model study by Wang et al. [8], 10 times as many simulations were run for twice as long at each salt concentration to gather equilibrium statistics on isoform population distributions. Then, distributions were double-checked by running extra simulations starting from different initial configurations to see that had any effect on the isoform population distributions. In future work using this model, efforts might be better concentrated in simulating longer runs for fewer salt concentrations to generate more in-depth statistics on a few relevant Section 6.3 - Evaluation of melting data 53 salt concentrations. Second, the 3SPN.2 model might simply be more prone to predicting open configurations of DNA junctions. The contributions of ions might dampen the dynamics in the system make the junction less likely to adopt stacked conformations or transition between stacked conformations on our simulated timescales at high salt; we might see an exchange of stacking partners at higher salt concentrations because of a crowding effect of ions around the junction center and arms. The 3SPN.2 explicit ion model might fail to return the expected result at such high ion concentrations since it is only meant to handle low ion concentrations. Hence we also recommend only simulating lower salt concentrations for future research using this model. We are not entirely convinced of the validity and utility of our Markov model classification method. Here we discuss this method’s results and significance. We see that the trained Markov models return drastically different classifications for different salt concentrations; the means and standard deviations of the Gaussian emissions differ between salt concentrations. This causes confusion when analyzing the Markov model’s population distributions. As a potential improvement upon this analysis, we hypothesize that providing more data by inputting every single trajectory snapshot from all salt concentrations into the Markov model training script might remove the problem in which the Markov model miscategorizes an open state that we labeled Iso1. Although this method would not return transition matrices and isoform population distributions for each concentration (we would have to measure these populations for each salt concentration empirically, after training), we postulate that more data over a range of salt concentrations will lead to better fitting classifications of isoform geometries. Since these models take hours to train, as compared to minutes when simply tallying states using a cut-off distance or angle criterion, we are unsure if they are as useful as our other analyses using cutoffs. However, the Markov models do return important details, such as showing that a tetrahedral form of the junction is more likely than a square planar form. Despite the Markov model’s inherent usefulness in describing conformational states, questions of the replicability of our data, how long the analyses take compared to simpler methods, and time constraints lead us to recommend critically evaluating the benefit of using these types of models for analysis in future studies.

6.3 Evaluation of melting data

We predict using the base pairs at the center of the junction would not be a good method to determine isoform during melting especially since we see that the junction preferentially melts from the center; Section 6.4 - Future directions 54 regardless, we were able to give a dynamical picture of melting by measuring the isoform populations distributions for junctions before and after all strands were completely dissociated. For our analysis of melting simulations, data could have been biased by the fact that not all junctions melted during the simulation time, even though simulations were carried out above the melting temperature. Because the base dissociation time code returns the length of the simulation if the base pair is not melted by the end, our data on the average base distance prior to melting and the mean base dissociation time for each strand [See Figure 4.4] may be biased to lower values. In the future we might consider launching more simulations for longer timescales to gather more statistics, although it is unclear if that would remove or lessen this bias caused by unmelted junctions.

6.4 Future directions

This project was pushed in many directions since its inception, and we are still left with questions to answer and data to analyze. We propose the following recommendations for further research and analyses that could be carried out using our current simulation trajectories and thermodynamic data, as well as recommendations for carrying our further simulations using the 3SPN.2 explicit ion model:

1. Compare data on branch migration collected by Xinyu Zhu from the implicit ion model with data from our simulations, specifically, histograms of preferred locations in the branch migration process, probable branch migration base step sizes, and the probability of the different junction isomers when the junction branch point is in different locations.

2. Simulate longer trajectories at lower salt concentrations to gather higher quality statistics on the isoform distributions.

3. Simulate longer trajectores of melting to remove some bias regarding junctions not melting during the extent of the simulation.

4. Use isoform characterization algorithms to determine the isoform of the mobile junction before, after and during junction migration steps, to measure how the Open state is in fact an intermediate for both junction migration and crossover isomerization.

5. Analyze thermodynamic data to investigate free energies of junction conformer transitions, branch migration transitions, ion binding, and the dynamics of these processes. Section 6.4 - Future directions 55

In conclusion, the explicit ion 3SPN.2 model provides a robust framework on which to study the dynamics of DNA junction isomerization, melting and branch migration. Our results offer a portrayal of ion binding consistent with the literature, multiple ways in which to classify junction conformations and infer isoform population distributions at various salt concentrations, a dynamic picture of junction melting which indicates that junction melting occurs on a per-strand basis, and branch migration statistics consistent with the familiar trends. In addition, we provide suggestions for future computational research of DNA junctions. Bibliography

[1] James Watson and Francis Crick. Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid. Nature, 91(6):1372–1379, 1953.

[2] Bruce Alberts, Dennis Bray, Karen Hopkin, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter. Essential Cell Biology. Garland Science, 4 edition, 2014.

[3] Robin Holliday. A mechanism for gene conversion in fungi. Genet. Res., 5(2):282–304, 1964.

[4] Jacob Litke. Probing the Ion-Binding Site of the Holliday Junction: A Spectroscopic and Calorimetric Approach. 2011.

[5] Derek R. Duckett, Alastair I.H. Murchie, Stephen Diekmann, Eberhard von Kitzing, Borries¨ Kemper, and David M.J. Lilley. The structure of the holliday junction, and its resolution. Cell, 55(1):79–89, 1988.

[6] Stephan Diekmann and David M. Lilley. The anomalous gel migration of a stable cruciform: Temperature and salt dependence, and some comparisons with curved DNA. Nucleic Acids Res., 15(14):5765–5774, 1987.

[7] Jin Yu, Taekjip Ha, and Klaus Schulten. Conformational model of the Holliday junction transition deduced from molecular dynamics simulations. Nucleic Acids Res., 32(22):6683–6695, 2004.

[8] Wujie Wang, Laura M. Nocka, Brianne Z. Wiemann, Daniel M. Hinckley, Ishita Mukerji, and Francis W. Starr. Holliday Junction Thermodynamics and Structure: Coarse-Grained Simulations and Experiments. Sci. Rep., 6(October 2015):1–13, 2016.

56 BIBLIOGRAPHY 57

[9] Elizabeth G. Wheatley, Susan N. Pieniazek, Ishita Mukerji, and D. L. Beveridge. Molecular dynamics of a DNA Holliday junction: The inverted repeat sequence d(CCGGTACCGG) 4. Biophys. J., 102(3):552–560, 2012.

[10] Sean A. McKinney, Anne-Cecile´ Declais,´ David M.J. Lilley, and Taekjip Ha. Structural dynamics of individual Holliday junctions. Nat. Struct. Biol., 10(2):93–97, 2003.

[11] Kenji Okamoto and Yasushi Sako. State transition analysis of spontaneous branch migration of the Holliday junction by photon-based single-molecule fluorescence resonance energy transfer. Biophys. Chem., 209:21–27, 2016.

[12] Franc J J Overmars and Cornelis Altona. NMR Study of the Exchange Rate Between Two Stacked Conformers of a Model Holliday Junction. pages 519–524, 1997.

[13] Donald Voet, Judith Voet, and Charlotte Pratt. Fundamentals of Biochemistry: Life at the Molecular Level. Wiley, Hoboken, NJ, 2008.

[14] Nadrian C Seeman. Design of Immobile Nucleic Acid Junctions. Biophys. J., 44(November):201– 209, 1983.

[15] Daniel M. Hinckley, Gordon S. Freeman, Jonathan K. Whitmer, and Juan J. De Pablo. An experimentally-informed coarse-grained 3-site-per-nucleotide model of DNA: Structure, thermody- namics, and dynamics of hybridization. J. Chem. Phys., 139(14):1–16, 2013.

[16] Miquel Coll, Miguel Ortiz-Lombard´ıa, Ana Gonzalez,´ Ramon´ Eritja, Joan Aymam´ı, and Fernando Azor´ın. Crystal structure of a DNA Holliday junction. Nat. Struct. Biol., 6(10):913–917, 1999.

[17] Th . Forster.¨ Transfer Mechanisms of Electronic Excitation Energy. Radiat. Res. Suppl. Vol . 2 , Bioenerg. Considerations Process. Absorption, Stab. Transf. Util., 2, 1960.

[18] Robert M. Clegg. The History of FRET: from conception through the labors of birth. In Rev. Fluoresc. 2006. Springer US, Boston, MA, 2006.

[19] Chirlmin Joo, Sean A. McKinney, David M.J. Lilley, and Taekjip Ha. Exploring rare conformational species and ionic effects in DNA Holliday junctions using single-molecule spectroscopy. J. Mol. Biol., 341(3):739–751, 2004.

[20] Jacob L. Litke, Yan Li, Laura M. Nocka, and Ishita Mukerji. Probing the ion binding site in a DNA Holliday junction using forster¨ resonance energy transfer (FRET). Int. J. Mol. Sci., 17(3), 2016. BIBLIOGRAPHY 58

[21] Mikhail Karymov, Douglas Daniel, Otto F. Sankey, and Yuri L. Lyubchenko. Holliday junc- tion dynamics and branch migration: Single-molecule analysis. Proc. Natl. Acad. Sci. U. S. A., 96(7):3670–5, mar 2005.

[22] Andrea Hartwig. Role of magnesium in genomic stability. Micronutr. Genomic Stab., 475(1–2):113– 121, 2001.

[23] J Y Lee, Burak Okumus, D S Kim, and Taekjip Ha. Extreme conformational diversity in human telomeric DNA. Proc. Natl. Acad. Sci. U. S. A., 102(52):18938–18943, 2005.

[24] D Herschlag, Y Bai, V B Chu, J Lipfert, V S Pande, and S Doniach. Critical assessment of nucleic acid electrostatics via experimental and computational investigation of an unfolded state ensemble. J. Am. Chem. Soc., 130(37):12334–12341, 2008.

[25] Changbong Hyeon, Jinwoo Lee, Jeseong Yoon, Sungchul Hohng, and D. Thirumalai. Hidden complexity in the isomerization dynamics of Holliday junctions. Nat. Chem., 4(11):907–914, 2012.

[26] Jan Lipfert, Sebastian Doniach, Rhiju Das, and Daniel Herschlag. Understanding Nucleic Acid–Ion Interactions, volume 83. 2014.

[27] Michel Peyrard and Thierry Dauxois. DNA melting: A phase transition in one dimension. Math. Comput. Simul., 40(3):305–318, 1996.

[28] Thomas E Ouldridge, Iain G Johnston, and Ard A Louis. The self-assembly of DNA Holliday junctions studied with a minimal model. pages 1–13, 2018.

[29] Erik Winfree, Furong Liu, Lisa A. Wenzler, and Nadrian C. Seeman. Design and self-assembly of two-dimensional DNA crystals. Nature, 1998.

[30] Chengde Mao, Weiqiong Sun, and Nadrian C. Seeman. Designed two-dimensional DNA holliday junction arrays visualized by atomic force microscopy. J. Am. Chem. Soc., 121(23):5437–5443, 1999.

[31] Erik Benson, Abdulmelik Mohammed, Alessandro Bosco, Ana I. Teixeira, Pekka Orponen, and Bjorn¨ Hogberg.¨ Computer-Aided Production of Scaffolded DNA Nanostructures from Flat Sheet Meshes. Angew. Chemie - Int. Ed., 55(31):8869–8872, 2016.

[32] Meng Wang, Sanjay K. Mohanty, and Shaily Mahendra. [ASAP] Nanomaterial-Supported Enzymes for Water Purification and Monitoring in Point-of-Use Water Supply Systems. Acc. Chem. Res. BIBLIOGRAPHY 59

[33] Jonathan R. Burns, Astrid Seifert, Niels Fertig, and Stefan Howorka. A biomimetic DNA-based channel for the ligand-controlled transport of charged molecular cargo across a biological membrane. Nat. Nanotechnol., 11(2):152–156, 2016.

[34] T. Andrew Taton, Karin Musier-Forsyth, Richard A. Kiehl, Nadrian C. Seeman, John D. Le, and Yariv Pinto. DNA-Templated Self-Assembly of Metallic Nanocomponent Arrays on a Surface. Nano Lett., 4(12):2343–2347, 2004.

[35] David Yu Zhang, Rizal F. Hariadi, Harry M T Choi, and Erik Winfree. Integrating DNA strand- displacement circuitry with DNA tile self-assembly. Nat. Commun., 4(May):1–10, 2013.

[36] Tyler D. Jorgenson, Abdul M. Mohammed, Deepak K. Agrawal, and Rebecca Schulman. Self- Assembly of Hierarchical DNA Nanotube Architectures with Well-Defined Geometries. ACS Nano, 11(2):1927–1936, 2017.

[37] Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Ed- ward Teller. Equation of state calculations by fast computing machines. J. Chem. Phys., 21(6):1087– 1092, 1953.

[38] B. J. Alder and T. E. Wainwright. Phase Transition for a Hard Sphere System. J. Chem. Phys., 27(5):1208–1209, 1957.

[39] A Rahman. Correlations in the motion of atoms in liquid silicon. Phys. Rev. A, 44(2):1401–1404, 1964.

[40] Thomas A. Knotts, Nitin Rathore, David C. Schwartz, and Juan J. De Pablo. A coarse grain model for DNA. J. Chem. Phys., 126(8), 2007.

[41] Daniel M. Hinckley and Juan J. De Pablo. Coarse-grained ions for nucleic acid modeling. J. Chem. Theory Comput., 11(11):5436–5446, 2015.

[42] Robert C Demille, Thomas E Cheatham, and Valeria Molinero. A Coarse-Grained Model For Explicit Solvation Of DNA By Water And Ions. J. Phys. Chem. B, 115:132–142, 2011.

[43] Tatiana R. Prytkova, Ibrahim Eryazici, Brian Stepp, Son Binh Nguyen, and George C. Schatz. DNA melting in small-molecule-DNA-hybrid dimer structures: Experimental characterization and coarse-grained molecular dynamics simulations. J. Phys. Chem. B, 114(8):2627–2634, 2010. BIBLIOGRAPHY 60

[44] Rachel Savage. Investigating the Thermodynamic Stability of DNA Four-Way Junctions using Fluorescent Nucleoside Analogues. 2017.

[45] Julie. McDonald. The Thermodynamic Characterization of the DNA Four-Way Junction Melting Process. 2018.

[46] Diwakar Shukla, Carlos X. Hernandez,´ Jeffrey K. Weber, and Vijay S. Pande. Markov state models provide insights into dynamic modulation of protein function. Acc. Chem. Res., 48(2):414–422, 2015.

[47] Sean A. McKinney, Chirlmin Joo, and Taekjip Ha. Analysis of single-molecule FRET trajectories using hidden Markov modeling. Biophys. J., 91(5):1941–1951, 2006.

[48] Kelly M Thayer and D L Beveridge. Hidden Markov models from molecular dynamics simulations on DNA. Proc. Natl. Acad. Sci., 99(13):8642–8647, 2002.

[49] Charnley. Calculate Root-mean-square (RMSD) of Two Molecules Using Rotation, 2019.

[50] W. Kabsch. A solution for the best rotation to relate two sets of vectors. Acta Crystallogr. Sect. A, 32:922–923, 1976.

[51] Michael Walker, Lejun Shao, and Richard A Volz. Estimating 3-D Location Parameters Using Dual Number Quaternions 3-D Location Quaternions. Computer (Long. Beach. Calif)., 54:358–367, 2010.

[52] William Humphrey, Andrew Dalke, and Klaus Schulten. VMD – V isual M olecular { } { } { } D ynamics. J. Mol. Graph., 14:33–38, 1996. { } [53] H Allawi and John SantaLucia. Thermodynamics and NMR of Internal G T Mismatches in DNA. · Biochemistry, 1997.