<<

A Computational Study of DNA Four-way Junctions and Their Significance to DNA by Matthew R. Adendorff

BSc. (Hons), MSc. Rhodes University, 2010

SUBMITTED TO THE DEPARTMENT OF BIOLOGICAL ENGINEERING IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN BIOLOGICAL ENGINEERING AT THE MASSACHUSETTS INSTITUTE OF FEBRUARY 2016

© Massachusetts Institute of Technology, 2016. All rights reserved.

Signature of Author: ______Department of Biological Engineering December 18, 2015

Certified by: ______Mark Bathe Associate Professor of Biological Engineering Thesis Supervisor

Accepted by: ______Forest White Professor of Biological Engineering Chairman, Graduate Program Committee

Thesis Committee:

Professor Bruce Tidor Professor of Biological Engineering and Science Massachusetts Institute of Technology Thesis Committee Chair

Professor William Shih Associate Professor of Biological Chemistry and Molecular Pharmacology Harvard University Thesis Committee Member

2

A Computational Study of DNA Four-way Junctions and Their Significance to DNA Nanotechnology by Matthew R. Adendorff

Submitted to the Department of Biological Engineering on December 18, 2015, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biological Engineering

ABSTRACT

The field of DNA nanotechnology has rapidly evolved over the past three decades, reaching a point where researchers can conceive of and implement both bioinspired and biomimetic devices using the programmed self-assembly of DNA . The sophisticated natural systems that these devices seek to interrogate and to imitate have Angstrom-level organizational precision, however, and the nanotechnology community faces the challenge of fine-tuning their design principles to match. A necessity for achieving this level of spatial control is an understanding of the atomic-level physico-chemical interactions and temporal dynamics inherent to fundamental structural motifs used for nanodevice design. The stacked configurational isomers of four-way junctions, the motif on which DNA nanotechnology was founded, are the focus of this work; initially in isolation and then as part of larger DNA nano-assemblies. The first study presented here investigates the impact of sequence on the structure, stability, and flexibility of these junction isomers, along with their canonical B-form duplex, nicked-duplex and single cross-over topological variants. Using explicit solvent and counterion molecular dynamics simulations, the base-pair level interactions that influence experimentally-observed conformational state preferences are interrogated and free- calculations provide a detailed theoretical picture of isomerization thermodynamics. Next, the synergy of single imaging, computational modelling, and a novel enzymatic assay is exploited to characterize the three-dimensional structure and catalytic function of a DNA tweezer-actuated . The analyses presented here show that rational redesign of the four-way junctions in the device enables the tweezers to be more completely and uniformly closed, while the sequence-level design strategies explored in this study provide guidelines for improving the performance of DNA-based structures. Finally, MD simulations are used to inform finite-element method coarse-grained models for the ground-state structure determination and equilibrium Brownian Dynamics of large-scale DNA origamis. Together, this thesis presents a set of guidelines for the rational design of nanodevices comprising arrays of constrained four-way junctions.

Thesis Supervisor: Mark Bathe Title: Associate Professor of Biological Engineering

3

4

ACKNOWLEDGEMENTS

I would first like to thank my advisor Prof. Mark Bathe for the years of guidance and mentorship that he has given to me. In his lab I have grown into a competent scientist and have learned from him how to identify and to tackle research problems with a systematic rigor that is built on a foundation of sound literature study. I thank my thesis committee chair Prof. Bruce Tidor for his invaluable input over the years and for helping me to ask critical, focused research questions. I thank my committee member Prof. William Shih for his knowledgeable guidance and for helping me to identify significant research questions that can benefit the DNA nanotechnology community. I am also grateful to the Army Research Office, the Office for Naval Research, and the International Science and Technology Fulbright Program for funding over the course of my PhD. Significant thanks are due to Keyao Pan who has been both a teacher and a friend to me during the many projects we have worked on together. I also thank members of the Bathe Lab, both former and current, for being such excellent and knowledgeable colleagues: Zach Barry, Remi Veneziano, Sakul Ratanalert, Aprotim Mazumder, Will Bricker, Tyson Shepherd, Stavros Gaitanaros, Etienne Boulais, Do-Nyun Kim, Reza Sharifi Sedeh, Philip Bransford, Syuan-Ming Guo, Simon Gordonov, Philipp Diesinger, Hyungmin Jun, Yera Hakobyan, Nilah Monnier, Larson Hogstrom, and Darlene Ray for always taking such good care of us. Special thanks must go to Prof. Doug Lauffenburger and the Biological Engineering department, especially: Anthony “Toro” Soltis, Chris “Richard” Ng, Jorge “George” Valdez, Shannon Hughes, the BE PhD class of 2015, Susan Jaskela, Aran Parillo, Dan Darling, and Dalia Fares. I thank the Simmons Hall team, in particular: Tyler and Dana Susko, Tim Milakovich, Hans Rinderknicht and Mariah Steele, Josh Gonzalez, Nika Hollingsworth, Steve Hall, and John and Ellen Essigmann. It has been an honour to be a part of the MIT Rugby family and to play with such fantastic gentlemen during my tenure at MIT, thank you and go Tech! Significant personal thanks are due to Adam Labadorf, Zach Boswell, Jarrad Langley, my nemesis Nick Allard, Tom Nyilis, Patrick Goulet, Liz Plageman and Dave Chaet, Lauren Melby and Zach Gerson-Nieder, Meaghan Texeira, Ognjen and Nicole Ilic, Dave and Natalia Worth, Phil and Aimee Brierley, Richard Gevers, and Heiko Heilgendorff. Thank you to my loving family: Emily Adendorff, Ralph and Trudi Adendorff, Shara and David Matlin, and Rox and Stu Lasky. Finally, to my wonderful wife Ally, thank you for all your immense love and support; this thesis is dedicated to you

5

"Because learning does not consist only of knowing what we must or we can do, but also of knowing what we could do and perhaps should not do."

- Umberto Eco

6

TABLE OF CONTENTS

A. GENERAL INTRODUCTION ...... 11 1.1. Fundamentals of DNA nanotechnology ...... 11 1.1.1. Origins ...... 11 1.1.2. Milestones in structural DNA nanotechnology ...... 12 1.1.3. Computational design of DNA nanostructures ...... 13 1.1.4. Improving spatial control ...... 14 1.2. Modelling approaches to DNA ...... 15 1.2.1. Statistical models ...... 16 1.2.2. Thermodynamic Models ...... 17 1.2.3. Atomistic models ...... 17 1.2.4. Continuum Models ...... 19 1.2.5. Unbiased MD formulation ...... 20 1.2.6. Essential dynamics analysis ...... 21 1.2.7. Collective variable analysis ...... 23 1.2.8. Helical parameter analysis ...... 25 1.2.9. Free-energy calculations ...... 26 1.2.9.1. Replica-based methods ...... 27 1.2.9.2. Biasing potential methods ...... 28 1.2.9.3. The MM-PBSA method ...... 30 1.3. Overview of thesis contents ...... 31

2. SEQUENCE EFFECTS IN IMMOBILE DNA FOUR-WAY JUNCTIONS: A COMPUTATIONAL STUDY ...... 33 2.1. Introduction ...... 34 2.2. Materials and Methods ...... 36 2.2.1. Junction nomenclature ...... 36 2.2.2. Construction of all-atom systems ...... 37 2.2.3. Molecular dynamics simulations ...... 37 2.2.4. Base-pair parameter analysis...... 38 2.2.5. MM-PBSA free-energy analysis ...... 38 2.2.6. Essential dynamics analysis ...... 40 2.2.7. Enhanced sampling with the Adaptive Biasing method ...... 41 2.3. Results and Discussion ...... 42 2.3.1. Immobile 4WJs deviate from canonical B-form structure ...... 42 2.3.2. Flexibility changes propagate to several steps from the junction core ...... 45 2.3.3. Base-pair step deformability is a consequence of both topology and local sequence ...... 47 2.3.4. Free of isomerization ...... 49 2.3.5. Core sequences exhibit unique junction dynamics ...... 50 2.4. Conclusions ...... 51

7

3. RATIONAL DESIGN OF DNA-ACTUATED GUIDED BY SINGLE MOLECULE ANALYSIS ...... 53 3.1. Introduction ...... 54 3.2. Results and Discussion ...... 55 3.3. Experimental ...... 65 3.3.1. Materials ...... 65 3.3.1.1. Enzyme ...... 65 3.3.1.2. DNA ...... 65 3.3.1.3. Crosslinking reagents ...... 65 3.3.1.4. Substrates and activity assay reagents ...... 65 3.3.2. Design, assembly, and bulk characterization of DNA tweezers ...... 66 3.3.2.1. DNA tweezer design ...... 66 3.3.2.2. Denaturing PAGE purification of oligonucleotides ...... 66 3.3.2.3. DNA tweezer assembly ...... 66 3.3.2.4. Isolated (HJ) assembly ...... 67 3.3.2.5. Preparation, purification, and characterization of -DNA conjugates ...... 67 3.3.2.6. Preparation, purification and characterization of NAD+-DNA conjugates ...... 67 3.3.2.7. Protein-DNA tweezer assembly and purification ...... 67 3.3.2.8. Native gel characterization of the purified assembly ...... 67 3.3.3. Single molecule FRET (smFRET) characterization of DNA tweezers ...... 68 3.3.3.1. Preparation of tweezers for smFRET ...... 68 3.3.3.2. Preparation of slides and execution of smFRET experiments ...... 68 3.3.3.3. smFRET analysis ...... 68 3.3.4. (AFM) imaging of DNA tweezers ...... 70 3.3.4.1. AFM imaging protocol...... 70 3.3.5. Molecular dynamics (MD) simulations of DNA tweezers ...... 70 3.3.5.1. Preparation of all-atom tweezer models ...... 70 3.3.5.2. Explicit solvent MD ...... 71 3.3.5.3. Atomic distance calculations...... 71 3.3.5.4. Calculation of HJ Jtwist values from MD simulations ...... 72 3.3.6. Inter-arm distance (IAD) estimations ...... 72

3.3.6.1. IAD from smFRET (IADFRET) ...... 72 3.3.6.2. IAD from AFM images ...... 73

3.3.6.3. IAD from MD simulation (IADMD) ...... 73 3.3.7. Bulk measurement of tweezer-scaffolded G6pDH activity ...... 73 3.3.8. Single molecule measurement of tweezer-scaffolded G6pDH activity ...... 74 3.3.8.1. Using microscope slides with fluorescent beads as fiduciary markers ...... 74 3.3.8.2. Imaging single enzyme activity ...... 74 3.3.8.3. Estimation of diffusion constant and elapsed time of resorufin in 10% (w/v) PEG 8000 ...... 75

8

3.3.8.4. Relating spike intensity and catalytic turnover ...... 76 3.3.8.5. Spike train analysis ...... 76 3.4. Conclusions ...... 77

4. IMPROVING FINITE ELEMENT METHOD MODELS WITH MD SIMULATIONS ...... 79 4.1. Lattice-free prediction of three-dimensional structure of programmed DNA assemblies ...... 80 4.2. nonequilibrium conformational dynamics of structured assemblies ...... 81

5. GENERAL CONCLUSION ...... 83 5.1. Summary ...... 83 5.2. Guidelines for DNA nanostructure design and simulation...... 84 5.3. Limitations and future work...... 86 5.4. Closing statement ...... 87

A. Supplement to – Sequence effects in immobile DNA four-way junctions: A computational study ...... 89

B. Supplement to – Rational design of DNA-actuated enzyme nanoreactors guided by single molecule analysis ...... 101

C. Manuscript for – Lattice-free prediction of three-dimensional structure of programmed DNA assemblies...... 139

D. Manuscript for – Computing nonequilibrium conformational dynamics of structured nucleic acid assemblies ...... 147

BIBLIOGRAPHY ...... 161

9

LIST OF FIGURES

2.1 Modelling DNA 4WJ isomers ...... 35

2.2 The DNA pseudo-duplexes within 4WJs can deviate significantly from canonical B-form structure ...... 44

2.3 The effect of junction topology on base-pair step flexibility ...... 46

2.4 Junction base sequences confer unique free energies and base-pair step deformability ...... 48

2.5 Junction stacking interactions confer unique global system dynamics ...... 51

3.1 Single molecule characterization of DNA tweezer conformation ...... 57

3.2 Optimization of the hairpin actuator element improves tweezer closure ...... 59

3.3 Redesigned Holliday junctions improve tweezer closure ...... 61

3.4 Characterization of enzyme nanoreactor activity in bulk and at the single molecule level using a G6pDH/NAD+ enzyme/cofactor pair ...... 64

10

CHAPTER 1

GENERAL INTRODUCTION

The purpose of this chapter is three-fold; to present the field of DNA nanotechnology, to define a computational protocol for the analysis of nucleic acid constructs built using this methodology, and to define the goals of this thesis. Section 1.1 introduces the immobile junction origins of the nucleic acid nanotechnology field, reviews some of the key milestones that have been reached along the way to the current state of the technique and discusses the challenges the nucleic acid nanotechnology field faces in its mission to meet the Angstrom-level precision of natural systems. Section 1.2 discusses computational methods for the theoretical analysis of DNA structures and presents a protocol for running explicit solvent molecular dynamics (MD) simulations to unpack experimental phenomena. Methods for extracting information from all- atom trajectories and for the calculation of thermodynamic properties are also presented. Finally, the experimental aims of this thesis are stated and an overview of the presented studies is provided.

1.1 FUNDAMENTALS OF DNA NANOTECHNOLOGY

Nucleic acid nanotechnology has developed into a mature synthetic technique for the design of hierarchical molecular assemblies (1-7) and these constructs have found application to a number of contemporary research endeavours, including, but not limited to: biomimicry (8), macromolecular sensing (9), energy transfer / photonics (10,11), drug delivery (12,13), and actuated nanoreactor design (14). In this section, a review of the fundamental structural components, most notably the DNA four-way junction, as well as the design principles of this synthetic technique are presented. Some of the challenges facing the field as it approaches the middle of its fourth decade are also discussed.

1.1.1 Origins

In 1982, the field of DNA nanotechnology began with ’s design of predictably immobile DNA four-way junctions, herein abbreviated to 4WJs (1). Due to their biological significance, 4WJs had already been studied for nearly 20 years by this time (15); but, it was

11 this new ability to control their formation (16,17) that sparked the period of intense investigation into their structure and physico-chemical properties that followed (18), Structurally, these chemical entities are defined as:

“…branch-points where the double-helical segments intersect with axial discontinuities, such that strands are exchanged between the different helical sections. Thus the integrity of junctions is maintained by the covalent continuity of the component strands” – David Lilley in (18)

They can be perfectly base-paired, such that every base is paired with its Watson-Crick complement, or they can contain mismatches and unpaired bases. In , 4WJs act as intermediates for both homologous (19) and site-specific recombination events (20); in DNA nanotechnology, they are a fundamental building block of many nanostructures, providing the “glue” that holds duplex segments together (2,4,21,22). A more detailed description of the structural properties of these motifs will be provided in Section 1.1.4, though it is important to first discuss the systems that they compose and in which these properties become particularly significant.

1.1.2 Milestones in structural DNA nanotechnology

Several key advances in the manipulation of DNA at the nanometer scale have occurred in the past three decades, beginning with the first example of a construct with complex connectivity, the Chen and Seeman (23). This was followed by the presentation of the double-crossover (DX) tile (24), a structural framework that allowed Winfree et al. to later design the first self- assembling, two-dimensional (2D) DNA crystal (25). The first large assembly built using the single-stranded scaffold technique, later named “DNA origami”, was presented by Shih et al. where a 1.7 kb single-stranded DNA scaffold was folded into an (26). In 2006, Paul Rothemund published a seminal paper wherein he defined the fundamental design procedure for DNA origami whereby a long single-stranded DNA scaffold is “folded” into a predesigned shape using short single-stranded “staples” (2). Though the initial origami structures were 2D objects, the technique was soon extended to the design and synthesis of three-dimensional (3D) DNA structures (3,27-31) that could exhibit complex curvatures, induced with programmed over- and under-twisting of the DNA duplexes (4). Woo and Rothemund later showed that duplex end-stacking interactions at the interfaces of DNA subunits could facilitate programmed recognition of DNA building blocks (32) and, in that same year, Han et al. revealed hollow 3D

12 origami structures (5). Up until this point, the fundamental motif (aside from the B-form DNA duplex) for origami design was the antiparallel 4WJ, though parallel crossover (PX) systems had been extensively studied and applied by the Seeman lab in smaller constructs (33,34). In 2012, Wei and co-workers demonstrated the efficacy of single-crossover (SX) motifs for low-cost generation of arbitrary DNA shapes by means of single-stranded tiles (35), a method later extended to brick-based assembly (36). In 2014, Shi et al. reported a new DNA strategy for the design of two-dimensional DNA structures, using sub-tile constructs, that consists of a hierarchical self-assembly process involving sets of three single-stranded DNA molecules with two levels of sticky ends (37). Finally, this year (2015), Gerling et al. demonstrated that three- dimensional DNA shapes can self-assemble without base pairing but rather on the basis of geometric arrangement or shape-complementarity (38). Arguably, the most taxing step in conventional DNA nanotechnology is the design phase as the synthetic step, while certainly requiring careful implementation, simply involves a temperature ramp followed by slow cooling. Several computational tools have been published that assist with the design of the necessary sequences to generate DNA nano-assemblies and these are discussed in Section 1.1.3.

1.1.3 Computational design of DNA nanostructures

Paul Rothemund’s design strategy for DNA origami represented the first “algorithm” for large- scale nanostructure design, though the first algorithmic procedure in the DNA nanotechnology field was the junction sequence optimizer of Seeman and Kallenbach that was published in 1983. Soon after Rothemund’s scaffolded origami paper, the program GIDEON was published by Birac et al. (39) that defined structural connectivity in the form of linked arrays. The program contains a relaxation algorithm that can rearrange the structure to minimize mechanical, planar, and torsional strains in a geometric, but non-energy-based, model. Andersen et al. automated Rothemund’s design rules in the 2008 Perl-based program, SARSE, allowing for the sequences of scaffold and staple strands to be modified so as to fit a target shape, e.g. their well-known “dolphin” (40).The first 3D editing tool for complex nanostructures, Tiamat, was presented the following year (41) and implemented a novel algorithm for sequence design to avoid unwanted secondary structures in designed sequences; however, no meaningful structural relaxation was included. Also in 2009, Douglas et al. designed and developed the computer-aided design software, caDNAno (42), to design lattice-based DNA origamis constrained to either a honeycomb or square helical lattice.

13

The first true mechanical model for the prediction of DNA origami solution structures, CanDo, was developed in the Bathe lab at MIT (21,43). CanDo models base-pairs as two-node beam finite elements representing an elastic rod with experimentally determined geometric and material parameters (44). In the original implementation, double helices were arranged linearly in space, followed by a loading step where the duplexes were compressed or stretched to conform to lattice-specific crossover position constraints. The degrees of freedom (DOFs) of the crossover positions were then clamped and the structure was allowed to relax, thereby generating the solution structure. In 2014, Pan et al. (22) introduced a lattice-free modelling approach whereby multiway junctions and topologically closed structures, e.g. ring-like origami objects, could be described using a novel “alignment element” representing the 4WJs composing the object. The parameterization of these junction elements was performed using the all-atom molecular dynamics simulation protocol detailed in this thesis and is described in Section 4.1. With this synthetic technique now reaching maturity, researchers are looking to probe, to manipulate and even to mimic natural systems; goals that come with the requirement of precise spatial control.

1.1.4 Improving spatial control

The continual desire for improved spatial precision of nanostructures, facilitating Nature-like accuracy over the systems being manipulated, comes with the requirement of Angstrom-level control (45-48). To achieve this, an understanding of the atomic physico-chemical interactions and temporal dynamics inherent to fundamental structural motifs, e.g. duplexes (49), nicked duplexes (50-52), single-crossover (35) and particularly 4WJs (22,53,54), is essential. As mentioned earlier, the immobile 4WJ has been the central component of structural DNA nanotechnology since Seeman’s foundational work in the early 1980’s (1,16) and it continues to be the motif that holds constructs together. The 3D structure of these DNA junctions in isolation has been extensively studied experimentally, with a rich literature stretching back several decades (18,54,55). The 4WJ can exist in one of three global conformations, including two stacked antiparallel junction isomers and a cross-shaped open-X conformation. The stacked forms are only adopted in the presence of polyvalent or in a high salt environment (56-58) as without screening of the charges concentrated at the junction center, the system will adopt the open-X form (59) to minimize repulsions. The relative populations of these stacked

14 configurational isomers have been shown to be sensitive to both the Mg2+ concentration (60) and to the base-pair sequences near the site of strand crossover (56,61-65). The conformational heterogeneity of the junction has also been shown to be sequence dependent through NMR analysis of different junction core arrangements (62). While the primary topological difference between these isomers is a change in the base-pair stacking at the junction core, the difference in duplex stacking energies (54,66,67) for these changes is insufficient to solely account for the experimentally obtained isomer ratios. A contribution of geometric constraints at the junction core to the free-energy has been considered (55) and the effect of a sequence-dependent electrostatic potential on junction stacking preferences has been suggested by experiment through selective base charge neutralization (68); however, a rigorous investigation into the topological and energetic contributions to junction isomerization bias is still lacking. Furthermore, the impact these contributions have on global junction dynamics, a potentially significant effect in arrays of interconnected junctions, is not known. Modern implementations of explicit solvent all-atom MD have proven to be highly for elucidating the physical interactions responsible for experimental phenomena (69). The application of these methodologies to the study of nucleic acid structures is no exception and valuable insights have been gained into B-from structure (70-74), sequence-level interactions (75-78), duplex free-energy contributions (79-81), large origami relaxation (82), and junction dynamics (82-86). While several investigations into 4WJ structure and dynamics have been performed with MD, no dedicated analysis of sequence-level structure and energetics in isolated asymmetrical and immobile junction isomers has been performed. In the following section, MD is reviewed, along with several other theoretical methods, for its applicability to the problems being tackled in this thesis.

1.2 MODELLING APPROACHES TO DNA

The application of theoretical methods to define and to discuss the structure of DNA has been occurring since the double-helical structure of our genetic material was first discovered (87). Many different methods have been used over the past six decades to develop our current detailed understanding of the behaviour of DNA molecules and brief descriptions of several historically-relevant classes applicable to DNA nanotechnology are provided in this section. Furthermore, the ability of all-atom simulations to describe observed experimental phenomena

15 and their significance to the development of improved continuum and coarse-grained (CG) representations for the modelling of nano-assemblies are emphasized.

1.2.1 Statistical models

Simple statistical models represent the earliest class of techniques applied to DNA, in fact, Paul Flory’s foundational work on establishing a framework for chemistry, detailed in his 1953 book Principles of Polymer Chemistry, enabled the immediate use of the freely jointed chain (FJC) and in turn the more advanced worm-like chain (WLC) (88-90) models for the analysis of DNA flexibility (91) upon the determination of its structure. The WLC provides a mathematical description of a thin and inextensible elastic filament where a single parameter, the persistence length, defines the thermal bendability. It is the small-angle limit of the FJC model, where the latter has fixed bond lengths and angles, with any remaining DOFs set as random; the “orientation memory” of the chain is therefore solely determined through correlations between adjoining segments. Using a parametric description of the chain, given by the contour position vector 풓(푠), one can show that the correlation of the tangent vectors

(풕(푠) = 푑풓(푠)/푑푠) along the chain decays with a rate set by the persistence length, 푙푝 (92,93). The Hamiltonian of the WLC consists of one term, accounting for the only degree freedom in the system at each point along the chain, namely, the bending fluctuations:

2 ℋ κ 퐿 푑2풓(푠) 푏 (1.1) = ∫ 푑풔 ( 2 ) 푘퐵푇 2 0 푑푠

The bending constant κ푏 represents the susceptibility of the chain to bend rather than maintain a straight configuration and thus encapsulates the persistence length. In fact, in the WLC model the two are related by the expression, 푙푝 = κ푏/푘퐵푇. Despite the simplicity of the model, which ignores the molecular features of a chain, the WLC Hamiltonian provides a surprisingly good description to this day (94) for relatively long segments of double-stranded DNA where the mode of flexibility is predominantly contour fluctuations, rather than the dihedral angle rotations in . In particular, the WLC model provides a highly accurate description of extensions of single ds-DNA molecules for to up to 10 pN (95). Recently, there has been controversy about the validity of WLC model for describing the bending of the short chains of DNA on the sub-persistence length scales (44,96-98), with arguments on both sides. These models

16 are historically significant and have found recent application to DNA nanostructure analysis (99,100); however, for the intended analyses of this thesis their neglect of both polymer and solvent/counterion chemical structure makes them unsuitable.

1.2.2 Thermodynamic Models

Thermodynamic modelling approaches were also taken in the early days of DNA analysis to develop descriptions of the hybridization of duplex molecules in terms of statistical mechanics (101), melting theory (102,103), and one-dimensional Ising models (104). The modern incarnations of these methods have largely focused on predicting the secondary structure and hybridization of dynamic DNA nanostructures without explicit concern for their three-dimensional structure, though with impressive accuracy (105-118). At the core of these methods are two components, a dynamic programming algorithm (119-121), and a parameter database of enthalpy/ thermodynamic values for essential structural motifs including Watson-Crick base pairs, internal mismatches, terminal mismatches, terminal dangling ends, hairpins, bulges, internal loops, and multibranched loops (122). Empirical equations can be employed to modulate these parameters for application to systems at different temperatures under a variety of salt conditions. The parameter databases for duplex (123-129) and nicked-duplex (66,67,130) systems have been rigorously derived through a number of experimental methods; however, crossover-junction motifs, which are examples of pseudoknotted DNA, have yet to be properly parameterized in this framework and cannot be adequately represented through combinations of existing parameters (55).

1.2.3 Atomistic models

Slower to develop was the powerful technique of all-atom MD, a method by which investigators can track the path of every atom in a system under interrogation, with femtosecond temporal detail. Using descriptions of inter-atomic interactions, in the form of atomic force fields derived from quantum mechanical calculations (131,132), the MD method enables the explicit prediction of the physical behavior of a chemical or biological system. The greatest challenge facing MD, however, is that accurate force fields and sufficient computational power are needed to ensure that such predictions are meaningful. A synopsis of the development of contemporary MD, its potential short-falls and its suitability to the problems investigated in this thesis are given in this

17 section. A mathematical description of the formulation of modern MD, along with specific parameter details is provided in Section 1.2.5. The first MD simulation of a DNA duplex in water with counterions was performed in 1985 by the group of the late Peter Kollman (133), with a then newly-derived force field (134), and building on the works of Levitt (135) and Tidor et al (136). The following decade saw a steady increase in simulation time, approaching the nanosecond timescale, albeit using implicit solvent and hard cut-off electrostatic calculations (137). The demonstration in 1995 that simulations of DNA using the mesh Ewald summation (PME) method, an efficient long- range electrostatic interaction scheme formulated in Fourier space, were significantly more accurate than truncation schemes (138) presented a difficult decision for the community due to its high additional computing overhead (139). Thankfully this was also the beginning of the era of and the major simulation codes could be accelerated via the newly developed message-passing interface (MPI) (140). Results of these significant advances in implementation were the first multi-nanosecond simulations of DNA in explicit solvent, showing the A-form to B-form transition (141) and the substantial fluctuation of the B-form structure (142). These simulations were significant for the field as they showed, along with other works, that MD could accurately depict X-ray and nuclear magnetic resonance (NMR) determined structures (139). At these timescales, the sensitivities of DNA structure to sequence (141,143-148), water density (149-151), and ions (152-155) could be elucidated, revealing a detailed dynamic structure. This period of quantitative MD simulations of DNA, assisted by improved force fields, also saw the first applications of free-energy methods to DNA structure (79,156), base pairing (80), stacking (157-163), and DNA elongation (164). Recently, the focus of the DNA MD community has been on increasing both the size of simulated systems (82) and their duration (76,165,166). The ABC Consortium (167) has spearheaded the push to long simulations for the development of a balanced structural database at the tetranucleotide level, providing detailed insight into sequence effects; both on the structure and on the dynamic properties of the B-DNA double helix. The modern capacity to observe long time-scale structural statistics of DNA models, often longer than the parameterization runs for the “second generation” force fields (168,169), has necessitated the release of the “third generation” of force fields (170,171). A revision of explicit parameters (172,173), most developed over 20 years ago, has also been required. The primary issue was that the original parameters were developed to reproduce the ions’ solvation energies; however, they overestimate the interaction energy of cation–anion and ion–charge interactions. Refined parameters were determined (174-176), yet artifacts remained for densely packed DNA systems

18 that are central to structural nanotechnology. These deficiencies were discovered during simulations performed to determine the structure, dynamics, ionic atmosphere and intermolecular forces of a DNA bundle system, previously explored experimentally (177). New methods were subsequently introduced to the community to improve simulation accuracy (178,179). The development of the DNA nanotechnology community has provided large-scale atomic DNA structures (82) and stimulated an interest in their structural properties and dynamics (180), an area well suited to MD analysis. While due care is needed to ensure that MD results are indeed accurately emulating a system of interest, the “computational microscope” that the method provides can deliver detailed insight into phenomena not easily decipherable through experimental means alone. While the quality of the results obtained is dependent on the accuracy of the force fields and interaction potentials used, the current state- of-the-art in this regard has proven itself capable of delivering valuable insights into the physico- chemical properties inherent to DNA systems. Furthermore, it is also capable of parameterizing less detailed modelling approaches to enable equilibrium sampling of large constructs that would not be possible with current computing hardware. Coarse-grained atomic modelling (CG), where whole groups of atoms, e.g. the sugar atoms in a DNA , are represented by a single CG particle, has developed in tandem with all-atom MD. The number of atoms represented by the CG varies from a few (181) to many (182), though in all cases the reduction in the number of DOFs within the CG system makes longer timescales accessible at the same computational cost (183,184). Parameters describing the interactions of these CG particles are traditionally obtained by fitting against experimental data, such as the nearest neighbor DNA melting parameters (185), and/or all-atom simulations (186,187). The ability of all-atom MD to elucidate the origins of DNA motif structure and function, as well as its capacity to parameterize simpler models to explore prohibitively large nano-assemblies make it the principal investigative tool for the studies contained in this thesis.

1.2.4 Continuum Models

The last class of models discussed in this section is that of continuum modelling (CM), one that is similar in principal to the CG methods mentioned in 1.2.3, i.e. a more simplistic representation of molecules than an all-atom model is used, yet distinct in formulation and implementation. Like their atomistic CG counterparts, CM methods are formulated to go beyond the temporal and spatial constraints of MD, allowing for the simulation of events occurring on the millisecond and

19 micron scales. CM representations of nucleic acids are based on Kirchoff’s theory of elastic rods and have been successfully applied to studying DNA molecules (188-196). For DNA nanostructures, the finite element-based (197) computational modelling framework CanDo has been the most successful (21,22,43). In its original implementation (21) DNA origami nanostructures were modeled as bundles of isotropic elastic rods that were rigidly constrained to their nearest neighbors at specific crossover positions. Later modifications included the introduction of nicked-duplex and single-stranded DNA modelling (43,103), and extension to lattice-free designs via parameterization of junction alignment elements (22). The introduction of alignment elements required parameterization of the mechanical junction models by all-atom simulations (see Section 4.1), an example of the predictive power of that method. Very recently, a Brownian Dynamics (BD) model has been added to the CanDo framework to simulate the equilibrium dynamics of large nanostructures in solution (see Section 4.2), providing a powerful predictive tool for understanding the dynamic behaviour of these assemblies. In Section 1.2.5, a description of the MD formulation used for the unbiased simulations performed in subsequent chapters is given. Descriptions of the various trajectory analysis methods used for the post-processing or biasing of these simulations are then provided in Sections 1.2.6 through 1.2.9.

1.2.5 Unbiased MD formulation

The simulations in this thesis were chosen to be complementary to experimental measurements and were therefore performed in the isothermal-isobaric ensemble. Langevin dynamics (198) and the Langevin piston method (199,200) were used to maintain the temperature and pressure respectively; with the governing set of equations:

−1 d퐱푡 = 퐌 퐩푡d푡 + 푄퐱푡 (1.2) −1 −1 d퐩푡 = −∇푉(퐱푡)d푡 − 훾퐌 퐩푡d푡 + √2훾훽 d푊푡 − 푄퐩푡 (1.3)

Where (퐱푡, 퐩푡) denotes the positions and momenta of the particles at time 푡, 퐌 is the mass 3푁 tensor, 푉 ∶ ℝ → ℝ is the potential energy function, 훾 is the friction coefficient, 푊푡 is the Wiener −1 process that defines the random forces (white noise), 훽 = 푘퐵푇, where 푘퐵 is the Boltzmann constant, 푇 is the temperature, and 푄 accounts for the barostat momentum. The potential energy is calculated using the CHARMM27 force field (168,201) for nucleic acids (Eq. 1.4 – 1.6,

20 the CMAP term for proteins is not shown). The internal / bonded potential energy terms are defined as:

2 2 푉퐼푁푇 = ∑ 푘푏(푏 − 푏0) + ∑ 푘휃(휃 − 휃0) 푏표푛푑푠 푎푛푔푙푒푠

+ ∑ 푘휑(1 + cos(푛휑 − 훿)) 푑푖ℎ푒푑푟푎푙푠 (1.4) 2 + ∑ 푘푢(푢 − 푢0) 푈푟푒푦−퐵푟푎푑푙푒푦

2 + ∑ 푘(휔 − 휔0) 푖푚푝푟표푝푒푟푠

The non-bonded terms are calculated as:

12 6 푅푚푖푛푖푗 푅푚푖푛푖푗 푞푖푞푗 푉푁퐵 = ∑ 휀 [( ) + ( ) ] + (1.5) 푟푖푗 푟푖푗 휀푟푖푗 푛표푛푏표푛푑푒푑

The combination of these two terms provides the total potential energy function:

푉퐶퐻퐴푅푀푀 = 푉퐼푁푇 + 푉푁퐵 (1.6)

The CHARMM27 field was used in place of the newer CHARMM36 field (170) as the Allnér non- bonded potentials were parameterized to the former and because the latter only became available once several simulations in this thesis had already been performed. All systems were simulated with explicit TIP3P water and counterions in orthogonal periodic cells with long-range electrostatics accounted for with the PME method (138). Specific cut-off, time-stepping, ensemble, simulation , and programmatic details of the various simulations are given within their corresponding chapters.

1.2.6 Essential dynamics analysis

Due the dimensional complexity of biomolecular systems, it can be difficult to unpack the important contained in their MD trajectories. A principal components analysis (PCA) (202-204) provides a means of alleviating this problem in a similar fashion to normal

21 mode analysis (NMA) (136,205), by assuming that the major collective modes of fluctuation dominate the general dynamics. It has been found that the vast majority of DNA dynamics can be described by a surprisingly low number of collective DOFs (206). As the bulk of biologically important interactions occur as a function of a subset of the largest variance modes, the dynamics in the low-dimensional subspace spanned by these modes is termed “essential dynamics” (ED) and the coordinate subspace spanned by them is referred to as the “essential subspace” (203). In contrast to NMA, PCA of a molecular dynamics simulation trajectory does not require the assumption of a harmonic potential and can be used to study the degree of anharmonicity in a system. It has actually been shown that the major modes of collective fluctuation are dominated by largely anharmonic diffusion between multiple wells at physiological temperatures (207). In NMA the modes of greatest fluctuation are those with the lowest frequencies, however, PCA modes are sorted by variance rather than frequency; the largest-amplitude modes of an ED analysis (EDA) usually also represent the slowest dynamics though. PCA of a MD trajectory comprises three steps, the mathematical details of which are given in the Methods section of Chapter 2. Qualitatively, configurations from the ensemble under analysis are superposed using a least-squares fit to reference/average coordinates, thereby removing global rotations and translations. The Cartesian coordinates of the ensemble frames are then used to construct a variance–covariance matrix that is subsequently diagonalized. This matrix is symmetric and contains, as elements, the covariances of the atomic displacements relative to their respective averages for each pair of atoms for the off-diagonal elements, and the variances of each atom along the diagonal. Atoms that move in concert have positive covariances, anti-correlated motions have negative values, and non-correlated displacements have (near-) zero values. Diagonalization of this covariance matrix gives a set of eigenvectors and eigenvalues that are sorted by eigenvalue magnitude. The eigenvalues represent the variances along each of the corresponding collective modes (eigenvectors) and usually a small number of modes suffice to describe the majority of the total fluctuation of the molecule. Finally, the original trajectory can be projected onto each of the principal modes to yield the time behavior and distribution of each of the principal coordinates (204). In contrast to standard NMA, EDA can be carried out on any subset of atoms and thus the diagonalization of the covariance matrix is less demanding for larger systems yet the derived modes are very similar to an all-atom analysis (203,208). Essential modes derived from different simulations or simulation parts allow for comparison of the sampled regions, thereby providing a means to judge similarity and convergence. Provided that a trajectory is sufficiently converged,

22 thermodynamic properties can be derived as ensemble averages and readily mapped onto the principal coordinates to yield free-energy landscapes; for large motions of big molecules, however, unbiased MD may not be capable of sufficient sampling in reasonable time to achieve this. Differences between various configurational ensembles of a structure, e.g. the backbone atoms of different DNA 4WJs (see Chapter 2), can be explored by concatenating multiple sub- ensembles into one meta-ensemble on which the PCA is performed. The individual sub- ensembles are then separately projected onto this combined set of modes, allowing for their direct comparison. This method has the advantage that differences between the various sub- ensembles are frequently visible along one of the combined principal modes (209). Finally, an eigenvector definition of a system’s major degrees of freedom provides a powerful means of applying enhanced sampling techniques to better explore its conformational space (210-213). In the following section, the use of collective variables to analyze and manipulate atomic simulation systems is discussed.

1.2.7 Collective variable analysis

For all but the simplest atomic systems, it is impractical to work with all the degrees of freedom in a MD simulation, instead, it is useful to reduce the system to a few parameters which can be either analyzed individually, or manipulated in order to alter the dynamics in a controlled manner. Several names exist for these reduced DOFs in the literature, i.e. order parameters, collective variables, and (surrogate) reaction coordinates (214). Throughout this thesis, the term `collective variable' (abbreviated to colvar) will be used, and it indicates any differentiable function of atomic Cartesian coordinates. The most general definition of a colvar 휉 is as a differentiable function of the vector of 3N atomic Cartesian coordinates 퐗:

휉(퐗) = ξ(퐱1, 퐱1, … 퐱푁) (1.7)

Depending on the structure of the system, 휉(푿) is often a function of much fewer arguments than 3N, or can be expressed as a combination of such functions:

휉(퐗) = ξ(푧1(퐗), 푧2(퐗), … 푧훼(퐗)) (1.8) with the number of basis functions 푧(훼) much smaller than the number of atoms. 푧훼(퐗) is referred to as a colvar component, though in most scenarios a single component 푧 is identified

23 with the colvar ξ. Historically, these elementary functions have been distances, angles or dihedrals, that have effectively mimicked the interaction terms of many Hamiltonians (169,215- 220). In recent applications to complex macromolecules, components have been used to mimic many-body interactions or to track collective motions, such as moments of , and principal components from covariance analysis (221-223). Such combinations may be used, for example, after an initial mapping of the colvar space has been performed by any method for an efficient calculation of a potential of mean force (PMF). The Cartesian gradient of the colvar is written as a function of those of the individual components using the chain rule:

∇퐱휉(퐗) = (∇퐱1휉(퐗), ∇퐱2휉(퐗), … ∇퐱푁휉(퐗)) (1.9)

In most applications, and indeed in Chapter 2, the gradient is needed to apply biasing forces to the colvars. Another relevant physical quantity in this work is the ‘inverse gradient’:

휕퐗 휕퐱1 휕퐱2 휕퐱푁 = ( , , … ) (1.10) 휕휉 휕휉 휕휉 휕휉 which is needed in thermodynamic integration (TI) to evaluate the thermodynamic force exerted on the colvars by the entire system (224); this is particularly important in implementations of the Adaptive Biasing Force (ABF) algorithm (222,225,226) that is discussed in Section 1.2.9. When the inverse gradients are integrated over an ensemble of states, a Jacobian derivative is introduced to preserve the density of phase space as a function of 휉:

휕ln|퐽(퐗)| 휕퐱 = ∑ ∇ ∙ 푖 휕휉 퐱푖 휕휉 (1.11) 푖

Each colvar may be harmonically coupled to an extended degree of freedom, which is set to undergo Hamiltonian or Langevin dynamics and this extended coordinate acts as a numerically convenient proxy for the actual variable. Six classes (I–VI) of colvars exist, differentiated by the treatment of their rotational and translational invariance, and on other more specific criteria (214):

24

I. Class I components describe the relative arrangement of geometric centres (individual atoms or groups of atoms) and include the commonly used distance, angle and dihedral colvars.

II. Class II components describe the distribution of atomic positions around their centre-of- geometry or centre-of-mass, via physically measurable quantities such as the radius of gyration or the moment of inertia.

III. Class III components are defined based on the connectivity within a macromolecule, e.g. a DNA molecule, and include projections of a set of dihedral angles onto dihedral principal components (227).

IV. Class IV components are based on distances between all pairs of atoms within two groups, e.g. coordination numbers (228), hydrogen bonds and nuclear Overhauser effects (229).

V. Class V components describe the quasi-rigid orientation of a group of atoms, computed by least-squares fitting against a set of reference coordinates where the full orientation is described by a unit quaternion, from which other variables can be extracted, e.g. relative twist and tilt. A common use of this class of colvar is to prevent a long macromolecule from rotating, thereby allowing for a smaller solvent box, without restricting internal DOFs.

VI. Class VI components use the same approach as Class V components to express the positions of a set of atoms in a rotated frame: within that frame, the RMSD from the reference positions and the projection on a 3N-dimensional displacement vector (eigenvector) can be calculated. The eigenvector colvar in this class is used to calculate the PMF along a principal component, derived from unbiased MD, in Chapter 2.

The ability to interact with MD simulations and to reduce their complexity in an intuitive way provides researchers with a powerful tool for interrogating biomolecular systems. When coupled with the enhanced sampling methodologies discussed in Section 1.2.9, these colvars allow for effective measurements of highly informative energy values.

25

1.2.8 Helical parameter analysis

Extensive work has been done in the DNA MD community to define a reduced-dimensional description of duplex structure and deformability so as to help with deciphering all-atom simulations (70,230-237). This pseudo-harmonic mesoscopic approach assumes that global DNA deformability can be described as a combination of six local harmonic deformabilities (three translational and three rotational) at the base-pair step level. Though this model neglects non-harmonic effects, which can be important to describe large localized structural changes, e.g. kinks (59), it provides a reasonable and well-used representation of DNA structure and flexibility (74,76-78,238,239). Two established programs enable the calculation of these parameters, namely, 3DNA (71,72) and Curves+ (240); both of which are freely available. Once these values have been obtained for a given trajectory the known relation between correlations of quantities with a multidimensional Gaussian distribution and the stiffness matrix (241) allows for their force constants at a specific temperature to be calculated (see Section 3.2). The determinant of the covariance matrix in helicoidal space can also be used to define a “configurational volume” for a base-pair step that accounts for off-diagonal coupling terms (70,232). This simple representation of base-pair level structure and deformability provides an effective way of investigating both sequence and topological effects in double-stranded DNA structures.

1.2.9 Free-energy calculations

For a typical potential 푉, the associated Boltzmann measure 휇 is multimodal; where high- probability regions are separated by low-probability regions. The former correspond to the most likely conformations of a molecule, which are typically separated by transition regions of very low probability. A consequence of this is that estimating averages with respect to the probability measure 휇 is often a difficult task. In particular, the ergodic properties of the dynamics of Eq. 1.2 and 1.3 are not sufficient to devise reliable numerical methods as the stochastic process often remains trapped in high-probability regions. In order to derive accurate free-energy values for large geometric changes in chemical systems, one of the goals of Chapter 2, the quasi- nonergodicity of unbiased MD (in reasonable time) needs to be overcome. Two classes of approaches to improve geometric sampling are described, namely, replica-based methods and biasing potential methods. The optimality of the Adaptive Biasing Force method (ABF) for the

26 free-energy calculations in Chapter 2 is also discussed. Finally, the end-state MM-PBSA method for the calculation of free energies from unbiased structural ensembles is discussed.

1.2.9.1 Replica-based methods

One of the most widely used methods for improved sampling is the replica exchange / parallel tempering MD approach (abbreviated to REMD), where several non-interacting copies of a system are independently and simultaneously simulated at different temperatures (242-245). These temperatures are chosen to span a range from the temperature of interest to much higher values where rapid traversal of potential energy barriers can occur. At intervals during the otherwise standard MD runs, conformations of the system being sampled at different temperatures can be exchanged based on a Metropolis-type acceptance equation:

[+(훽 −훽 )(푈(푟푁)−푈(푟푁))] 퐴 = 푚𝑖푛 {1, 푒푥푝 푖 푗 푖 푗 } (1.12)

where 푈 is the potential energy, i.e. non-momentum energy. Eq. 1.12 considers the probability of sampling a given conformation at an alternate temperature and the transition probabilities are constructed such that canonical ensemble properties are maintained during a simulation. In order to extend this balance-preserving statistical mechanics technique to MD one must also take into account the momenta of the particles. After an exchange, the new momenta for replica 𝑖, 퐩(푖)′, are determined as:

푇 퐩(푖)′ = √ 푛푒푤 퐩(푖) (1.13) 푇표푙푑

3 This procedure ensures that the average kinetic energy remains equal to 푁푘 푇. The 2 퐵 reconstructed ensembles can be used to investigate equilibrium thermodynamic quantities and to identify key transition pathways that can then be explored using additional enhanced dynamical methods (246-250). The choice of the number of replicas and their respective temperatures are important to ensure efficient and effective sampling. A method commonly used for selecting temperatures that result in a uniform acceptance ratio across the replicates, where 23% - 30% is usually optimal (251,252), is:

27

∆퐸 ∆퐸 | = [ ] 휎 휎 (1.15) 푚 푇푖

This approach does require a rough prior knowledge of the average energy and standard deviation dependencies on temperature; which can be achieved by a cursory pre-run. A geometric progression provides an approximation to the method above, without any prior information being needed:

푇 푖 [푙표푔( 푚푎푥)( )] 푇 푁 (1.16) 푇푖 = 푇0푒푥푝 푚푖푛

where 푇0 is the desired final temperature and 푁 is the number of replicates. A significant challenge faced by REMD is the matter of solvent contributions to the energy, where in explicitly solvated systems the water contributions dominate the Hamiltonian. A consequence of this is the need for very small temperature differences between replicates; and to achieve a useful improvement in sampling performance many replicates are needed. Several methods have attempted to alleviate this problem (253), namely: adaptive tempering (254), locally enhanced sampling (LES) (255), and local-/partial- replica exchange MD (LREMD/PREMD) (256). Test runs of these methods were performed on a well-studied protein energy surface, that of N- acetyl-N′-methylalaninamide (NANMA) (222) followed by comparison to two biasing potential methods (metadynamics and ABF) discussed in Section 1.2.9.2. The biasing potential methods were found to be more computationally efficient and had fewer artifacts than the REMD methods. A discussion of these biasing potential methods and the final choice of ABF over metadynamics for the enhanced sampling in Chapter 2 are provided in Section 1.2.9.2.

1.2.9.2 Biasing potential methods

A number of techniques have been developed to improve the efficiency of free-energy calculations associated with geometrical transformations along a colvar (210,224,257-268). The central idea of these techniques is to not sample from the Boltzmann distribution defined by the original Hamiltonian, but rather to sample from a biased distribution that favors important but infrequently visited regions of phase space. Probably the most popular (and oldest) of these enhanced-sampling techniques is umbrella sampling (257), a method that relies on introducing a bias in simulations that favors states corresponding to large values of the free-energy along

28 the colvar. In its original form, umbrella sampling referred to incorporating an external biasing potential in the simulation, ideally the negative of the free-energy, which would yield a broad, if not uniform exploration of the colvar. In other words, in ideal circumstances, the system would evolve in the colvar space on a flat free-energy hyperplane. Under most circumstances, however, the form of the optimal umbrella is unknown. Thus, for any qualitatively new problem, the end-user must resort to an educated guess regarding the shape of the biasing umbrella potential, usually on the basis of prior knowledge of this and related problems. This may constitute a challenging task and poorly predicted biasing potentials can give non-uniform probability distributions across the colvar (269). Adaptive umbrella sampling is one member of a broader family of techniques called adaptive biasing potential methods. Local elevation (260), conformational flooding (210), its more recent avatar metadynamics (264-268,270,271), and the Wang–Landau algorithm for Monte-Carlo (MC) simulations (272,273) are other members of this family. In these methods, Eq. 1.3 becomes:

−1 −1 d퐩푡 = −∇(푉 − 퐴푡 ∘ 휉)(퐱푡)d푡 − 훾퐌 퐩푡d푡 + √2훾훽 d푊푡 − 푄퐩푡 (1.17)

where the idea is to penalize the already visited states by changing the potential 푉 to 푉 – 퐴푇 ∘ 휉,

퐴푡(푧) being related to the occupation time of the value 푧 of the colvar up to time 푡. The longer the time spent in a bin, the larger the biasing potential 퐴푡(푧) where 푧 ∈ [푧푎, 푧푏]. The local elevation technique and metadynamics are based on a similar idea. The biasing potential 퐴푡 is built as a sum of Gaussian kernels that are periodically added to the Hamiltonian along the ξ variable. This pushes the system away from states that have already been visited and, by doing so, improves sampling. It ought to be noted that the potential 퐴푡, rather than its derivative, is computed in these two cases, hence, the name adaptive biasing potential methods. While these methods have been successfully applied to many systems (274-280), they have some technical challenges associated with them. Firstly, the widths, weights, and frequency of the added Gaussian functions require careful tuning, often needing considerable experience. Secondly, these methods will continue to bias the potential unless careful stop conditions are implemented. Finally, if the derivative of the biasing potential is needed the evaluated biasing potential 퐴푡 must itself be differentiated which may lead to noisy results due to 퐴푡 being estimated along a stochastic trajectory In the ABF algorithm (222,225,281), the gradient of the biasing potential is determined from the average force acting within a bin 푧 of a striated colvar 휉:

29

′ A푡(푧) = 피[퐹휉(퐱푡)|휉(퐱푡) = 푧] (1.18)

The estimation of this gradient happens automatically and if the correct free-energy is given as an initial guess, i.e. if V is replaced by V – A◦ξ in Eq. 17, then the biasing force will not be ′ updated and the gradient of the biasing potential, 퐴푡 in Eq. 18, will be constant over time. ′ Another advantage is that 퐴푡 is directly computed and there are no noise issues as is the case with the adaptive biasing potential methods. This ease of application has seen the ABF method applied to free-energy analysis along many different classes of colvar (282-301) and was the basis for its selection for the calculations performed in Chapter 2.

1.2.9.3 The MM-PBSA method

The free-energy methods discussed previously rely upon the existence of a well-defined geometric coordinate (or set of coordinates) that links the states of interest. In some cases such a coordinate is poorly defined, at least at the level of atomic colvars, e.g. the isomerization of DNA 4WJs (302); and the aforementioned enhanced sampling methods are ill-suited to compare the states of interest. An approach that can effectively calculate these types of energy differences is the Molecular Mechanics with Poisson-Boltzmann and Surface Area (MM-PBSA) method that combines force field mechanical calculations with continuum solvent models and solute entropy values (303-306). The method is a post-processing technique that is performed on ensembles of structures from MD simulations that are representative of the different configurational states of interest. The governing equations of this analysis are:

∆퐺푠표푙푣푎푡푒푑 = 퐸푔푎푠 + ∆퐺푠표푙푣푎푡푖표푛 − 푇푆푠표푙푢푡푒 (1.19)

퐸푔푎푠 = 퐸푖푛푡푒푟푛푎푙 + 퐸푒푙푒푐푡푟표푠푡푎푡푖푐 + 퐸푣푑푊 (1.20)

∆퐺푠표푙푣푎푡푖표푛 = ∆퐺푃퐵 + ∆퐺푆퐴 (1.21)

where 퐸푔푎푠 is the phase energy described for a molecule’s conformation by the mechanical force field, ∆퐺푠표푙푣푎푡푖표푛 is the solvation free-energy, 푇 is the temperature, and 푆푠표푙푢푡푒 is the solute entropy . 퐸푔푎푠 comprises the internal energy (퐸푖푛푡푒푟푛푎푙), i.e. contributions from bond, angle and dihedral energies, and energies due to non-bonded electrostatic (퐸푒푙푒푐푡푟표푠푡푎푡푖푐) and van der

Waal’s (퐸푣푑푊) interactions. ∆퐺푠표푙푣푎푡푖표푛 comprises the polar solvation (∆퐺푃퐵) and the non-polar

30 solvation (∆퐺푆퐴) free energies calculated with the Poisson-Boltzmann equation (PBE). Specific details of the parameters used for calculation of the solvation terms using the PBE are available in the Methods section of Chapter 2. The solute entropy contributions are usually approximated using either Andricioaei and Karplus’s or Schlitter’s methods (307,308), where the mass-weighted covariance matrix 푪′ is determined from the covariance matrix 푪 and the mass matrix 푴:

푪′ = 푴1/2푪′푴1/2 (1.22)

푴 contains the atomic masses along the diagonal and is zero elsewhere. Diagonalization of 푪′ by an orthogonal transformation returns the classical variances of the new coordinates 푞푖 whose fluctuations are linearly uncorrelated. The entropy is then estimated as a sum of the contributions ( 푆푖) from all individual coordinate components:

1 푆 < 푆′ = 푘 ∑ 푙푛[1 + (푘푇푒2/ℏ2)〈푞2〉] (1.23) 2 푖 where the calculated estimate 푆′ is an upper bound for the real entropy 푆. It should be stated that accurate values of these are exceptionally difficult to obtain for large molecules and several correction terms needed to be added to the slowly converging and often highly overestimated quasiharmonic (QH) entropy estimate in Eq. 23 (309-314). The fact that these correction factors often require almost an order of magnitude longer sampling times than the QH entropies to converge means that accurate values are hard to achieve in reasonable time, particularly for comparatively rigid molecules like DNA. Even with the challenges associated with calculating entropy values, the MM-PBSA method has found extensive application to nucleic acids (79,157,305), protein-protein interactions (315-317), and protein-ligand binding (318-320)

1.3 OVERVIEW OF THESIS CONTENTS

The overarching goal of this thesis is to better inform current design principles, thereby facilitating the improved function of DNA nano-assemblies and imparting a finer degree of spatial control than before. The principal investigative tool used to achieve this is that of explicit solvent and counterion MD simulations, the capabilities of which have been expanded upon in

31 the preceding section. Coupled with well-established post-processing and free-energy methodologies, these simulations are used to unpack experimentally observed phenomena and to highlight the impact of sequence and topology on the functioning of DNA nanostructures. Chapter 2 describes an investigation into the atomic-level interactions responsible for the sequence dependence of 4WJ stacked-isomer bias and the effect of topology on motif deformability. Various topological and sequence variants of the canonical Seeman J1 junction are analyzed structurally and energetically, providing a detailed description of the effect these variations have on stability. B-form duplexes, nicked-duplexes, single-crossovers and 4WJs are compared and contrasted, for the first time, in terms of base-pair level mean structure and deformability. This study also presents the first theoretical support of a previously proposed sequence-dependent electrostatic potential contribution to junction isomer bias and reveals that the junction core sequence confers a unique handedness to the system. This study is presented as a manuscript being prepared for submission to Nucleic Acids Research and supporting information is included in Appendix A. Chapter 3 details a collaborative investigation into the impact of junction sequence and hairpin motif design on the structure and function of a tweezer-actuated nanoreactor. Single molecule imaging, computational modelling, and enzymatic assaying techniques are used to redesign a published reactor system for improved enhancement of enzymatic turn-over rate. The MD simulations performed in this work reveal how subtle base-level interactions can have a profound impact on the 3D structure of a nanodevice and this work is currently under review at Nanoscale. The supporting information for this project can be found in Appendix B. Chapter 4 highlights two published applications of all-atom MD to the parameterization and validation of finite-element method (FEM) models for DNA origami ground-state structure determination and equilibrium Brownian Dynamics (BD) sampling. The manuscripts for these publications are included as Appendix C and Appendix D respectively. The studies performed in this thesis provide a detailed description of the effect that sequence and topology can have on junction structure and, by extension, on nanodevice function. They also showcase the capabilities of atomistic simulation to both elucidate the physico-chemical factors responsible for observed experimental phenomena and to inform robust CG models for the efficient simulation of large origami systems.

32

CHAPTER 2

SEQUENCE EFFECTS IN IMMOBILE DNA FOUR-WAY JUNCTIONS: A COMPUTATIONAL STUDY

Immobile 4WJs are a fundamental nucleic acid important in and DNA nanotechnology. A deeper, sequence-level understanding of the structure, stability, and flexibility of this motif is essential for achieving Angstrom-level spatial control over nucleic acid nanodevices. Under high salt conditions, or in the presence of polyvalent ions, 4WJs exist in one of two stacked isomers when free in solution but are forced into one or the other configuration when part of programmed nanostructure arrays. Experiments have previously shown that local sequence around the site of strand crossover has a substantial impact on the conformational bias for one or both isomers. Here, we use explicit solvent and counterion MD simulations to interrogate the base-pair level interactions that influence conformational state preference, using the classic Seeman J1 junction system as a reference. A pseudo-alchemical path approach is used to investigate how altering the strand topology to form nicked-duplexes and single-crossovers at the junction core impacts B-form DNA and crossover geometry and stability. Furthermore, the effect of local sequence is investigated through the simulation of a second junction core sequence that possesses the same GC-content as J1 but differs in base stacking. Analysis of the base-pair degrees of freedom for each step along this path, in addition to free-energy calculations and reduced-coordinate sampling of the different 4WJ isomers, reveal the importance of base sequence on local structure and dynamics, and suggest how these impact isomer bias and global junction behaviour.

†Supplementary information is located in Appendix A: Mean step parameter deviations for topological variants, comparison of duplex mean step parameter values to literature values, step parameter stiffness constant deviations for topological variants, comparison of duplex step parameter stiffness constants to literature values, deformation volumes for all topological variants, junction configurational entropy plots, fractional variance plots for the junction backbone meta-ensemble, blocking curves of the second principal component for all junction replicates.

This chapter is being prepared for submission to Nucleic Acids Research as: Matthew R. Adendorff and Mark Bathe, “Sequence effects in immobile DNA four-way junctions: A computational study”

33

2.1 INTRODUCTION

Nucleic acid nanotechnology has developed into a mature synthetic technique for the programmed design of hierarchical molecular assemblies (1-7). These constructs have found applications in a number of research endeavours, including biomimicry (8), energy transfer / photonics (10,11), drug delivery (12,13), and acuated nanoreactor design (14). The need for improved spatial precision of nanostructures, with the goal Nature-like accuracy over the engineered systems, comes with the requirement of Angstrom-level control (45-48). To achieve this, an understanding of the atomic physico-chemical interactions and temporal dynamics inherent to fundamental structural motifs, e.g. duplexes (49), nicked duplexes (50), single- crossover junctions (35) and particularly four-way junctions (4WJs) (22,53,54), is essential. The immobile 4WJ has been the central component of structural DNA nanotechnology since Seeman’s foundational work (1,16) and it continues to be the ‘glue’ that holds contructs together. The three-dimensional structure of these DNA junctions in isolation has been extensively studied experimentally, with literature stretching back several decades (18,54,55). The 4WJ can exist in one of three global conformations, inlcuding two stacked antiparallel junction isomers and a cross-shaped Open-X conformation (Figure 2.1A). The stacked forms are only adopted in the presence of polyvalent ions or in a high salt environment (56-58) since without screening of the phosphate charges concentrated at the junction center, the system will adopt the open-X form (59) to minimize repulsions. The relative populations of these stacked configurational isomers have been shown to be sensitive to both the Mg2+ concentration (60) and to the sequences near the site of strand crossover (56,61-65). The conformational heterogeneity of the junction has also been shown to be sequence dependent through NMR analysis of different junction core arrangements (62). While the primary topological difference between these isomers is a change in the base-pair stacking sequence at the junction core, the difference in stacking energies (54,66,67) for these changes is insufficient to solely account for the experimentally obtained isomer ratios. A contribution of geometric constraints at the junction core to the free-energy has been considered (55) and the effect of a sequence-dependent electrostatic potential on junction stacking preferences has been suggested by experiment through selective base charge neutralization (68); however, a rigorous investigation into the topological and energetic contributions to junction isomerization bias is still lacking. Furthermore, the impact these contributions have on junction dynamics, a consideration in designed arrays of interconnected junctions, is not known.

34

Modern implementations of explicit solvent all-atom molecular dynamics (MD) have proven to be highly effective for elucidating the physical interactions responsible for experimental phenomena (69). Applying these methods to nucleic acid structures has given valuable insights into B-form structure (70-74), sequence-level interactions (75-78), duplex free- energy contributions (79-81), large origami relaxation (82), and junction dynamics (82-86). While several investigations into 4WJ structure and dynamics have been performed with MD, no dedicated analysis of sequence-level structure and energetics in isolated asymmetrical and immobile junction isomers has been performed. In this work we use all-atom MD to simulate multiple replicates of both stacked isomers for the Seeman J1 sequence and for a variant sequence (Figure 2.1B) that posesses the same GC content as J1 but differs in its core base stacking arrangement. Calculations of the base-pair level structure are performed on the resulting trajectories using established helical parameter analysis methods and the distributions of these parameters, as well as their inherent flexibilities, are compared to reference B-form duplex, nicked duplex and single crossover topological variants of the J1 junction. The per-base free energies of isomerization are then determined from structural ensembles of each junction sequence using end-state free-energy calculations. Finally, the essential dynamics of 4WJs are derived from meta-ensemble principal component analysis (PCA) of the junction trajectories and the potentials of mean force (PMFs) along the principal junction twist coordinate are calculated using enhanced sampling with a biasing potential.

Figure 2.1. Modelling DNA 4WJ isomers. (A) The three major isomers of DNA 4WJs. Switching between the Iso I and Iso II junction forms alters the base stacking at the site of strand crossover, with the X3/X8

35 and X7/X4 pairs being replaced with X5/X2 and X1/X6. (B) The junction systems used in this study, shown in the Open-X configuration. Core base-pair sequences, denoted X1-8, are assigned to sequences of J1 (top) and J24 (bottom).

2.2 MATERIALS AND METHODS

2.2.1 Junction nomenclature

A shorthand notation is introduced in this work for referencing junction, isomer, helical arm and base-pair step identities. A particular base-pair step (Figure 2.2A), defined between two 푖 adjacent base-pairs, is denoted using the form 푥 (ℎ,푏), where 푥 is the junction core sequence, i.e. J1 or J24, 𝑖 is the isomer number, ℎ is the helix number for the pseudo-duplexes and 푏 is the base-pair step index. The helices are defined such that helix 1 includes strand I in isomer 1 and strand II in isomer 2; helix 2 includes strand III in isomer 1 and strand IV in isomer 2. The base- pair step sequence and numbering follows the 5’ to 3’ sequence of strand I in isomer 1 helix 1, strand III in isomer 1 helix 2, strand II in isomer 2 helix 1, and strand IV in isomer 2 helix 2. The terminal three base-pair steps on either end of the helical arms are not included in helical parameter analyses and so the numbering starts at the fourth base-pair step in the helix and runs up to and includes the twelfth step. Several topological variants of the J1 junction system are also investigated in this work. The first two of these variants are the single-crossovers, motifs that only have one crossing strand at the junction site. When the crossing occurs on the left-hand side of the junction frame shown in Figure 2.2A the variant is termed the SXB construct and if the crossing occurs on the right it is termed the SXD construct, for a given isomer. An extension of the previously introduced junction nomenclature is used for indexing, e.g. the single-crossover equivalent of J11, with strand II performing the crossing action, is denoted SXB1 and the equivalent of J12, with strand I performing the crossing action, is SXD2. The second type of variant is the nicked-duplex, which is a B-form duplex, with the same base-pair sequence as a junction arm pseudo-duplex but with a phosphate backbone break at the junction crossover 1 site, e.g. between X3 and X8 in J1(1). The indexing for the nicked-duplex uses “N” as the 2 2 nomenclature prefix, e.g. the nicked-duplex equivalent of J1(1) is written as N1(1).

36

2.2.2 Construction of all-atom systems

The junction strand sequences used were identical to previously used experimental systems used for J1 (16) and J2 (62), up to and including the fifth neighbour from the junction core. All- atom PDB files for J11, J12 J241 and J242 were generated using the atomic structure generator in the lattice-free implementation of CanDo (22), with a starting Jtwist (321) value of 60˚ (right handed junction). The system was immersed in TIP3P water (322), explicit Na+, Mg2+, and Cl- ions were added to neutralize DNA charges and to set the simulation cell Na+, and Mg2+ ion concentrations to 50 mM and 5 mM, respectively. Duplex, nicked-duplex and single crossover (B-strand and D-strand) equivalents of the J1 junction isomers were generated from the initial junction structures using DS Visualizer (323) and then solvated / ionized in the same manner as the junction systems.

2.2.3 Molecular dynamics simulations

All simulations were performed using the program NAMD2.10 (324) with the CHARMM27 force field (168,201) and Allnér Mg2+ (325) parameters with an integration time step of 2 fs and periodic boundary conditions applied in an orthogonal simulation cell. Van der Waals energies were calculated using a 1.2 nm cut-off with a switching function applied from 1.0 to 1.2 nm and the pair list distance at 1.4 nm. The Particle Mesh Ewald (PME) method (326) was used to calculate full electrostatics with a minimum grid point spacing of 0.1 nm. Full electrostatic forces were computed every two time steps (every 4 fs) and non-bonded forces were calculated at each time step (2 fs). Simulations were performed in the NpT ensemble using the Nosé-Hoover Langevin piston method (199,327) for pressure control with an oscillation period of 200 fs and a damping time of 100 fs. Langevin forces (198) were applied to all heavy atoms for temperature control (298 K) with coupling coefficients of 5 ps-1. All hydrogens were constrained to their equilibrium lengths during the simulation and system configurations were recorded every 1 ps for downstream analysis of coordinates. Energy minimization was always performed on the orthogonal simulation cell prior to dynamics using the conjugate-gradient and line search minimizer implemented in NAMD2.10, first on the solvent and ions alone for 10,000 steps with all nucleic acid atoms spatially constrained, followed by an additional 10,000 steps with all atoms unconstrained. All parameters for the minimization were identical to those used for dynamics. The system was then slowly heated (1 K per 10 ps) to 300 K and the pressure was allowed to equilibrate to 1 atm prior to production run MD. All simulated systems (whole

37 junction, and duplex, nicked-duplex, and single crossover equivalents) were run for 300 ns, in triplicate (900 ns total production time per system). The first 60 ns of each simulation was considered as equilibration time and was not used for subsequent analysis.

2.2.4 Base-pair parameter analysis

DNA base-pair parameters were calculated at 10 ps intervals for all trajectories using the 3DNA program (71,72) with the MDAnalysis x3DNA interface (328). The stiffness matrix in base-pair step helicoidal space (rise, shift, slide, roll, tilt, twist) was calculated from the simulation- obtained helicoidal covariance matrix, 푪ℎ, as:

−1 푪ℎ = 푘퐵푇푭ℎ (2.1)

−1 where 푭ℎ is the helicoidal stiffness matrix, 푭ℎ is its inverse, 푘 is Boltzmann’s constant and 푇 is the temperature. The determinant of 푪ℎ (analogous, in this case, to the product of the eigenvalues of the matrix) was used to calculate the conformational volume of a base-pair step which takes into account the off-diagonal stiffness terms in a general description of base-pair step flexibility (70,232).

2.2.5 MM-PBSA free-energy analysis

The end-state MM-PBSA method (79) was used to estimate the free-energy of the J1 and J24 isomer systems using the following equations:

퐸푠표푙푣푎푡푒푑 = 퐸푔푎푠 + 퐸푠표푙푣푎푡푖표푛 − 푇푆푠표푙푢푡푒 (2.2)

퐸푔푎푠 = 퐸푖푛푡푒푟푛푎푙 + 퐸푒푙푒푐푡푟표푠푡푎푡푖푐 + 퐸푣푑푊 (2.3)

퐸푠표푙푣푎푡푖표푛 = 퐸푃퐵 + 퐸푆퐴 (2.4)

where 퐸푔푎푠 is the gas phase energy described for a molecule’s conformation by the mechanical force field, 퐸푠표푙푣푎푡푖표푛 is the solvation free-energy, 푇 is the temperature, and 푆푠표푙푢푡푒 is the solute entropy . 퐸푔푎푠 comprises the internal energy (퐸푖푛푡푒푟푛푎푙), i.e. contributions from bond, angle and dihedral energies, and energies due to non-bonded electrostatic (퐸푒푙푒푐푡푟표푠푡푎푡푖푐) and van der

38

Waal’s (퐸푣푑푊) interactions. 퐸푠표푙푣푎푡푖표푛 comprises the polar solvation (퐸푃퐵) and the non-polar solvation (퐸푆퐴) free energies. The MMPBSA.py module (329) of AmberTools14 (330) was used to run all calculations with CHAMBER-generated topologies (331). 24,000 frames (10 ps between frames) were used from each trajectory to generate the energy averages with 72,000 frames analysed per isomer.

Gas phase energies (퐸푔푎푠) were calculated from the CHARMM27 force field with 1-4 non- bonded interaction energies summed into their corresponding van der Waal’s and electrostatic terms.

Both solvation terms (퐸푃퐵 and 퐸푆퐴) were calculated using the pbsa module in sander, included in AmberTools14. The electrostatic field, including the solvent reaction field and the Coulombic field, was described with the Poisson-Boltzmann equation (PBE) (332,333):

푧푖휙(풓) ∇ ∙ (휀(풓)∇휙(풓)) = −4휋휌(풓) − 4휋휆(풓) ∑ 푧푖푐푖exp(− ) (2.5) 푘퐵푇 푖 where 휀(풓) is the dielectric constant, 휙(풓) is the electrostatic potential, 휌(풓) is the solute charge, 휆(풓) is the ion exclusion or Stern layer masking function describing ion accessibility of position 풓, 푧푖 is the charge of ion type 𝑖, 푐푖 is the bulk number density of ion type 𝑖 far from the solute, 푘퐵 is the Boltzmann constant, and 푇 is the temperature; the summation is over all ion types. The non-linear PBE (334) was solved on a 0.5 Å spacing finite difference grid with 5,000 iterations per frame. The solute and solvent dielectrics were set to 1.0 and 80.0 respectively, with the weighted harmonic average level set function being used for boundary grid edges. The solvent excluded surface (335) was used to define the dielectric boundary and was generated using a 1.4 Å solvent probe radius and CHARMM atomic radii. Per-base decomposition was performed in AmberTools to obtain the interaction energies of each base with all other bases. Decomposition of the per-base surface area contributions was performed using the Linear Combination of Pairwise Overlaps (LCPO) method (336). Full molecule surface area contributions were calculated with pbsa. The solute entropy was approximated from the covariance matrix obtained in each simulation using Schlitter’s method (308), where the mass-weighted covariance matrix 푪′ is determined from the covariance matrix 푪 and the mass matrix 푴:

푪′ = 푴1/2푪′푴1/2 (2.6)

39

The matrix 푴 contains the atomic masses along the diagonal and is zero elsewhere. Diagonalization of 푪′ by an orthogonal transformation returns the classical variances of the new coordinates 푞푖 whose fluctuations are linearly uncorrelated. The entropy is then estimated as a sum of the contributions ( 푆푖) from all individual coordinate components:

1 푆 < 푆′ = 푘 ∑ 푙푛[1 + (푘푇푒2/ℏ2)〈푞2〉] (2.7) 2 푖 where the calculated estimate 푆′ is an upper bound for the real entropy 푆.

2.2.6 Essential dynamics analysis

An ensemble of junction coordinates obtained from MD simulation was subjected to principal component analysis so as to extract the essential modes (203,206), the coordinate fluctuations responsible for most of the variance observed in the system. A meta-ensemble (204) was generated from the backbone atoms (CHARMM codes: P, O1P, O2P, C4', O4', C1', C2', O5', C5', C3', and O3') of all four isomer simulation sets comprising 2.88 μs of total dynamics time. The terminal three base-pairs at both the 5’ and 3’ ends of the junction arm pseudo-duplexes were excluded from this ensemble to avoid the influence of end-effects. The average structural coordinate set for a given ensemble was generated through iterative superposition of all trajectory frames. Initially, all frames were superposed onto the first trajectory frame, followed by calculation of the mean coordinates and the setting of these as the new reference set. This process was repeated until the total RMSD was lower than 0.0001 Å. Once all frames had been aligned to the average structure, the covariance matrix 푪 was constructed:

푪 = 〈(풙(푡) − 〈풙〉)(풙(푡) − 〈풙〉)푇〉 (2.8) where 〈∙〉 denotes the ensemble average and 푪 is a symmetric matrix. Diagonalization of 푪 was then performed by an orthogonal transformation 푻:

푪 = 푻Ʌ푻푻 (2.9)

40 where Ʌ is the eigenvalue matrix and 푻 contains the eigenvectors of 푪. The eigenvalues (휆) relate to the mean square fluctuations (in Å2) along the eigenvector coordinates and represent the contribution of each component to the total system fluctuation. The six zero eigenvalues, corresponding to global rotation and translation (removed through the superposition step), and their eigenvectors were discarded and the remaining vectors were sorted in descending order by their eigenvalues. The original molecular configurations were projected onto the first three principal components to yield their principal coordinates 푝푖(푡):

푝푖(푡) = 휈푖 ∙ (풙(푡) − 〈풙〉 (2.10)

푡ℎ 2 where 휈푖 is the 𝑖 eigenvector of 푪 and 〈푝푖 〉 = 휆푖. For visualization purposes, projections along eigenvectors were transformed back to Cartesian coordinates with:

∗ 풙푖 (푡) = 푝푖(푡) ∙ 휈푖 + 〈풙〉 (2.11)

All essential mode analyses were performed with the ProDy package (337).

2.2.7 Enhanced sampling with the Adaptive Biasing Force method

The free-energy profile, 퐴(휉), along the largest variance essential mode of the junction backbone meta-ensemble was calculated as a function of the colvar ξ, here representing Jtwist; defined as the projection of a junction system’s backbone atom deviations from the meta- ensemble’s average structure coordinates onto a vector in 푅3푁, where 푁 is the number of atoms. The computed quantity is:

푁 푟푒푓 푟푒푓 푟푒푓 푝({풙푖(푡)}, {풙푖 }) = ∑ 풗푖 ∙ (푈(풙푖(푡) − 풙푐표푔(푡) − (풙푖 − 풙푐표푔)) (2.12) 푖=1

푟푒푓 where 푈 is the optimal rotation matrix, 풙푐표푔(푡) and 풙푐표푔 are the centers of geometry of the current and reference positions and 풗푖 are the per-atom components of the vector (214). 퐴(휉) was obtained using the NAMD2.10 implementation of the Adaptive Biasing Force (ABF) method of thermal integration (222,338). The algorithm comprises two steps: (1) the

41 thermodynamic force acting along 휉 is extracted from the unbiased simulation and (2) the position-dependent average force is then subtracted from the instantaneous force, adapting over sampling time. After sufficient sampling the energy surface along ξ is effectively flattened, removing energy barriers along the colvar, and the system will diffuse as a result of the random fluctuating force acting on it along the other degrees of freedom. Free-energy changes are then calculated by integrating the average forces along 휉. Free-energy calculations on the junction systems were divided into five windows, sampling from -5 Å to 10 Å RMSD (-90˚ to +90˚) along ξ. The windows covered the intervals [-5, -1], [-2, -3], [1, 6], [4, 8] and [6, 10], and each window was subdivided into 15 bins for the force calculations; generating the PMF at 46 discrete points. Harmonic wall restraints were applied at the edges of the windows to keep the solute within the desired region. Initial ABF simulations of 100 ns per window were performed for each of the four junction systems, followed by additional runs in under-sampled bins to ensure a minimum per- bin sampling of 5 ns. Blocking analysis, using the method of Flyvbjerg and Petersen (339), was performed on projections of the unbiased MD 4WJ trajectories onto the second slowest essential mode (Jroll) to obtain a minimum number of steps of ~4 ns. The average sampling times per bin were 10.2 ns (J11), 10.4 ns (J12), 10.1 ns (J241), and 10.6 ns (J242). An upper limit for the standard error in the ABF calculations, 푆퐷[∆퐴퐴퐵퐹] (340), for a free-energy difference, 퐴퐵퐹 ∆퐴 , between points 휉푎 and 휉푏 along the colvar, 휉, was estimated as:

휎 푆퐷[∆퐴퐴퐵퐹] ≈ (휉 − 휉 ) (1 + 2휏)1/2 (2.13) 푏 푎 퐾1/2 where 휎2 is the variance of the force along 휉, 퐾 is the total number of force evaluations and 휏 is the correlation length for the calculated force.

2.3 RESULTS AND DISCUSSION

2.3.1 Immobile 4WJs deviate from canonical B-form structure

The six base-pair step helical parameters provide a useful, widely-adopted coarse-grained description of DNA (76). In the of known sequence level effects on DNA structure, we sought to extend these six helical parameters to the 4WJ motif to highlight regions of interest. The deviations from reference canonical B-form duplex simulations of the junction pseudo- duplex arms were calculated for both isomers of J1 as well as for several topological variants of

42 the J1 sequence, as detailed in the Methods section. A hexagonal plot representation (Figure 2.2A) is used to show the deviations in all six parameters for each given step in a particular junction/variant system. Two sets of scale bars are shown as rings within each plot, the inner representing a deviation of 0.5 Å for translational parameters (shift, slide, and rise) and a deviation of 5˚ for rotational parameters (tilt, roll, and twist); the outer representing deviations of 1.0 Å and 10˚ respectively.. The J1 system exhibits pronounced negative mean parameter changes at the core step (step 5) and smaller deviations are present up to two neighbouring steps away (Figure 2.2B). Negative shift is observed in all core steps along with large untwisting (negative twist) of the pseudo-duplexes on one helix per isomer at step 5. Rupturing one crossover link removes most of these deviations with the SXB (Figure 2.2C) and SXD (Figure A1†) systems showing only slight relative changes compared to the full junction, and the systems (Figure A2†) behave almost exactly like B-form DNA in terms of mean step parameter values. The core steps of the J24 junction system (Figure A3†) show significant untwisting 1 2 2 at J24(1,5), J24(1,5) and J24(2,5) and line plots of all absolute mean values are included in the supplement (Figures A4-7†). The results presented here indicate a local-sequence effect is present as deviations are occurring at base-pair steps of known high flexibility, i.e. untwisting of 1 2 2 the pyrimidine-purine TG steps at J1(1,6) and J1(2,5) and the TA step at J24(1,5). Changes in mean 1 2 values are also apparent at the comparatively stable AG and CC steps at J1(1,5) and J24(2,5) respectively, indicating a global topological effect. Step 5 in isomer 1, helix 1 and isomer 2, helix 2 in both junction systems is deformed, irrespective of relative flexibility, suggesting that the geometric constraints of the 4WJ motif are contributing to the local sequence effects. It should be noted that as J24 has some local sequence differences to J1 around the core, reference B- form values were taken from literature simulations (238) for these steps. A comparison of the B- form simulation-obtained values in this work with these literature values (Figures A8-11†) shows that the slight differences in solvent and temperature conditions between the two sets of CHARMM simulations has a negligible effect on the mean values and hence the comparisons are reasonable.

43

Figure 2.2. The DNA pseudo-duplexes within 4WJs can deviate significantly from canonical B-form structure. (A) Six helical step parameters are calculated for each base-pair step comprising the two junction helix arms (H1 and H2). An atomic representation is shown (middle, left) with step 5 of both helices shown in purple. Deviations in mean parameter values from reference B-form simulations for each step are shown as wedges in a hexagon (representative plot at lower left). The inner gray line represents a deviation of 0.5 Å for translational parameters (shift, slide, and rise) and a deviation of 5˚ for rotational parameters (tilt, roll, and twist); the outer black line represents deviations of 1.0 Å and 10˚ respectively. Negative deviations are coloured red, positive deviations are coloured blue, and the absence of a wedge indicates a deviation of less than 0.2 Å or 2˚. Within the base-pair step coordinate frame, the x-axis points away from the minor groove edge of a base-pair, the y-axis points towards the complementary base, and the z-axis points in the direction of helical rise. (B) Deviations in step parameter mean values for the J1 junction systems (right) and their topologies (left). Isomer 1 is shown on top (first two helices) with isomer 2 below (third and fourth helices) for all systems. Base-pair step sequences are shown for strands I and III (red, orange) for isomer 1; and strands II and IV (blue, green) for isomer 2; the 5’ to 3’ step sequence is read right to left for helix 1 and left to right for helix 2 for all systems. (C) Deviations in step parameter mean values for the SXB junction systems (right) and their topologies (left).

44

2.3.2 Flexibility changes propagate to several steps from the junction core

Changes in the stiffness parameters along the helical step degrees of freedom can indicate whether any de-/stabilizations of the base-pair steps are occurring as a consequence of the 4WJ motif. The deviations from the diagonal stiffness matrix terms of reference canonical B- form duplex simulations of the junction pseudo-duplex arms were calculated for all systems. A hexagonal plot representation (Figure 2.3) is used to show the stiffness deviations in all six parameters for each given step in a particular junction/variant system. Two sets of scale bars are shown as rings within each plot, the inner representing a deviation of 1 kcal/mol.Å2 for translational parameters (shift, slide, and rise) and a deviation of 0.005 kcal/mol.degree2 for rotational parameters (tilt, roll, and twist); the outer representing deviations of 2.0 kcal/mol.Å2 and 0.01 kcal/mol.deg2 respectively. These scales were chosen in accordance with literature values (75), where the lowest translational and rotational stiffness constants were 1.21 kcal/mol.Å2 and 0.015 kcal/mol.deg2 when simulated with the CHARMM field. The data shows that flexibility changes are more abundant and propagate further along the junction arms than mean-value deviations, with J1 exhibiting only reductions in stiffness. Significant changes occur at the junction core and the softening of specific parameters corresponds to the mean-value 1 deviations observed in Figure 2.2B, i.e. the high flexibility of steps J1(1,4−7) and the higher twist 2 2 flexibility of J1(2,5) compared to J1(1,5) at the core. Although the nicked-duplex systems show minimal mean value changes from B-form, they have increased rise and twist flexibility, both at the core and at the first neighbour. This change at the core is consistent with the structural change of breaking the rotationally-confining helical backbone at step 5 and the extension of effects to the first neighbour is in agreement with simulations of DNA tetranucleotide sequence effects (76). The SXB and SXD systems show reduced flexibility changes when compared to J1, particularly at non-core steps, with minimal propagation along the arms. The two systems show different distributions of flexibility changes, however, indicating the effect of topology. J24 shows less flexibility propagation out from step 5 than J1, with most changes occurring at the core, (Figure A12†) and the softening of specific parameters also corresponds to mean-value † 1 2 1 deviations observed in Figure A1 , i.e. softer twist in J24(1,5) and J24(1,5) than J24(2,5) and 2 2 1 J24(2,5). The general flexibilities of the TA and TG steps at J24(1,5) and J24(1,5) are also consistent with the known deformabilities of these two sequences. As in the previous section, line plots for all absolute stiffness value measurements are included (Figures A13-16†) and comparisons of the stiffness values from the B-form simulations with literature values are

45 included (Figures A17-20†) to support the use of these values for the J24 calculations. The results presented here support the previously reported mean value data and indicate that an increase in base-pair step flexibility is occurring both as a consequence of topology and as a consequence of local sequence.

Figure 2.3. The effect of junction topology on base-pair step flexibility. Deviations in the helical step parameter stiffness constants from those of canonical B-form DNA are shown for the J1 junction systems (left, center), the nicked duplex systems (right, top), the SXB junction systems (right, middle), and the SXD junction systems (right, bottom). Base-pair step sequences are shown for strands A (red in isomer 1, blue in isomer 2) and C (orange in isomer 1, green in isomer 2); the 5’ to 3’ sequence runs right to left for helix 1 and left to right for helix 2 for both isomers. Topology schematics for J11 and J12 are shown top left and bottom left respectively. Complementary schematics for the nicked-duplex, SXB and SXD systems are shown top right, middle right, and bottom right respectively. The inner gray line in each hexagon represents a deviation of 1 kcal/mol.Å2 for the translational parameters and of 0.005 kcal/mol.deg2 for the rotational parameters; the outer black line represents deviations of 2 kcal/mol.Å2 and 0.01 kcal/mol.deg2 respectively.

46

2.3.3 Base-pair step deformability is a consequence of both topology and local sequence

In order to obtain a full description of base-pair step deformability, the off-diagonal coupling terms of the stiffness matrix need to be considered (232). The configurational volume metric, calculated as the product of the eigenvalues of the covariance matric in helical parameter space, has been used to characterize both X-ray crystallography and MD data (70,232) and is used here to summarize the relative deformabilities of all the analyzed systems. A modification of the hexagonal plots in Figure 2.2 and Figure 2.3 is used to compare the per-step configurational volumes of the different topological (duplex, nicked-duplex, and single- crossovers) and sequence (J24) variants in Figure 2.4A. The two log10 scale bar rings represent volumes of 101 and 102 Å3deg3 respectively. The sequence-level heterogeneity in step flexibility is immediately apparent, i.e. the flexible TA step at step 8 in isomer 1, helix 2 compared to the rigid AT step at step 8 in isomer 2, helix 2 in all systems. The effect of changes in local 1 sequence can be seen in how the TG step at J1(1,6) is significantly more flexible in this location 2 than when moved to J1(2,4) where both have the same proximity to the junction. The increased 1 flexibility at J1(1,6) compared to other topological variants also indicates the significance of 1 topology on local DNA structure. J1(1−2,5) steps have notably lower flexibilities, by approximately 2 an order of magnitude, than their J1(1−2,5) equivalents with the same relative relationship present 1 2 between their N1(1−2,5) and N1(1−2,5) counterparts, though at slight lower volume values. As the J11 isomer is known to be significantly favoured over J12, this result is significant as it suggests a relative energetic stability at the core in J11 and this will be further investigated in the following section. Also, as the nicked-duplex systems show a similar pattern to the J1 isomers, this effect is likely due to both the local sequence at the junction core and the geometric constraints applied by the 4WJ motif. The SX systems exhibit generally smaller deformabilities at the core than their junction and nicked-duplex equivalents, except at step 5 in isomer 2, helix 2 where they are as flexible as their counterparts. The two systems do show distinct deformability distributions, however, in agreement with their mean value deviations and this suggests that their topology has an effect on local structure. The relative stability of single-crossovers, at least in isolation, is in agreement with their extensive use in tile-based applications (36). The J24 sequence shows increased deformability in both isomers at the junction core, with little propagation to neighbouring steps, in contrast to J1 which exhibits increased deformability up to two neighbours from the core. The effect of local sequence can also be seen in the major

47

1 1 reduction in configurational volume when the TG step J1(1,6) is changed to an AG in J24(1,6). Line plots of the volume values are included in Figure A20†.

Figure 2.4. Junction base sequences confer unique free energies and base-pair step deformability. (A) The configurational volumes for all six topological variants are shown as wedges in a hexagonal plot (left). The inner gray line in each hexagon represents a volume of 10 Å3.deg3; the outer black line represents deviations of 100 Å3.deg3. Topologies of the J1 and J24 junction systems are shown to the left and right of the dotted line. Isomer 1 is shown on the top and isomer 2 on the bottom for both core sequences. Base- pair step sequences are shown for strands A (red in isomer 1, blue in isomer 2) and C (orange in isomer 1, green in isomer 2); the 5’ to 3’ sequence runs right to left for helix 1 and left to right for helix 2 for both isomers. Base-pair step sequences unique to J24 are given in parentheses. (B) Per-base internal (∆퐸퐼푁푇), van der Waals (∆퐸푉퐷푊), gas-phase electrostatic (∆퐸퐸퐿퐸), polar solvation (∆퐸푃), non-polar solvation (∆퐸푁푃), and total (∆퐸푇푂푇) free-energy differences for each chain in the J1 systems and (C) J24 systems. Chains are colored according to the topology schematics and shaded regions represent the standard deviations across replicas.

48

2.3.4 Free energies of isomerization

The significant differences in configurational volumes between the isomers of J1 and the comparable values for those of J24 suggest that unique atomic-level interactions are occurring at or near the 4WJ cores. To unpack the various contributions to the junction energy of isomerization, ∆퐸푖푠표 = 퐸퐼푠표1 − 퐸퐼푠표2, the MM-PBSA end-state free-energy analysis method was applied to structural ensembles of each isomer. Figure 2.4B shows plots of the various per-base energy contributions to the total isomerization energy for each chain in J1. J11 is more stable 2 than J1 in all energy terms (∆퐸퐼푁푇, ∆퐸푉퐷푊, ∆퐸푁푃, ∆퐸푃) except gas-phase electrostatics (∆퐸퐸퐿퐸), though this is offset by the large polar-solvation energy difference to give a total energy difference of -8.2 kcal/mol-1. It should be noted that all energy totals presented do not include the solute entropy contribution as the quasiharmonic approximation has not converged (Figure A22†), though these contributions would most likely favour the more deformable systems in

Figure 2.4A. The plots of ∆퐸푇푂푇 in Figure 2.4B show that strands I, III, and IV are all more stable in isomer 1 at the core bases, with only strand II favouring isomer 2 at bases 7 and 8. These two bases, along with the non-core base 6, correspond to the region of high deformability 1 at J1(1,5−7) around the flexible TG step in Figure 2.4A. The largest energy differences occur at the core bases (indices 7-10) where bases are being changed from a linear configuration (chains I + III in isomer 1, II + IV in isomer 2) into a “bent” configuration (II + IV in isomer 1, I + III in isomer 2) though energy changes are also seen at neighbouring bases. In the J24 system the 1 isomers are energetically similar, with J24 being favoured in the ∆퐸푉퐷푊, ∆퐸푁푃 and ∆퐸푃 terms 2 and J24 in ∆퐸퐼푁푇 and ∆퐸퐸퐿퐸 for a total energy difference of 0.4 kcal/mol. Looking at the ∆퐸푇푂푇 plots for J24, strands I + III favour isomer 1 at the core, while II + IV favour isomer 2. The J24 energy changes are more focused at the core than J1, consistent with the reduced deformability of non-core bases in Figure 2.4A relative to their J1 counterparts. The data confirms that stacking energies alone, represented by the ∆퐸푉퐷푊 term, are not the only interactions that change upon isomerization as both geometric (∆퐸퐼푁푇) and electrostatic (∆퐸퐸퐿퐸 + ∆퐸푃) factors contribute to the total energy difference. Furthermore, the presence of electrostatic interactions is in agreement with the previously suggested hypothesis of a sequence-dependent electrostatic potential contributing to the isomerization energy (55,68) and is consistent with the results of Mg2+ pulsing experiments (60).

49

2.3.5 Core sequences exhibit unique junction dynamics

As DNA nanostructures are often comprised of arrays of interconnected 4WJs, knowledge of sequence effects on global dynamics is necessary to exert a fine control over a nanodevice. PCA was performed on the Cartesian coordinates of all the equilibrated junction data, ~2.9 μs in total, to obtain the essential dynamical modes of an averaged 4WJ. The highest variance modes were found to be the in-plane scissor-like , Jtwist (Figure 2.5A), and the rolling of the junction arms, Jroll (Figure 2.5B). Definitions of these motions have both been reported in crystallographic studies (22,321) and their contributions to the total variance are shown in † 2 1 2 1 Figure A23 . While J1 , J24 and J24 all show predominant motion along Jtwist, J1 shows predominant motion along Jroll. In addition, projections of the unbiased trajectories onto the two principal modes reveal all junctions except J11 show movement to left-handed configurations (negative twist) during the various simulation replicates (Figure 2.5C). This sequence- dependent configurational heterogeneity is consistent with previous NMR studies performed on various permutations of junction core sequence (62). Enhanced sampling along the Jtwist eigenvector, obtained from PCA, was performed using the ABF algorithm to determine the PMFs along this reduced coordinate in the antiparallel regime interval of [-90˚, +90˚] (55) in a new set of biased simulations. A minimum sampling of 5 ns was performed in each bin as this ensures that the unbiased degrees of freedom have de-correlated, where the slowest relaxation † † time, that of Jroll, is ~4 ns (Figures A24 and A25 ). The PMFs show that each junction has a 1 2 unique flexibility and handedness along Jtwist where J1 prefers to be right-handed, J1 shows equal preference for right- and left-handedness, J241 prefers being left-handed, and J242 prefers being right-handed (Figure 2.5D). The J1 isomers have very different average rolls along the

Jtwist coordinate, while the J24 isomers have fairly similar roll profiles. This result is significant as even in the presence of equal energy isomers (J24), there can be unique sequence-dependent global dynamics.

50

Figure 2.5. Junction stacking interactions confer unique global system dynamics. (A) The eigenvector components of the highest variance mode (Mode 1) are shown as gray arrows on a rendering of the average meta-ensemble backbone structure, colored for isomer 1 (top) and isomer 2 (bottom). (B) The eigenvector components of the second highest variance mode (Mode 2) are shown as gray arrows on a rendering of the average meta-ensemble backbone structure, colored for isomer 1 (top) and isomer 2 (bottom). (C) Projections of the unbiased simulations onto these eigenvectors are shown for J1 (top row) and J24 (bottom row). For each system projection isomer 1 is shown on the left and isomer 2 is shown on the right. Contours are shaded according to free-energy values calculated relative to the most populated 2 bin for each system; bin areas are 0.04 Å each. Corresponding Jtwist values are shown in orange for Mode 1. (D) Potentials of mean force, from independent ABF simulations to (C), calculated along Mode 1 for isomer 1 (black) and isomer 2 (orange) are shown for J1 (top, left) and for J24 (bottom, left); the shaded regions represent the error calculated with Equation (2.13). The average value of Mode 2 in each sampling bin for isomer 1 (black) and isomer 2 (orange) are shown for J1 (top, right) and for J24 (bottom, right); the shaded regions represent the per-bin standard deviations.

2.4 CONCLUSIONS

Analyses of explicit solvent MD simulations of a reference J1 junction sequence, as well as duplex, nicked-duplex, single-crossover, and junction-core sequence analogues, were used to

51 demonstrate that asymmetric immobile 4WJs deviate from canonical B-form DNA, both in structure and, to a larger degree, flexibility. The chemical topology and local base-pair sequence were shown to contribute to these structural changes both in the 4WJ motif and in its topological variants. Free-energy analyses showed that stacking, geometric, and electrostatic factors all contribute to the energy difference between stacked junction isomers and that these contributions can come from both core and non-core bases. The contribution of electrostatic factors to the free-energy differences helps to explain the sensitivity of isomer bias to Mg2+ concentration observed by Hyeon and co-workers during ionic pulsing experiments. Finally, enhanced sampling MD revealed that junction core base stacking sequences exhibit unique and significant global junction dynamics that are independent of isomerization energy differences. Of 1 particular interest was the observation that Jtwist is not the fundamental motion of J1 , with Jroll being the major source of variance in the simulations. This may very well be the source of its significant improvement of the tweezer nanoreactor’s planarity and function in Chapter 3. Several tentative sequence design rules for larger DNA assemblies can be garnered from the results contained here. Firstly, the natural deformabilities of the various B-form base- pair steps are exacerbated at or near nanotechnology-relevant structural motifs. As such, if rigidity of a nanostructure is desired, the flanking sequences of these motifs should be selected from the more rigid purine-pyrimidine tetranucleotide sequences; should flexibility be desired, the pyrimidine-purine tetranucleotide sequences might be preferable. At the core, certain steps appear to be made more deformable as a consequence of geometry, irrespective of sequence, and so energetically it may be best to select relatively flexible bases for these locations, i.e. the pyrimidine-purine steps, with rigid bases at the other core locations; this hypothesis will require further testing, ideally with free energy perturbation (FEP) methodologies. In Chapter 3, the impact sequence effects have on the behaviour of actuated nanodevices will be explored.

52

CHAPTER 3

RATIONAL DESIGN OF DNA-ACTUATED ENZYME NANOREACTORS GUIDED BY SINGLE MOLECULE ANALYSIS

The control of enzymatic reactions using nanoscale DNA devices offers a powerful application of DNA nanotechnology uniquely derived from actuation. However, previous characterization of enzymatic reaction rates using bulk biochemical assays reported suboptimal function of DNA devices such as tweezers. To gain mechanistic insight into this deficiency and to identify design rules to improve their function, here we exploit the synergy of single molecule imaging and computational modelling to characterize the three-dimensional structures and catalytic functions of DNA tweezer-actuated nanoreactors. Our analysis revealed two important deficiencies – incomplete closure upon actuation and conformational heterogeneity. Upon rational redesign of the Holliday junctions located at their hinge and arms, we found that the DNA tweezers could be more completely and uniformly closed. A novel single molecule enzyme assay was developed to demonstrate that our design improvements yield significant, independent enhancements in the fraction of active enzyme nanoreactors and their individual substrate turnover frequencies. The sequence-level design strategies explored here may aid more broadly in improving the performance of DNA-based nanodevices including biological and chemical sensors.

†Supplementary information is located in Appendix B: DNA sequences, assembly, and bulk characterization of all tweezers; details of smFRET characterization of tweezers; additional AFM images; detailed results of MD simulations; bulk measurement of tweezer-scaffolded G6pDH activity; and single molecule measurements of tweezer-scaffolded G6pDH activity.

The work in this chapter is currently under review at Nanoscale as: Soma Dhakal, Matthew R. Adendorff, Minghui Liu, Hao Yan, Mark Bathe, Nils G. Walter, “Rational design of DNA-actuated enzyme nanoreactors guided by single molecule analysis”

Author contributions S.D. designed, performed and analyzed the smFRET and single molecule enzyme assays, M.R.A. designed, performed and analyzed the MD simulations. M.L. performed and analyzed the enzyme-tweezer assembly, bulk enzyme assays, and AFM imaging. S.D., N.G.W., M.R.A. and M.B. conceived of redesigning the tweezer HJs. All authors wrote the manuscript.

53

3.1 INTRODUCTION

Supramolecular DNA nanodevices driven by external signals such as strand displacement or environmental chemical cues enable actuation of nanoscale reactions with precise spatial control (45,105,106,341-346). DNA tweezers are such prototypical nanomachines that enable actuation in response to their environment for applications ranging from DNA-templated chemical coupling reactions (347), pH-sensing to monitor cellular trafficking (348), actuated protein capture (349), and enzyme nanoreactors (14). While these nanodevices have potentially broad utility, their performance is often found to be suboptimal when compared with idealized expectations. However, the field of DNA nanotechnology is largely driven by the need for improved engineering rather than molecular understanding to inform device design. For example, while a glucose-6-phosphate dehydrogenase (G6pDH)/NAD+ enzyme/cofactor pair at the ends of the two arms of the DNA tweezer-actuated enzyme nanoreactor can be activated and inhibited repeatedly, the relative enhancement between the two states was previously found to be under 6-fold (14). In an effort to understand and resolve the basis of this relatively modest enhancement observed in bulk solution, we turned to single molecule imaging and analysis as a powerful means for studying nanoscale structure-function relationships in the underlying DNA tweezer (350,351). The field of DNA nanotechnology broadly utilizes DNA nanodevices comprised of single and double crossover motifs composed of Holliday junctions (HJs), for which a systematic study of structure-function relationships is currently lacking. The DNA tweezer is archetypal in that its two arms are each composed of two HJs that are connected by a single HJ hinge (Figure 3.1a). Combining single molecule fluorescence resonance energy transfer (smFRET), atomic force microscopy (AFM), and MD simulations, we here have dissected the sequence-level origins of the sub-optimal performance of our DNA tweezer-actuated enzyme nanoreactors. Introducing a structure-guided rational design strategy, we improved tweezer closure and tweezer-actuated enzyme function by implementing modifications to both the hairpin actuator and architectural HJ elements. Most significantly, we found that the specific sequences of the HJs significantly modulate the inter-arm distance and conformational heterogeneity of the tweezer, which both can be reduced by a simple sequence redesign. These rational design strategies provide general guidelines for improving DNA nanodevices for an enhancement of their performance in future applications.

54

3.2 RESULTS AND DISCUSSION

The original DNA tweezer (14,349) is composed of nine DNA oligonucleotides (Figure 3.1a, Tables S1-S6† in Appendix B.1) that self-assemble to form two tweezer “arms” composed of DX motifs connected by a single immobile HJ ‘hinge’. A single-stranded actuator element bridges the middle of the two arms and cycles between double-helix and hairpin conformations upon addition of fuel and antifuel strands that open and close the tweezer, respectively (Figure 3.1a). To characterize tweezer actuation and quantify its closed-state conformation, we used smFRET to monitor surface-immobilized single tweezers by total internal reflection fluorescence microscopy (TIRFM) upon attachment of a Cy3 and Cy5 fluorophore pair to the ends of the tweezer arms via deoxy(d)T3 and dT4 tails, respectively (Figure 3.1a, see Experimental and Appendix B.2 for smFRET experiments). The FRET pair was thus situated near the attachment sites of enzyme and cofactor to estimate their distance via the FRET efficiency. The initial construct contained an actuator element with three- (3T) spacers flanking each side of a 3-base pair (3bp) stem capped by a 13-nt loop(14,349) (Figure 3.1a and S1†). While the opened-state showed a single, zero-FRET conformation (Figure 3.1b), closure of the 3T3bp tweezer typically yielded two distinct FRET conformations of approximately 0 and 0.2, † corresponding to inter-arm distances (IADFRET) of >10 and ~6.3 ± 0.2 nm (Figure B7 , see Experimental for IAD estimation), respectively, that only slowly interconverted (Figure 3.1c and S8†). The zero-FRET sub-population diminished upon increase in antifuel strand concentration, identifying it as tweezers remaining unclosed despite the presence of excess antifuel strand (Figure B9†). AFM quantitatively supported the notion that the 3T3bp tweezer can be closed only to an IAD of ~6.3 nm (Experimental, Figure B18† in Appendix B.3). To gain sequence-level insight into possible structural origins for the large IAD in the closed-state, we performed all- atom MD simulations, as described in Experimental and Section B.4, of a model 3T3bp tweezer in solution under experimental conditions. The 13-nt closing loop of the hairpin actuator was excluded in all MD simulations due to its long equilibration time. However, we observed that the included hairpin stem remains fully formed throughout our simulations, indicating that excluding its closing loop does not make a significant difference in the IADMD observed. Results revealed that the tweezer arms splay in a three-dimensional manner that is facilitated by the long and flexible three-thymine spacers that bridge the actuating hairpin, resulting in a predicted mean † IADMD of 5.3 nm (Figure 3.1d, Movie SM1 ). This structural origin of the observed incomplete closure offers a straightforward explanation for the limited bulk activity enhancement observed for an enzyme juxtaposed with its cofactor at the ends of the two arms of the nanoreactor (14)

55

(Figure B25†, S26† and Table B10†).To force improved closure of the tweezer, we redesigned the 3T3bp architecture by eliminating the thymine spacers and hairpin entirely in a 0T0bp construct. While this redesign was anticipated to bring the tweezer arms into the closest possible proximity upon closure, smFRET characterization revealed even poorer closure † (IADFRET = 6.8 ± 0.1 nm, Figure B10 ) that MD simulations indicated is due to inter-arm twisting † from crossover induced in this square-lattice design (IADMD = 4.7 nm, Figure B21 , Movie SM2†). Similar results of DNA distortion in DNA nanodevices designed on a cubic lattice that does not match the regular helical pitch of 10.5 bp/turn of B-form DNA have been observed in other contexts (2,352), but to the best of our knowledge never at the single HJ level, as observed here.

56

Figure 3.1. Single molecule characterization of DNA tweezer conformation. (a) Experimental strategy for smFRET. Top panel depicts the surface immobilization of biotin-modified DNA tweezer on a PEG/ coated quartz slide. The hairpin actuator is boxed and other sturctureal elements are highlighted. The bottom panel depicts the opening and closing cycles of a DNA tweezer actuated by fuel and antifuel strands, respectively. The sequence details of the hairpin actuator ‘3T3bp’ in the bottom panel summarize our nomenclature of the tweezer used in this study. (b) Typical smFRET time traces for the opened-state tweezer. The overlaid black and cyan traces are nfilter and HMM idealized data, respectively. (c) Typical smFRET time traces for static (top panel, 72% molecules) and dynamic (bottom panel, 28% molecules) closed-state tweezers, determined by HMM analysis over 100 s total observation time. Only molecules with active FRET acceptor (confirmed at the end of each experiment by direct excitation with a 640-nm red laser) were used for analysis. The right-hand panels in b and c depict the corresponding FRET probability distributions for the molecules shown. Imaging was performed at 100 ms

57 camera integration time. (d) Two representative orthogonal views of a typical molecular conformation from

MD, corresponding to an IADMD of 5.3 nm (left), with the probability distribution of mean, µ3T3bp (right).

The IADMD of 5.3 nm is highlighted by double-sided arrows in the orthogonal views of the tweezer.

To relieve this lattice strain we instead introduced single-thymine spacers intervening each tweezer arm of the actuating hairpin (Figure B1†), which has previously been suggested to mitigate inter-duplex strain.(31,353) MD simulation confirmed improved closure of a tweezer containing single-thymine spacers (1T0bp) that relieved inter-arm distortion present in the 0T0bp construct (Figure 3.2a, S21† and S22†, Movies SM2† and SM3†). Experimental implementation using a 1T3bp tweezer redesign confirmed substantial improvement of the closed-state, as evidenced by 30% higher mean FRET (Figure 3.2b). However, a significant low-FRET population in the closed-state remained together with conformational heterogeneity and slow interconversion dynamics (Figure B11†). We interpret the observed slow FRET fluctuations as interconversion dynamics of the tweezer, rather than effects on the dye photophysics, since they varied with the tweezer design (discussed later; compare, for example, the FRET dynamics in Figure B13† and S15†) without any obvious change in the local dye environment. Because DNA hairpins are known to exhibit heterogeneity in their conformational dynamics, next we tested whether ineffective closing and actuation of the tweezer could be improved by redesign of the actuating hairpin itself. In particular, we hypothesized that the 3bp duplex stem may be insufficient to retain the tweezer in the closed-state, so we extended it to a 4bp or 5bp stem to enhance its predicted thermodynamic stability (Figure 3.2b, c and S1†). Both the 1T4bp and 1T5bp architectures indeed resulted in improved closure, manifest in shifts towards higher FRET populations in the closed-state that were well supported by AFM (Figure 3.2b-d). However, the 1T5bp design exhibited considerably higher heterogeneity than the other constructs (Figure B13†) and did not fully open upon fuel-strand addition (Figure 3.2b), presumably because the fuel-strand less efficiently competes with the thermodynamically most stable 5bp hairpin. Thus, the 1T4bp design was optimal amongst the set explored, although conformational heterogeneity was not eliminated by this actuator element (Figure 3.4b and S12†).

58

Figure 3.2. Optimization of the hairpin actuator element improves tweezer closure. (a) Two orthogonal views of a typical conformation corresponding to the mean IADMD of 3.3 nm of the lower subpopulation

(highlighted by black double-arrows) and the full IADMD distribution (right), predicted using MD of the tweezer with only a single-thymine spacer and no hairpin (1T0bp), showing a substantially reduced IADMD compared with the three-thymine spacer (3T3bp) and zero-thymine (0T0bp) spacer. (b) FRET probability distributions of closed-state and reopened tweezers containing different hairpin linkers―‘3T3bp’, ‘1T3bp’, ‘1T4bp’, and ‘1T5bp’. (c) AFM visualization of opened-state (top panel) and closed-state (bottom panel) tweezers. Scale bar is 100 nm. The zoom-in images of the tweezers (42 nm × 42 nm are shown) are placed in the same order as they (highlighted by boxes) appear from left to right in the field of view. Please see Figure B18† and S19† for an expanded field of view. (d) IAD distributions of both closed-state

(solid lines) and opened-state (dotted lines) tweezers as measured from AFM images (IADAFM, top panel) and closed-state tweezers measured by smFRET (IADFRET, bottom panel). The gray background guides the eye for comparing IADs of closed tweezers measured by AFM and smFRET. The populations to the right of the dotted line represent opened-state tweezers observed by AFM (top panel), whereas the corresponding IADFRET is beyond the sensitivity of smFRET (and observed as zero-FRET; bottom panel).

Conformational heterogeneity and long-timescale interconversion dynamics have been characterized previously for immobile HJs (62). Specifically, single HJs were observed by smFRET to exhibit differential thermodynamic preference for their two distinct isomeric states, iso-I and iso-II, with slow interconversion rates that depend strongly on ionic conditions

59

(60,354). Indeed, smFRET of HJ1 isolated from our tweezer and fluorophore labeled to distinguish the isomers (Figure 3.1a and B14†) showed a preference for adopting iso-II that would be detrimental to the desired closing of the tweezer arms (Figure B14†). Immobile HJ sequences employed in structural DNA nanotechnology have previously been enumerated (53), with the J1 sequence of Seeman and co-workers exhibiting a strong preference for the preferable iso-I conformation (355). While numerous core sequences of HJs have been investigated conformationally in isolation (60,354,356), as done here for HJ1 (Figure B14†), the impact of these dynamics on higher-order programmed DNA assemblies composed of multiple such junctions have not been analyzed systematically. In fact, while the original tweezer design consisted of five immobile HJs (Figure 3.1a), none were chosen to be the J1 sequence in its stable isomer. We therefore hypothesized that replacing the HJs with the stable J1 junction designed to be in its preferred proximal iso-I state may further improve tweezer closing. Because HJ1 serves as the tweezer hinge, we first redesigned it to be in the iso-I conformation, which significantly reduced the heterogeneity in the tweezer population (Figure 3.3a). Subsequent redesign of all five HJs resulted in an additional substantial shift towards † † higher mean FRET (the IADFRET decreased to 4.4 ± 0.2 nm; Figure 3.3a, S15 and S16 ). As a negative control, we redesigned HJ1 to adopt its iso-II isomer, which resulted in a shift to a low- FRET region with increased conformational heterogeneity (Figure 3.3a), as anticipated. We next imaged both the opened- and closed-state of the fully redesigned tweezer with AFM (Figure

3.3b) and found that the mean IADAFM of the closed tweezers decreased to 4.1 ± 0.3 nm (Figure B20†). MD simulation indicated that redesign of all five HJs resulted in a planar tweezer conformation that also brings the arms into closer proximity (Figure 3.3c, S23† and S24†, Table † † B9, Movies SM4 and SM5 ). This flattening correlates with Jtwist values for the junctions adopting distributions that are on average closer to zero (Figure 3.3d). MD simulations additionally suggested that the dye-linked dT tails play an important role in stabilizing the high- FRET conformation via an inter-tail interaction across the arms (Figure B24†). This interaction was enhanced in the construct with the redesigned HJs, suggesting cooperativity with flattening of the overall tweezer conformation. Addition of excess of dA5 oligonucleotides in solution resulted in a shift towards lower FRET (Figure B17†), consistent with this hypothetical inter-tail stabilization. To investigate how the preceding architectural redesign improves catalytic function, we coupled the two arms of three representative tweezer designs – 3T3bp (original), 1T4bp (actuator-optimized), and 1T4bp-5HJs-iso-I (1T4bp with all five HJs redesigned to adopt iso-I) – with the enzyme G6pDH and its cofactor NAD+ to generate DNA-tweezer actuated nanoreactors

60

(14). Utilizing a PMS/resazurin coupled reaction we monitored the G6pDH activity in bulk solution by fluorescence (Figure 3.4a, Appendix B.5). Compared to the original 3T3bp nanoreactor, 1T4bp enhanced the reaction rate

Figure 3.3. Redesigned Holliday junctions improve tweezer closure. (a) FRET probability distribution of redesigned 1T4bp tweezers. The HJ hinge was redesigned to favor iso-I (1T4bp-HJ1-iso-I)), iso-II (1T4bp-HJ1-iso-II), or all five HJs were redesigned to favor iso-I (1T4bp-5HJs-iso-I). (b) AFM visualization of opened- and closed-state tweezers with all five HJs redesigned. Scale bar is 100 nm. The zoom-in images of the tweezers (42 nm × 42 nm are shown) are placed in the same order as they (highlighted by boxes) appear from left to right in the field of view. (c) Two orthogonal views of a typical conformation corresponding to the mean IADMD of 1.1 nm of the lower subpopulation (highlighted by black double- arrows) of the ‘1T3bp-5HJs-iso-I’ tweezer (left) and the corresponding IADMD distribution compared with the three earlier designs as indicated (right; dotted lines represent their means, µ). (d) Reduction in individual Jtwist angles in tweezer 1T3bp-5HJs-iso-I (red) compared with the original junction design 1T3bp

61

(black; dotted lines represent the means, µ, and solid, thick black lines highlight a Jtwist angle of 0˚, which correspond to the planar conformation). Compared with the original design, the mean Jtwist angles of all but HJ2 are shifted (arrows) closer to 0˚ in the redesigned tweezer. by ~2-fold, whereas 1T4bp-5HJs-iso-I enhanced the rate by ~3-fold (Figure 3.4b, Figure B25† and S26†). To probe the mechanism underlying these enhancements, we optimized our fluorescent assay to reach single molecule sensitivity when detected by TIRFM (Figure B27† in Appendix B.6). Since the diffusion of resorufin is rather fast, 10% (w/v) PEG8000 in the imaging buffer (see Experimental for an estimation of the diffusion coefficient and elapsed time of resorufin) was used to detect single catalytic turnovers at 35 ms camera integration time in the form of single fluorescence spikes (357) (Figure 3.4c and S27†). We note that due to the low concentration of resazurin (50 nM, Table B11†) used to avoid a high background, a slow turnover rate and low, easily detected spike frequency are expected (Figure B27† and S28†). Compared to a control without substrate, the fluorescence spikes were noticeably more frequent (Figure 3.4c). After applying an empirical intensity threshold to distinguish from noise (Figure B27†), we found that only a subset (~20%) of 3T3bp nanoreactors were active (Figure 3.4d). This observation is likely due to heterogeneity in enzyme orientation on the tweezer arm due to non-specific lysine-DNA conjugation. We note that controlling the enzyme orientation on the nanodevice is not trivial and will have to await future developments. In addition, due to the low concentration of resazurin used (50 nM, Table B11†) and our empirically informed intensity threshold to determine the fraction of active molecules, it is likely that we have excluded some minimally active nanoreactors. However, our relative comparison of the fraction of active among different tweezer designs is independent of such technical limitations. Accordingly, increasing numbers of nanoreactors became active with improved closure: 28% for 1T4bp and 32% for 1T4bp-5HJs-iso-I (Figure 3.4d). The average spike frequency per molecule was similar (0.021 s-1) for the 3T3bp and 1T4bp tweezers and increased to 0.024 s-1 for 1T4bp- 5HJs-iso-I (Figure B29†). We further discovered bursting among these spikes that we dissected by spike train analysis (358) (Figure B29†), plotting the cumulative population distribution of inter-spike intervals (ISIs) for each tweezer design normalized to the total fraction of active nanoreactors (Figure 3.4e). These data were best fitted with double-exponentials, revealing (at least) two populations: one with a relatively short average ISI (τ1), the other with a relatively longer average ISI (τ2). In accord with previous single molecule studies of enzymes (357,359,360), we interpret these two populations as distinct enzyme conformational sub-states. The population with short ISIs corresponds to enzymes in an optimal catalytic conformation

62

(Figure 3.4f), whereas the population of longer ISIs corresponds to enzymes in non-optimal, † catalytically less active conformations (Figure B30 ). Strikingly, the mean short ISI (τ1), or wait time between catalytic events of highly active enzymes, did not change from the 3T3bp to the 1T4bp tweezer, but decreased (from 2.7 ± 0.2 s for 1T4bp to 2.0 ± 0.2 s for 1T4bp-5HJs-iso-I) with the redesigned HJs of 1T4bp (Figure 3.4f). Our single molecule observations thus suggest that we have discovered two independent mechanisms for improving nanoreactor performance: (i) a greater number of nanoreactors reach the critical IAD distance needed for activating their coupled enzyme with its cofactor due to improved closure upon optimized design of the actuator hairpin; and (ii) more frequent substrate turnover is achieved when stabilizing interarm tail-tail interactions are implemented that lead to straighter tweezer arms resulting from improved HJ sequence design.

63

Figure 3.4. Characterization of enzyme nanoreactor activity in bulk and at the single molecule level using a G6pDH/NAD+ enzyme/cofactor pair. (a) Detection of enzyme activity is enabled by a PMS/resazurin coupled assay in which the NAD+ is reduced to NADH by G6pDH, followed by PMS-catalyzed electron transfer from NADH to resazurin to produce the fluorescent resorufin. (b) Relative bulk activity of the G6pDH/NAD+-coupled original (3T3bp), linker-optimized (1T4bp), and junction-redesigned (1T4bp-5HJs- iso-I) tweezers. (c) Time traces for single nanoreactors with different tweezer designs. Each resorufin conversion leads to a single fluorescent spike. Five random 5-min-long traces from each tweezers were concatenated. The alternating gray and light-gray pattern highlights individual molecules in the concatenated traces. The horizontal dotted-lines represent an intensity threshold (chosen as mean + 8×standard deviation, SD). (d) Fraction of all enzyme nanoreactors of a particular tweezer design

64 producing fluorescent spikes above the threshold (≥4 spikes above the dotted line in panel c). For the “Neg. Control” lacking G6p, an average from all three tweezer designs is reported. (e) Cumulative frequency of the inter-spike interval (ISI). These data are best fitted with double-exponential increase functions, yielding two time constants (τ1 and τ2). (f) Comparison of the short ISIs (τ1) extracted from the plot in panel e, reflecting the most active enzyme sub-states. Please note that the negative control shows a ~2.5-fold longer ISI from spontaneously generated resorufin. The 1T4bp tweezers with all HJs redesigned (1T4bp-5HJs-iso-I) demonstrated the shortest ISI (*P < 0.05 and ** P < 0.05).

3.3 EXPERIMENTAL

3.3.1 Materials

3.3.1.1 Enzyme

Glucose-6-phosphate dehydrogenase (G6pDH, Leuconostoc mesenteroides) was purchased from Sigma (St. Louis, MO, USA).

3.3.1.2 DNA oligonucleotides

All modified and unmodified DNA oligonucleotides were ordered from Integrated DNA (IDT, Coralville, IA, USA). Cy3- and Cy5-labeled oligonucleotides were HPLC- purified by the manufacturer.

3.3.1.3 Crosslinking reagents

N-Succinimidyl 3-(2-pyridyldithio)-propionate (SPDP) was ordered from Pierce. Disuccinimidyl suberate (DSS), N,N-Diisopropylethylamine (DIPEA), and Dimethyl sulfoxide (DMSO) were purchased from Sigma. NAD+ β-Nicotinamide-N6-(2-aminoethyl) dinucleotide (6AE- NAD+ or AE-NAD+) was ordered from BIOLOG (Bremen, Germany). Unmodified NAD+ was ordered from Sigma.

3.3.1.4 Substrates and activity assay reagents

Glucose-6-phosphate (G6p), resazurin (RESA), and phenazine methosulfate (PMS) were purchased from Sigma.

65

3.3.2 Design, assembly, and bulk characterization of DNA tweezers

3.3.2.1 DNA tweezer design

The detailed sequences of the DNA tweezers are shown in Tables B1-S5†. The computer program of Tiamat (http://yanlab.asu.edu/Resources.html) was used to facilitate the structure design. The design is based on one previously described (14) except that TB1 is labeled with biotin at 3' end.

3.3.2.2 Denaturing PAGE purification of oligonucleotides

Oligonucleotides purchased from IDT were purified using the same method as described (361).

3.3.2.3 DNA tweezer assembly

The DNA strands constituting each DNA structure were mixed in 1×TAE-Mg2+ buffer (40 mM Tris, 20 mM acetic acid, 2 mM EDTA and 12.5 mM magnesium acetate, pH 8.0) to reach a final concentration of 0.5 µM per strand, except for the NAD+-conjugated DNA strands, which were added to the mixture at a final concentration of 0.75 µM, and the fuel strands, which were added at a final concentration of 1 µM. All samples were annealed in an Eppendorf Mastercycler using an annealing protocol as described (14). The formation of the DNA structures was characterized by native polyacrylamide (PAGE). For single molecule measurements, tweezer samples were further purified by running a 5% native PAGE for 2 h at a constant voltage of 200 V; the major bands representing assembled DNA tweezers were visualized by UV-shadowing and cut from the gel, chopped into small pieces, and incubated for 1 h in 1×TAE- Mg2+ buffer. The DNA nanostructures were extracted from the gel pieces by centrifugation at 8,000 rpm for 10 min using a Costar Spin X filtration device (Corning, cellulose acetate membrane with 0.22 µm size). DNA nanostructures were collected in 1×TAE-Mg2+ buffer and concentrations were measured by UV absorbance at 260 nm using the extinction coefficient (Table B6†) estimated by the IDT analyzer (http://biophysics.idtdna.com). Samples for bulk measurements were assembled with regular T3 and TP8 strands without fluorescent dyes. Samples for single molecule measurements were assembled with Cy3/ Cy5- labeled strands.

66

3.3.2.4 Isolated Holliday junction (HJ) assembly

The isolated Holliday junction “hinge” (HJ1) was assembled similarly to the DNA tweezers. The detailed sequence is provided in Table B5†. The T5 strand was modified with biotin to enable surface capture for single molecule experiments. In Labeling Scheme-I, the T3 and T6 strands were modified with Cy3 and Cy5, respectively. In Labeling Scheme-II, the TP2 and T6 strands were modified with Cy3 and Cy5, respectively.

3.3.2.5 Preparation, purification, and characterization of protein-DNA conjugates

Preparation, SPDP crosslinking conjugation, FPLC purification and characterization steps of protein-DNA conjugates were the same as described (361). The protocol was optimized to purify protein with two conjugated DNA oligonucleotides.

3.3.2.6 Preparation, purification and characterization of NAD+-DNA conjugates

The method for the conjugation and HPLC purification was similar to that reported in previous publications (14,361).

3.3.2.7 Protein-DNA tweezer assembly and purification

As described in a previous publication (14), a 3-fold molar excess of -conjugated G6pDH was added to the pre-annealed tweezer structures and mixed well. Proteins were assembled by using a 1-h annealing protocol: the temperature was decreased from 37 ℃ to 10 ℃ and held at 4 ℃ using the established protocol (14). Excess G6pDH-oligonucleotide was removed using monomeric avidin resin (Pierce) and biotin-labeled tweezers; the protein was eluted with 2 mM biotin at a recovery yield of ~30%.

3.3.2.8 Native gel characterization of the purified assembly

3% non-denaturing PAGE was performed at room temperature over 2-2.5 h at a constant voltage of 200 V, and the gel subsequently stained with ethidium bromide. The purified protein- DNA tweezers were quantified by absorbance at 260 nm and concentrations were calculated with estimated extinction coefficients (Table B6†).

67

3.3.3 Single molecule FRET (smFRET) characterization of DNA tweezers

3.3.3.1 Preparation of tweezers for smFRET

DNA tweezers containing Cy3 and Cy5 fluorophores on separate arms were prepared to monitor the opening and closing by smFRET. The tweezer was biotinylated on the TB1 strand to allow immobilization on streptavidin-coated microscope slides for single molecule study. Please see above for the tweezer assembly protocol.

3.3.3.2 Preparation of microscope slides and execution of smFRET experiments

PEGylated quartz slides were prepared using previously published procedures (362). Briefly, the slides were reacted with aminopropyltriethoxysilane (APTES) in acetone for 30 min to generate an amino-functionalized surface followed by a 3-h reaction with a 10:1 mixture of succinimidyl ester–functionalized O-methyl-PEG and biotin-PEG. The remaining unreacted amines on the quartz slides were removed by adding sulfo-disuccinimidyl tartrate for 30 min. A single flow-channel per slide was assembled using double-sided sticky tape and a glass coverslip. We added a solution of 0.2 mg ml-1 streptavidin in 1×T50 buffer (50 mM Tris-HCl, pH 7.5, 50 mM NaCl) to the channel and incubated for ~3 min at room temperature to capture DNA tweezers via the biotin/streptavidin interaction.

3.3.3.3 smFRET analysis

Unless otherwise noted, the doubly labeled DNA tweezers (~5 nM) were incubated with ~0.5-1 µM antifuel or fuel strand in 5 µL 1×TAE-Mg2+ buffer for ~5 min at room temperature to close/reopen the tweezer. The sample was further diluted to a concentration of 20-100 pM in imaging buffer (1×TAE-Mg2+), flowed into the channel and incubated for 2-3 min. Excess tweezer and fuel/antifuel strands were removed by flowing 300-400 µL imaging buffer through the channel. For the polyA experiment, the FRET movie of the closed tweezer was recorded with 1 µM polyA oligonucleotide added to the imaging buffer. The imaging buffer containing oxygen scavenger system (OSS) composed of 50 nM protocatechuate dioxygenase (PCD, Sigma), 5 mM protocatechuate (PCA, Sigma) and 2 mM Trolox (Acros) to retard photobleaching.(361) In the total internal reflection microscope (TIRFM), Cy3 was directly excited using a 532 nm laser (CrystaLaser CL532-050-L, 50 mW), and emission from Cy3 and

68

Cy5 fluorophores was simultaneously recorded using an intensified CCD camera (iPentamax, Princeton Instruments) at 100 ms time resolution. The presence of an active FRET acceptor was confirmed at the end of each experiment by the excitation with a 640-nm red laser (Coherent CUBE 635-25C, 25 mW) as described (361). The smFRET analysis of isolated Holliday junction was performed similarly except that a smaller field of view (80 × 256 pixels) was imaged to achieve 16 ms time resolution. Figure B6† shows representative fields of view from the smFRET experiments of different tweezers. smFRET time traces were extracted from the raw movie files using custom- written IDL (Research Systems) and analyzed using Matlab (The Math Works) scripts, as previously described (361). The smFRET trajectories were screened for subsequent analysis based on the following expected features (361): single-step photobleaching; total fluorescence of Cy3 and Cy5 exceeding 300 counts/frame; and evidence of both Cy3 and Cy5 signal. A

FRET value was calculated as IA/(IA+ID), where IA and ID represent the background corrected fluorescence intensities of the acceptor (Cy5) and donor (Cy3) fluorophores, respectively (362). The raw fluorescence intensity traces were processed using a non-linear filtering program adapted from published approaches (363,364). Such nfilter analysis is useful to resolve underlying populations in broadly distributed FRET data that may otherwise be masked by noise. A set of casual and anti-casual predictors [2 4 6 8] running forward and backward in time, respectively, were used to smooth the donor and acceptor traces. A window size (M) of 20 was used for the analysis (363). Smoothed FRET traces were then calculated from smoothed donor and acceptor intensities as described above. FRET histograms were constructed from the first 1,000 frames of the nfiltered data of each molecule. The mean FRET values were obtained by Gaussian fitting of the FRET histogram. Hidden Markov Modelling (HMM) of smFRET traces was performed for the first 100 s observation time as described (362). We used a two-state model for all tweezers except for 1T5bp (due to its inter-conversion to three FRET states, a three-state model was used instead) to idealize the FRET data. Transition Occupancy Density Plots (TODPs) were then obtained from the idealized data using custom written Matlab scripts (362). The HMM analysis was also used to count static/dynamic molecules.

For estimation of IAD from smFRET data (IADFRET), please see ‘Inter-arm distance (IAD) estimations’ below.

69

3.3.4 Atomic force microscopy (AFM) imaging of DNA tweezers

3.3.4.1 AFM imaging protocol

2 µL samples were deposited onto a freshly cleaved mica surface (Ted Pella, Inc.) and left to adsorb for 2 min. 50 µL of 1× TAE-Mg2+ buffer was added to the sample, followed by 0.5 µL 100 mM Ni2+ to enhance DNA adsorption onto the mica. The samples were scanned using a SCANASYST-FLUID+ probe (Bruker, Inc.) in “Scanasyst in fluid” mode on a Multimode 5 AFM with Nanoscope V controller (Bruker Corporation). All AFM imaging was performed in solution at room temperature. For estimation of IAD from AFM images (IADAFM), please see ‘Inter-arm distance (IAD) estimations’ below.

3.3.5 Molecular dynamics (MD) simulations of DNA tweezers

3.3.5.1 Preparation of all-atom tweezer models

The sequence topology of the tweezer base-plate, comprising all bps not in the hairpin or dye linker motifs, was defined using the software caDNAno (42). The three-dimensional mechanical ground-state conformation was subsequently calculated and atomic structure generated using the finite element based modelling framework CanDo (21,43), assuming anti-parallel Holliday junctions (HJ). DNA duplexes were assumed to have geometrical and mechanical properties consistent with B-form DNA. Axial rise per bp was equal to 0.34 nm, duplex diameter 1.85 nm in order to ensure crossovers were consistent with known HJ atomic structure13, and 10.5 bp/turn was assumed. The axial stretching, bending, and torsional stiffnesses were assumed to be equal to 1100 pN, 230 pNnm2 and 460 pNnm2, respectively (43). The all-atom PDB file generated using the atomic structure generation option in CanDo (22) was subsequently ionized and solvated in preparation for molecular dynamics. Thymine linkers, when present, were added by extending the pertinent strands using DS Visualizer’s Macromolecules Toolkit (365). Details of the modifications to attach the different hairpin motifs are described in Appendix B.4. The all- atom system was immersed in TIP3P (322) water and explicit Mg2+ and Cl- ions were added to neutralize DNA charges and set the simulation cell Mg2+ ion concentration to 12.5 mM, consistent with experimental conditions.

70

3.3.5.2 Explicit solvent MD

All simulations were performed using the program NAMD2 (324) with CHARMM27 force field (168,201) and Allnér Mg2+ parameters (325) with an integration time step of 2 fs and periodic boundary conditions applied in an orthogonal simulation cell. Van der Waals energies were calculated using a 1.2 nm cut-off with a switching function applied from 1.0 to 1.2 nm. The Particle Mesh Ewald (PME) method (326) was used to calculate full electrostatics with a minimum grid point density of one per 0.1 nm. Full electrostatic forces were computed every two time steps (equal to 4 fs) and non-bonded forces were calculated at each time step (2 fs). Simulations were performed in the NpT ensemble using the Nosé-Hoover Langevin piston method (327,366) for pressure control with an oscillation period of 200 fs and a damping time of 100 fs. Langevin forces (198) were applied to all heavy atoms for temperature control (300 K) with coupling coefficients of five per picosecond. All hydrogens were constrained to their equilibrium lengths during the simulation and system configurations were recorded every 1 ps for downstream analysis of coordinates. Energy minimization was always performed on the orthogonal simulation cell prior to dynamics using the conjugate-gradient and line minimizer implemented in NAMD2, first on the solvent and ions alone for 10,000 steps with all nucleic acid atoms spatially constrained, followed by an additional 10,000 steps with all atoms unconstrained. All parameters for the minimization were identical to those used for dynamics. The system was then slowly heated (1 K per 10 ps) to 300 K and the pressure was allowed to equilibrate to 1 atm prior to production run MD.

3.3.5.3 Atomic distance calculations

The distance metric used to monitor temporal tweezer arm fluctuations (IADMD, see ‘Inter-arm distance (IAD) estimations’ below for further details) is defined as the distance between the O5' atoms of the 5' end bases of the duplex segments of the T3 and T8 strands. Distances were calculated using the ProDy (337) suite of trajectory analysis functions. A Gaussian Mixture Model (GMM) was used to fit Gaussian distributions to each subpopulation in the data and the Bayesian Information Criterion (BIC) (367) was used to select the best-fitting number of Gaussians.

71

3.3.5.4 Calculation of HJ Jtwist values from MD simulations

Inter-duplex crossing angles, Jtwist (321) corresponding to the five in-plane scissor motions of each HJ were calculated at each trajectory frame for each tweezer. For each junction, the duplex arms were defined using the five bps flanking the strand crossover site on either side of the junction, for a total duplex arm length of 10 bps. Local base-pair reference frames were fit to every bp of each arm using the program Curves+ (73) to generate a helical axis for each arm. Two linear axis vectors were then generated by fitting a three-dimensional vector through the helical axis nodes. The dihedral angle between the resulting two linear vectors was used to define Jtwist, where a positive value denotes a right-handed conformation and a negative value denotes a left-handed conformation. A GMM and the BIC were again used to fit multiple Gaussians to the subpopulations in the angle distributions from simulation.

3.3.6 Inter-arm distance (IAD) estimations smFRET, AFM and MD each lead to slightly distinct estimates of the IAD due to the inherent differences in the nature of the underlying data (see our description below). Our aim was to make the IADs as comparable with one another as possible, by calculating them as follows. Nevertheless, a small variation (~0.5-1 nm) of the estimated IAD is expected between three different datasets.

3.3.6.1 IAD from smFRET (IADFRET)

The IADFRET (equivalent to the inter-dye distance, R) and the apparent FRET efficiency, Eapp, were calculated as described (368,369) from the equations:

퐼퐶푦5 퐸푎푝푝 = (∅퐶푦5 × 휂퐶푦5) (3.1) 퐼퐶푦5 + 퐼퐶푦3 × (∅퐶푦3 × 휂퐶푦3)

6 −1 퐸푎푝푝 = 푐[1 + (푅/푅0) ] (3.2)

where c = 0.69, R0 = 54 Å, and φ and η signify the fluorophores quantum yields and detector channel efficiencies (368,369) respectively. The donor and acceptor intensities ICy3 and ICy5,

72 respectively, were corrected for leakage of 20% of donor photons into the acceptor channel. Since we compared FRET values among different tweezer designs without altering local environment of the dyes, use of a literature value of R0 is justified.

3.3.6.2 IAD from AFM images

AFM images were analyzed with Veeco NanoScope Analysis software 1.20 (Build R1Sr3.64571). Original images were first flattened to fit each line individually to center data (0th order), remove tilt (1st order) and remove bow (2nd, 3rd order). The Measure tool built in Image

Window was utilized to manually analyze the IADAFM between the inner ends of two arms. Smaller view areas were zoomed in with Data Zoom tool and moved to all scanning areas with the Pan tool. The conformational sub-states of the closed tweezers observed by smFRET (Figure † 3.2b) were not very obvious in the IADAFM histograms (Figure 3.2d and S20 ). This is probably due to surface perturbation of the tweezer conformation due to the tight mica interaction.

However, for a given tweezer, the overall distribution of IADAFM and IADFRET were comparable when the distances were binned equally to 0.5 nm (Figure 3.2d).

3.3.6.3 IAD from MD simulation (IADMD)

The IAD in MD simulation was defined as the distance between the O5' atoms of the 5' end bases on the T3 and T8 strands. The IADMD for each sub-population in a given construct was again determined using GMM and BIC fitting.

In general, we observed shorter a IADMD of tweezers compared to IADFRET and IADAFM, which may be due to the short timescales simulated (~100-200 ns) compared with the smFRET acquisition time (~100 ms), or the presence of poly-thymine tails to tether dyes to the tweezer arms. Notwithstanding, the relative pattern of the IADs among the different tweezers is in agreement with trends observed experimentally.

3.3.7 Bulk measurement of tweezer-scaffolded G6pDH activity

Bulk enzyme activity was measured for both opened- and closed-state tweezers. 100 nM DNA tweezers assembled with G6pDH/NAD+ were assayed with 100 µL of 1mM glucose-6- phosphate, 1 mM phenazine methosulfate and 500 µM resazurin in 1 × Tris-buffered saline

73

(TBS) buffer (50 mM Tris-HCl, pH 7.5, 150 mM NaCl) with no magnesium ions added. The activity was measured by monitoring the fluorescence increase at 590 nm as described(14) and reported in Table B11†. The native gel characterization is shown in Figure B4†.

3.3.8 Single molecule measurement of tweezer-scaffolded G6pDH activity

3.3.8.1 Using microscope slides with fluorescent beads as fiduciary markers

For the single molecule enzyme assay, streptavidin modified PEG slides (prepared as described above) were incubated for ~2 min with neutravidin coated fluorescent beads (FluoSpheres, specifications: 0.04 µm, excitation/emission; 550/605, 1% solids, Invitrogen) at a 1:1 million dilution in 1×T50 buffer, after which excess was flushed away with 400 μL of the same buffer. These fluorescent beads (~5-8 per field of view) were used as fiduciary markers (Figure B27†) to correct for stage/slide drift over time. DNA tweezers carrying the enzyme/cofactor pair at the ends of the arms was injected for surface binding and a FRET movie recorded as described above. During analysis of each movie, the photobleaching donor (Cy3) channel of the initial movie was mapped based on a custom-written MATLAB script to the subsequent enzyme assay movie using the fiduciary beads as reference. This approach allowed us to keep track the locations (x- and y-coordinates) of all individual tweezers in a field of view even after photobleaching the FRET fluorophores, to follow the enzyme turnover over time at the predefined tweezer locations.

3.3.8.2 Imaging single enzyme activity

To measure the activity of tweezer-scaffolded G6pDH, the enzyme assay was performed on the same field of view after photobleaching the fluorophores. A substrate solution was prepared in 1×TBS buffer (pH 7.5) containing 1 mM Mg2+ and 10% (w/v) PEG 8000 (Table B11†). The movie was recorded (35 ms camera acquisition time) immediately after injecting a substrate solution that lacked one of the substrates, glucose-6-phosphate (G6p). To compare the spiking behavior on the same field of view, a complete substrate solution was injected and a movie was subsequently recorded for ~5 min (9,091 frames) using TIRFM. In this single molecule assay, the activity of individual enzymes is observed as fluorescent spikes in the fluorescence-time traces due to formation of fluorescent resorufin through a catalytic cascade reaction (Figure B27†). Fluorescence fluctuations over time were analyzed using a custom-written MATLAB

74 script. The script allowed us to measure the background intensity of single molecule traces and set an intensity threshold to subtract from the raw traces. We used a high threshold (mean + 8×standard deviation, SD) to minimize the chance of detecting a freely diffusing resorufin molecule. Since we typically observed 1-3 spikes above this cut-off intensity in the control experiments ((-)G6p), probably due to non-specific (perhaps photoinduced) formation of resorufin, only molecules with ≥4 spikes were counted as active enzymes and considered for subsequent spike train analysis. Note that a low frequency of fluorescent spikes is expected due to our low concentration of substrate (50 nM resazurin, Table B11†), used to maintain a low background.

3.3.8.3 Estimation of diffusion constant and elapsed time of resorufin in 10% (w/v) PEG 8000

The diffusion coefficient of resorufin in the imaging buffer containing 10% (w/v) PEG 8000 is estimated using the Stokes-Einstein Equation:

푘퐵푇 퐷 = (3.3) 6휋휂푟

-23 2 2 Where the Boltzmann constant kB = 1.3806488 × 10 m kg/s K; absolute temperature T = 298 K; of 10% (w/v) PEG 8000 휂 = 0.0089 kg/ms; and molecular radius r= 0.3×10-9 m . Using these parameters, the diffusion coefficient D is estimated to be 8.18 × 10-11 m2/s. The elapsed time of the resorufin diffusion (푡) is estimated using the following equation:

푥2 푡 = (3.4) 2퐷

Using the mean distance traveled x = 300 nm (the upper bound of TIRF detection distance from the surface); it takes approximately 0.6 ms for a resorufin molecule to diffuse out of the probe volume. However, this calculation is a lower estimate of the elapsed time because the molecular radius was used instead of the hydrodynamic radius, and the influence of other reagents on viscosity such as glucose-6-phosphate (G6p), resazurin and Mg2+ was not considered in the calculation for the sake of simplicity. This dwell time suffices to render the fluorescence signal detectable. By contrast, the diffusion coefficient of resorufin in water (휂 = 0.00089 kg/ms (370)) at 298 K is 8.74 × 10-10 m2/s and the corresponding elapsed time is 0.06 ms. This calculation

75 suggests that the use of 10% (w/v) PEG 8000 slows down the diffusion of resorufin by at least 10-fold, critical for its single molecule detection.

3.3.8.4 Relating spike intensity and catalytic turnover

To identify the relationship between individual fluorescence spikes and the catalytic turnover number, we monitored the spike intensity and fraction of molecules with fluorescence spikes over a wide range of resazurin concentrations (0.5―100 nM, Figure B28†). Fluorescence intensity time traces were recorded on TIRF microscope as described above except for the following changes. The Cy3 labeled DNA modified G6pDH molecules (G6pDH-(5'- TTTTTCCCTCCCTCC)-Cy3) were captured on a PEG-streptavidin coated microscope slide using complementary capture oligonucleotides. For this, the slide was first incubated with 10 nM biotin-modified complementary DNA oligonucleotide (5'-biotin-TTTTTGGAGGGAGGG) for 3 min, followed by 10 min incubation with 20-50 pM protein sample in 1×TAE-Mg buffer. After flushing out the excess protein, a movie was recorded under 532 nm illumination to locate Cy3 (~protein) molecules followed by the photobleaching of the fluorophores. A series of movies were recorded on the same field of view after injecting a substrate solution (Table B11†) supplemented with 3 mM NAD+ and a variable concentration of resazurin (Figure B28†). For each resazurin concentration, the average intensity of the spikes above the threshold was compared (Figure B27†). While the fraction of molecules with spikes increases with the increase in resazurin concentration, the average intensity does not change with the resazurin concentration, suggesting that each spike corresponds to a single substrate turnover (357).

3.3.8.5 Spike train analysis

Spike train analysis was carried out using a modified Rank Surprise (RS) method (358). Briefly, inter-spike intervals (ISIs) were determined by calculating the time in between individual fluorescent spikes (Figure B27†) for each molecule. The RS method was used to demarcate the start and end points of bursts after collecting ISIs for all molecules. A maximum expected ISI length of 5 s was used as a threshold to be included in a burst (Figure B29†). Any sporadically appearing spikes outside a burst event are considered to be non-bursts.

76

3.4 CONCLUSIONS

In summary, we have used quantitative single molecule characterization and simulation to examine the sequence-level origins of sub-optimal performance of DNA-actuated enzyme nanoreactors. We discovered several specific yet generalizable features of the original sequence design that resulted in sub-optimal tweezer performance. Introducing a structure- guided rational design strategy, we were able to systematically eliminate each of these defects by implementing modifications for both the actuator and structural HJ elements. Our novel single molecule enzyme assay confirmed two specific, independent pathways to increased activity of the tweezer-actuated enzyme nanoreactor in response to our design improvements that resulted in proper closure and elimination of heterogeneity among individual nanodevices. Because the design features identified here are general to many other DNA nanostructures, our work establishes guidelines for rationally improving the performance of nanodevices including reactors, chemical and biological sensors, as well as programmable . The effect of junction core sequence on a multi-junction nano-assembly has been shown in this work to be significant. While the neighbouring-base sequences around the junction cores were not the same as in Chapter 2, the effect of the core changes was never-the-less profound. J11 proved to dramatically improve the construct’s planarity, while simulation of the J12 construct showed a highly flexible and non-planar structure. The free-energy measurements in Chapter 2 revealed that the core bases are of particular significance and the results here confirm that. The 1 fact that Jtwist is not the major contributor of variance in the dynamics of J1 is most likely the reason that this core sequence had such a profound effect on the planarity of the larger construct.

77

78

CHAPTER 4

IMPROVING FINITE ELEMENT METHOD MODELS WITH MD SIMULATIONS

As discussed in Section 1.2.4, continuum models provide an efficient framework for simulating large DNA assemblies on timescales that are beyond the scope of all-atom models. These atomistic models are important; however, as they provide a means of parameterizing and validating these lower dimensional ones. The state-of-the art in terms of this class of models being applied to DNA is the CanDo suite of programs, developed in the Laboratory for Computational Biology and Biophysics at MIT, which uses an FEM representation at the base- pair level to calculate the solution structure and thermal fluctuations of nanostructures. The original implementation of this framework (21,43) used a “clamped” 4WJ approach, where the DOFs of the nodes at the crossover positions were fixed, and could only solve on-lattice and topologically-open constructs. The fluctuations reported by the software were also super- positions of normal modes and not temporal dynamics. Both of these limitations have recently been addressed by two publications that this author contributed to; providing MD data for the parameterization of 4WJ alignment elements in a paper by Pan et al. (22), and of a DNA tweezer construct to validate a BD model in a paper by Sedeh and Pan et al. (371). The abstracts for these two studies are provided here and the published manuscripts can be found in Appendices C and D.

79

4.1 LATTICE-FREE PREDICTION OF THREE-DIMENSIONAL STRUCTURE OF PROGRAMMED DNA ASSEMBLIES

DNA can be programmed to self-assemble into high molecular weight 3D assemblies with precise nanometer-scale structural features. While numerous sequence design strategies exist to realize these assemblies in solution, there is currently no computational framework to predict their 3D structures based on programmed underlying multi-way junction topologies constrained by DNA duplexes. Here, we introduce such an approach and apply it to assemblies designed using the canonical immobile four-way junction. The procedure is used to predict the 3D structure of high molecular weight planar and spherical ring-like origami objects, a tile-based sheet-like ribbon, and a 3D crystalline motif, in quantitative agreement with experiments. Our framework provides a new approach to predicting programmed nucleic acid 3D structure based on prescribed secondary structure motifs, with application to the design of such assemblies for use in biomolecular and materials science.

This manuscript was accepted for publication in Nature Communications as: Keyao Pan, Do-Nyun Kim, Fei Zhang, Matthew R. Adendorff, Hao Yan and Mark Bathe, “Lattice-free prediction of three-dimensional structure of programmed DNA assemblies”

Author contributions

K.P, D-N.K., and M.B. conceived of the finite element modelling and structure-prediction approach. M.R. A. performed molecular dynamics simulations and analyses. K.P. implemented the primary and secondary structure parsing algorithm. K.P. and D-N.K. implemented the finite element model and structure-prediction procedure. K.P. applied the model to compute the modelling results and processed the results to make the figures. F.Z. and H.Y. conceived of the experimental assay. F.Z. implemented the experimental assay, generated the experimental data, and processed the results to make the experimental figures. All authors discussed the results and their presentation. K.P., D-N.K., and M.B. wrote the manuscript. All authors commented on and edited the manuscript.

80

4.2 COMPUTING NONEQUILIBRIUM CONFORMATIONAL DYNAMICS OF STRUCTURED NUCLEIC ACID ASSEMBLIES

Synthetic nucleic acids can be programmed to form precise three-dimensional structures on the nanometer-scale. These thermodynamically stable complexes can serve as structural scaffolds to spatially organize functional molecules including multiple enzymes, chromophores, and force- sensing elements with internal dynamics that include substrate reaction-diffusion, excitonic energy transfer, and force-displacement response that often depend critically on both the local and global conformational dynamics of the nucleic acid assembly. However, high molecular weight assemblies exhibit long time-scale and large length-scale motions that cannot easily be sampled using all-atom computational procedures such as molecular dynamics. As an alternative, here we present a computational framework to compute the over-damped conformational dynamics of structured nucleic acid assemblies and apply it to a DNA-based tweezer, a nine-layer ring origami object, and a pointer-shaped DNA origami object, which consist of 204, 3,600, and over seven thousand base-pairs, respectively. The framework employs a mechanical finite element model for the DNA nanostructure combined with an implicit solvent model to either simulate the Brownian dynamics of the assembly or alternatively compute its Brownian modes. Computational results are compared with an all-atom molecular dynamics simulation of the DNA-based tweezer. Several hundred microseconds of Brownian dynamics are simulated for the nine-layer ring origami object to reveal its long time-scale conformational dynamics, and the first ten Brownian modes of the pointer-shaped structure are predicted. The computational procedure is available online at http://cando-dna-origami.org.

This manuscript was accepted for publication in the Journal of Chemical Theory and Computation as: *Reza Sharifi Sedeh, *Keyao Pan, Matthew R. Adendorff, Oskar Hallatschek, Klaus-Jürgen Bathe, and Mark Bathe, “Computing Nonequilibrium Conformational Dynamics of Structured Nucleic Acid Assemblies”

Author contributions

R.S.S., O.H., K.J.B., and M.B. conceived of the coarse-grained BD approach for DNA assemblies. R.S.S. developed the BD simulation framework and verified its performance against analytical and other numerical solutions. R.S.S. implemented the pre-simulation and BD

81 simulation scripts. R.S.S. and K.P. implemented the scripts for post-simulation data analysis. K.P. built the FE models and atomic models for the DNA assemblies, performed the final BD simulations, and processed the simulation results to generate the figures and movies. M.R.A. performed and analyzed the MD simulation for the DNA tweezer. All authors discussed the results and their presentation. R.S.S., K.P., M.R.A., and M.B. wrote the manuscript text. All authors commented on and edited the manuscript.

*These authors contributed equally.

82

CHAPTER 5

GENERAL CONCLUSION

5.1 Summary

This thesis has focused on the effect of sequence changes at the design motif level of DNA nanodevices, as well as on the downstream influence these motifs have on both the structure and function of a device. Throughout, all-atom MD simulations have served as the principal investigative tool for unpacking observed experimental phenomena and for informing experiment design. Chapter 1 introduced the field of DNA nanotechnology, described some of the key milestones reached, and discussed the challenge of changing nanotechnology to Angstrom- technology (Section 1.1). This was followed by a review of various methods for the theoretical treatment of DNA molecules, with a particular focus on all-atom MD simulations and the evolution of the current state-of-the-art. Techniques for effective analysis of MD trajectories, enhanced sampling of potential-energy landscapes, and the calculation of thermodynamic free energies were then discussed (Section 1.2). Chapter 2 investigated the subtle atomic-level interactions that influence motif structure, stacked-isomer bias, and junction handedness. Analyses of explicit solvent MD simulations of a reference J1 junction sequence, as well as duplex, nicked-duplex, single-crossover, and junction-core analogues, were used to demonstrate that asymmetric immobile 4WJs deviate from canonical B-form DNA, both in structure and, to a larger degree, flexibility. Chemical topology and local base-pair sequence were shown to contribute to these structural changes both in the 4WJ motif and in its topological variants. Free-energy analyses showed that stacking, geometric, and electrostatic factors all contribute to the energy difference between stacked junction isomers and that these contributions can come from both core and non-core bases. Furthermore, enhanced sampling MD revealed that junction core base stacking sequences exhibit unique and significant global junction dynamics that are independent of isomerization energy differences. Chapter 3 looked into how the aforementioned sequence effects impact larger structures, where motifs were combined into constrained arrays to form a DNA tweezer- actuated nanoreactor. Quantitative single molecule characterization and simulation found several specific, yet generalizable, features of sequence design that had manifested as sub-

83 optimal performance of the initial device. A simulation-informed systematic elimination of these defects was performed through modifications to both the actuator hairpin size and the 4WJ element sequences. A novel single-molecule enzyme assay was then used to confirm that modification of these two motifs enhanced the activity of the tweezer-actuated enzyme nanoreactor through proper arm closure and the elimination of heterogeneity among individual nanodevices. Finally, Chapter 4 presented an MD-parameterized junction-focused approach to the modelling of off-lattice and topologically closed DNA constructs (Section 4.1). This framework was then extended to allow for the calculation of BD trajectories and harmonic modes, facilitating simulations of large DNA origamis on previously infeasible time scales Section 4.2).

5.2 Guidelines for DNA nanostructure design and simulation

Based on the insights gained from the studies performed in this thesis, and in the light of the extensive structural DNA literature, some guidelines are presented for the improved design of DNA motifs, their simulation, and the structures they will be used to build. Firstly, DNA should not be treated with a two-state paired-unpaired model when used in a structural context as many potential interactions, be they specific or non-specific, should be considered. While the programming of Watson-Crick pairing is critical to the paradigm of nanodevice self-assembly, the significance of non-specific DNA interactions is often understated. The phosphate backbones of DNA helices are rich with hydrogen bonding sites that single-stranded DNA strands can interact with, a factor that played a significant role in Chapter 3, where the poly-thymine linkers that attached the fluorescent dyes were found to cross-link the tweezer arms in certain systems, thereby stabilizing a low IAD population. DNA sequence is structurally significant and this is especially true close to the site of strand crossover in 4WJs and SX systems, and at the backbone discontinuity in nicked- duplexes. Recent computational studies have shown the diversity of base-pair step preferred geometries and stiffnesses (76), indicating a structural heterogeneity that is also dependent on nearest-neighbour sequence. The natural deformabilities of the various B-form base-pair steps are exacerbated at or near nanotechnology-relevant structural motifs. As such, if rigidity of a nanostructure is desired, the flanking sequences of these motifs should be selected from the more rigid purine-pyrimidine tetranucleotide sequences; should flexibility be desired, the pyrimidine-purine tetranucleotide sequences might be preferable. At the core, certain steps appear to be made more deformable as a consequence of geometry, irrespective of sequence,

84 and so energetically it may be best to select relatively flexible bases for these locations, i.e. the pyrimidine-purine steps, with rigid bases at the other core locations; this hypothesis will require further testing, ideally with free energy perturbation (FEP) methodologies. These aforementioned sequence effects were explored in Chapter 2, within the context of isolated junction / variant systems, yet, their effects can be seen in the structural heterogeneity in the tweezer constructs analyzed in Chapter 3. Even after all 4WJs were set to the same sequence, there was still heterogeneity in their Jtwist angles; indicating that additional interactions are contributing to the structure and dynamics of the device. It should be noted that 4WJ core sequence heterogeneity alone was not enough to account for the improved functioning of the DNA nanoreactor as even when the tweezer junctions were all set to the core sequence of isomer 2 of J1 there was considerable structural aplanarity when compared to the all-isomer 1 systems. When looking to design nanostructures with specific global structures, e.g. induced twist or bend; one should carefully consider the sequences used in these regions as certain tetranucleotides are more amenable to deformation along different DOFs. The range of preferred values for each of the 6 major base-pair step parameters is very large and clever sequence choices may dramatically improve performance. Preferred junction isomer handedness and stability are effects that deserve further study. Researchers have long considered 4WJs as exclusively preferring a structure exhibiting a right-handed twist (18), however, recent findings in origami structures (82,372) and in this thesis suggest that immobile junctions may behave differently than their mobile counterparts. The work in Chapter 2 is, to the best of this author’s knowledge, the first calculation of 4WJ twist energies and suggests that knowledge of the flexibility of a specific core sequence may be needed to ensure subtle spatial control of a device where these junctions are interconnected, constraining junctions into one isomer or the other, e.g. the work in Chapter 3. The fact that J11 does not explore Jtwist as its highest variance mode is very significant, and this may be the reason why the redesigned tweezer construct with this core sequence showed such improved planarity and performance. The energy difference between junction isomers should also be considered in the design of these arrays as the results in Chapter 2 indicate that significant isomerization energies may exist between the two stacked forms for a given sequence. On a more personal note, a lesson I only learned several years into my PhD, and one that would have made it considerably more productive, is that one month of focused literature study can save a year of poorly-directed effort. When starting a project, immerse yourself in the literature; read a paper and then read all the papers that paper references. Suddenly, key

85 research questions that are worthy of your focused study become apparent and this leads to well-defined thesis goals. When thinking about using a new method, spend as much time on learning what it is poorly suited for as what it is well suited for and do every tutorial you can find before implementing it on your own work. Above all, know the origins of your simulation and force-field parameters; read the papers about them and know their assumptions and limitations. If you are planning on running MD simulations, learn Python...this cannot be stressed enough. Packages like ProDy, MDAnalysis, pyEMMA, and MMPBSA.py are extremely powerful and the language in and of itself makes analysis so much more convenient. Know what your downstream post-processing methods will be before you start your simulations. Certain programs require certain trajectory formats, force-field parameters, or rely on specific simulation conditions. You may find that certain analyses are off-limits to you simply because you used CHARMM over AMBER, or vice-versa. Klaus Schulten once said, “Always run the longest simulations that you possibly can”. This is a solid piece of advice and it is worth considering continuing a simulation after you have started processing what you think is enough. Just leave it running in the background; you may realize that after a few weeks of finally getting your analysis scripts working that you wish you had just a little bit more data or another replicate. A challenge though is to know when to stop, and this is why a good understanding of your convergence criteria is essential. Above all, ensure that every piece of focused work you do is rigorously vetted and publishable before you move on. Before you run a simulation, know how you will confirm that your results are in fact correct and meaningful. Learn good methods of convergence testing and never be afraid to throw out a simulation and do it better a second time. Failure to do so can result in a shaky tower of poorly validated research than can than fall down later, bringing a year of subsequent effort tumbling around you; forcing a restart and a lot of unhappiness.

5.3 Limitations and future work

This section summarizes the limitations of the work performed and suggests studies that would further the advances made here. In Chapter 2, an effective framework for quantifying junction stability and flexibility was developed, though a larger set of junction core sequences should be analyzed computationally and characterized experimentally. The junction isomerization ratios have been measured for all immobile junction core sequences by a collaborator, David Millar; and while our computational analyses agree qualitatively with his values for the J1 and J24 sequences, additional sequences

86 should be investigated using the presented framework and compared to this experimental data, particularly J2. A more thorough investigation into the proposed sequence-dependent electrostatic potential would be beneficial and an improved map of core and nearest-neighbour sequence effects on free energies could also be generated by augmenting the MM-PBSA calculations used in Chapter 2 with Free-energy Perturbation (FEP) runs, whereby bases are computationally mutated into different ones. The sequences flanking the crossover site of the standard junction model, i.e. Figure 2.1, should be made symmetrical for this analysis and the bases of interest then systematically shifted to obtain well-converged energy values. The added advantage of this method is that it allows for the stability of different junction core sequences to be compared with one another, where presently only the isomerization energies are known; experimentally, these inter-junction energies could be quantified through melting curve analyses The results of Chapter 3 revealed that junction sequence is significant, however, these effects should be better quantified through a more systematic junction modification scheme that accounts for nearest-neighbour effects. The tweezer presents a useful assaying technique to measure the effect of junction core and nearest-neighbour sequence modifications, with the inter-arm distance and the heterogeneity of its distributions as the readout. Alternatively, a competition assay could be devised whereby only one alternative structure will form in the presence of a difference in core sequence energies, or both will form in the case of equi- energetic systems. Finally, the updated models in Chapter 4 can be used to analyze the effect of junction twist and / or roll on the global structure and dynamics of the tweezer constructs, allowing for an investigation into the origins of the specific IAD sub-populations observed in Chapter 3. In a design capacity, junction sequences could be chosen to confer certain characteristics, e.g. a specfic 3D shape, to nanostructures; provided that a sufficient knowledge of junction core sequence flexibilities has been gathered.

5.4 Closing statement

DNA is a molecule with immense potential that can only be achieved if its subtle behaviour is understood. In order for DNA nanotechnology to truly achieve the Nature-like precision it seeks, an understanding of the motif-level physico-chemical effects needs to advance in tandem with that of large-scale self-assembly practices. This thesis provides a contribution to the former.

87

88

APPENDIX A

Supplement to - SEQUENCE EFFECTS IN IMMOBILE DNA FOUR-WAY JUNCTIONS: A COMPUTATIONAL STUDY

Figure A1: Deviations in step parameter mean values for the SXD junction topological variants of J1 (left) and their topologies (right).

Figure A2: Deviations in step parameter mean values for the nicked duplex topological variants of J1 (left) and their topologies (right).

Figure A3: Deviations in step parameter mean values for the J24 junction systems (left) and their topologies (right).

89

Figure A4: Mean values of the six base-step degrees of freedom for the inner nine base steps of helix 1 are shown for each of the different topologies of J1, as well as for the 4WJ topology of J24, in their isomer 1 forms. The various topologies are color coded as: red (duplex), dark blue (nick), orange (SXB), green (SXD), red-purple (J1 4WJ), and light blue (J24 4WJ).

Figure A5: Mean values of the six base-step degrees of freedom for the inner nine base steps of helix 2 are shown for each of the different topologies of J1, as well as for the 4WJ topology of J24, in their isomer 1 forms. The various topologies are color coded as: red (duplex), dark blue (nick), orange (SXB), green (SXD), red-purple (J1 4WJ), and light blue (J24 4WJ).

90

Figure A6: Mean values of the six base-step degrees of freedom for the inner nine base steps of helix 1 are shown for each of the different topologies of J1, as well as for the 4WJ topology of J24, in their isomer 2 forms. The various topologies are color coded as: red (duplex), dark blue (nick), orange (SXB), green (SXD), red-purple (J1 4WJ), and light blue (J24 4WJ).

Figure A7: Mean values of the six base-step degrees of freedom for the inner nine base steps of helix 2 are shown for each of the different topologies of J1, as well as for the 4WJ topology of J24, in their isomer 2 forms. The various topologies are color coded as: red (duplex), dark blue (nick), orange (SXB), green (SXD), red-purple (J1 4WJ), and light blue (J24 4WJ).

91

1 Figure A8: Comparison of the J1(1) duplex base-pair step parameter mean values with literature. The mean values from the present work’s simulations (red) are plotted with those from Dans et al, 2012 (gray). Shaded areas represent standard deviations. An absence of the reference line indicates complete overlap.

1 Figure A9: Comparison of the J1(2) duplex base-pair step parameter mean values with literature. The mean values from the present work’s simulations (red) are plotted with those from Dans et al, 2012 (gray). Shaded areas represent standard deviations. An absence of the reference line indicates complete overlap.

92

2 Figure A10: Comparison of the J1(1) duplex base-pair step parameter mean values with literature. The mean values from the present work’s simulations (red) are plotted with those from Dans et al, 2012 (gray). Shaded areas represent standard deviations. An absence of the reference line indicates complete overlap.

2 Figure A11: Comparison of the J1(2) duplex base-pair step parameter mean values with literature. The mean values from the present work’s simulations (red) are plotted with those from Dans et al, 2012 (gray). Shaded areas represent standard deviations. An absence of the reference line indicates complete overlap.

93

Figure A12:.Deviations in the helical step parameter stiffness constants from those of canonical B-form are shown for the J24 junction systems (right). Topology schematics for J241 and J242 are shown top left and bottom left respectively.

Figure A13: Stiffness constants along the six base-step degrees of freedom for the inner nine base steps of helix 1 are shown for each of the different topologies of J1, as well as for the 4WJ topology of J24, in their isomer 1 forms. The various topologies are color coded as: red (duplex), dark blue (nick), orange (SXB), green (SXD), red-purple (J1 4WJ), and light blue (J24 4WJ).

94

Figure A14: Stiffness constants along the six base-step degrees of freedom for the inner nine base steps of helix 2 are shown for each of the different topologies of J1, as well as for the 4WJ topology of J24, in their isomer 1 forms. The various topologies are color coded as: red (duplex), dark blue (nick), orange (SXB), green (SXD), red-purple (J1 4WJ), and light blue (J24 4WJ).

Figure A15: Stiffness constants along the six base-step degrees of freedom for the inner nine base steps of helix 1 are shown for each of the different topologies of J1, as well as for the 4WJ topology of J24, in their isomer 2 forms. The various topologies are color coded as: red (duplex), dark blue (nick), orange (SXB), green (SXD), red-purple (J1 4WJ), and light blue (J24 4WJ).

95

Figure A16: Stiffness constants along the six base-step degrees of freedom for the inner nine base steps of helix 2 are shown for each of the different topologies of J1, as well as for the 4WJ topology of J24, in their isomer 2 forms. The various topologies are color coded as: red (duplex), dark blue (nick), orange (SXB), green (SXD), red-purple (J1 4WJ), and light blue (J24 4WJ).

1 Figure A17: Comparison of the J1(1) duplex base-pair step diagonal stiffness constants with literature. The mean values from the present work’s simulations (red) are plotted with those from Pérez et al, 2008 (black). Shaded areas represent standard deviations. An absence of the reference line indicates complete overlap.

96

1 Figure A18: Comparison of the J1(2) duplex base-pair step diagonal stiffness constants with literature. The mean values from the present work’s simulations (red) are plotted with those from Pérez et al, 2008 (black). Shaded areas represent standard deviations. An absence of the reference line indicates complete overlap.

2 Figure A19: Comparison of the J1(1) duplex base-pair step diagonal stiffness constants with literature. The mean values from the present work’s simulations (red) are plotted with those from Pérez et al, 2008 (black). Shaded areas represent standard deviations. An absence of the reference line indicates complete overlap.

97

2 Figure A20: Comparison of the J1(2) duplex base-pair step diagonal stiffness constants with literature. The mean values from the present work’s simulations (red) are plotted with those from Pérez et al, 2008 (black). Shaded areas represent standard deviations. An absence of the reference line indicates complete overlap.

Figure A21: Base-pair step configurational volumes for the different topology systems. The various topologies are color coded as: red (duplex), dark blue (nick), orange (SXB), green (SXD), red-purple (J1 4WJ), and light blue (J24 4WJ).

98

Figure A22: Solute entropies for the J1 and J24 junction systems. The entropies for replica 1 (left), replica 2 (middle), and replica 3 (right) are shown for J11 (solid black), J12 (solid red), J241 (dashed black), and J242 (dashed red) are shown at various trajectory lengths. Each nanosecond corresponds to 1,000 structures.

Figure A23: The fractional variance (left) and cumulative fractional variance (right) for the 20 highest variance meta-ensemble modes, ranked by eigenvalue.

Figure A24: Blocking curves for the J1 system replicates. Blocking curves are shown for each of the three replicates of J11 (top row) and J12 (bottom row), performed on projections of the individual trajectories

99

푛 onto the second junction essential mode (Jroll). The block size is determined as 2 ps, where 푛 is the number of blocking operations and each.

Figure A25: Blocking curves for the J24 system replicates. Blocking curves are shown for each of the three replicates of J241 (top row) and J242 (bottom row), performed on projections of the individual 푛 trajectories onto the second junction essential mode (Jroll). The block size is determined as 2 ps, where 푛 is the number of blocking operations.

100

APPENDIX B

Supplement to – RATIONAL DESIGN OF DNA-ACTUATED ENZYME NANOREACTORS GUIDED BY SINGLE MOLECULE ANALYSIS

B.1 DNA sequences, assembly, and bulk characterization of DNA tweezers

Table B1. Sequences of original hairpin (3T3bp), hairpin removed (0T0bp and 1T0bp), and hairpin redesigned (1T3bp, 1T4bp and 1T5bp) DNA tweezers. Each tweezer design contains one of the T7 strands (shaded), which dictates the name of the tweezer (3T3bp tweezer has a 3-nt spacer (red) and a stem region of 3bp length (blue), for example). Strand TB1 is labeled with biotin. Tweezers for bulk measurements were assembled with regular T3 and TP8 strands without fluorescent dyes, whereas these strands were fluorophore labeled for single molecule experiments. The 3' end of strand T9 is conjugated to the NAD+ molecule via a 20T linker. Protruding complementary sequences used to capture oligonucleotide-modified protein on the tweezer are highlighted in cyan and green.

Name Sequence TB1 TTTTCGACCGAGCGTGAATTAGTGATCCGGAACTCGCGCAATGAACCTTTT/3BioTEG/ TTTTTCAGCTGGCCTATCTAAGACTGAACTCGCACCGCCGGCATAAGCTATGCGCTCTGCCGCTTTGGAGGG TP2 AGGG T3 TTTTTAGGAGATGGCACGTTAATGAATAGTCTCCACTTGCATCCGAGATCCGAACTGCTGCC T3-Cy3 Cy3-TTTTTAGGAGATGGCACGTTAATGAATAGTCTCCACTTGCATCCGAGATCCGAACTGCTGCC TB4 TTTTCGAGAGAAGGCTTGCCAGGTTACGTTCGTACATCGTCTGAGTTTTTT T5 TTTTGGCAGCAGTTCAGGCCAGCTGATTTT T6 TTTTGGTTCATTGCGGAGTTCAGTCTTAGATGGATCTCGGATGCAAGGCCTTCTCTCGTTTT GGTGCCGAGTTCCGGATCACTAATTCCATAGCTTATGCCGGCTTTGCGTAAGACCCACAATCGCTTTACTA T7 (3T3bp) TTCATTAACGTGTGTACGAACGTAACCTGGCAATGGAG GGTGCCGAGTTCCGGATCACTAATTCCATAGCTTATGCCGGCACTATTCATTAACGTGTGTACGAACGTAAC T7 (0T0bp) CTGGCAATGGAG GGTGCCGAGTTCCGGATCACTAATTCCATAGCTTATGCCGGCTTACTATTCATTAACGTGTGTACGAACGTA T7 (1T0bp) ACCTGGCAATGGAG GGTGCCGAGTTCCGGATCACTAATTCCATAGCTTATGCCGGCTGCGTAAGACCCACAATCGCTACTATTCA T7 (1T3bp) TTAACGTGTGTACGAACGTAACCTGGCAATGGAG GGTGCCGAGTTCCGGATCACTAATTCCATAGCTTATGCCGGCTGCGGTAAGACCCACAATCCGCTACTATT T7 (1T4bp) CATTAACGTGTGTACGAACGTAACCTGGCAATGGAG GGTGCCGAGTTCCGGATCACTAATTCCATAGCTTATGCCGGCTGCGGGTAAGACCGACAATCCCGCTACT T7 (1T5bp) ATTCATTAACGTGTGTACGAACGTAACCTGGCAATGGAG TP8 TTTTGCGGCAGAGCGACGCTCGGTCGTTTGGAGGGAGGG TP8-Cy5 Cy5-TTTTGCGGCAGAGCGACGCTCGGTCGTTTGGAGGGAGGG T9-20T AACTCAGACGACCATCTCCTAATTTTTTTTTTTTTTTTTTTT/3AmMO/

101

Table B2. Sequences of the 1T4bp tweezer with redesigned HJ1 using iso-I (1T4bp-HJ1-iso-I) of Seeman’s HJ J1.(16,27,355) Redesigned inner-core sequences are bolded and underlined.

Name Sequence TB1 TTTTCGACCGAGCGTGAATTAGTGATCCGGAACTCGCGCAATGAACCTTTT/3BioTEG/ TTTTTCAGCTGGCCGATCTAAGACTGAACTCGCACCGCCGGCATAAGCTATGCGCTCTGCCGCTTT TP2 GGAGGGAGGG T3-Cy3 Cy3-TTTTTAGGAGATGGCACGTTAATGAATAGTCTCCACTTGCATCCGAGATCCTAACTGCTGCC T4 TTTTCGAGAGAAGGCTTGCCAGGTTACGTTCGTACATCGTCTGAGTTTTTT T5 TTTTGGCAGCAGTTACGGCCAGCTGATTTT T6 TTTTGGTTCATTGCGGAGTTCAGTCTTAGATGGATCTCGGATGCAAGGCCTTCTCTCGTTTT GGTGCCGAGTTCCGGATCACTAATTCCATAGCTTATGCCGGCTGCGGTAAGACCCACAATCCGCTA T7 (1T4bp) CTATTCATTAACGTGTGTACGAACGTAACCTGGCAATGGAG TP8-Cy5 Cy5-TTTTGCGGCAGAGCGACGCTCGGTCGTTTGGAGGGAGGG Set CGTGTGGTTGAGCGGATTGTGGGTCTTACCGCA Fuel TGCGGTAAGACCCACAATCCGCTCAACCACACG T9-20T AACTCAGACGACCATCTCCTAATTTTTTTTTTTTTTTTTTTT/3AmMO/

Table B3. Sequences of the 1T4bp tweezer with all five Holliday junctions (HJ1 to HJ5) redesigned based on iso-I of Seeman’s HJ J1 (16,27,355) (1T4bp-5HJs-iso-I), matching the 5'-3' direction of the core sequence with HJ1. Redesigned inner-core sequences are bolded and underlined.

Name Sequence TB1 TTTTCGACCGAGCGGAAATTAGTGATCCGGAACTCGAGCAATGAACCTTTT/3BioTEG/ TTTTTCAGCTGGCCGATCTAAGACTGAACTCTCACCGCCGGCATAAGCTATCTGCTCTGCCGCTTT TP2 GGAGGGAGGG T3-Cy3 Cy3-TTTTTAGGAGATGGAACGTTAATGAATAGTCTCCGATTGCATCCGAGATCCTAACTGCTGCC T4 TTTTCGAGAGAAGGCTTGCCAGGTTACGTTCGTACCTCGTCTGAGTTTTTT T5 TTTTGGCAGCAGTTACGGCCAGCTGATTTT T6 TTTTGGTTCATTGCTGAGTTCAGTCTTAGATGGATCTCGGATGCAATGCCTTCTCTCGTTTT GGTGACGAGTTCCGGATCACTAATTTGATAGCTTATGCCGGCTGCGGTAAGACCCACAATCCGCTA T7 (1T4bp) CTATTCATTAACGTTGGTACGAACGTAACCTGGCAACGGAG TP8-Cy5 Cy5-TTTTGCGGCAGAGCACCGCTCGGTCGTTTGGAGGGAGGG Set CGTGTGGTTGAGCGGATTGTGGGTCTTACCGCA Fuel TGCGGTAAGACCCACAATCCGCTCAACCACACG C3-T9-20T AACTCAGACGACCATCTCCTAATTTTTTTTTTTTTTTTTTTT/3AmMO/ T3 TTTTTAGGAGATGGAACGTTAATGAATAGTCTCCGATTGCATCCGAGATCCTAACTGCTGCC TP8 TTTTGCGGCAGAGCACCGCTCGGTCGTTTGGAGGGAGGG

102

Table B4. Sequences of the 1T4bp tweezer with redesigned HJ1 using iso-II of Seeman’s HJ J1 (16,27,355) (1T4bp-HJ1-iso-II), a control design. Redesigned inner-core sequences are bolded and underlined.

Name Sequence TB1 TTTTCGACCGAGCGTGAATTAGTGATCCGGAACTCGCGCAATGAACCTTTT/3BioTEG/ TTTTTCAGCTGGCCTGTCTAAGACTGAACTCGCACCGCCGGCATAAGCTATGCGCTCTGCCGCTTT TP2 GGAGGGAGGG T3-Cy3 Cy3-TTTTTAGGAGATGGCACGTTAATGAATAGTCTCCACTTGCATCCGAGATCACAACTGCTGCC T4 TTTTCGAGAGAAGGCTTGCCAGGTTACGTTCGTACATCGTCTGAGTTTTTT T5 TTTTGGCAGCAGTTGAGGCCAGCTGATTTT T6 TTTTGGTTCATTGCGGAGTTCAGTCTTAGACTGATCTCGGATGCAAGGCCTTCTCTCGTTTT GGTGCCGAGTTCCGGATCACTAATTCCATAGCTTATGCCGGCTGCGGTAAGACCCACAATCCGCTA T7 (1T4bp) CTATTCATTAACGTGTGTACGAACGTAACCTGGCAATGGAG TP8-Cy5 Cy5-TTTTGCGGCAGAGCGACGCTCGGTCGTTTGGAGGGAGGG Set CGTGTGGTTGAGCGGATTGTGGGTCTTACCGCA Fuel TGCGGTAAGACCCACAATCCGCTCAACCACACG T9-20T AACTCAGACGACCATCTCCTAATTTTTTTTTTTTTTTTTTTT/3AmMO/

Table B5. Sequences of the isolated HJ1 (original 3T3bp tweezer) with a regular and a flipped labeling scheme for the FRET pair (Cy3/Cy5).

Name Sequence (Labeling Scheme-I) TP2 TTTTTCAGCTGGCCTATCTAAGACTG T3Cy3 /5Cy3/ATCCGAGATCCGAACTGCTGCC T5Biotin /5BiotinTEG/TTTTGGCAGCAGTTCAGGCCAGCTGATTTT T6Cy5 /5Cy5/CAGTCTTAGATGGATCTCGGAT Name Sequence (Labeling Scheme-II) TP2Cy3 /5Cy3/TCAGCTGGCCTATCTAAGACTG T3 ATCCGAGATCCGAACTGCTGCC T5Biotin /5BiotinTEG/TTTTGGCAGCAGTTCAGGCCAGCTGATTTT T6Cy5 /5Cy5/CAGTCTTAGATGGATCTCGGAT

103

Table B6. Estimated extinction coefficients of all tweezers used in our measurements. The theoretical extinction coefficients were obtained by summing the extinction coefficients of all DNA strands (both + -1 - double- and single-stranded ) and protein/NAD components involved. G6pDH: ε260 ~ 61594 M cm 1 + -1 -1 ; 6AE-NAD : ε260 ~ 21000 M cm (provided by Biolog, Hayward, CA, USA).

-1 -1 DNA Tweezer Theoretical ε260 (M cm ) 3T3bp (open) 4162384.9 1T3bp (open) 4105635.3 1T4bp (open) 4138847.2 1T5bp (open) 4174447.5 1T4bp-HJ1-iso-I (open) 4135293.4 1T4bp-5HJs-iso-I (open) 4139136.2 1T4bp-HJ1-iso-II (open) 4136108.6 3T3bp (close) 4003121.4 1T3bp (close) 3954125.0 1T4bp (close) 3971369.5 1T5bp (close) 3992879.8 1T4bp-HJ1-iso-I (close) 3967815.7 1T4bp-5HJs-iso-I (close) 3971658.5 1T4bp-HJ1-iso-II (close) 3968630.9 3T3bp (open)-G6pDH 4338484.7 1T3bp (open)-G6pDH 4281735.1 1T4bp (open)-G6pDH 4314947.0 1T5bp (open)-G6pDH 4350547.3 1T4bp-5HJs-iso-I (open)-G6pDH 4315236.0 3T3bp (close)-G6pDH 4179221.2 1T3bp (close)-G6pDH 4130224.8 1T4bp (close)-G6pDH 4147469.3 1T5bp (close)-G6pDH 4168979.6 1T4bp-5HJs-iso-I (close)-G6pDH 4147758.3

104

Figure B1. mFold prediction of the hairpin-actuator secondary structure of the different tweezers. (a) Hairpin sequences used as inter-arm linkers in different tweezers. The single-stranded flanking spacer, the hairpin stem, and loop sequences are highlighted in red, blue and black, respectively. (b) mFold (120) predicted folding of the DNA linker sequences (UNAFold Tool; Integrated DNA Technologies, Inc., Coralville, IA, USA).

105

Figure B2. HPLC purification (top) and MS characterization (bottom) of NAD+-modified oligonucleotides (T9-20T) that are designed to be compatible with the 3T3bp, 1T3bp, 1T4bp and 1T5bp tweezers. The calculated molecular weight of T9-20T is 12,927.5 Da (provided by Integrated DNA Technologies, Inc., Coralville, IA, USA). Since the sample concentration was at the scale of nanomolar, it was purified using a C18 analytical column. Due to the overlap of the two peaks in the top panel, we chose the purer portion of each peak with little overlap. Although the purity may not be 100%, the enzyme activities of opened- and closed-state tweezers were compared for the same samples, and therefore the relative activity of the tweezers should not be affected by the conjugation yield and purity of the NAD+-DNA conjugates. The peak at ~7000 (m/z) in the bottom panel is a doubly-charged complex.

106

Figure B3. HPLC purification (top) and MS characterization (bottom) of NAD+-modified oligonucleotides (T9-20T) that are designed to be anchored to the 1T4bp-5HJs-iso-I tweezer. The calculated molecular weight of T9-20T is 12,927.5 Da (provided by Integrated DNA Technologies, IDT). Please see legend of Figure B2 for purification details.

107

Figure B4. Native 3% PAGE characterization of the various assembled DNA tweezers used in bulk measurements (summarized in Table B9) with variant actuator elements as indicated. The native polyacrylamide gel was stained with ethidium bromide. Both closed- and opened-state tweezers with 3T/1T-spacers and 3bp/4bp/5bp-stems were analyzed (G: G6pDH enzyme).

108

Figure B5. Native 3% PAGE characterization of the various assembled DNA tweezers used in bulk measurements (shown in Figure B26) with variant actuator elements as indicated. The native polyacrylamide gel was stained with ethidium bromide. Both closed- and opened-state tweezers were tested (G: G6pDH enzyme).

109

B.2 Single molecule FRET characterization of DNA tweezers

Figure B6. Representative fields of view of different tweezer designs from single molecule TIRFM observation. The field of view was excited with a green laser (532 nm) and the emission intensity monitored for both the Cy3 and Cy5 channels (as indicated).

110

Figure B7. Determining the IADFRET of the closed-state 3T3bp tweezer from smFRET data. The FRET data for the right peak (~0.2) were first converted into an IADFRET distribution (as indicated by an arrow) and the mean IADFRET determined by Gaussian fitting (cyan) of the IADFRET histogram (right). The zero-

FRET population represents a tweezer conformation with an IADFRET >10 nm, which cannot be accurately calculated from the data.

111

Figure B8. Representative smFRET traces (gray) of the closed-state 3T3bp tweezer and the corresponding histograms of the nfiltered data (black overlay). A majority of molecules (72%) show static FRET states; although possibly not all underlying sub-states were resolved since the tweezer molecules only span the low FRET regime where the sensitivity to distance changes is low.

112

Figure B9. smFRET probability distributions of the closed-state 3T3bp tweezer at higher concentrations of antifuel strand used to close them. These FRET experiments were conducted in the presence of 1 or 5 µM antifuel strand in the imaging buffer. Compared to our standard conditions of 0.5-1 µM antifuel strand during incubation that was later diluted out and led to the data in Figure 3.2b (top panel), the zero-FRET population significantly decreased at a 1 µM antifuel strand concentration and completely disappeared at 5 µM antifuel strand, strongly suggesting that any zero-FRET state molecules represent unclosed tweezers.

113

Figure B10. smFRET probability distribution (left) and corresponding IADFRET distribution (right) of the 0T0bp tweezer.

114

Figure B11. Representative smFRET traces (gray) of the closed-state 1T3bp tweezer and the corresponding histograms of the nfiltered data (black). A vast majority of the molecules (84%) show dynamic behavior.

115

Figure B12. Representative smFRET traces (gray) of the closed-state 1T4bp tweezer and the corresponding histograms of the nfiltered data (black). A vast majority of the molecules (89%) show dynamic behavior.

116

Figure B13. Representative smFRET traces (gray) of the closed-state 1T5bp tweezer and the corresponding histograms of the nfiltered data (black). A vast majority of the molecules (75%) show dynamic behavior.

117

Figure B14. smFRET measurement of the isolated Holliday Junction (HJ) ‘hinge’ at 16 ms time resolution. The time resolution of 16 ms was achieved by scanning a very small slice (80 × 512 pixel) at the middle of the chip yielding only few (~3-10) molecules with active FRET pair per field of view. (a) The isolated HJ (blue box, left panel, see Table B5 for detailed sequences) and two complementary labeling schemes (right panel) used in this study. (b) Typical donor (green), acceptor (red), and FRET (black) traces for labeling scheme-I at 12.5 (top panel) and 100 mM Mg2+ (bottom panel). The corresponding FRET probability distributions are shown on the right. (c) Typical donor (green), acceptor (red), and FRET (black) traces for labeling scheme-II at 12.5 (top panel) and 100 mM Mg2+ (bottom panel). The corresponding FRET probability distributions are shown on the right.

118

Figure B15. Representative smFRET traces (gray) of the redesigned closed-state tweezer (1T4bp-5HJs- iso-I) and the corresponding histograms of the nfiltered data (black). Contrary to tweezers with the original HJs, a vast majority of the molecules (90%) in this tweezer show a static (stable) FRET state.

119

Figure B16. IADFRET probability distribution of 1T4bp-5HJs-iso-I tweezer. The number of molecules analyzed is denoted by n.

Figure B17. Comparing the smFRET probability distributions of the 1T4bp-5HJs-iso-I tweezer with and without externally added dA5 DNA oligonucleotide. The addition of the dA5 oligonucleotide shifted the probability distribution to lower FRET and the FRET ≈ 1.0 population (shaded) disappeared. The FRET probability distribution in the absence of the dA5 oligonucleotide was taken from Figure 3.3a for comparison.

120

B.3 ATOMIC FORCE MICROSCOPY IMAGING OF DNA TWEEZERS

Figure B18. AFM characterization of the 3T3bp tweezer. (a) Closed- and opened-state tweezers are shown in the left and right panel, respectively. The zoomed-in images of the tweezers (42 nm × 42 nm are shown) are placed in the same order as they appear (highlighted by boxes) from left to right in the field of

121 view. (b) IADAFM population distribution for the closed- (black) and opened-state (red) tweezers, where n depicts the number of tweezers analyzed.

Continued….

122

Figure B19. Expanded field of view of the 1T3bp, 1T4bp and 1T5bp tweezers shown in Figure 3.2c. Closed- and opened-state tweezers are shown in the left and right panel, respectively. The zoomed-in images of the tweezers (42 nm × 42 nm are shown) are placed in the same order as they (highlighted by boxes) appear from left to right in the field of view.

123

Figure B20. AFM characterization of the 1T4bp-5HJs-iso-I tweezer. IADAFM population distribution for the closed- (solid bars) and opened-state (dotted bars) tweezers, where n depicts the number of tweezers analyzed.

B.4 MOLECULAR DYNAMICS SIMULATION OF DNA TWEEZERS

Table B7. MD simulations reported in this work. Reported are the design of the DNA tweezera, the junction sequenceb, where 5HJs-iso-I refers to the case where all five HJs are redesigned to Seeman’s junction (J1) in their favored isomer(16,27,355), the inclusion of explicit poly-thymine dye linkersc, the concentration of Mg2+d, and the duration of each simulatione. The hairpin was eliminated entirely in the 0T0bp and 1T0bp designs.

Tweezera Junctionsb Tailsc [Mg2+]d Duratione 0T0bp Original No 12.5 mM 340 ns 1T0bp Original No 12.5 mM 100 ns 1T3bp┼ Original No 12.5 mM 100 ns 1T3bp┼ 5HJs-iso-I No 12.5 mM 100 ns 1T0bp 5HJs-iso-I Yes 12.5 mM 200 ns 1T3bp┼ Original Yes 12.5 mM 200 ns 1T3bp┼ 5HJs-iso-I Yes 12.5 mM 200 ns 1T3bp┼ 5HJs-iso-II Yes 12.5 mM 200 ns 3T3bp┼ Original Yes 12.5 mM 300 ns

124

B.4.1 MD simulation of the 3T3bp system

The hairpin motif was built using DS Visualizer’s Macromolecules Toolkit. A B-form GCG duplex was modified with three thymine bases per strand and then attached to the base-plate model, followed by energy minimization in vacuum using the same parameters detailed in the general MD protocol. Only the hairpin and spacer atoms were allowed to move during this initial minimization. There was no 13-nt hairpin loop present in this simulation.

B.4.2 MD simulations of the 0T0bp and 1T0bp systems with original junction sequences

The single crossover construct (0T0bp) was generated using CanDo(22). The double thymine crossover (1T0bp) construct was generated from the 0T0bp model by adding two thymine bases to the crossover site of the T7 strand using DS Visualizer’s Macromolecules Toolkit. Energy minimization was subsequently performed in vacuum using the same parameters as those detailed in the general MD protocol. All atoms except those corresponding to the hairpin and spacer atoms were constrained during this initial minimization. The 0T0bp construct exhibits significant structural out-of-plane distortion (Figure B21) with a single mean IADMD of 4.7 nm, whereas the 1T0bp system exhibited two IADMD sub-populations with mean values of 3.3 nm and 4.3 nm (Figure 3.2a and Figure B22).

B.4.3 MD simulations of the 1T3bp system with original and redesigned junction sequences

In order to elucidate the origin of the distinct states observed using smFRET, we investigated the conformational dynamics of the 1T3bp tweezer (Figure B24) as a model system in which we excluded the hairpin itself. Justification for this choice is provided by the fact that full equilibration of hairpin conformational dynamics could exceed by orders of magnitude the 100 ns simulations in explicit solvent performed here (373). Moreover, because the hairpin itself is likely to remain in solvent due both to solvation energy preference and also entropic considerations, it is unlikely to directly influence the tweezer conformation on the time-scales simulated. Results of MD simulation indicate that the hairpin stem remains hybridized on the approximately 200 ns timescale simulated (Table B8). Moreover, the duplex stem does not interdigitate between the tweezer’s arms on this timescale, consistent with the discussion

125 above. Importantly, the inclusion of explicit thymine dye linker tails had a noticeable effect on the IADMD due to their non-specific interactions involving hydrogen bonding and partial base pairing and stacking, resulting in two sub-populations centered at 2.7 nm and 3.6 nm respectively (Figure 3.3c). Simulation of the same construct excluding these tails resulted in

IADMD sub-populations with mean values of 3.6 nm and 4.3 nm (Figure 3.3c), supporting the hypothesis that interactions involving these linkers might stabilize conformational sub-states observed experimentally (Figure B24a and b). MD shows them to be stabilized by short-lived, non-specific hydrogen bonding interactions between the poly-thymine linkers tethering the dyes to the tweezer arms (Figure 24b, top panel). However, out-of-plane distortions of the tweezer arms result due to twisting that is incurred by the square lattice.(31) In contrast to this minimal inter-tail interaction (Figure B24b, top), the interaction was more prevalent for the same construct with redesigned junctions (Figure B23 and S24b, bottom panel). For the 1T3bp system with redesigned junction sequences, the effect of including explicit tails significantly reduced IADMD values when compared with a complementary simulation that does not include the tails (Figure 3.3c). Inspection of the redesigned structure reveals a more planar conformation (Figure B23, Table B9), with significant stabilization of the lower (1.1 nm) distance sub-population by inter-tail interactions (Figure B24b).

Figure B21. Three-dimensional configuration of the 0T0bp DNA tweezer. (a) Three orthogonal views of a representative conformation of the tweezer corresponding to mean IADMD of 4.7 nm. (b) The distribution of IADMD values from MD simulation and the fitted single Gaussian population, with μ0T0bp of 4.7 nm, is shown.

126

Figure B22. Three-dimensional configurations of the 1T0bp DNA tweezer. Three orthogonal views of a representative conformation of the tweezer corresponding to mean IADMD values of (a) 3.3 nm and (b) 4.7 nm. (c) The distributions of IADMD values from MD simulation and the fitted Gaussian populations.

Figure B23. Three-dimensional configurations of the 1T3bp-5HJs-iso-I and 1T3bp-5HJs-iso-2 DNA tweezer. (a) Three orthogonal views of a representative conformation of the tweezer corresponding to mean IADMD values of the junctions in their iso-1 forms at separations of 1.1 nm and (top) and 1.8 nm

(bottom). (b) The distributions of IADMD values from the 1T3bp-5HJs-iso-I MD simulation and the fitted Gaussian populations. (c) Three orthogonal views of a representative conformation of the tweezer

127 corresponding to mean IADMD values of the junctions in their iso-1 forms at separations of 1.1 nm and

(top) and 1.8 nm (bottom). (d) The distributions of IADMD values from the 1T3bp-5HJs-iso-2 MD simulation and the fitted Gaussian populations.

Figure B24. MD simulation of the 1T3bp tweezer. (a) smFRET derived IADs (IADFRET) for the non-zero FRET populations in 1T3bp, 1T4bp and 1T5bp tweezers. The shaded areas highlight the common sub- states with nearly the same IADFRET. (b) Two typical orthogonal views from MD simulation of the 1T3bp construct (top) and junction redesigned 1T3bp-5HJs-iso-I (bottom) tweezers, corresponding to their mean low IADMD. The tweezer adopted a flatter conformation after redesigning the junctions (dotted box); allowing more prevalent inter-tail interaction (highlighted with dotted circles on the rendered views on the left). This observation was consistent with the conclusion that cooperativity exists between structural flattening and the inter-tail interaction.

MD Movies are available at (INSERT DROPBOX LINK):

. Movie SM1. Movie of the 300 ns MD trajectory for the 3T3bp construct with explicit tails at 300 K and 12.5 mM Mg2+ sampled at 7 ns per frame. . Movie SM2. Movie of the 330 ns MD trajectory for the 0T0bp construct without explicit tails at 300 K and 12.5 mM Mg2+ sampled at 7 ns per frame. . Movie SM3. Movie of the 100 ns MD trajectory for the 1T0bp construct without explicit tails at 300 K and 12.5 mM Mg2+ sampled at 5 ns per frame. . Movie SM4. Movie of the 200 ns MD trajectory for the 1T3bp construct with explicit tails at 300 K and 12.5 mM Mg2+ sampled at 7 ns per frame. . Movie SM5. Movie of the 200 ns MD trajectory for the 1T3bp-5HJs-iso-I construct with explicit tails at 300 K and 12.5 mM Mg2+ sampled at 7 ns per frame.

128

B.4.4 Monitoring hairpin stem integrity:

B.4.4.1 Methods To ensure that the hairpin stem motif remained intact over the course of the simulations, the number of hydrogen bonds present in the short duplex was monitored at each trajectory step for all 1T3bp systems. The calculations were performed using the MDAnalysis toolkit,(328) with a donor-acceptor distance of 3.0 Å and an angle cutoff of 120˚.

B.4.4.2 Supporting results

All stem motifs remained intact over the course of the production runs for the 1T3bp systems, with the tail-free constructs exhibiting a lower mean number of hydrogen bonds in the stem than their explicit tail counterparts (Table B8). The observation that the tweezer arms are closer in the explicit tail simulations implies that there is probably less tension on the hairpin stem, resulting in a higher number of stable hydrogen bonds.

Table B8: Mean number of hydrogen bonds in stem during production runs.

Tails Mean Hydrogen Bonds In 1T3bp Tweezer Hairpin Stem Original Sequence No 3 Original Sequence Yes 5 5HJs-iso-I No 2 5HJs-iso-I Yes 7

B.4.4.3 Calculation of base pair structural parameters

The inter-bp structural step parameters were calculated using 3DNA (71,72) via the x3DNA interface within the MDAnalysis toolkit for every simulated structure. Parameter values were calculated for each point of the minimization and equilibration trajectories to ensure that a correct initial structure was being used for production runs.

129

B.4.4.4 Planarity metric calculations (not included in original manuscript)

In order to compare the planarities of the different tweezer systems, a metric is introduced that describes the planarity of a tweezer construct in terms of displacements of the tweezer arms in a two-dimensional plane, where the long axis of the tweezer construct is the normal vector. The locations of the phosphate atoms for each junction arm (duplexes containing bases A1-32, B28- 59, and their Watson-Crick complements comprise arm 1; those containing strands C1-32, D1- 32, and their Watson-Crick complements comprise arm 2) are used to create point clouds for the arms, where the base numbering excludes poly-T bases when present. The largest variance principle component eigenvectors of these point clouds form the junction arm vectors, onto which the locations of the O5’ atoms of bases A1, B59, C1, and D32 are projected. These points are in turn projected onto the 푥푧 plane of a coordinate system where the 푦-axis runs along the construct’s duplexes and the 푥-axis passes through all duplexes. These projected points are calculated for every frame in the dynamics trajectory and their mean values are used to define two deformation values. X and Z deformations are calculated as the sums of the absolute deviations for the arm mean positions from their ground-state tweezer model (solved by CanDo) counterparts, along the 푥 and 푧 axes respectively. The values of these deformations are provided in Table B9.

Table B9: Planarity values for all systems.

System X Deformation [nm] Z Deformation [nm] 3T3 3.4 4.0 0T0 1.5 3.7 1T0 0.3 2.8 1T3-TF 5.3 2.9 1T3-Tails 2.0 3.0 1T3-Redesign-Iso1-TF 6.8 2.3 1T3-Redesign-Iso1-Tails 2.0 2.2 1T3-Redesign-Iso2-Tails 5.8 2.8

130

B.5 BULK MEASUREMENT OF TWEEZER-SCAFFOLDED G6PDH ACTIVITY

Figure B25. Raw activity data of opened- and closed-state tweezers with the G6pDH/NAD+ pair. Assay conditions: 100 nM G6pDH-NAD+ assembly, 1 mM G6p/PMS, 500 µM resazurin in 1 × TBS buffer (pH 7.5). A corresponding native gel characterization of the tweezers is shown in Figure B4. The relative fold enhancement is summarized in Table B10.

Table B10. Bulk measurements of enzyme activity of opened- and closed-state tweezers with the G6pDH/NAD+ pair. The relative fold enhancement in activity from opened- to closed-state was calculated by taking the ratio of the activity in closed tweezer (Activity―Closed) to open tweezer (Activity―Open). *1T4bp tweezer demonstrates the highest fold enhancement.

Tweezers Activity―Open Activity―Closed Fold Enhancement ±SD (n=3)

3T3bp 0.060 0.300 5.0 ± 1.0 1T3bp 0.048 0.257 5.4 ± 0.2 1T4bp 0.031 0.277 8.8 ± 0.3* 1T5bp 0.036 0.270 7.5 ± 0.2

131

Figure B26. Bulk measurements of enzyme activity of three design-specific tweezers. Raw data (a) and normalized activity (b) of open and closed 3T3bp/ 1T4bp/ 1T4bp-5HJs-iso-I tweezers with the G6pDH/ NAD+ pairs. Activity is normalized with respect to original 3T3bp open tweezers. The disturbance in the fluorescent signal in (a) during the first 500 s may be due to instability of the SoftMax plate reader after inserting the sample plate. This region was excluded for calculating the initial . Assay conditions: 100 nM G6pDH-NAD+ assembly, 1 mM G6p/PMS, 500 µM resazurin in 1 × TBS buffer (pH 7.5). The corresponding native gel characterization of the tweezers is shown in Figure B5.

132

B.6 Single molecule measurement of tweezer-scaffolded G6pDH activity

Table B11. Reagents used for the single molecule enzyme assay of enzyme/cofactor coupled tweezers. *The concentration of resazurin was optimized to maintain a tolerable background for a reliable TIRFM measurement of enzyme turnover.

Solution Concentration 10× TBS, pH 7.5 1× Resazurin 50 nM* Glucose-6-phosphate (G6p) 1 mM Phenazine Methosulfate (PMS) 12.5 µM 2+ Mg (MgCl2) 1 mM PEG 8000 10% (w/v)

133

Figure B27. Experimental workflow of the single molecule enzyme assay. Fluorescent beads (indicated by arrows) were used as fiduciary markers to correct for microscope slide/stage drift during the course of experiment. Enzyme activity of the G6pDH scaffolded tweezers (circles) were recorded in the same field of view after photobleaching the fluorophores. The enzyme activity movie (Movie 2) was registered with the initial movie (Movie 1) and the fluorescence-time traces were analyzed for inter-spike intervals (ISIs) at each tweezer location (bottom panel) after background correction (dotted line ≈ mean intensity+8×standard deviation, SD) using spike train analysis.

134

Figure B28. Monitoring individual enzyme behavior at different concentrations of resazurin. (a) Fraction of molecules showing fluorescent spikes over the concentration range of 0.5-100 nM resazurin. As expected, a higher number of enzymes crosses our empirical intensity threshold so that the fraction of active enzymes increases with increasing concentration of resazurin (see Experimental). A decrease in ISI may also be expected at increasing concentrations of resorufin, however, the comparison of ISIs was not statistically reliable in this experiment as only few molecules were found to be active at low resazurin concentrations. (b) Spike intensity for the concentration range studied. The average intensity remains constant (within error) over the concentration range studied, suggesting that each spike represents a single catalytic turnover. The concentration of resazurin (50 nM) used for imaging single enzyme activities in this study is highlighted by an asterisk (*).

135

Figure B29. Spike train analysis. (a) Representative fluorescence-time traces of three tweezer designs: 3T3bp, 1T4bp, and 1T4bp-5HJs-iso-I. The intensity-time traces (black) of tweezer-scaffolded G6pDH/NAD+ before and after background correction and Hidden Markov Model (HMM) idealization to a two-state model (red) are shown. The fluorescence intensity of enzyme reaction on the microscope slide was recorded for ~5 min at 35 ms time resolution. (b) Comparison of average spikes per molecule, where ‘n’ depicts the number of molecules analyzed (molecules with spikes above background, Figure B27). The tweezer with redesigned HJs showed the highest spiking rate. All experiments were carried out at room temperature in 1× TBS buffer, pH 7.5 (Table B11).

136

Figure B30. Analysis of the long ISI (τ2). According to our analysis, τ2 reflects a less active conformational sub-state of the enzyme G6pDH. τ2 values were obtained from the double-exponential fitting of the ISI data in Figure 3.4e. While the average ISI values (gray, left axis) are only slightly shorter in the presence of substrate, the fraction of molecules showing spikes above background (black, right axis) is significantly higher in the presence of substrate compared to the negative control ((-)G6p) experiment.

137

138

APPENDIX C

139

140

141

142

143

144

145

146

APPENDIX D

147

148

149

150

151

152

153

154

155

156

157

158

159

160

BIBLIOGRAPHY

1. Seeman, N.C. and Kallenbach, N.R. (1982) OPTIMAL-DESIGN OF IMMOBILE AND SEMI-MOBILE NUCLEIC-ACID JUNCTIONS. Biophysical Journal, 37, A93-A93. 2. Rothemund, P.W.K. (2006) Folding DNA to create nanoscale shapes and patterns. Nature, 440, 297-302. 3. Douglas, S.M., Dietz, H., Liedl, T., Hoegberg, B., Graf, F. and Shih, W.M. (2009) Self-assembly of DNA into nanoscale three-dimensional shapes. Nature, 459, 414-418. 4. Dietz, H., Douglas, S.M. and Shih, W.M. (2009) Folding DNA into Twisted and Curved Nanoscale Shapes. Science, 325, 725-730. 5. Han, D., Pal, S., Nangreave, J., Deng, Z., Liu, Y. and Yan, H. (2011) DNA Origami with Complex Curvatures in Three-Dimensional Space. Science, 332, 342-346. 6. Zhang, F., Jiang, S., Wu, S., Li, Y., Mao, C., Liu, Y. and Yan, H. (2015) Complex wireframe DNA origami nanostructures with multi-arm junction vertices. Nature nanotechnology, 10, 779-784. 7. Geary, C., Rothemund, P.W.K. and Andersen, E.S. (2014) A single-stranded architecture for cotranscriptional folding of RNA nanostructures. Science, 345, 799-804. 8. Delebecque, C.J., Lindner, A.B., Silver, P.A. and Aldaye, F.A. (2011) Organization of Intracellular Reactions with Rationally Designed RNA Assemblies. Science, 333, 470-474. 9. Wei, R., Martin, T.G., Rant, U. and Dietz, H. (2012) DNA Origami Gatekeepers for Solid-State Nanopores. Angewandte Chemie-International Edition, 51, 4864-4867. 10. Dutta, P.K., Varghese, R., Nangreave, J., Lin, S., Yan, H. and Liu, Y. (2011) DNA-Directed Artificial Light-Harvesting Antenna. Journal of the American Chemical Society, 133, 11985-11993. 11. Pan, K., Boulais, E., Yang, L. and Bathe, M. (2014) Structure-based model for light-harvesting properties of nucleic acid nanostructures. Nucleic Acids Research, 42, 2159-2170. 12. Mohri, K., Nishikawa, M., Takahashi, N., Shiomi, T., Matsuoka, N., Ogawa, K., Endo, M., Hidaka, K., Sugiyama, H., Takahashi, Y. et al. (2012) Design and Development of Nanosized DNA Assemblies in Polypod-like Structures as Efficient Vehicles for Immunostimulatory CpG Motifs to Immune Cells. Acs Nano, 6, 5931-5940. 13. Davis, M.E., Chen, Z. and Shin, D.M. (2008) therapeutics: an emerging treatment modality for cancer. Nature Reviews Drug Discovery, 7, 771-782. 14. Liu, M., Fu, J., Hejesen, C., Yang, Y., Woodbury, N.W., Gothelf, K., Liu, Y. and Yan, H. (2013) A DNA tweezer-actuated enzyme nanoreactor. Nature Communications, 4. 15. Holliday, R. (1964) A mechanism for in fungi. Genetics Research, 5, 282-304.

16. Seeman, N.C. and Kallenbach, N.R. (1983) DESIGN OF IMMOBILE NUCLEIC-ACID JUNCTIONS. Biophysical Journal, 44, 201-209. 17. Gough, G.W. and Lilley, D.M.J. (1985) DNA BENDING INDUCED BY CRUCIFORM FORMATION. Nature, 313, 154-156. 18. Lilley, D.M.J. (2000) Structures of helical junctions in nucleic acids. Quarterly Reviews of Biophysics, 33, 109-159. 19. Broker, T.R. and Lehman, I.R. (1971) BRANCHED DNA MOLECULES - INTERMEDIATES IN T4 RECOMBINATION. Journal of , 60, 131-&. 20. Hoess, R., Wierzbicki, A. and Abremski, K. (1987) ISOLATION AND CHARACTERIZATION OF INTERMEDIATES IN SITE-SPECIFIC RECOMBINATION. Proceedings of the National Academy of Sciences of the United States of America, 84, 6840-6844. 21. Castro, C.E., Kilchherr, F., Kim, D.-N., Shiao, E.L., Wauer, T., Wortmann, P., Bathe, M. and Dietz, H. (2011) A primer to scaffolded DNA origami. Nature Methods, 8, 221-229.

161

22. Pan, K., Kim, D.-N., Zhang, F., Adendorff, M.R., Yan, H. and Bathe, M. (2014) Lattice-free prediction of three-dimensional structure of programmed DNA assemblies. Nature Communications, 5. 23. Chen, J.H. and Seeman, N.C. (1991) SYNTHESIS FROM DNA OF A MOLECULE WITH THE CONNECTIVITY OF A CUBE. Nature, 350, 631-633. 24. Fu, T.J. and Seeman, N.C. (1993) DNA DOUBLE-CROSSOVER MOLECULES. , 32, 3211- 3220. 25. Winfree, E., Liu, F.R., Wenzler, L.A. and Seeman, N.C. (1998) Design and self-assembly of two- dimensional DNA crystals. Nature, 394, 539-544. 26. Shih, W.M., Quispe, J.D. and Joyce, G.F. (2004) A 1.7-kilobase single-stranded DNA that folds into a nanoscale octahedron. Nature, 427, 618-621. 27. Zheng, J., Birktoft, J.J., Chen, Y., Wang, T., Sha, R., Constantinou, P.E., Ginell, S.L., Mao, C. and Seeman, N.C. (2009) From molecular to macroscopic via the rational design of a self-assembled 3D DNA crystal. Nature, 461, 74-77. 28. Ke, Y., Sharma, J., Liu, M., Jahn, K., Liu, Y. and Yan, H. (2009) Scaffolded DNA Origami of a DNA Molecular Container. Nano Letters, 9, 2445-2447. 29. Kuzuya, A. and Komiyama, M. (2009) Design and construction of a box-shaped 3D-DNA origami. Chemical Communications, 4182-4184. 30. Andersen, E.S., Dong, M., Nielsen, M.M., Jahn, K., Subramani, R., Mamdouh, W., Golas, M.M., Sander, B., Stark, H., Oliveira, C.L.P. et al. (2009) Self-assembly of a nanoscale DNA box with a controllable lid. Nature, 459, 73-U75. 31. Ke, Y., Douglas, S.M., Liu, M., Sharma, J., Cheng, A., Leung, A., Liu, Y., Shih, W.M. and Yan, H. (2009) Multilayer DNA Origami Packed on a Square Lattice. Journal of the American Chemical Society, 131, 15903-15908. 32. Woo, S. and Rothemund, P.W.K. (2011) Programmable molecular recognition based on the geometry of DNA nanostructures. Nature Chemistry, 3, 620-627. 33. Lu, M., Guo, Q., Seeman, N.C. and Kallenbach, N.R. (1991) PARALLEL AND ANTIPARALLEL HOLLIDAY JUNCTIONS DIFFER IN STRUCTURE AND STABILITY. Journal of Molecular Biology, 221, 1419-1432. 34. Li, X.J., Yang, X.P., Qi, J. and Seeman, N.C. (1996) Antiparallel DNA double crossover molecules as components for nanoconstruction. Journal of the American Chemical Society, 118, 6131-6140. 35. Wei, B., Dai, M. and Yin, P. (2012) Complex shapes self-assembled from single-stranded DNA tiles. Nature, 485, 623-+. 36. Ke, Y., Ong, L.L., Sun, W., Song, J., Dong, M., Shih, W.M. and Yin, P. (2014) DNA brick crystals with prescribed depths. Nature Chemistry, 6, 994-1002. 37. Shi, X., Lu, W., Wang, Z., Pan, L., Cui, G., Xu, J. and LaBean, T.H. (2014) Programmable DNA tile self-assembly using a hierarchical sub-tile strategy. Nanotechnology, 25. 38. Gerling, T., Wagenbauer, K.F., Neuner, A.M. and Dietz, H. (2015) Dynamic DNA devices and assemblies formed by shape-complementary, non-base pairing 3D components. Science, 347, 1446-1452. 39. Birac, J.J., Sherman, W.B., Kopatsch, J., Constantinou, P.E. and Seeman, N.C. (2006) Architecture with GIDEON, a program for design in structural DNA nanotechnology. Journal of Molecular Graphics & Modelling, 25, 470-480. 40. Andersen, E.S., Dong, M., Nielsen, M.M., Jahn, K., Lind-Thomsen, A., Mamdouh, W., Gothelf, K.V., Besenbacher, F. and Kjems, J. (2008) DNA origami design of dolphin-shaped structures with flexible tails. Acs Nano, 2, 1213-1218.

162

41. Williams, S., Lund, K., Lin, C., Wonka, P., Lindsay, S. and Yan, H. (2009) In Goel, A., Simmel, F. C. and Sos’ık, P. (eds.), DNA Computing. Springer Berlin Heidelberg, Berlin, Heidelberg, Vol. Vol. 5347, pp. 90-101. 42. Douglas, S.M., Marblestone, A.H., Teerapittayanon, S., Vazquez, A., Church, G.M. and Shih, W.M. (2009) Rapid prototyping of 3D DNA-origami shapes with caDNAno. Nucleic Acids Research, 37, 5001-5006. 43. Kim, D.-N., Kilchherr, F., Dietz, H. and Bathe, M. (2012) Quantitative prediction of 3D solution shape and flexibility of nucleic acid nanostructures. Nucleic Acids Research, 40, 2862-2868. 44. Peters, J.P. and Maher, L.J., III. (2010) DNA curvature and flexibility in vitro and in vivo. Quarterly Reviews of Biophysics, 43, 23-63. 45. Pinheiro, A.V., Han, D., Shih, W.M. and Yan, H. (2011) Challenges and opportunities for structural DNA nanotechnology. Nature Nanotechnology, 6, 763-772. 46. Zhang, F., Nangreave, J., Liu, Y. and Yan, H. (2014) Structural DNA Nanotechnology: State of the Art and Future Perspective. Journal of the American Chemical Society, 136, 11198-11211. 47. Chen, Y.-J., Groves, B., Muscat, R.A. and Seelig, G. (2015) DNA nanotechnology from the test tube to the cell. Nature Nanotechnology, 10, 748-760. 48. Han, D., Pal, S., Yang, Y., Jiang, S., Nangreave, J., Liu, Y. and Yan, H. (2013) DNA Gridiron Nanostructures Based on Four-Arm Junctions. Science, 339, 1412-1415. 49. Qiu, H.X., Dewan, J.C. and Seeman, N.C. (1997) A DNA decamer with a sticky end: The of d-CGACGATCGT. Journal of Molecular Biology, 267, 881-898. 50. Greene, D.G., Keum, J.-W. and Bermudez, H. (2012) The Role of Defects on the Assembly and Stability of DNA Nanostructures. Small, 8, 1320-1325. 51. Martin, T.G. and Dietz, H. (2012) Magnesium-free self-assembly of multi-layer DNA objects. Nature Communications, 3. 52. Sobczak, J.-P.J., Martin, T.G., Gerling, T. and Dietz, H. (2012) Rapid Folding of DNA into Nanoscale Shapes at Constant Temperature. Science, 338, 1458-1461. 53. Altona, C. (1996) Classification of nucleic acid junctions. Journal of Molecular Biology, 263, 568- 581. 54. Lilley, D.M.J. (2009) FOUR-WAY HELICAL JUNCTIONS IN DNA MOLECULES. Mathematics of DNA Structure, Function and Interactions, 150, 213-224. 55. Lilley, D.M.J. and Clegg, R.M. (1993) THE STRUCTURE OF THE 4-WAY JUNCTION IN DNA. Annual Review of Biophysics and Biomolecular Structure, 22, 299-328. 56. Duckett, D.R., Murchie, A.I.H., Diekmann, S., Vonkitzing, E., Kemper, B. and Lilley, D.M.J. (1988) THE STRUCTURE OF THE HOLLIDAY JUNCTION, AND ITS RESOLUTION. Cell, 55, 79-89. 57. Churchill, M.E.A., Tullius, T.D., Kallenbach, N.R. and Seeman, N.C. (1988) A HOLLIDAY RECOMBINATION INTERMEDIATE IS TWOFOLD SYMMETRIC. Proceedings of the National Academy of Sciences of the United States of America, 85, 4653-4656. 58. Cooper, J.P. and Hagerman, P.J. (1989) GEOMETRY OF A BRANCHED DNA-STRUCTURE IN SOLUTION. Proceedings of the National Academy of Sciences of the United States of America, 86, 7336-7340. 59. Clegg, R.M., Murchie, A.I.H. and Lilley, D.M.J. (1994) THE SOLUTION STRUCTURE OF THE 4-WAY DNA JUNCTION AT LOW-SALT CONDITIONS - A FLUORESCENCE RESONANCE ENERGY-TRANSFER ANALYSIS. Biophysical Journal, 66, 99-109. 60. Hyeon, C., Lee, J., Yoon, J., Hohng, S. and Thirumalai, D. (2012) Hidden complexity in the isomerization dynamics of Holliday junctions. Nature Chemistry, 4, 907-914. 61. McKinney, S.A., Declais, A.C., Lilley, D.M.J. and Ha, T. (2003) Structural dynamics of individual Holliday junctions. Nature , 10, 93-97.

163

62. Miick, S.M., Fee, R.S., Millar, D.P. and Chazin, W.J. (1997) Crossover isomer bias is the primary sequence-dependent property of immobilized Holliday junctions. Proceedings of the National Academy of Sciences of the United States of America, 94, 9080-9084. 63. Carlstrom, G. and Chazin, W.J. (1996) Sequence dependence and direct measurement of crossover isomer distribution in model holliday junctions using NMR spectroscopy. Biochemistry, 35, 3534-3544. 64. Grainger, R.J., Murchie, A.I.H. and Lilley, D.M.J. (1998) Exchange between stacking conformers in a four-way DNA junction. Biochemistry, 37, 23-32. 65. Overmars, F.J.J. and Altona, C. (1997) NMR study of the exchange rate between two stacked conformers of a model holliday junction. Journal of Molecular Biology, 273, 519-524. 66. Protozanova, E., Yakovchuk, P. and Frank-Kamenetskii, M.D. (2004) Stacked-unstacked equilibrium at the nick site of DNA. Journal of Molecular Biology, 342, 775-785. 67. Yakovchuk, P., Protozanova, E. and Frank-Kamenetskii, M.D. (2006) Base-stacking and base- pairing contributions into thermal stability of the DNA double helix. Nucleic Acids Research, 34, 564-574. 68. Duckett, D.R., Murchie, A.I.H. and Lilley, D.M.J. (1990) THE ROLE OF METAL-IONS IN THE CONFORMATION OF THE 4-WAY DNA JUNCTION. Embo Journal, 9, 583-590. 69. Maffeo, C., Yoo, J., Comer, J., Wells, D.B., Luan, B. and Aksimentiev, A. (2014) Close encounters with DNA. Journal of -Condensed Matter, 26. 70. Olson, W.K., Gorin, A.A., Lu, X.J., Hock, L.M. and Zhurkin, V.B. (1998) DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. Proceedings of the National Academy of Sciences of the United States of America, 95, 11163-11168. 71. Lu, X.J. and Olson, W.K. (2003) 3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures. Nucleic Acids Research, 31, 5108- 5121. 72. Lu, X.-J. and Olson, W.K. (2008) 3DNA: a versatile, integrated software system for the analysis, rebuilding and visualization of three-dimensional nucleic-acid structures. Nature Protocols, 3, 1213-1227. 73. Lavery, R., Moakher, M., Maddocks, J.H., Petkeviciute, D. and Zakrzewska, K. (2009) Conformational analysis of nucleic acids revisited: Curves. Nucleic Acids Research, 37, 5917- 5929. 74. Meyer, S., Jost, D., Theodorakopoulos, N., Peyrard, M., Lavery, R. and Everaers, R. (2013) Temperature Dependence of the DNA Double Helix at the Nanoscale: Structure, Elasticity, and Fluctuations. Biophysical Journal, 105, 1904-1914. 75. Perez, A., Lankas, F., Luque, F.J. and Orozco, M. (2008) Towards a molecular dynamics consensus view of B-DNA flexibility. Nucleic Acids Research, 36, 2379-2394. 76. Pasi, M., Maddocks, J.H., Beveridge, D., Bishop, T.C., Case, D.A., Cheatham, T., III, Dans, P.D., Jayaram, B., Lankas, F., Laughton, C. et al. (2014) mu ABC: a systematic microsecond molecular dynamics study of tetranucleotide sequence effects in B-DNA. Nucleic Acids Research, 42, 12272-12283. 77. Beveridge, D.L., Barreiro, G., Byun, K.S., Case, D.A., Cheatham, T.E., Dixit, S.B., Giudice, E., Lankas, F., Lavery, R., Maddocks, J.H. et al. (2004) Molecular dynamics simulations of the 136 unique tetranucleotide sequences of DNA oligonucleotides. I. Research design and results on d(C(p)G) steps. Biophysical Journal, 87, 3799-3813. 78. Dixit, S.B., Beveridge, D.L., Case, D.A., Cheatham, T.E., Giudice, E., Lankas, F., Lavery, R., Maddocks, J.H., Osman, R., Sklenar, H. et al. (2005) Molecular dynamics simulations of the 136 unique tetranucleotide sequences of DNA oligonucleotides. II: Sequence context effects on the dynamical structures of the 10 unique dinucleotide steps. Biophysical Journal, 89, 3721-3740.

164

79. Srinivasan, J., Cheatham, T.E., Cieplak, P., Kollman, P.A. and Case, D.A. (1998) Continuum solvent studies of the stability of DNA, RNA, and phosphoramidate - DNA helices. Journal of the American Chemical Society, 120, 9401-9409. 80. Stofer, E., Chipot, C. and Lavery, R. (1999) Free energy calculations of Watson-Crick base pairing in aqueous solution. Journal of the American Chemical Society, 121, 9503-9508. 81. Florian, J., Goodman, M.F. and Warshel, A. (2000) Free-energy perturbation calculations of DNA destabilization by base substitutions: The effect of neutral thymine, adenine and adenine difluorotoluene mismatches. Journal of Physical Chemistry B, 104, 10092-10099. 82. Yoo, J. and Aksimentiev, A. (2013) In situ structure and dynamics of DNA origami determined through molecular dynamics simulations. Proceedings of the National Academy of Sciences of the United States of America, 110, 20099-20104. 83. Yu, J., Ha, T.J. and Schulten, K. (2004) Conformational model of the Holliday junction transition deduced from molecular dynamics simulations. Nucleic Acids Research, 32, 6683-6695. 84. Wheatley, E.G., Pieniazek, S.N., Mukerji, I. and Beveridge, D.L. (2012) Molecular Dynamics of a DNA Holliday Junction: The Inverted Repeat Sequence d(CCGGTACCGG)(4). Biophysical Journal, 102, 552-560. 85. Varnai, P. and Timsit, Y. (2010) Differential stability of DNA crossovers in solution mediated by divalent cations. Nucleic Acids Research, 38, 4163-4172. 86. Wheatley, E.G., Pieniazek, S.N., Vitoc, I., Mukerji, I. and Beveridge, D.L. (2012) Molecular Dynamics Structure Prediction of a Novel Protein-DNA Complex: Two HU Proteins with a DNA Four-way Junction. Innovations in Biomolecular Modeling and Simulations, Vol 2, 111-128. 87. Watson, J.D. and Crick, F.H.C. (1953) MOLECULAR STRUCTURE OF NUCLEIC ACIDS - A STRUCTURE FOR DEOXYRIBOSE NUCLEIC ACID. Nature, 171, 737-738. 88. Kuhn, W. (1936) Relationship between molecular size, static molecular shape and elastic characteristics of high polymer materials. Kolloid-Zeitschrift, 76, 258-271. 89. Kramers, H.A. (1946) THE BEHAVIOR OF MACROMOLECULES IN INHOMOGENEOUS FLOW. Journal of Chemical Physics, 14, 415-424. 90. Kratky, O. and Porod, G. (1949) DIFFUSE SMALL-ANGLE SCATTERING OF X-RAYS IN COLLOID SYSTEMS. Journal of Colloid Science, 4, 35-70. 91. Doty, P. (1957) The physical chemistry of deoxyribonucleic acids. Journal of cellular physiology. Supplement, 49, 27-57. 92. de Gennes, P.G. (1990) Introduction to Polymer Dynamics. Cornell University Press, Ithaca. 93. Fredrickson, G. (2007) The Equilibrium Theory of Inhomogeneous . Oxford University Press, New York. 94. Chen, H., Meisburger, S.P., Pabit, S.A., Sutton, J.L., Webb, W.W. and Pollack, L. (2012) Ionic strength-dependent persistence lengths of single-stranded RNA and DNA. Proceedings of the National Academy of Sciences of the United States of America, 109, 799-804. 95. Bustamante, C., Smith, S.B., Liphardt, J. and Smith, D. (2000) Single-molecule studies of DNA mechanics. Current Opinion in Structural Biology, 10, 279-285. 96. Yuan, C., Chen, H., Lou, X.W. and Archer, L.A. (2008) DNA bending stiffness on small length scales. Physical Review Letters, 100. 97. Mastroianni, A.J., Sivak, D.A., Geissler, P.L. and Alivisatos, A.P. (2009) Probing the Conformational Distributions of Subpersistence Length DNA. Biophysical Journal, 97, 1408-1417. 98. Mazur, A.K. (2007) Wormlike chain theory and bending of short DNA. Physical Review Letters, 98. 99. Liedl, T., Hogberg, B., Tytell, J., Ingber, D.E. and Shih, W.M. (2010) Self-assembly of three- dimensional prestressed tensegrity structures from DNA. Nature Nanotechnology, 5, 520-524.

165

100. Schiffels, D., Liedl, T. and Fygenson, D.K. (2013) Nanoscale Structure and Microscale Stiffness of DNA Nanotubes. Acs Nano, 7, 6700-6710. 101. Gibbs, J.H. and Dimarzio, E.A. (1959) STATISTICAL MECHANICS OF HELIX-COIL TRANSITIONS IN BIOLOGICAL MACROMOLECULES. Journal of Chemical Physics, 30, 271-282. 102. Zimm, B.H. (1960) THEORY OF MELTING OF THE HELICAL FORM IN DOUBLE CHAINS OF THE DNA TYPE. Journal of Chemical Physics, 33, 1349-1356. 103. Hays, J.B. and Zimm, B.H. (1970) FLEXIBILITY AND STIFFNESS IN NICKED DNA. Journal of Molecular Biology, 48, 297-&. 104. Hill, T.L. (1959) GENERALIZATION OF THE ONE-DIMENSIONAL ISING MODEL APPLICABLE TO HELIX TRANSITIONS IN NUCLEIC ACIDS AND PROTEINS. Journal of Chemical Physics, 30, 383-387. 105. Zhang, D.Y. and Seelig, G. (2011) Dynamic DNA nanotechnology using strand-displacement reactions. Nature Chemistry, 3, 103-113. 106. Bath, J. and Turberfield, A.J. (2007) DNA nanomachines. Nature Nanotechnology, 2, 275-284. 107. Yurke, B., Turberfield, A.J., Mills, A.P., Jr., Simmel, F.C. and Neumann, J.L. (2000) A DNA-fuelled made of DNA. Nature, 406, 605-608. 108. Yan, H., Zhang, X.P., Shen, Z.Y. and Seeman, N.C. (2002) A robust DNA mechanical device controlled by hybridization topology. Nature, 415, 62-65. 109. Seelig, G., Soloveichik, D., Zhang, D.Y. and Winfree, E. (2006) Enzyme-free nucleic acid logic circuits. Science, 314, 1585-1588. 110. Zhang, D.Y., Turberfield, A.J., Yurke, B. and Winfree, E. (2007) Engineering entropy-driven reactions and networks catalyzed by DNA. Science, 318, 1121-1125. 111. Yin, P., Choi, H.M.T., Calvert, C.R. and Pierce, N.A. (2008) Programming biomolecular self- assembly pathways. Nature, 451, 318-U314. 112. Qian, L. and Winfree, E. (2011) Scaling Up Digital Circuit Computation with DNA Strand Displacement Cascades. Science, 332, 1196-1201. 113. Dirks, R.M. and Pierce, N.A. (2004) Triggered amplification by hybridization chain reaction. Proceedings of the National Academy of Sciences of the United States of America, 101, 15275- 15278. 114. Venkataraman, S., Dirks, R.M., Rothemund, P.W.K., Winfree, E. and Pierce, N.A. (2007) An autonomous polymerization motor powered by DNA hybridization. Nature Nanotechnology, 2, 490-494. 115. Sherman, W.B. and Seeman, N.C. (2004) A precisely controlled DNA biped walking device. Nano Letters, 4, 1203-1207. 116. Shin, J.S. and Pierce, N.A. (2004) A synthetic DNA walker for molecular transport. Journal of the American Chemical Society, 126, 10834-10835. 117. Yin, P., Yan, H., Daniell, X.G., Turberfield, A.J. and Reif, J.H. (2004) A unidirectional DNA walker that moves autonomously along a track. Angewandte Chemie-International Edition, 43, 4906- 4911. 118. Omabegho, T., Sha, R. and Seeman, N.C. (2009) A Bipedal DNA Brownian Motor with Coordinated Legs. Science, 324, 67-71. 119. Zuker, M. (2000) Calculating nucleic acid secondary structure. Current Opinion in Structural Biology, 10, 303-310. 120. Zuker, M. (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Research, 31, 3406-3415. 121. Dirks, R.M., Bois, J.S., Schaeffer, J.M., Winfree, E. and Pierce, N.A. (2007) Thermodynamic analysis of interacting nucleic acid strands. Siam Review, 49, 65-88. 122. SantaLucia, J. and Hicks, D. (2004) The thermodynamics of DNA structural motifs. Annual Review of Biophysics and Biomolecular Structure, 33, 415-440.

166

123. Gotoh, O. and Tagashira, Y. (1981) STABILITIES OF NEAREST-NEIGHBOR DOUBLETS IN DOUBLE- HELICAL DNA DETERMINED BY FITTING CALCULATED MELTING PROFILES TO OBSERVED PROFILES. Biopolymers, 20, 1033-1042. 124. Vologodskii, A.V., Amirikyan, B.R., Lyubchenko, Y.L. and Frankkamenetskii, M.D. (1984) ALLOWANCE FOR HETEROGENEOUS STACKING IN THE DNA HELIX-COIL TRANSITION THEORY. Journal of Biomolecular Structure & Dynamics, 2, 131-148. 125. Breslauer, K.J., Frank, R., Blocker, H. and Marky, L.A. (1986) PREDICTING DNA DUPLEX STABILITY FROM THE BASE SEQUENCE. Proceedings of the National Academy of Sciences of the United States of America, 83, 3746-3750. 126. Blake, R.D. and Delcourt, S.G. (1998) Thermal stability of DNA. Nucleic Acids Research, 26, 3323- 3332. 127. SantaLucia, J., Allawi, H.T. and Seneviratne, A. (1996) Improved nearest-neighbor parameters for predicting DNA duplex stability. Biochemistry, 35, 3555-3562. 128. Sugimoto, N., Nakano, S., Yoneyama, M. and Honda, K. (1996) Improved thermodynamic parameters and helix initiation factor to predict stability of DNA duplexes. Nucleic Acids Research, 24, 4501-4505. 129. Doktycz, M.J., Goldstein, R.F., Paner, T.M., Gallo, F.J. and Benight, A.S. (1992) STUDIES OF DNA DUMBBELLS .1. MELTING CURVES OF 17 DNA DUMBBELLS WITH DIFFERENT DUPLEX STEM SEQUENCES LINKED BY T4 ENDLOOPS - EVALUATION OF THE NEAREST-NEIGHBOR STACKING INTERACTIONS IN DNA. Biopolymers, 32, 849-864. 130. Krueger, A., Protozanova, E. and Frank-Kamenetskii, M.D. (2006) Sequence-dependent basepair opening in DNA double helix. Biophysical Journal, 90, 3091-3099. 131. Perez, A., Javier Luque, F. and Orozco, M. (2012) Frontiers in Molecular Dynamics Simulations of DNA. Accounts of Chemical Research, 45, 196-205. 132. Cheatham, T.E., III and Case, D.A. (2013) Twenty-Five Years of Nucleic Acid Simulations. Biopolymers, 99, 969-977. 133. Seibel, G.L., Singh, U.C. and Kollman, P.A. (1985) A MOLECULAR-DYNAMICS SIMULATION OF DOUBLE-HELICAL B-DNA INCLUDING COUNTERIONS AND WATER. Proceedings of the National Academy of Sciences of the United States of America, 82, 6537-6540. 134. Weiner, S.J., Kollman, P.A., Case, D.A., Singh, U.C., Ghio, C., Alagona, G., Profeta, S. and Weiner, P. (1984) A NEW FORCE-FIELD FOR MOLECULAR MECHANICAL SIMULATION OF NUCLEIC-ACIDS AND PROTEINS. Journal of the American Chemical Society, 106, 765-784. 135. Levitt, M. (1983) Computer simulation of DNA double-helix dynamics. Cold Spring Harbor symposia on quantitative biology, 47 Pt 1, 251-262. 136. Tidor, B., Irikura, K.K., Brooks, B.R. and Karplus, M. (1983) DYNAMICS OF DNA OLIGOMERS. Journal of Biomolecular Structure & Dynamics, 1, 231-252. 137. Jayaram, B. and Beveridge, D.L. (1996) Modeling DNA in aqueous solutions: Theoretical and computer simulation studies on the ion atmosphere of DNA. Annual Review of Biophysics and Biomolecular Structure, 25, 367-394. 138. Cheatham, T.E., Miller, J.L., Fox, T., Darden, T.A. and Kollman, P.A. (1995) MOLECULAR- DYNAMICS SIMULATIONS ON SOLVATED BIOMOLECULAR SYSTEMS - THE PARTICLE MESH EWALD METHOD LEADS TO STABLE TRAJECTORIES OF DNA, RNA, AND PROTEINS. Journal of the American Chemical Society, 117, 4193-4194. 139. Cheatham, T.E. and Kollman, P.A. (2000) Molecular dynamics simulation of nucleic acids. Annual Review of Physical Chemistry, 51, 435-471. 140. Dongarra, J., Walker, D., Lusk, E., Knighten, B., Snir, M., Geist, A., Otto, S., Hempel, R., Gropp, W., Cownie, J. et al. (1994) SPECIAL ISSUE - MPI - A MESSAGE-PASSING INTERFACE STANDARD.

167

International Journal of Supercomputer Applications and High Performance Computing, 8, 165- &. 141. Cheatham, T.E. and Kollman, P.A. (1996) Observation of the A-DNA to B-DNA transition during unrestrained molecular dynamics in aqueous solution. Journal of Molecular Biology, 259, 434- 444. 142. Young, M.A., Ravishanker, G. and Beveridge, D.L. (1997) A 5-nanosecond molecular dynamics trajectory for B-DNA: Analysis of structure, motions, and solvation. Biophysical Journal, 73, 2313-2336. 143. Cheatham, T.E., 3rd and Young, M.A. (2000) Molecular dynamics simulation of nucleic acids: successes, limitations, and promise. Biopolymers, 56, 232-256. 144. Ulyanov, N.B. and James, T.L. (1995) Statistical analysis of DNA duplex structural features. Nuclear Magnetic Resonance and Nucleic Acids, 261, 90-120. 145. Zhurkin, V.B., Ulyanov, N.B., Gorin, A.A. and Jernigan, R.L. (1991) STATIC AND STATISTICAL BENDING OF DNA EVALUATED BY MONTE-CARLO SIMULATIONS. Proceedings of the National Academy of Sciences of the United States of America, 88, 7046-7050. 146. Poncin, M., Hartmann, B. and Lavery, R. (1992) CONFORMATIONAL SUB-STATES IN B-DNA. Journal of Molecular Biology, 226, 775-794. 147. Cheatham, T.E. and Kollman, P.A. (1997) Molecular dynamics simulations highlight the structural differences among DNA:DNA, RNA:RNA, and DNA:RNA hybrid duplexes. Journal of the American Chemical Society, 119, 4805-4825. 148. Gorin, A.A., Zhurkin, V.B. and Olson, W.K. (1995) B-DNA TWISTING CORRELATES WITH BASE-PAIR MORPHOLOGY. Journal of Molecular Biology, 247, 34-48. 149. Feig, M. and Pettitt, B.M. (1998) A molecular simulation picture of DNA hydration around A- and B-DNA. Biopolymers, 48, 199-209. 150. Shields, G.C., Laughton, C.A. and Orozco, M. (1997) Molecular dynamics simulations of the d(T center dot A center dot T) triple helix. Journal of the American Chemical Society, 119, 7463- 7469. 151. Kosztin, D., Gumport, R.I. and Schulten, K. (1999) Probing the role of structural water in a duplex oligodeoxyribonucleotide containing a water-mimicking base analog. Nucleic Acids Research, 27, 3550-3556. 152. Young, M.A., Jayaram, B. and Beveridge, D.L. (1997) Intrusion of counterions into the spine of hydration in the minor groove of B-DNA: Fractional occupancy of electronegative pockets. Journal of the American Chemical Society, 119, 59-69. 153. Feig, M. and Pettitt, B.M. (1999) Sodium and chlorine ions as part of the DNA solvation shell. Biophysical Journal, 77, 1769-1781. 154. Ponomarev, S.Y., Thayer, K.M. and Beveridge, D.L. (2004) Ion motions in molecular dynamics simulations on DNA. Proceedings of the National Academy of Sciences of the United States of America, 101, 14771-14775. 155. MacKerell, A.D. (1997) Influence of magnesium ions on duplex DNA structural, dynamic, and solvation properties. Journal of Physical Chemistry B, 101, 646-650. 156. Jayaram, B., Sprous, D., Young, M.A. and Beveridge, D.L. (1998) Free energy analysis of the conformational preferences of A and B forms of DNA in solution. Journal of the American Chemical Society, 120, 10629-10633. 157. Cheatham, T.E., Srinivasan, J., Case, D.A. and Kollman, P.A. (1998) Molecular dynamics and continuum solvent studies of the stability of polyG-polyC and polyA-polyT DNA duplexes in solution. Journal of Biomolecular Structure & Dynamics, 16, 265-280. 158. Kollman, P. (1993) FREE-ENERGY CALCULATIONS - APPLICATIONS TO CHEMICAL AND BIOCHEMICAL PHENOMENA. Chemical Reviews, 93, 2395-2417.

168

159. Norberg, J. and Nilsson, L. (1994) STACKING-UNSTACKING OF THE DINUCLEOSIDE MONOPHOSPHATE GUANYLYL-3',5'-URIDINE STUDIED WITH MOLECULAR-DYNAMICS. Biophysical Journal, 67, 812-824. 160. Norberg, J. and Nilsson, L. (1995) Potential of mean force calculations of the stacking unstacking process in single-stranded deoxyribodinucleoside monophosphates. Biophysical Journal, 69, 2277-2285. 161. Norberg, J. and Nilsson, L. (1996) Conformational free energy landscape of ApApA from molecular dynamics simulations. Journal of Physical Chemistry, 100, 2550-2554. 162. Norberg, J. and Nilsson, L. (1996) Influence of adjacent bases on the stacking-unstacking process of single-stranded oligonucleotides. Biopolymers, 39, 765-768. 163. Norberg, J. and Nilsson, L. (1998) Solvent influence on base stacking. Biophysical Journal, 74, 394-402. 164. MacKerell, A.D. and Lee, G.U. (1999) Structure, force, and energy of a double-stranded DNA oligonucleotide under tensile loads. European Biophysics Journal with Biophysics Letters, 28, 415-426. 165. Galindo-Murillo, R., Roe, D.R. and Cheatham, T.E., III. (2014) On the absence of intrahelical DNA dynamics on the mu s to ms timescale. Nature Communications, 5. 166. Perez, A., Luque, F.J. and Orozco, M. (2007) Dynamics of B-DNA on the microsecond time scale. Journal of the American Chemical Society, 129, 14739-14745. 167. Beveridge, D.L., Cheatham, T.E., III and Mezei, M. (2012) The ABCs of molecular dynamics simulations on B-DNA, circa 2012. Journal of Biosciences, 37, 379-397. 168. Foloppe, N. and MacKerell, A.D. (2000) All-atom empirical force field for nucleic acids: I. Parameter optimization based on small molecule and condensed phase macromolecular target data. Journal of Computational Chemistry, 21, 86-104. 169. Cornell, W.D., Cieplak, P., Bayly, C.I., Gould, I.R., Merz, K.M., Ferguson, D.M., Spellmeyer, D.C., Fox, T., Caldwell, J.W. and Kollman, P.A. (1996) A second generation force field for the simulation of proteins, nucleic acids, and organic molecules (vol 117, pg 5179, 1995). Journal of the American Chemical Society, 118, 2309-2309. 170. Hart, K., Foloppe, N., Baker, C.M., Denning, E.J., Nilsson, L. and MacKerell, A.D., Jr. (2012) Optimization of the CHARMM Additive Force Field for DNA: Improved Treatment of the BI/BII Conformational Equilibrium. Journal of Chemical Theory and Computation, 8, 348-362. 171. Perez, A., Marchan, I., Svozil, D., Sponer, J., Cheatham, T.E., III, Laughton, C.A. and Orozco, M. (2007) Refinenement of the AMBER force field for nucleic acids: Improving the description of alpha/gamma conformers. Biophysical Journal, 92, 3817-3829. 172. Aqvist, J. (1990) ION WATER INTERACTION POTENTIALS DERIVED FROM FREE-ENERGY PERTURBATION SIMULATIONS. Journal of Physical Chemistry, 94, 8021-8024. 173. Beglov, D. and Roux, B. (1994) FINITE REPRESENTATION OF AN INFINITE BULK SYSTEM - SOLVENT BOUNDARY POTENTIAL FOR COMPUTER-SIMULATIONS. Journal of Chemical Physics, 100, 9050-9063. 174. Joung, I.S. and Cheatham, T.E., III. (2008) Determination of alkali and halide monovalent ion parameters for use in explicitly solvated biomolecular simulations. Journal of Physical Chemistry B, 112, 9020-9041. 175. Joung, I.S. and Cheatham, T.E., III. (2009) Molecular Dynamics Simulations of the Dynamic and Energetic Properties of Alkali and Halide Ions Using Water-Model-Specific Ion Parameters. Journal of Physical Chemistry B, 113, 13279-13290. 176. Luo, Y. and Roux, B. (2010) Simulation of Osmotic Pressure in Concentrated Aqueous Salt Solutions. Journal of Physical Chemistry Letters, 1, 183-189.

169

177. Rau, D.C., Lee, B. and Parsegian, V.A. (1984) MEASUREMENT OF THE REPULSIVE FORCE BETWEEN POLY-ELECTROLYTE MOLECULES IN IONIC SOLUTION - HYDRATION FORCES BETWEEN PARALLEL DNA DOUBLE HELICES. Proceedings of the National Academy of Sciences of the United States of America-Biological Sciences, 81, 2621-2625. 178. Yoo, J. and Aksimentiev, A. (2012) Improved Parametrization of Li+, Na+, K+, and Mg2+ Ions for All-Atom Molecular Dynamics Simulations of Nucleic Acid Systems. Journal of Physical Chemistry Letters, 3, 45-50. 179. Allnér, O., Nilsson, L. and Villa, A. (2012) Magnesium ion–water coordination and exchange in biomolecular simulations. J. Chem.Theory Comput., 8, 1493-1502. 180. Jabbari, H., Aminpour, M. and Montemagno, C. (2015) Computational Approaches to Nucleic Acid Origami. ACS combinatorial science, 17, 535-547. 181. Gopal, S.M., Mukherjee, S., Cheng, Y.-M. and Feig, M. (2010) PRIMO/PRIMONA: A coarse- grained model for proteins and nucleic acids that preserves near-atomistic accuracy. Proteins- Structure Function and Bioinformatics, 78, 1266-1281. 182. Rollins, G.C., Petrov, A.S. and Harvey, S.C. (2008) The role of DNA twist in the packaging of viral genomes. Biophysical Journal, 94, L38-L40. 183. Doye, J.P.K., Ouldridge, T.E., Louis, A.A., Romano, F., Sulc, P., Matek, C., Snodin, B.E.K., Rovigatti, L., Schreck, J.S., Harrison, R.M. et al. (2013) Coarse-graining DNA for simulations of DNA nanotechnology. Physical Chemistry Chemical Physics, 15, 20395-20414. 184. Saunders, M.G. and Voth, G.A. (2013) Coarse-Graining Methods for Computational Biology. Annual Review of Biophysics, Vol 42, 42, 73-93. 185. Sulc, P., Romano, F., Ouldridge, T.E., Rovigatti, L., Doye, J.P.K. and Louis, A.A. (2012) Sequence- dependent thermodynamics of a coarse-grained DNA model. Journal of Chemical Physics, 137. 186. Maffeo, C., Ngo, T.T.M., Ha, T. and Aksimentiev, A. (2014) A Coarse-Grained Model of Unstructured Single-Stranded DNA Derived from Atomistic Simulation and Single-Molecule Experiment. Journal of Chemical Theory and Computation, 10, 2891-2896. 187. Liu, P., Shi, Q., Daume, H., III and Voth, G.A. (2008) A Bayesian statistics approach to multiscale coarse graining. Journal of Chemical Physics, 129. 188. Wadati, M. and Tsuru, H. (1986) ELASTIC MODEL OF LOOPED DNA. Physica D, 21, 213-226. 189. Westcott, T.P., Tobias, I. and Olson, W.K. (1995) Elasticity theory and numerical analysis of DNA supercoiling: An application to DNA looping. Journal of Physical Chemistry, 99, 17926-17935. 190. Purohit, P.K. and Nelson, P.C. (2006) Effect of supercoiling on formation of protein-mediated DNA loops. Physical Review E, 74. 191. Balaeff, A., Mahadevan, L. and Schulten, K. (1999) Elastic rod model of a DNA loop in the lac operon. Physical Review Letters, 83, 4900-4903. 192. Balaeff, A., Koudella, C.R., Mahadevan, L. and Schulten, K. (2004) Modelling DNA loops using continuum and statistical mechanics. Philosophical Transactions of the Royal Society of London Series a-Mathematical Physical and Engineering Sciences, 362, 1355-1371. 193. Balaeff, A., Mahadevan, L. and Schulten, K. (2006) Modeling DNA loops using the theory of elasticity. Physical review. E, Statistical, nonlinear, and soft matter physics, 73, 031919-031919. 194. Villa, E., Balaeff, A., Mahadevan, L. and Schulten, K. (2004) Multiscale method for simulating protein-DNA complexes. Multiscale Modeling & Simulation, 2, 527-553. 195. Starostin, E.L. (2004) Symmetric equilibria of a thin elastic rod with self-contacts. Philosophical Transactions of the Royal Society of London Series a-Mathematical Physical and Engineering Sciences, 362, 1317-1334. 196. Coleman, B.D. and Swigon, D. (2004) Theory of self-contact in Kirchhoff rods with applications to supercoiling of knotted and unknotted DNA . Philosophical Transactions of the Royal Society of London Series a-Mathematical Physical and Engineering Sciences, 362, 1281-1299.

170

197. Bathe, K.-J. (1996) Finite Element Procedures. Prentice Hall Inc., Upper Saddle River, New Jersey. 198. Brunger, A.T. (1992), X-PLOR, Version 3.1, Department of Molecular Biophysics and Biochemistry, Howard Hughes Medical Institute, Yale University, CT. 199. Martyna, G.J., Tobias, D.J. and Klein, M.L. (1994) Constant pressure molecular dynamics algorithms. J. Chem. Phys., 101, 4177-4189. 200. Feller, S.E., Zhang, Y., Pastor, R.W. and Brooks, B.R. (1995) Constant pressure molecular dynamics simulation: The Langevin piston method. J. Chem. Phys., 103, 4613-4621. 201. MacKerell, A.D. and Banavali, N.K. (2000) All-atom empirical force field for nucleic acids: II. Application to molecular dynamics simulations of DNA and RNA in solution. Journal of Computational Chemistry, 21, 105-120. 202. Garcia, A.E. (1992) LARGE-AMPLITUDE NONLINEAR MOTIONS IN PROTEINS. Physical Review Letters, 68, 2696-2699. 203. Amadei, A., Linssen, A.B.M. and Berendsen, H.J.C. (1993) ESSENTIAL DYNAMICS OF PROTEINS. Proteins-Structure Function and Genetics, 17, 412-425. 204. Hayward, S. and de Groot, B.L. (2008) Normal modes and essential dynamics. Methods in molecular biology (Clifton, N.J.), 443, 89-106. 205. Irikura, K.K., Tidor, B., Brooks, B.R. and Karplus, M. (1985) TRANSITION FROM B-DNA TO Z-DNA - CONTRIBUTION OF INTERNAL FLUCTUATIONS TO THE CONFIGURATIONAL ENTROPY DIFFERENCE. Science, 229, 571-572. 206. Perez, A., Blas, J.R., Rueda, M., Lopez-Bes, J.M., de la Cruz, X. and Orozco, M. (2005) Exploring the essential dynamics of B-DNA. Journal of Chemical Theory and Computation, 1, 790-800. 207. Fresch, B. and Remacle, F. (2015) In Joachim, C. and Rapenne, G. (eds.), Single Molecule Machines and Motors. Springer International Publishing, Switzerland. 208. VanAalten, D.M.F., DeGroot, B.L., Findlay, J.B.C., Berendsen, H.J.C. and Amadei, A. (1997) A comparison of techniques for calculating protein essential dynamics. Journal of Computational Chemistry, 18, 169-181. 209. vanAalten, D.M.F., Findlay, J.B.C., Amadei, A. and Berendsen, H.J.C. (1995) Essential dynamics of the cellular retinol-binding protein - Evidence for ligand-induced conformational changes. Protein Engineering, 8, 1129-1135. 210. Grubmuller, H. (1995) PREDICTING SLOW STRUCTURAL TRANSITIONS IN MACROMOLECULAR SYSTEMS - CONFORMATIONAL FLOODING. Physical Review E, 52, 2893-2906. 211. Amadei, A., Linssen, A.B.M., deGroot, B.L., vanAalten, D.M.F. and Berendsen, H.J.C. (1996) An efficient method for sampling the essential subspace of proteins. Journal of Biomolecular Structure & Dynamics, 13, 615-625. 212. deGroot, B.L., Amadei, A., vanAalten, D.M.F. and Berendsen, H.J.C. (1996) Towards an exhaustive sampling of the configurational spaces of the two forms of the peptide hormone guanylin. Journal of Biomolecular Structure & Dynamics, 13, 741-751. 213. deGroot, B.L., Amadei, A., Scheek, R.M., vanNuland, N.A.J. and Berendsen, H.J.C. (1996) An extended sampling of the configurational space of HPr from E-coli. Proteins-Structure Function and Genetics, 26, 314-322. 214. Fiorin, G., Klein, M.L. and Henin, J. (2013) Using collective variables to drive molecular dynamics simulations. Molecular Physics, 111, 3345-3362. 215. Jorgensen, W.L., Chandrasekhar, J., Madura, J.D., Impey, R.W. and Klein, M.L. (1983) Comparison of simple potential functions for simulating water. J. Chem. Phys., 79, 926-935. 216. MacKerell, A.D., Bashford, D., Bellott, M., Dunbrack, R.L., Evanseck, J.D., Field, M.J., Fischer, S., Gao, J., Guo, H., Ha, S. et al. (1998) All-atom empirical potential for molecular modeling and dynamics studies of proteins. Journal of Physical Chemistry B, 102, 3586-3616.

171

217. Jorgensen, W.L., Maxwell, D.S. and TiradoRives, J. (1996) Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic . Journal of the American Chemical Society, 118, 11225-11236. 218. Schuler, L.D., Daura, X. and Van Gunsteren, W.F. (2001) An improved GROMOS96 force field for aliphatic hydrocarbons in the condensed phase. Journal of Computational Chemistry, 22, 1205- 1218. 219. Rappe, A.K., Casewit, C.J., Colwell, K.S., Goddard, W.A. and Skiff, W.M. (1992) UFF, A FULL PERIODIC-TABLE FORCE-FIELD FOR MOLECULAR MECHANICS AND MOLECULAR-DYNAMICS SIMULATIONS. Journal of the American Chemical Society, 114, 10024-10035. 220. Shinoda, W., Devane, R. and Klein, M.L. (2007) Multi-property fitting and parameterization of a coarse grained model for aqueous surfactants. Molecular Simulation, 33, 27-36. 221. Daidone, I. and Amadei, A. (2012) Essential dynamics: foundation and applications. Wiley Interdisciplinary Reviews-Computational Molecular Science, 2, 762-770. 222. Henin, J., Fiorin, G., Chipot, C. and Klein, M.L. (2010) Exploring Multidimensional Free Energy Landscapes Using Time-Dependent Biases on Collective Variables. Journal of Chemical Theory and Computation, 6, 35-47. 223. Gaborek, T.J., Chipot, C. and Madura, J.D. (2012) Conformational Free-Energy Landscapes for a Peptide in Saline Environments. Biophysical Journal, 103, 2513-2520. 224. Carter, E.A., Ciccotti, G., Hynes, J.T. and Kapral, R. (1989) CONSTRAINED REACTION COORDINATE DYNAMICS FOR THE SIMULATION OF RARE EVENTS. Chemical Physics Letters, 156, 472-477. 225. Henin, J. and Chipot, C. (2004) Overcoming free energy barriers using unconstrained molecular dynamics simulations. Journal of Chemical Physics, 121, 2904-2914. 226. Chipot, C. and Pohorille, A. (2007) Free Energy Calculations. Springer, New York. 227. Glykos, N.M. (2006) Software news and updates - Carma: A molecular dynamics analysis program. Journal of Computational Chemistry, 27, 1765-1768. 228. Brannigan, G., LeBard, D.N., Henin, J., Eckenhoff, R.G. and Klein, M.L. (2010) Multiple binding sites for the general anesthetic isoflurane identified in the nicotinic acetylcholine receptor transmembrane domain. Proceedings of the National Academy of Sciences of the United States of America, 107, 14122-14127. 229. Wang, J., Wu, Y., Ma, C., Fiorin, G., Wang, J., Pinto, L.H., Lamb, R.A., Klein, M.L. and DeGrado, W.F. (2013) Structure and inhibition of the drug-resistant S31N mutant of the M2 of A virus. Proceedings of the National Academy of Sciences of the United States of America, 110, 1315-1320. 230. Lankas, F., Sponer, J., Hobza, P. and Langowski, J. (2000) Sequence-dependent elastic properties of DNA. Journal of Molecular Biology, 299, 695-709. 231. Lankas, F., Cheatham, T.E., Spackova, N., Hobza, P., Langowski, J. and Sponer, J. (2002) Critical effect of the N2 amino group on structure, dynamics, and elasticity of DNA polypurine tracts. Biophysical Journal, 82, 2592-2609. 232. Lankas, F., Sponer, J., Langowski, J. and Cheatham, T.E. (2003) DNA basepair step deformability inferred from molecular dynamics simulations. Biophysical Journal, 85, 2872-2883. 233. Gonzalez, O. and Maddocks, J.H. (2001) Extracting parameters for base-pair level models of DNA from molecular dynamics simulations. Theoretical Chemistry Accounts, 106, 76-82. 234. Lankas, F., Gonzalez, O., Heffler, L.M., Stoll, G., Moakher, M. and Maddocks, J.H. (2009) On the parameterization of rigid base and basepair models of DNA from molecular dynamics simulations. Physical Chemistry Chemical Physics, 11, 10565-10588. 235. Coleman, B.D., Olson, W.K. and Swigon, D. (2003) Theory of sequence-dependent DNA elasticity. Journal of Chemical Physics, 118, 7127-7140.

172

236. Orozco, M., Perez, A., Noy, A. and Luque, F.J. (2003) Theoretical methods for the simulation of nucleic acids. Chemical Society Reviews, 32, 350-364. 237. Olson, W.K., Bansal, M., Burley, S.K., Dickerson, R.E., Gerstein, M., Harvey, S.C., Heinemann, U., Lu, X.J., Neidle, S., Shakked, Z. et al. (2001) A standard reference frame for the description of nucleic acid base-pair geometry. Journal of Molecular Biology, 313, 229-237. 238. Dans, P.D., Perez, A., Faustino, I., Lavery, R. and Orozco, M. (2012) Exploring polymorphisms in B-DNA helical conformations. Nucleic Acids Research, 40, 10668-10678. 239. Wilhelm, M., Mukherjee, A., Bouvier, B., Zakrzewska, K., Hynes, J.T. and Lavery, R. (2012) Multistep Drug Intercalation: Molecular Dynamics and Free Energy Studies of the Binding of Daunomycin to DNA. Journal of the American Chemical Society, 134, 8588-8596. 240. Lavery, R., Moakher, M., Maddocks, J.H., Petkeviciute, D. and Zakrzewska, K. (2009) Conformational analysis of nucleic acids revisited: Curves+. Nucleic Acids Res., 37, 5917-5929. 241. Landau, L.D. and Lifshitz, E.M. (1980) Statistical Physics, Part 1. Butterworth-Heinemann, Oxford, UK. 242. Neal, R.M. (1996) Sampling from multimodal distributions using tempered transitions. Statistics and Computing, 6, 353-366. 243. Sugita, Y. and Okamoto, Y. (1999) Replica-exchange molecular dynamics method for . Chemical Physics Letters, 314, 141-151. 244. Sugita, Y., Kitao, A. and Okamoto, Y. (2000) Multidimensional replica-exchange method for free- energy calculations. Journal of Chemical Physics, 113, 6042-6051. 245. Earl, D.J. and Deem, M.W. (2005) Parallel tempering: Theory, applications, and new perspectives. Physical Chemistry Chemical Physics, 7, 3910-3916. 246. Murata, K., Sugita, Y. and Okamoto, Y. (2004) Free energy calculations for DNA base stacking by replica-exchange umbrella sampling. Chemical Physics Letters, 385, 1-7. 247. Seibert, M.M., Patriksson, A., Hess, B. and van der Spoel, D. (2005) Reproducible polypeptide folding and structure prediction using molecular dynamics simulations. Journal of Molecular Biology, 354, 173-183. 248. Johnson, R.R., Kohlmeyer, A., Johnson, A.T.C. and Klein, M.L. (2009) Free Energy Landscape of a DNA- Hybrid Using Replica Exchange Molecular Dynamics. Nano Letters, 9, 537-541. 249. Kannan, S. and Zacharias, M. (2009) Simulation of DNA double-strand dissociation and formation during replica-exchange molecular dynamics simulations. Physical Chemistry Chemical Physics, 11, 10589-10595. 250. Phelps, C., Lee, W., Jose, D., von Hippel, P.H. and Marcus, A.H. (2013) Single-molecule FRET and linear dichroism studies of DNA breathing and helicase binding at replication fork junctions. Proceedings of the National Academy of Sciences of the United States of America, 110, 17320- 17325. 251. Sindhikara, D., Meng, Y. and Roitberg, A.E. (2008) Exchange frequency in replica exchange molecular dynamics. Journal of Chemical Physics, 128. 252. Rathore, N., Chopra, M. and de Pablo, J.J. (2005) Optimal allocation of replicas in parallel tempering simulations. Journal of Chemical Physics, 122. 253. Kelso, C. and Simmerling, C. (2006) In Sponer, J. and Lankas, F. (eds.), Computational Studies of RNA and DNA. Springer, Dordrecht, Vol. 2. 254. Zhang, C. and Ma, J. (2010) Enhanced sampling and applications in protein folding in explicit solvent. Journal of Chemical Physics, 132. 255. Simmerling, C., Fox, T. and Kollman, P.A. (1998) Use of locally enhanced sampling in free energy calculations: Testing and application to the alpha ->beta anomerization of glucose. Journal of the American Chemical Society, 120, 5771-5782.

173

256. Cheng, X.L., Cui, G.L., Hornak, V. and Sinnnerling, C. (2005) Modified replica exchange simulation methods for local structure refinement. Journal of Physical Chemistry B, 109, 8220-8230. 257. Torrie, G.M. and Valleau, J.P. (1977) NON-PHYSICAL SAMPLING DISTRIBUTIONS IN MONTE- CARLO FREE-ENERGY ESTIMATION - UMBRELLA SAMPLING. Journal of Computational Physics, 23, 187-199. 258. Straatsma, T.P., Zacharias, M. and McCammon, J.A. (1992) HOLONOMIC CONSTRAINT CONTRIBUTIONS TO FREE-ENERGY DIFFERENCES FROM THERMODYNAMIC INTEGRATION MOLECULAR-DYNAMICS SIMULATIONS. Chemical Physics Letters, 196, 297-302. 259. Pearlman, D.A. (1993) DETERMINING THE CONTRIBUTIONS OF CONSTRAINTS IN FREE-ENERGY CALCULATIONS - DEVELOPMENT, CHARACTERIZATION, AND RECOMMENDATIONS. Journal of Chemical Physics, 98, 8946-8957. 260. Huber, T., Torda, A.E. and Vangunsteren, W.F. (1994) LOCAL ELEVATION - A METHOD FOR IMPROVING THE SEARCHING PROPERTIES OF MOLECULAR-DYNAMICS SIMULATION. Journal of Computer-Aided Molecular Design, 8, 695-708. 261. Engkvist, O. and Karlstrom, G. (1996) A method to calculate the probability distribution for systems with large energy barriers. Chemical Physics, 213, 63-76. 262. Bartels, C., Schaefer, M. and Karplus, M. (1999) Determination of equilibrium properties of biomolecular systems using multidimensional adaptive umbrella sampling. Journal of Chemical Physics, 111, 8048-8067. 263. Darve, E. and Pohorille, A. (2001) Calculating free energies using average force. Journal of Chemical Physics, 115, 9169-9183. 264. Laio, A. and Parrinello, M. (2002) Escaping free-energy minima. Proceedings of the National Academy of Sciences of the United States of America, 99, 12562-12566. 265. Park, S., Khalili-Araghi, F., Tajkhorshid, E. and Schulten, K. (2003) Free energy calculation from steered molecular dynamics simulations using Jarzynski's equality. Journal of Chemical Physics, 119, 3559-3566. 266. Maragakis, P., Spichty, M. and Karplus, M. (2006) Optimal estimates of free energies from multistate nonequilibrium work data. Physical Review Letters, 96. 267. Babin, V., Roland, C. and Sagui, C. (2008) Adaptively biased molecular dynamics for free energy calculations. Journal of Chemical Physics, 128. 268. Barducci, A., Bussi, G. and Parrinello, M. (2008) Well-tempered metadynamics: A smoothly converging and tunable free-energy method. Physical Review Letters, 100. 269. Chipot, C. and Pohorille, A. (1997) Structure and dynamics of small peptides at aqueous interfaces - A multi-nanosecond molecular dynamics study. Journal of Molecular Structure- Theochem, 398, 529-535. 270. Iannuzzi, M., Laio, A. and Parrinello, M. (2003) Efficient exploration of reactive potential energy surfaces using Car-Parrinello molecular dynamics. Physical Review Letters, 90. 271. Bussi, G., Laio, A. and Parrinello, M. (2006) Equilibrium free energies from nonequilibrium metadynamics. Physical Review Letters, 96. 272. Wang, F.G. and Landau, D.P. (2001) Determining the density of states for classical statistical models: A random walk algorithm to produce a flat histogram. Physical Review E, 64. 273. Wang, F.G. and Landau, D.P. (2001) Efficient, multiple-range random walk algorithm to calculate the density of states. Physical Review Letters, 86, 2050-2053. 274. Martonak, R., Laio, A. and Parrinello, M. (2003) Predicting crystal structures: The Parrinello- Rahman method revisited. Physical Review Letters, 90. 275. Piana, S. and Laio, A. (2007) A bias-exchange approach to protein folding. Journal of Physical Chemistry B, 111, 4553-4559.

174

276. Bussi, G., Gervasio, F.L., Laio, A. and Parrinello, M. (2006) Free-energy landscape for beta hairpin folding from combined parallel tempering and metadynamics. Journal of the American Chemical Society, 128, 13435-13441. 277. Marinelli, F., Pietrucci, F., Laio, A. and Piana, S. (2009) A Kinetic Model of Trp-Cage Folding from Multiple Biased Molecular Dynamics Simulations. Plos Computational Biology, 5. 278. Granata, D., Camilloni, C., Vendruscolo, M. and Laio, A. (2013) Characterization of the free- energy landscapes of proteins by NMR-guided metadynamics. Proceedings of the National Academy of Sciences of the United States of America, 110, 6817-6822. 279. Baftizadeh, F., Pietrucci, F., Biarnes, X. and Laio, A. (2013) Nucleation Process of a Fibril Precursor in the C-Terminal Segment of Amyloid-beta. Physical Review Letters, 110. 280. Baftizadeh, F., Biarnes, X., Pietrucci, F., Affinito, F. and Laio, A. (2012) Multidimensional View of Amyloid Fibril Nucleation in Atomistic Detail. Journal of the American Chemical Society, 134, 3886-3894. 281. Comer, J., Gumbart, J.C., Henin, J., Lelievre, T., Pohorille, A. and Chipot, C. (2015) The Adaptive Biasing Force Method: Everything You Always Wanted To Know but Were Afraid To Ask. Journal of Physical Chemistry B, 119, 1129-1151. 282. Wei, C. and Pohorille, A. (2009) Permeation of Membranes by Ribose and Its Diastereomers. Journal of the American Chemical Society, 131, 10237-10245. 283. Xu, J., Crowley, M.F. and Smith, J.C. (2009) Building a foundation for structure-based cellulosome design for cellulosic ethanol: Insight into cohesin-dockerin complexation from computer simulation. Protein Science, 18, 949-959. 284. Cai, W., Sun, T., Liu, P., Chipot, C. and Shao, X. (2009) Inclusion Mechanism of Steroid Drugs into beta-Cyclodextrins. Insights from Free Energy Calculations. Journal of Physical Chemistry B, 113, 7836-7843. 285. Rodriguez, J., Semino, R. and Laria, D. (2009) Building up Nanotubes: Docking of "Janus" Cyclodextrins in Solution. Journal of Physical Chemistry B, 113, 1241-1244. 286. Vaitheeswaran, S. and Thirumalai, D. (2008) Interactions between amino acid side chains in cylindrical hydrophobic nanopores with applications to peptide stability. Proceedings of the National Academy of Sciences of the United States of America, 105, 17636-17641. 287. Dehez, F., Pebay-Peyroula, E. and Chipot, C. (2008) Binding of ADP in the mitochondrial ADP/ATP carrier is driven by an electrostatic funnel. Journal of the American Chemical Society, 130, 12725-12733. 288. Gorfe, A.A., Babakhani, A. and McCammon, J.A. (2007) Free energy profile of h-ras membrane anchor upon membrane insertion. Angewandte Chemie-International Edition, 46, 8234-8237. 289. Weronski, P., Jiang, Y. and Rasmussen, S. (2007) Molecular dynamics study of small PNA molecules in -water system. Biophysical Journal, 92, 3081-3091. 290. Spiegel, K., Magistrato, A., Carloni, P., Reedijk, J. and Klein, M.L. (2007) Azole-bridged diplatinum anticancer compounds. Modulating DNA flexibility to escape repair mechanism and avoid cross resistance. Journal of Physical Chemistry B, 111, 11873-11876. 291. Blumberger, J., Lamoureux, G. and Klein, M.L. (2007) Peptide hydrolysis in thermolysin: Ab initio QM/MM investigation of the Glu143-assisted water addition mechanism. Journal of Chemical Theory and Computation, 3, 1837-1850. 292. Henin, J., Tajkhorshid, E., Schulten, K. and Chipot, C. (2008) Diffusion of glycerol through Escherichia coli aquaglyceroporin GlpF. Biophysical Journal, 94, 832-839. 293. Lee, E.H., Hsin, J., Mayans, O. and Schulten, K. (2007) Secondary and tertiary structure elasticity of titin Z1Z2 and a titin chain model. Biophysical Journal, 93, 1719-1735. 294. Dehez, F., Tarek, M. and Chipot, C. (2007) Energetics of ion transport in a peptide nanotube. Journal of Physical Chemistry B, 111, 10633-10635.

175

295. Lamoureux, G., Klein, M.L. and Berneche, S. (2007) A stable water chain in the hydrophobic pore of the AmtB ammonium transporter. Biophysical Journal, 92, L82-L84. 296. Kuang, Z., Mahankali, U. and Beck, T.L. (2007) Proton pathways and H+/Cl- stoichiometry in bacterial chloride transporters. Proteins-Structure Function and Bioinformatics, 68, 26-33. 297. Ivanov, I., Cheng, X., Sine, S.M. and McCammon, J.A. (2007) Barriers to ion translocation in cationic and anionic receptors from the Cys-loop family. Journal of the American Chemical Society, 129, 8217-8224. 298. Yu, Y.M., Chipot, C., Cai, W.S. and Shao, X.G. (2006) Molecular dynamics study of the inclusion of into cyclodextrins. Journal of Physical Chemistry B, 110, 6372-6378. 299. Treptow, W. and Tarek, M. (2006) Molecular restraints in the permeation pathway of ion channels. Biophysical Journal, 91, L26-L28. 300. Henin, J., Schulten, K. and Chipot, C. (2006) Conformational equilibrium in alanine-rich peptides probed by reversible stretching simulations. Journal of Physical Chemistry B, 110, 16718-16723. 301. Henin, J., Pohorille, A. and Chipot, C. (2005) Insights into the recognition and association of transmembrane alpha-helices. The free energy of alpha-helix dimerization in glycophorin A. Journal of the American Chemical Society, 127, 8478-8484. 302. Hohng, S., Zhou, R., Nahas, M.K., Yu, J., Schulten, K., Lilley, D.M.J. and Ha, T. (2007) Fluorescence-force spectroscopy maps two-dimensional reaction landscape of the Holliday junction. Science, 318, 279-283. 303. Wang, J., Hou, T. and Xu, X. (2006) Recent Advances in Free Energy Calculations with a Combination of Molecular Mechanics and Continuum Models. Current Computer-Aided Drug Design, 2, 287-306. 304. Wang, W., Donini, O., Reyes, C.M. and Kollman, P.A. (2001) Biomolecular simulations: Recent developments in force fields, simulations of enzyme , protein-ligand, protein-protein, and protein-nucleic acid noncovalent interactions. Annual Review of Biophysics and Biomolecular Structure, 30, 211-243. 305. Kollman, P.A., Massova, I., Reyes, C., Kuhn, B., Huo, S.H., Chong, L., Lee, M., Lee, T., Duan, Y., Wang, W. et al. (2000) Calculating structures and free energies of complex molecules: Combining molecular mechanics and continuum models. Accounts of Chemical Research, 33, 889-897. 306. Beveridge, D.L. and Dicapua, F.M. (1989) FREE-ENERGY VIA MOLECULAR SIMULATION - APPLICATIONS TO CHEMICAL AND BIOMOLECULAR SYSTEMS. Annual Review of Biophysics and Biophysical Chemistry, 18, 431-492. 307. Andricioaei, I. and Karplus, M. (2001) On the calculation of entropy from covariance matrices of the atomic fluctuations. Journal of Chemical Physics, 115, 6289-6292. 308. Schlitter, J. (1993) ESTIMATION OF ABSOLUTE AND RELATIVE ENTROPIES OF MACROMOLECULES USING THE COVARIANCE-MATRIX. Chemical Physics Letters, 215, 617-621. 309. Hnizdo, V., Darian, E., Fedorowicz, A., Demchuk, E., Li, S. and Singh, H. (2007) Nearest-neighbor nonparametric method for estimating the configurational entropy of complex molecules. Journal of Computational Chemistry, 28, 655-668. 310. Killian, B.J., Kravitz, J.Y. and Gilson, M.K. (2007) Extraction of configurational entropy from molecular simulations via an expansion approximation. Journal of Chemical Physics, 127. 311. Numata, J., Wan, M. and Knapp, E.-W. (2007) Conformational entropy of : Beyond the quasi-harmonic approximation. 2007, Vol 18, 18, 192-205. 312. Hnizdo, V., Tan, J., Killian, B.J. and Gilson, M.K. (2008) Efficient calculation of configurational entropy from molecular simulations by combining the mutual-information expansion and nearest-neighbor methods. Journal of Computational Chemistry, 29, 1605-1614.

176

313. Baron, R., Huenenberger, P.H. and McCammon, J.A. (2009) Absolute Single-Molecule Entropies from Quasi-Harmonic Analysis of Microsecond Molecular Dynamics: Correction Terms and Convergence Properties. Journal of Chemical Theory and Computation, 5, 3150-3160. 314. Numata, J. and Knapp, E.-W. (2012) Balanced and Bias-Corrected Computation of Conformational Entropy Differences for Molecular Trajectories. Journal of Chemical Theory and Computation, 8, 1235-1245. 315. Wang, W. and Kollman, P.A. (2000) Free energy calculations on dimer stability of the HIV protease using molecular dynamics and a continuum solvent model. Journal of Molecular Biology, 303, 567-582. 316. Hou, T., Chen, K., McLaughlin, W.A., Lu, B. and Wang, W. (2006) Computational analysis and prediction of the binding motif and protein interacting partners of the Abl SH3 domain. Plos Computational Biology, 2, 46-55. 317. Gohlke, H. and Case, D.A. (2004) Converging free energy estimates: MM-PB(GB)SA studies on the protein-protein complex Ras-Raf. Journal of Computational Chemistry, 25, 238-250. 318. Hou, T.J., Guo, S.L. and Xu, X.J. (2002) Predictions of binding of a diverse set of ligands to gelatinase-A by a combination of molecular dynamics and continuum solvent models. Journal of Physical Chemistry B, 106, 5527-5535. 319. Wang, W. and Kollman, P.A. (2001) Computational study of protein specificity: The molecular basis of HIV-1 protease drug resistance. Proceedings of the National Academy of Sciences of the United States of America, 98, 14937-14942. 320. Huo, S.H., Wang, J.M., Cieplak, P., Kollman, P.A. and Kuntz, I.D. (2002) Molecular dynamics and free energy analyses of cathepsin D-inhibitor interactions: Insight into structure-based ligand design. Journal of Medicinal Chemistry, 45, 1412-1419. 321. Watson, J., Hays, F.A. and Ho, P.S. (2004) Definitions and analysis of DNA Holliday junction geometry. Nucleic Acids Research, 32, 3017-3027. 322. Jorgensen, W.L., Chandrasekhar, J., Madura, J.D., Impey, R.W. and Klein, M.L. (1983) COMPARISON OF SIMPLE POTENTIAL FUNCTIONS FOR SIMULATING LIQUID WATER. Journal of Chemical Physics, 79, 926-935. 323. BIOVIA, D.S. (2015). San Diego: Dassault Systèmes, Release 4.5. 324. Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R.D., Kale, L. and Schulten, K. (2005) Scalable molecular dynamics with NAMD. Journal of Computational Chemistry, 26, 1781-1802. 325. Allner, O., Nilsson, L. and Villa, A. (2012) Magnesium Ion-Water Coordination and Exchange in Biomolecular Simulations. Journal of Chemical Theory and Computation, 8, 1493-1502. 326. Batcho, P.F., Case, D.A. and Schlick, T. (2001) Optimized particle-mesh Ewald/multiple-time step integration for molecular dynamics simulations. Journal of Chemical Physics, 115, 4003-4018. 327. Feller, S.E., Zhang, Y.H., Pastor, R.W. and Brooks, B.R. (1995) CONSTANT-PRESSURE MOLECULAR- DYNAMICS SIMULATION - THE LANGEVIN PISTON METHOD. Journal of Chemical Physics, 103, 4613-4621. 328. Michaud-Agrawal, N., Denning, E.J., Woolf, T.B. and Beckstein, O. (2011) Software News and Updates MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations. Journal of Computational Chemistry, 32, 2319-2327. 329. Miller, B.R., III, McGee, T.D., Jr., Swails, J.M., Homeyer, N., Gohlke, H. and Roitberg, A.E. (2012) MMPBSA.py: An Efficient Program for End-State Free Energy Calculations. Journal of Chemical Theory and Computation, 8, 3314-3321. 330. D.A. Case, J.T.B., R.M. Betz, D.S. Cerutti, T.E. Cheatham, III, T.A. Darden, R.E. Duke, T.J. Giese, H. Gohlke, A.W. Goetz, N. Homeyer, S. Izadi, P. Janowski, J. Kaus, A. Kovalenko, T.S. Lee, S. LeGrand, P. Li, T. Luchko, R. Luo, B. Madej, K.M. Merz, G. Monard, P. Needham, H. Nguyen, H.T. Nguyen, I.

177

Omelyan, A. Onufriev, D.R. Roe, A. Roitberg, R. Salomon-Ferrer, C.L. Simmerling, W. Smith, J. Swails, R.C. Walker, J. Wang, R.M. Wolf, X. Wu, D.M. York and P.A. Kollman. (2014), University of California, San Francisco. 331. Crowley, M.F., Williamson, M.J. and Walker, R.C. (2009) CHAMBER: Comprehensive Support for CHARMM Force Fields Within the AMBER Software. International Journal of Quantum Chemistry, 109, 3767-3772. 332. Warwicker, J. and Watson, H.C. (1982) CALCULATION OF THE ELECTRIC-POTENTIAL IN THE ACTIVE-SITE CLEFT DUE TO ALPHA-HELIX DIPOLES. Journal of Molecular Biology, 157, 671-679. 333. Klapper, I., Hagstrom, R., Fine, R., Sharp, K. and Honig, B. (1986) Focusing of electric fields in the active site of Cu-Zn superoxide dismutase: effects of ionic strength and amino-acid modification. Proteins, 1, 47-59. 334. Cai, Q., Hsieh, M.-J., Wang, J. and Luo, R. (2010) Performance of Nonlinear Finite-Difference Poisson-Boltzmann Solvers. Journal of Chemical Theory and Computation, 6, 203-211. 335. Tan, C., Tan, Y.-H. and Luo, R. (2007) Implicit nonpolar solvent models. Journal of Physical Chemistry B, 111, 12263-12274. 336. Weiser, J., Shenkin, P.S. and Still, W.C. (1999) Approximate atomic surfaces from linear combinations of pairwise overlaps (LCPO). Journal of Computational Chemistry, 20, 217-230. 337. Bakan, A., Meireles, L.M. and Bahar, I. (2011) ProDy: Protein Dynamics Inferred from Theory and Experiments. Bioinformatics, 27, 1575-1577. 338. Darve, E., Rodriguez-Gomez, D. and Pohorille, A. (2008) Adaptive biasing force method for scalar and vector free energy calculations. Journal of Chemical Physics, 128. 339. Flyvbjerg, H. and Petersen, H.G. (1989) ERROR-ESTIMATES ON AVERAGES OF CORRELATED DATA. Journal of Chemical Physics, 91, 461-466. 340. Rodriguez-Gomez, D., Darve, E. and Pohorille, A. (2004) Assessing the efficiency of free energy calculation methods. Journal of Chemical Physics, 120, 3563-3578. 341. Michelotti, N., Johnson-Buck, A., Manzo, A.J. and Walter, N.G. (2012) Beyond DNA origami: the unfolding prospects of nucleic acid nanotechnology. Wiley Interdisciplinary Reviews- and , 4, 139-152. 342. Fu, J., Liu, M., Liu, Y. and Yan, H. (2012) Spatially-Interactive Biomolecular Networks Organized by Nucleic Acid Nanostructures. Accounts of Chemical Research, 45, 1215-1226. 343. Krishnan, Y. and Bathe, M. (2012) Designer nucleic acids to probe and program the cell. Trends in Cell Biology, 22, 624-633. 344. Seeman, N.C. (2005) From genes to machines: DNA nanomechanical devices. Trends in Biochemical Sciences, 30, 119-125. 345. He, Y. and Liu, D.R. (2010) Autonomous multistep organic synthesis in a single isothermal solution mediated by a DNA walker. Nature Nanotechnology, 5, 778-782. 346. Rosen, C.B., Kodal, A.L.B., Nielsen, J.S., Schaffert, D.H., Scavenius, C., Okholm, A.H., Voigt, N.V., Enghild, J.J., Kjems, J., Torring, T. et al. (2014) Template-directed covalent conjugation of DNA to native antibodies, transferrin and other metal-binding proteins. Nature Chemistry, 6, 804-809. 347. Chhabra, R., Sharma, J., Liu, Y. and Yan, H. (2006) Addressable molecular tweezers for DNA- templated coupling reactions. Nano Letters, 6, 978-983. 348. Modi, S., Swetha, M.G., Goswami, D., Gupta, G.D., Mayor, S. and Krishnan, Y. (2009) A DNA nanomachine that maps spatial and temporal pH changes inside living cells. Nature Nanotechnology, 4, 325-330. 349. Zhou, C., Yang, Z. and Liu, D. (2012) Reversible Regulation of Protein Binding Affinity by a DNA Machine. Journal of the American Chemical Society, 134, 1416-1418. 350. Rueda, D. and Walter, N.G. (2005) Single molecule fluorescence control for nanotechnology. Journal of Nanoscience and Nanotechnology, 5, 1990-2000.

178

351. Widom, J.R., Dhakal, S., Heinicke, L.A. and Walter, N.G. (2014) Single-molecule tools for enzymology, structural biology, systems biology and nanotechnology: an update. Archives of Toxicology, 88, 1965-1985. 352. Johnson-Buck, A., Nangreave, J., Kim, D.-N., Bathe, M., Yan, H. and Walter, N.G. (2013) Super- Resolution Fingerprinting Detects Chemical Reactions and Idiosyncrasies of Single DNA Pegboards. Nano Letters, 13, 728-733. 353. Wei, B., Dai, M., Myhrvold, C., Ke, Y., Jungmann, R. and Yin, P. (2013) Design Space for Complex DNA Structures. Journal of the American Chemical Society, 135, 18080-18088. 354. Joo, C., McKinney, S.A., Lilley, D.M.J. and Ha, T. (2004) Exploring rare conformational species and ionic effects in DNA Holliday junctions using single-molecule spectroscopy. Journal of Molecular Biology, 341, 739-751. 355. Nam, N., Birktoft, J.J., Sha, R., Wang, T., Zheng, J., Constantinou, P.E., Ginell, S.L., Chen, Y., Mao, C. and Seeman, N.C. (2012) The absence of tertiary interactions in a self-assembled DNA crystal structure. Journal of Molecular Recognition, 25, 234-237. 356. Eichman, B.F., Vargason, J.M., Mooers, B.H.M. and Ho, P.S. (2000) The Holliday junction in an inverted repeat DNA sequence: Sequence effects on the structure of four-way junctions. Proceedings of the National Academy of Sciences of the United States of America, 97, 3971- 3976. 357. English, B.P., Min, W., van Oijen, A.M., Lee, K.T., Luo, G.B., Sun, H.Y., Cherayil, B.J., Kou, S.C. and Xie, X.S. (2006) Ever-fluctuating single enzyme molecules: Michaelis-Menten equation revisited. Nature Chemical Biology, 2, 87-94. 358. Gourevitch, B. and Eggermont, J.J. (2007) A nonparametric approach for detection of bursts in spike trains. Journal of Neuroscience Methods, 160, 349-358. 359. Hammes, G.G., Benkovic, S.J. and Hammes-Schiffer, S. (2011) Flexibility, Diversity, and Cooperativity: Pillars of Enzyme Catalysis. Biochemistry, 50, 10422-10430. 360. Liu, B., Baskin, R.J. and Kowalczykowski, S.C. (2013) DNA unwinding heterogeneity by RecBCD results from static molecules able to equilibrate. Nature, 500, 482-+. 361. Fu, J., Yang, Y.R., Johnson-Buck, A., Liu, M., Liu, Y., Walter, N.G., Woodbury, N.W. and Yan, H. (2014) Multi-enzyme complexes on DNA scaffolds capable of substrate channelling with an artificial swinging arm. Nature Nanotechnology, 9, 531-536. 362. Abelson, J., Blanco, M., Ditzler, M.A., Fuller, F., Aravamudhan, P., Wood, M., Villa, T., Ryan, D.E., Pleiss, J.A., Maeder, C. et al. (2010) Conformational dynamics of single pre-mRNA molecules during in vitro splicing. Nature Structural & Molecular Biology, 17, 504-U156. 363. Chung, S.H. and Kennedy, R.A. (1991) FORWARD-BACKWARD NONLINEAR FILTERING TECHNIQUE FOR EXTRACTING SMALL BIOLOGICAL SIGNALS FROM NOISE. Journal of Neuroscience Methods, 40, 71-86. 364. Haran, G. (2004) Noise reduction in single-molecule fluorescence trajectories of folding proteins. Chemical Physics, 307, 137-145. 365. (2013). Accelrys Software Inc., San Diego. 366. Martyna, G.J., Tobias, D.J. and Klein, M.L. (1994) CONSTANT-PRESSURE MOLECULAR-DYNAMICS ALGORITHMS. Journal of Chemical Physics, 101, 4177-4189. 367. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V. et al. (2011) Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830. 368. Pereira, M.J.B., Nikolova, E.N., Hiley, S.L., Jaikaran, D., Collins, R.A. and Walter, N.G. (2008) Single VS molecules reveal dynamic and hierarchical folding toward catalysis. Journal of Molecular Biology, 382, 496-509.

179

369. Krishnan, R., Blanco, M.R., Kahlscheuer, M.L., Abelson, J., Guthrie, C. and Walter, N.G. (2013) Biased Brownian ratcheting leads to pre-mRNA remodeling and capture prior to first-step splicing. Nature Structural & Molecular Biology, 20, 1450-1457. 370. Korosi, A. and Fabuss, B.M. (1968) VISCOSITY OF LIQUID WATER FROM 25 DEGREES TO 150 DEGREES C MEASUREMENTS IN PRESSURIZED GLASS CAPILLARY VISCOMETER. Analytical Chemistry, 40, 157-&. 371. Sedeh, R.S., Pan, K., Adendorff, M.R., Hallatschek, O., Bathe, K.-J. and Bathe, M. (2016) Computing Nonequilibrium Conformational Dynamics of Structured Nucleic Acid Assemblies. Journal of Chemical Theory and Computation. 372. Bai, X.-c., Martin, T.G., Scheres, S.H.W. and Dietz, H. (2012) Cryo-EM structure of a 3D DNA- origami object. Proceedings of the National Academy of Sciences of the United States of America, 109, 20012-20017. 373. Wallace, M.I., Ying, L.M., Balasubramanian, S. and Klenerman, D. (2001) Non-Arrhenius kinetics for the loop closure of a DNA hairpin. Proceedings of the National Academy of Sciences of the United States of America, 98, 5584-5589.

180