Coarse-Grained Nucleic Acid - Model for Hybrid Nanotechnology

Jonah Procyk1, Erik Poppleton1, and Petr Sulcˇ 1∗ 1School of Molecular Sciences and Center for Molecular Design and Biomimetics, The Biodesign Institute, Arizona State University, 1001 South McAllister Avenue, Tempe, Arizona 85281, USA

(Dated: September 22, 2020) The emerging field of hybrid DNA - protein nanotechnology brings with it the potential for many novel materials which combine the addressability of DNA nanotechnology with versatility of protein interactions. However, the design and computational study of these hybrid structures is difficult due to the system sizes involved. To aid in the design and in silico analysis process, we introduce here a coarse-grained DNA/RNA-protein model that extends the oxDNA/oxRNA models of DNA/RNA with a coarse-grained model of based on an anisotropic network model representation. Fully equipped with analysis scripts and , our model aims to facilitate hybrid nanomaterial design towards eventual experimental realization, as well as enabling study of biological complexes. We further demonstrate its usage by simulating DNA-protein nanocage, DNA wrapped around histones, and a nascent RNA in polymerase.

INTRODUCTION come increasingly relevant as size and complexity of nanostructures grow. Design tools such as Adenita [14] Molecular nanotechnology designs biomolecular inter- MagicDNA [15], CadNano [16], and Tiamat [17] are es- actions to assemble nanoscale devices and structures. sential for the structural design of DNA origamis. New DNA nanotechnology, in particular, has attracted lots coarse-grained models have been introduced to study of attention and experienced rapid growth over the past DNA nanostructures, as the sizes (thousands or more) three decades. While originally envisioned as a method as well as rare events (formation or breaking of large of developing a DNA lattice for crystallizing proteins for sections of base pairs) involved in study of these sys- structure determination [1], DNA nanotechnology is see- tems make atomistic-resolution modeling more difficult. ing promising applications in e.g. biomaterial assembly Among the available tools, the oxDNA and oxRNAs [2], biocatalysis [3], therapeutics [4], and diagnostics [5]. models [18–21] have been among the most popular over The programmability of DNA allows for the rapid design the past few years and have been used by dozens of re- and experimental realization of complex shapes, yield- search groups in over one hundred articles to study vari- ing an unprecedented level of control and functionality ous aspects of DNA and RNA nanosystems as well as bio- at the nanoscale. As DNA nanotechology has devel- physical properties of DNA and RNA [22–27]. Each nu- oped, so have parallel technologies with other familiar cleotide is represented as a rigid body in the simulation, biomolecules such as RNA [6], and, to some extent, pro- with interactions between different sites parametrized teins [7, 8]. While DNA nanostructures and devices have to reproduce mechanical, structural and thermodynamic been unequivocally successful in realizing more complex properties of single-stranded and double-stranded DNA and larger constructs, they are inherently limited in func- and RNA respectively. tion by their available chemistry, with a possible solu- However, the oxDNA/oxRNA models only allow for tion using functionalized DNA nanostructures [9]. Of representation of nucleic acids alone, limiting their scope particular interest is hybrid DNA-protein nanotechnol- of usability. While there have been examples of coarse- ogy, which can combine the already well developed de- grained protein-DNA modeling specifically applied to sign strategies of DNA nanotechnology and cross-linking modeling of nucleosomes [28, 29], we currently do not them with functional proteins. The combination of the have an efficient tool at the level of oxDNA coarse- two molecules in nanotechnology will open new applica- graining that would allow for efficient study of arbitrary tions, such as diganostics, therapeutics, molecular ”fac- protein-DNA complexes. tories” and new biomimetic materials [10]. Examples of Here, we introduce such a coarse-grained model that arXiv:2009.09589v1 [q-bio.BM] 21 Sep 2020 successfully realized hybrid nanostructures include DNA- uses an Anisotropic Network Model (ANM) to represent protein cages [11], a DNA nanorobot with nucleolin ap- proteins alongside the oxDNA or oxRNA model. The tamer for cancer therapy [12] and peptide-directed as- ANM is a form of elastic network model used to probe the sembly of large nanostructures [13]. dynamics of biomolecules fluctuating around their native At the same time, computational tools for the study state. Originally formulated by Atilgan et. al. [30], the and design of DNA and RNA nanostructures have be- ANM has become fundamental tool in probing protein dynamics, often closely matching residue-residue fluctu- ations and normal modes of fully atomistic simulations ∗ Corresponding author: [email protected] [31–33]. Here we use the ANM to approximately capture 2 native state protein dynamics. The ANM representation of proteins interact with just an excluded volume interac- TABLE I. Excluded volume parameters used in Eq. 2 for (a) protein-protein, (b) protein-nucleic base and (c) protein- tion with the oxDNA / oxRNA representation, but spe- nucleic backbone non-bonded interactions in simulation units. cific attractive or repulsive interactions can be added as Parameter (a) (b) (c) well. We provide parametrization of common linkers that σ 0.350 0.360 0.570 are used to conjugate proteins to DNA in typical hybrid rc 0.353 0.363 0.573 nanotechnology applications. r∗ 0.349 0.359 0.569 The ANM-oxDNA/oxRNA hybrid models are intended b 30.7 × 107 29.6 × 107 17.9 × 107 to help design and probe function of large nucleic-acid protein hybrid nanostructures, but also to be used to Protein Model study biological complexes and processes which can be captured within the approximations employed by the models. As an example of the model’s use, we show simu- In the ANM representation, each protein residue is lations of DNA-protein hybrid nanocage, DNA wrapped represented solely by its α-carbon position. All residues around histone, and a nascent RNA strand inside poly- within a specified cutoff distance rmax from one another merase. are considered ’bonded’. Please see Ref. [30] for a more detailed introduction. Each bond between residues i and j in the ANM is represented as a harmonic potential that ij fluctuates around the equilibrium length r0 : 1  2 V rij = γ rij − rij (1) ij 2 0

The total bonded interaction potential Vbonded−anm is the sum of terms Eq. (1) for all pairs i, j of aminoacids at distance smaller than rmax in the resolved protein struc- ij ture, as schematically illustrated in Fig. 2. We set r0 to the the distance between α-carbons of the residues i and j in the PDB file. Free parameter γ is set uniformly on each bond in the ANM and and is chosen to best fit the Debye-Waller factors of the original PDB struc- ture. Debye-Waller factors (or B-factors when applied FIG. 1. A schematic overview of the oxDNA2 model and its specifically to proteins) describe the thermal motions of interactions. Each nucleotide is represented as a single rigid each resolved atom in a protein given by their respective body with backbone and base interaction sites (shown here X-ray scattering assay. As previously done [30], we use schematically as a sphere and an ellipsoid) with their effective the B-factor of the α-carbon to approximately capture interactions designed to reproduce basic properties of DNA. the fluctuations of the protein backbone. Since an ANM is typically an analytical technique, it has no excluded volume effects. Hence we here extend the model to use a repulsive part Lennard-Jones potential between both bonded and non-bonded particles (Eq. 2) to model the MODEL DESCRIPTION excluded volume at a per particle excluded volume diam- eter of 2.5 A.˚ For any two particles (either protein/protein or protein- Implemented in the oxDNA simulation package [34], DNA/RNA) that are at distance r, we define the ex- our model allows for a coarse-grained simulation of large cluded volume interaction in Eq. 2: hybrid nanostructures. It consists of two coarse-grained  6 12 4(− σ + σ ) r < r∗ particle representations, the already existing oxDNA2 or  r6 r12 4 ∗ oxRNA model for their respective nucleic acids and an Vexc(r) = b(r − rc) r < r < rc (2) Anistropic Network Model (ANM) for proteins [35]. The  0 r ≥ rc detailed description of the oxDNA2/oxRNA models is ∗ available in Refs. [19, 20]. A DNA duplex with a nicked where we choose rc. Parameters b and r were calculated strand is schematically illustrated in Fig. 1. The ANM so that Vexc is a differentiable function. The constant  pN allows us to represent a protein with a known structure sets the strength of the potential and we use  = 82 nm . as beads connected by springs. We chose to use ANM to represent proteins for its efficiency and relative simplicity, while still providing reasonably accurate representations Parameterization of proteins crosslinked to DNA nanostructures. Further- more, it can be implemented using only pairwise interac- In parameterizing our model for simulation, the goal tion potentials, the same as oxDNA/oxRNA models. is to mimic the dynamics of the protein in the native 3 state. Though not without their setbacks [36, 37] we se- our model and compare to the analytical calculation. lected B-factors for their widespread availability in PDB We show the comparison in Fig. 3 for ribonuclease T1 structures and history of being used to fit elastic net- and green fluoresecent proteins simulated with the ANM work models of proteins [36]. Our model contains two model and our ANMT model, to be introduced later. free parameters, the cutoff distance rmax and the spring While the simulation and analytical prediction of the constant γ. classic ANM agree well with each other, as expected, we note that the model still does not fully reproduce the measured B-factors as reported in the experimen- tal structures. ANM models are not able to fully re- produce the measured B-factors [30], and are known to have peaks in the mean square displacement profiles that have not been observed in the measured B-factors [36]. The model nevertheless provides semi-quantitative agree- ment with the measured data, and hence represents an accurate enough representation of a protein to model its mechanical properties under small perturbations, as re- quired for DNA-hybrid nanotechnology systems.

FIG. 2. Illustration of ANM using GFP protein (PDB code: 1W7S) from (a) starting PDB structure to (b) ANM represen- EXPANSION OF THE ANM MODEL tation at rmax of 8 A,˚ (c) bonding criteria per residue: all par- ticles within distance rmax (bounds depicted by blue sphere) of center particle (black circle) are considered ’bonded’ (blue In addition to the classic ANM model, our model can squares) while those further (outside of sphere) are considered also optionally use unique γij for each bonded pair of ’nonbonded’ (red squares). residues, which allows for implementation of other ana- lytical models, such as the heterogeneous ANM (HANM) The cutoff distance rmax can be varied, but typically [38] and multiscale ANM (mANM) [39] that can gener- authors use a value of 13 A˚ [30] with strong long-range ate better fits to experimental B-factors using the γij interactions being a key feature of the classic ANM [37]. values. The HANM iteratively fits a normal ANM net- For each protein (consisting of N aminoacids) repre- work to given experimental B-factors with variable real- sented by ANM, we linearly fit the analytically computed istic force parameters γij. While unquestionably useful, B-factors to their experimental counterpart with γ as a the inaccuracy of B-factor data particularly in large or free parameter. To solve for the B-factors analytically, high resolution structures limits its application. In the we first calculate the 3N × 3N Hessian matrix of the mANM model, our conversion from the PDB structure spring potential Vspring, a task made simple by the har- to ANM representation also allows the fitting of multiple monic potential energy function [30]. After constructing networks with varying γij values tuned by scale param- the Hessian H for the system at a specified cutoff rmax, eters [39] (similar to rmax). A linear combination of the the mean squared deviation from the mean position for networks is then solved to minimize the difference be- each residue i can be calculated from the equipartition tween the ANM network’s predicted and experimental theorem: B-factors. The original formulation of the mANM [39] is limited in computational application as it has no cut- 2 kbT −1 ∆Ri = tr Hi,i (3) off value (rmax); a protein of size N residues would have γ N(N − 1)/2 connections, significantly more than the av- The B-factor B of the residue i can be directly computed erage ANM. For the proteins studied in this work, neither from our previous result as [30]: HANM nor mANM provided a significant advantage, so we decide to use the simple ANM with fixed rmax and the 8π2 same γ for all spring interactions. A Cα coarse-grained B = h∆R i2 . (4) i 3 i HANM and a mANM with an additional cutoff value parameter are however implemented in our conversion The experimental B-factors are provided along with re- scripts and can be optionally used to represent proteins solved structures of proteins, and we can hence in our model. use Eqs. (3) and (4) to obtain N equations. We then fit One major obstacle in using an ANM is known as the γ parameter to minimize tip effect [40]. The result is an extremely large spike in the B-factors due to a residue being under-constrained. N  2 2 X 8π 2 f(γ) = Bexp. − h∆R i (5) Often this can be solved by raising the cutoff value in i 3 i ANM construction; however, doing so raises the compu- i=1 tational requirements of our simulations. Furthermore, for a selected rmax. We can further measure the mean we found the ANM model to be not able to accurately square deviation of residue positions in a simulation of represent short peptides, as the spring network does not 4

(a)

FIG. 4. Depiction of (a) bending and (b) torsional potential terms on a pair of particles i and j. The angles depicted as dot products correspond to the cosine of that angle. Equilib- rium values (in red) correspond to (the cosine of) initial angle displacements derived from coordinates in the PDB file.

tion on top of the ANM model with bonded and excluded volume potentials. Each protein residue corresponds to a spherical particle, with associated orientation given by (b) its orthonormal axes ˆi1, ˆi2, ˆi3 (Fig. 4a). Harmonic terms control the angle between the normalized interparticle distance vectorr ˆij and the normal vector of each particle ˆi1, ˆj1 to control bond bending. The angles between two sets of orientation vectors, ˆi1, ˆj1 and ˆi3, ˆj3, are controlled as well allowing for modulation of the torsion based on the particles relative orientations. The full pairwise po- tential is given by Eq. 6:

k  2  2 V B&T = b ˆr · ˆi − aij + −ˆr · ˆj − bij + ij 2 ij 1 0 ij 1 0 k  2  2 t ˆi · ˆj − cij + ˆi · ˆj − dij (6) 2 1 1 0 3 3 0

B&T The function Vij is defined for all pairs of residues that are neighbors along the protein backbone. We set the energy minimum values aij, bij, cij, dij to correspond FIG. 3. Analytical, classic ANM simulation, ANMT simula- 0 0 0 0 tion, and experimentally determined B-factors calculated in to the cosines of respective angles in between residues 2 in the PDB file for the protein structure. The terms A˚ per residue for (a) ribonuclease T1 (PDB code 1BU4) at ◦ kb and kt are two new global parameters that control 25 C (rmax = 15, ks = 42.2pN/A˚, kb = kt = 171.3pN/A)˚ and (b) green fluorescent protein (PDB code 1W7S) at 25◦C the strength of the bending and torsion potential respec- (rmax = 13, ks = 33.2pN/A˚, kb = kt = 171.3pN/A)˚ tively. Currently, we set their values empirically, though pair specific terms could lead to further agreement with experimental data. Fig. 3 shows the effect of the torsional provide enough constraints to reproduce their end-to-end and bonding modulation on the same set of proteins used distance as seen when simulated with more detailed mod- prior. As intended, a noticeable decrease in high peak B- els like AWSEM-MD [41]. factors is observed using a modest kb and kt value. Fig. 4 To overcome this obstacle, we implemented harmonic illustrates the potential in a two particle system. Here- pairwise bending and torsional modulation forces into after, we will refer to the ANM model with torsional and the existing simulation model. These new constraints bending modulation as the ANMT model. allow for reduced rmax values, and also can more accu- rately represent shorter peptides, which are often used Protein-Nucleic Acid Interactions in DNA-hybrid nanostructures. We introduce these op- tional modulation forces below. In our current implementation of the model, protein residues and nucleotides have no interaction except for excluded volume and optional explicitly specified spring Bending and Torsional Modulation potentials between user-designated protein residues and nucleotides: We introduce the torsional and bending potential as 2 optional interaction potentials in our protein representa- Vspring(r) = k (r − r0) (7) 5

TABLE II. Average and standard deviation of end-to-end distance of linkers in fully atomistic Gromacs simulation and fit spring constant k Linker hri (A)˚ hr2i (A)˚ k (pN/A)˚ LC-SPDP 1.71 0.32 3.99 FIG. 5. 2D molecular structures of common bioconjugate DBCO-triazole 3.64 2.87 0.14 linkers dubbed (a) LC-SPDP and (b) DBCO-triazole; both can be used to conjugate proteins to amine-modified nu- cleotides EXAMPLES

Our model is fully functional with the latest version of where r is the distance between the centers of mass of the the visualization tool oxView [50] for both the design of hybrid nanomaterials as well as the viewing of simulation respective particles and k and r0 and external parame- ters. trajectories. The one caveat is that protein topologies are non-editable. Instead each protein starts from their The excluded volume interaction potential between PDB crystal structure and is converted into oxDNA for- protein and DNA/RNA residues has the same form as mat while the ANM spring constant is set to best match defined in Eq. (2), with the respective interaction pa- the experimental B-factors via our provided scripts. The rameters given in Table I. In the oxDNA/oxRNA models, output files can then be loaded into oxView as well as each nucleotide has two distinct interaction sites (back- used for simulation in our model. bone and base), each of which is interacting with the The model is theoretically able to represent any pro- protein residue using separate excluded volume parame- tein or protein complex that the ANM model can rep- ters. Future expansion of the model will include an ap- resent. Not beyond the scope of our model, biologically proximate treatment of electrostatic interaction between relevant multi-chain proteins such as nucleosomes, RNA protein and nucleic acids based on Debye-H¨uckel theory polymerases, and viral assemblies can be also simulated, as implemented in oxDNA [19], as well as coarse-grained allowing for the nucleic acid behavior present in each of protein model AWSEM [41]. Many non-specific DNA- these systems to be modeled, studied, and compared to protein interactions make use of the electrostatic interac- experimental data. While the detailed study of these sys- tions between the DNA backbone and positively charged tems is beyond the scope of this article, we show exam- portions of the protein [42]. Sensitive to salt concentra- ples of both biological systems and designed nanosystems tion, these electrostatic contributions have been previ- as represented by our ANM-oxDNA or ANM-oxRNA ously modeled using Debye-H¨uckel theory [43] to inves- model. tigate the role of protein frustration in regulating DNA Two prominent cases of nucleic acid - protein inter- binding kinetics. Similarly an extension of our model actions, RNA polymerases and nucleosomes, were con- with an appropriate Debye-H¨uckel potential can capture structed and simulated using the ANMT model for fu- and enable study of non-specific DNA-binding protein ture study. As many PDB files are missing residues, we systems. first reconstruct each individual chain using the best scor- Since we are interested in exploring conjugated hybrid ing of ten models generated by the Modeller tool [51]. systems, it is necessary to have an approximation for The reconstructed RNA polymerase was converted into the covalent linkers bridging the nucleic acid base and oxDNA format from its PDB entry (6ASX) using an rmax protein residue. We model the two bioconjugate linkers, of 15 A.˚ A fragment of the RNA was reinserted into the LC-SPDP and DBCO-triazole, (Fig. 5) that are typically exit channel and the subsequent MD simulation was al- used in protein-DNA hybrid nanotechnology [44, 45] us- lowed to sample the RNA’s escape from the exit chan- ing a spring potential as defined in Eq. (7) with parame- nel. The reconstructed nucleosome was converted into ters k and r0 parametrized to mimic the end-to-end aver- oxDNA simulation format from its PDB entry (3LEL) age distance and standard deviation of each linker at tem- using an rmax of 12 A.˚ Attractive spring potentials be- perature 300K. LC-SPDP links the thiol group of a mod- tween randomly chosen DNA and protein residues that ified cysteine residue to an amine-modified nucleotide. were in close proximity in the PDB structure were added DBCO-trizaole is the product of a copper-free click re- at the histone/DNA interface. Example snapshots from action involving a DBCO-modified residue to link to an the MD simulations of these simulated biological systems azide-modified nucleotide. Each of the linkers (Fig. 5) are shown in Fig. 6a,b. was first drawn in MolView and then converted into While no process was explicitly modeled, our new OPLS/AA forcefield format via LibParGen [46–48]. In model can be used to explore behavior of large scale sim- GROMACS [49], each linker was first equilibrated and ulation of DNA and histones, as at the latest version of then simulated with OPLS/AA forcefield in SPCE wa- GPU cards, the oxDNA model has been shown to be ter molecules at 300K for 3 trials of 1 nanosecond each. able to equilibrate systems consisting of over 1 million The obtained averaged end-to-end distance and standard nucleotides. deviation during each trial are shown in Table II. More pertinent to our goal of aiding in the design of 6

(a) (b)

FIG. 6. OxView visualization of simulated biological assemblies (a) RNA in exit channel of paused RNA polymerase (PDB code: 6ASX) and (b) Human nucleosome made up of histone octamer and DNA (PDB code: 3LEL), (c) mean structure from MD simulation of KDPG aldolase (PDB code: 1WA3) conjugated to a DNA cage hybrid nanostructures, our model supports conversion of showed much higher rigidity with no stretched confor- CadNano, Tiamat, and other popular DNA origami de- mations when compared to the ANM model alone. Final sign tools into the oxDNA format [52] where they can end-to-end distances and standard deviation are shown easily be edited in oxView to include linked proteins of in Table III. interest. Since an ANM is a highly simplified model of protein dynamics, the predictive power of our model lies TABLE III. Average and standard deviation of end-to-end not in prediction of protein structure but rather the col- distance of hemagglutinin peptides between coarse-grained lection of statistical data of the protein’s effect on the models nucleic acid component of the system. Available and Model AWSEM ANM ANMT compatible with this model is also the suite of oxDNA Peptide 125 analysis scripts [50] allowing for a detailed exploration of hri (A)˚ 12.02 12.9 12.09 system specific effects. hr2i (A)˚ 4.9 4.51 4.34 Synthetic peptides are used in many chemistry applica- Peptide 149 tions. Since these peptides are often very small and lack hri (A)˚ 12.9 12.9 12.9 long-distance contacts that enforce specific 3D confor- hr2i (A)˚ 6.6 4.6 4.6 mations, we wanted to explore how our models perform Peptide 227 on these small structures. We compared the end-to-end hri (A)˚ 14.5 16.2 14.7 distance of 3 hemagglutinin binding peptides [53] simu- hr2i (A)˚ 7.4 5.4 5.1 lated in our ANM model, the ANMT model, and another Peptide 125 - CSGHNIYAQYGYPYDHMYEG popular coarse-grained protein model, AWSEM-MD [54]. Peptide 149 - CSGKSQEIGDPDDIWNQMKW For AWSEM-MD simulations, initial structure predic- Peptide 227 - CSGSGNQEYFPYPMIDYLKK tions were generated from sequence using I-TASSER [55]. A secondary structure weight (ssweight) file was gener- ated using jpred [56], and the structure and weight files Hybrid DNA-protein nanostructure constructs such as were converted to the appropriate formats for AWSEM- those developed by the Stepahanopoulos Lab are of par- MD simulation in LAMMPS [57] using tools provided ticular interest. The Stephanopoulos group has experi- with AWSEM-MD. Simulations were run for 109 steps mentally realized their size-tunable DNA cage attached with end-to-end distance printed every 105 steps. to homotrimeric protein KDPG aldolase making use of Using the classic ANM, each peptide was built us- a LC-SPDP linker (Fig. 5) to join the DNA and protein ing strong backbone connections and significantly weaker components [11]. The DNA cage was converted from Tia- long-range connections to empirically match the AWSEM mat format into oxDNA format and the protein was con- mean and standard deviation of the end-to-end dis- verted from it’s PDB structure. The linker between the tance. The resulting simulation of each peptide; how- components was modeled as a spring potential (Eq. (7)) ever, showed the trajectory to include a large amount of using the parameters from Table II. We conducted a short stretched, nonphysical conformations. The subsequent MD simulation of the full system corresponding to time inclusion of the bending and torsion modulation using the of about 30 ns. The mean structure from simulation of ANMT model allowed for the same level of accuracy using the experimental cage was calculated using our analysis only strong short-range connections. The ANMT model scripts [50] and is displayed in Fig. 6c. 7

CONCLUSIONS tructures. The subsequent analysis of the designs can be used to optimize nanostructure parameters, such as placement of the linkers and lengths of duplex segments We present a coarse-grained protein model, based on in order to achieve desired geometry. elastic network representation of proteins, for use in con- The simulation code is freely available on github.com/ junction with existing coarse-grained nucleic acid models sulcgroup/anm-oxdna and will also be incorporated in capable of simulating large hybrid nanostructures. Im- the future release of the oxDNA simulation package. plemented on GPU as well as CPU, our model allows for The visualization of protein-hybrid systems has been in- simulations of large systems based on nanotechnology de- corporated into our previously developed oxView tool signs as well as large biological complexes. [50]. The aforementioned analysis scripts and visualizer Looking forward, both the paused RNA polymerase are available in git repositories github.com/sulcgroup/ and histone are biological systems we plan to study us- oxdna_analysis_tools and github.com/sulcgroup/ ing this model. In addition, experimental systems such oxdna-viewer respectively. as the hybrid cage in Fig. 6 can be simulated and directly compared to available experimental data. While widely available, B-factors are severely limited particularly in ACKNOWLEDGEMENTS terms of accuracy. However, our model can be parame- terized to approximate any available fluctuation data in- We thank all members of the Sulcˇ group for their sup- cluding but not limited to fully atomistic simulation and port and helpful discussions, in particular to H. Liu and solution NMR data. In addition to the model, we also M. Matthies. We thank Dr. Stephanopoulos for helpful extended a nanotechnology design and simulation anal- comments and feedback about simulation study of his ysis tool, oxView, to include a protein representation to DNA-protein hybrid system. We acknowledge support aid computer design of DNA/RNA-protein hybrid nanos- from the NSF grant no. 1931487.

[1] N. C. Seeman, Journal of Theoretical Biology 99, 237 [15] C. M. Huang, A. Kucinic, J. A. Johnson, H. Su, and (1982). C. E. Castro, bioRxiv (2020). [2] W. Liu, M. Tagawa, H. L. Xin, T. Wang, H. Emamy, [16] S. M. Douglas, A. H. Marblestone, S. Teerapittayanon, H. Li, K. G. Yager, F. W. Starr, A. V. Tkachenko, and A. Vazquez, G. M. Church, and W. M. Shih, Nucleic O. Gang, Science 351, 582 (2016). Acids Research 37, 5001 (2009). [3] C. Geng and P. J. Paukstelis, Journal of the American [17] S. Williams, K. Lund, C. Lin, P. Wonka, S. Lindsay, and Chemical Society 136, 7817 (2014). H. Yan, in International Workshop on DNA-Based Com- [4] S. Li, Q. Jiang, S. Liu, Y. Zhang, Y. Tian, C. Song, puters (Springer, 2008) pp. 90–101. J. Wang, Y. Zou, G. J. Anderson, J.-Y. Han, et al., Na- [18] T. E. Ouldridge, A. A. Louis, and J. P. Doye, The Jour- ture biotechnology 36, 258 (2018). nal of chemical physics 134, 02B627 (2011). [5] F. Zhang, J. Nangreave, Y. Liu, and H. Yan, Journal of [19] B. E. Snodin, F. Randisi, M. Mosayebi, P. Sulc,ˇ J. S. the American Chemical Society 136, 11198 (2014). Schreck, F. Romano, T. E. Ouldridge, R. Tsukanov, [6] P. Guo, Nature nanotechnology 5, 833 (2010). E. Nir, A. A. Louis, et al., The Journal of chemical [7] R. V. Ulijn and R. Jerala, Chemical Society Reviews 47, physics 142, 06B613 1 (2015). 3391 (2018). [20] P. Sulc,ˇ F. Romano, T. E. Ouldridge, J. P. Doye, and [8] N. P. King, J. B. Bale, W. Sheffler, D. E. McNamara, A. A. Louis, The Journal of chemical physics 140, 235102 S. Gonen, T. Gonen, T. O. Yeates, and D. Baker, Nature (2014). 510, 103 (2014). [21] P. Sulc,ˇ F. Romano, T. E. Ouldridge, L. Rovigatti, [9] M. Madsen and K. V. Gothelf, Chemical reviews 119, J. P. K. Doye, and A. A. Louis, Journal of Chemical 6384 (2019). Physics 137, 5101 (2012). [10] N. Stephanopoulos, Chem 6, 364 (2020). [22] R. Sharma, J. S. Schreck, F. Romano, A. A. Louis, and [11] Y. Xu, S. Jiang, C. R. Simmons, R. P. Narayanan, J. P. Doye, ACS Nano 11, 12426 (2017). F. Zhang, A. M. Aziz, H. Yan, and N. Stephanopou- [23] M. C. Engel, D. M. Smith, M. A. Jobst, M. Sajfutdinow, los, ACS Nano 13, 3545 (2019). T. Liedl, F. Romano, L. Rovigatti, A. A. Louis, and J. P. [12] S. Li, Q. Jiang, S. Liu, Y. Zhang, Y. Tian, C. Song, Doye, ACS Nano 12, 6734 (2018). J. Wang, Y. Zou, G. J. Anderson, J. Y. Han, Y. Chang, [24] A. Suma, E. Poppleton, M. Matthies, P. Sulc,ˇ F. Ro- Y. Liu, C. Zhang, L. Chen, G. Zhou, G. Nie, H. Yan, mano, A. A. Louis, J. P. Doye, C. Micheletti, and B. Ding, and Y. Zhao, Nature Biotechnology 36, 258 L. Rovigatti, Journal of 40, (2018). 2586 (2019). [13] J. Jin, E. G. Baker, C. W. Wood, J. Bath, D. N. Woolf- [25] F. Hong, S. Jiang, X. Lan, R. P. Narayanan, P. Sulc,ˇ son, and A. J. Turberfield, ACS nano 13, 9927 (2019). F. Zhang, Y. Liu, and H. Yan, Journal of the American [14] E. deLlano, H. Miao, Y. Ahmadi, A. J. Wilson, M. Beeby, Chemical Society 140, 14670 (2018). I. Viola, and I. Barisic, Nucleic Acids Research (2020), [26] J. P. Doye, T. E. Ouldridge, A. A. Louis, F. Romano, 10.1093/nar/gkaa593. P. Sulc,ˇ C. Matek, B. E. Snodin, L. Rovigatti, J. S. 8

Schreck, R. M. Harrison, and W. P. Smith, Physical [44] A. Buchberger, C. R. Simmons, N. E. Fahmi, R. Free- Chemistry Chemical Physics 15, 20395 (2013). man, and N. Stephanopoulos, Journal of the American [27] M. Matthies, N. P. Agarwal, E. Poppleton, F. M. Joshi, Chemical Society 142, 1406 (2019). P. Sulc,ˇ and T. L. Schmidt, ACS nano 13, 1839 (2019). [45] Y. Xu, S. Jiang, C. R. Simmons, R. P. Narayanan, [28] B. Zhang, W. Zheng, G. A. Papoian, and P. G. Wolynes, F. Zhang, A.-M. Aziz, H. Yan, and N. Stephanopoulos, Journal of the American Chemical Society 138, 8126 ACS nano (2019). (2016). [46] L. S. Dodda, I. C. De Vaca, J. Tirado-Rives, and W. L. [29] R. V. Honorato, J. Roel-Touris, and A. M. Bonvin, Fron- Jorgensen, Nucleic Acids Research 45, W331 (2017). tiers in molecular biosciences 6, 102 (2019). [47] L. S. Dodda, J. Z. Vilseck, J. Tirado-Rives, and W. L. [30] A. R. Atilgan, S. R. Durell, R. L. Jernigan, M. C. Jorgensen, Journal of Physical Chemistry B 121, 3864 Demirel, O. Keskin, and I. Bahar, Biophysical Journal (2017). 80, 505 (2001). [48] W. L. Jorgensen and J. Tirado-Rives, Proceedings of the [31] S. K. Mishra and R. L. Jernigan, PLoS ONE 13 (2018), National Academy of Sciences of the United States of 10.1371/journal.pone.0199225. America, Tech. Rep. 19 (2005). [32] M. Gur, E. Zomot, and I. Bahar, Journal of Chemical [49] H. J. Berendsen, D. van der Spoel, and R. van Physics 139, 121912 (2013). Drunen, Computer Physics Communications, Tech. Rep. [33] L. Yang, G. Song, A. Carriquiry, and R. L. Jernigan, 1-3 (1995). Structure 16, 321 (2008). [50] E. Poppleton, J. Bohlin, M. Matthies, S. Sharma, [34] L. Rovigatti, P. Sulc,ˇ I. Z. Reguly, and F. Romano, Jour- F. Zhang, and P. Sulc,ˇ Nucleic acids research 48, e72 nal of computational chemistry 36, 1 (2015). (2020). [35] A. R. Atilgan, S. Durell, R. L. Jernigan, M. Demirel, [51]A. Saliˇ and T. L. Blundell, Journal of Molecular Biology O. Keskin, and I. Bahar, Biophysical journal 80, 505 (1993), 10.1006/jmbi.1993.1626. (2001). [52] A. Suma, E. Poppleton, M. Matthies, P. ulc, F. Romano, [36] Z. Sun, Q. Liu, G. Qu, Y. Feng, and M. T. Reetz, Chem- A. A. Louis, J. P. K. Doye, C. Micheletti, and L. Rovi- ical Reviews (2019), 10.1021/acs.chemrev.8b00290. gatti, J. Comput. Chem. 9999, 1 (2019). [37] E. Fuglebakk, N. Reuter, and K. Hinsen, Journal of [53] S. A. Johnston, V. Domenyuk, N. Gupta, M. T. Batista, Chemical Theory and Computation 9, 5618 (2013). J. C. Lainson, Z.-G. Zhao, J. F. Lusk, A. Loskutov, Z. Ci- [38] F. Xia, D. Tong, and L. Lu, Journal of Chemical Theory chacz, P. Stafford, et al., Scientific reports 7, 1 (2017). and Computation 9, 3704 (2013). [54] A. Davtyan, N. P. Schafer, W. Zheng, C. Clementi, P. G. [39] K. Xia, Physical Chemistry Chemical Physics 20, 658 Wolynes, and G. A. Papoian, The Journal of Physical (2017). Chemistry B 116, 8494 (2012). [40] M. Lu, B. Poon, and J. Ma, Journal of Chemical Theory [55] J. Yang and Y. Zhang, Nucleic acids research 43, W174 and Computation 2, 464 (2006). (2015). [41] M. Y. Tsai, W. Zheng, D. Balamurugan, N. P. Schafer, [56] A. Drozdetskiy, C. Cole, J. Procter, and G. J. Barton, B. L. Kim, M. S. Cheung, and P. G. Wolynes, Protein Nucleic acids research 43, W389 (2015). Science 25, 255 (2016). [57] S. Plimpton, Journal of computational physics 117, 1 [42] V. K. Misra, J. L. Hecht, A. S. Yang, and B. Honig, (1995). Biophysical Journal, Tech. Rep. 5 (1998). [43] A. Marcovitz and Y. Levy, Journal of Physical Chemistry B 117, 13005 (2013).