Design of Nucleic Acid Strands with Long Low-Barrier Folding Pathways

Nat Comput DOI 10.1007/s11047-016-9587-9 Design of nucleic acid strands with long low-barrier folding pathways Anne Condon1 · Bonnie Kirkpatrick2 · Ján Maˇnuch1 © The Author(s) 2017. This article is published with open access at Springerlink.com Abstract A major goal of natural computing is to design fully developed, that exploit base pairing interactions of biomolecules, such as nucleic acid sequences, that can be nucleic acids. Prominent examples include DNA strand dis- used to perform computations. We design sequences of placement systems (DSDs) (Seelig et al. 2006) and RNA nucleic acids that are “guaranteed” to have long folding origami systems (Geary and Andersen 2014). Our work here pathways relative to their length. This particular sequences is motivated by the goal of computing with a single RNA with high probability follow low-barrier folding pathways sequence as the nucleic acids of the sequence interact with that visit a large number of distinct structures. Long fold- each other. ing pathways are interesting, because they demonstrate that RNA sequences form folded structures in which pairs of natural computing can potentially support long and com- nucleic acids biochemically bond to each other. These bonds plex computations. Formally, we provide the first scalable change the physical energy of the sequence, and a given designs of molecules whose low-barrier folding pathways, sequence prefers to assume low-energy folded structures. with respect to a simple, stacked pair energy model, grow Folding is a dynamic process, constrained by kinetics, dur- superlinearly with the molecule length, but for which all sig- ing which an RNA sequence will move through a sequence nificantly shorter alternative folding pathways have an energy of structures with each differing from the previous one by barrier that is 2 − times that of the low-barrier pathway for the addition or removal of a single base pair. The process any >0 and a sufficiently long sequence. may reach a low-energy structure from a high-energy structure or simply maintain low energy (a process referred to Keywords Nucleic acid strands · Low-barrier pathways · as the natural “breathing” of the molecule). Folding path- Sequence design · Folding pathways ways will tend to meander along low-energy “valleys” in the landscape of secondary structures, rather than scaling high-energy “barriers”, even if the low-barrier valleys are 1 Introduction longer. Here, we focus on design of an RNA sequence that tra- Novel means for performing computations or designing verses a low-energy pathway as the molecule breathes. We nanostructures at the molecular level have been success- imagine molecules suspended in a solution, where each B molecule interacts only with itself. Our goal is to design Ján Maˇnuch nucleic acid strands that, because of kinetic folding con- [email protected] straints, are fated to follow inordinately long low-barrier Anne Condon pathways, relative to the strand length, from some initial to [email protected] target structure. We seek a scalable design that, for any n, Bonnie Kirkpatrick yields a strand of length Θ(n) such that all low-barrier path- [email protected] ways from initial to target visit a number of distinct structures 1 Department of Computer Science, University of British that grows superlinearly with n, while any shorter pathway Columbia, Vancouver, Canada has a significantly higher barrier and is unfavourable kineti- 2 Intrepid Net Computing, Dillon, MT, USA cally. 123 A. Condon et al. 1.1 Motivation and related work erties can contribute to fundamental scientific understanding of the diversity of folding pathways that are possible with the The motivation for our goal stems in part from the strengths basic building blocks of nature and, ultimately, applications and weaknesses of multi-stranded nucleic acid systems, such of this diversity. There has been much interest in inverse RNA as DNA strand displacement systems (DSDs), as a means secondary structure prediction, that is, computational design of molecular programming and design. Toehold-mediated of RNA sequences that fold into given (typically pseudoknot DNA strand displacement systems support circuit and arti- free) secondary structures (Andronescu et al. 2004; Busch ficial neural network computations via folding pathways and Backofen 2006; Dirks et al. 2004; Haleš et al. 2015; (Qian and Winfree 2011; Qian et al. 2011; Seelig et al. Jaeger et al. 2001; Leea et al. 2014; Schuster et al. 1994; Zhou 2006), and can in principle support general Turing machine et al. 2013). Leea et al. (2014) have developed eteRNA, a computations (Qian et al. 2011). Earlier designs of multi- crowd-sourcing approach to RNA secondary structure design state DNA machines also relied on multi-stranded folding and inference of design rules. Mathieson and Condon (2015) pathways involving the formation and breakage of hairpins provide designs of RNA sequences with folding pathway (Hagiya et al. 2006; Uejima and Hagiya 2004). Moreover, whose minimum barrier pathways from an initial to target multi-stranded DNA folding pathways are the means for real- structure are necessarily indirect, that is, involve base pairs izing molecular tweezers (Yurke et al. 2000), autonomous that are neither in the initial or target structure, and in addi- locomotors (Yin et al. 2008), and many other nano-scale tion may contain “repeats” where a base pair is removed and mechanical devices (Simmel and Dittmer 2005). In these later added back in again. The folding pathways introduced in examples, correct steps in a computation or execution of a this paper contain structures with both indirect repeated base device correspond to low-barrier pathways of multi-stranded pairs, with the repetitions occurring many times, in contrast pseudoknot-free structures; incorrect steps are unfavourable to just one repeat in the designs of Mathieson and Condon. because of high energy barriers. However, all of these com- Yet other related work pertains to the design of bistable putational or mechanical processes use a number of strands or multistable DNA or RNA molecules, inspired by biologi- that is proportional to the number of steps of the process. cal molecular switches. Molecular riboswitches are bistable Consider, for example, computations involving strand dis- RNA molecules in nature that are capable of changing struc- placement, in which so-called signal strands serve as the ture, and thus function, in changing environments; there is memory of the computation. Each signal strand is a reac- evidence that molecular switches facilitate processes such as tant in only one strand displacement step, becoming part viroid replication (Gultyaev et al. 1998) and gene expres- of a waste complex that is one of the products of the step. sion (Babitzke and Yanofsky 1993). Goals in the field Thus, DNA strand displacement processes use DNA strands of synthetic biology and its applications have motivated as a sort of write-once, read-once memory. If such a process rational computational design of synthetic riboswitches— occurs in a closed volume, that volume must be at least pro- subsequences of mRNA’s that regulate gene expression via portional to the number of steps of the process in order to structural changes—sometimes guided by properties of the accommodate all of the needed strands. This is very differ- RNA’s folding pathways (Beisel and Smolke 2009; Isaacs ent from typical silicon-based computations, where memory et al. 2006). Soukup and Breaker (1999) designed an RNA can be re-used. switch that changes its structure in the presence of certain lig- DSDs can in principle simulate volume-efficient compu- ands. Schultes and Bartel (2000) designed an RNA sequence tations, i.e., computations where the total length of strands whose bistable structures are motifs of two functionally dif- involved is polynomial in the input size, via multi-stranded ferent ribozymes, even though the two structures have no base folding pathways that have length exponential in the number pairs in common. There has also been work on design (or of strands involved (Thachuk and Condon 2012). However, redesign) of protein folding pathways (Kuhlman et al. 2002; such volume-efficient DSD computations would be difficult Nauli et al. 2001), motivated both by improving fundamental to carry out experimentally, in part because single copies of understanding of protein folding pathway processes and also some participating strands are needed. To avoid this diffi- by the goal of designing proteins that fold into biologically- culty and other practical limitations of multi-stranded DSDs, relevant structures with faster folding rates than wild-type it would be interesting to find a way to compute in a volume- protein sequences. efficient way within a single strand. A computation would Flamm et al. (2000) show how the design of multi-stable correspond to a folding pathway of the strand from some nucleic acid sequences can be cast as an optimization prob- input structure to a solution structure; the longer the path- lem with constraints, and have developed computational way, the longer the computation. methods to design sequences that satisfy the constraints. Apart from this computational motivation, developing Their methods can be used, for example, to design a sequence principles for design of nucleic acids whose structure or with two prescribed low-energy structures and a high energy kinetically-preferred folding pathways have unusual prop- barrier between these structures. The design goal that we 123 Design of nucleic acid strands with long low-barrier folding pathways consider here, namely to design sequences with long folding ideas in Sects. 2 and 3, we have put some technical details pathways, is quite different than the design goals for multi- in an “Appendix”. In Sect. 4, we argue heuristically that our stable sequences or switches, but our design incorporates designed pathway will be followed with high probability. both of these elements in more general ways than previ- We work with a simple energy model because we want to ous work.

Design of Nucleic Acid Strands with Long Low-Barrier Folding Pathways

Algorithms for Nucleic Acid Sequence Design

A Global Sampling Approach to Designing and Reengineering RNA

(12) United States Patent (10) Patent No.: US 7.058,515 B1 Selifonov Et Al

Nanoparticle‐Mediated DNA and Mrna Delivery

A Rule-Based Modelling and Simulation Tool for DNA Strand Displacement Systems

Design of Nucleic Acid Sequences for DNA Computing Based on a Thermodynamic Approach

2013 CIRC Poster Session Abstracts

Computer-Aided Design Software for Custom Nucleic Acid Nanostructures

Principles and Applications of Nucleic Acid Strand Displacement Reactions † ‡ † Friedrich C

Control of RNA Function by Conformational Design

Biomolecular Modeling and Simulation: a Prospering Multidisciplinary Field

The Design Charrette Ways to Envision Sustainable Futures the Design Charrette