DNA as a universal substrate for chemical kinetics

David Soloveichika,1, Georg Seeliga,b,1, and Erik Winfreec,1 aDepartment of Computer Science and Engineering, University of Washington, Seattle, WA 98195; bDepartment of Electrical Engineering, University of Washington, Seattle, WA 98195; and cDepartments of Computer Science, Computation and Neural Systems, and Bioengineering, California Institute of Technology, Pasadena, CA 91125

Edited by José N. Onuchic, University of California San Diego, La Jolla, CA, and approved January 29, 2010 (received for review August 18, 2009)

Molecular programming aims to systematically engineer molecular wire the components to achieve particular functions. Attempts to and chemical systems of autonomous function and ever-increasing systematically understand what functional behaviors can be ob- complexity. A key goal is to develop embedded control circuitry tained by using such components have targeted connections to within a chemical system to direct molecular events. Here we show analog and digital electronic circuits (10, 18, 19), neural networks that systems of DNA molecules can be constructed that closely ap- (20–22), and computing machines (15, 20, 23, 24); in each case, proximate the dynamic behavior of arbitrary systems of coupled complex systems are theoretically constructed by composing chemical reactions. By using strand displacement reactions as a modular chemical subsystems that carry out key functions, such primitive, we construct reaction cascades with effectively unimole- as boolean logic gates, binary memories, or neural computing cular and bimolecular kinetics. Our construction allows individual units. Despite its apparent difficulty, we directly targeted CRNs reactions to be coupled in arbitrary ways such that reactants can for three reasons. First, shoehorning the design of synthetic participate in multiple reactions simultaneously, reproducing the chemical circuits into familiar but possibly inappropriate comput- desired dynamical properties. Thus arbitrary systems of chemical ing models may not capture the natural potential and limitations equations can be compiled into real chemical systems. We illustrate our method on the Lotka–Volterra oscillator, a limit-cycle oscillator, of the chemical substrate. Second, there is a vast literature on a chaotic system, and systems implementing feedback digital logic the theory of CRNs (25, 26) and even on general methods to im- and algorithmic behavior. plement arbitrary polynomial ordinary differential equations as CRNs (27, 28). Third, as a fundamental model that captures

molecular programming ∣ mass-action kinetics ∣ strand displacement the essential formal structure of chemistry, implementation of BIOPHYSICS AND cascades ∣ chemical reaction networks ∣ nonlinear chemical dynamics CRNs could provide a useful programming paradigm for mole- COMPUTATIONAL BIOLOGY cular systems. hemical reaction equations and mass-action kinetics provide Here we propose a method for compiling an arbitrary CRN Ca powerful mathematical language to describe and analyze into nucleic-acid-based chemistry. Given a formal specification chemical systems. For well over a century, mass-action kinetics of coupled chemical kinetics, we systematically design DNA mo- has been used to model chemical experiments and to predict lecules implementing an approximation of the system scaled to an appropriate temporal and concentration regime. Formal species and explain their dynamical properties. Both biological and non- CHEMISTRY biological chemical systems can exhibit complex behaviors such as are identified with certain DNA strands, whose interactions are oscillations, memory, logic and feedback control, chaos, and pat- mediated by a set of auxiliary DNA complexes. Nonconserving tern formation—all of which can be explained by the correspond- CRNs can be implemented because the auxiliary species implic- ing systems of coupled chemical reactions (1–4). Whereas the use itly supply energy and mass. of mass-action kinetics to describe existing chemical systems is Conveniently, the base sequence of nucleic acids can deter- well established, the inverse problem of experimentally imple- mine reactivity not only through direct hybridization of single- menting a given set of chemical reactions has not been considered stranded species (29) but also through branch migration and in full generality. Here, we ask: Given a set of formal chemical strand displacement reaction pathways (30–32). These powerful X ;X ; …;X reaction equations, involving formal species 1 2 n, can reaction primitives have been used previously for designing M ;M ; …;M we find a set of actual molecules 1 2 m that interact nucleic-acid-based molecular machines with complex behaviors, ’ in an appropriate buffer to approximate the formal system s such as motors, logic gates, and amplifiers (33–37). Here we use mass-action kinetics? If this were possible, the formalism of these reaction mechanisms as the basis for the implementation of chemical reaction networks (CRNs) could be treated as an effec- arbitrary CRNs. Our work advances a systematic approach that tive programming language for the design of complex network aims to provide a general mechanism for implementing a well- behavior (5–9). specified class of behaviors. Unfortunately, a formally expressed system of coupled chemi- cal equations may not have an obvious realization in known Molecular Primitive: Strand Displacement Cascades chemistry. In a formal system of chemical reactions, a species Because simple hybridization reactions cannot be cascaded, we may participate in multiple reactions, both as a reactant and/or use the more flexible strand displacement reaction as a molec- as a product, and these reactions progress at relative rates deter- ular primitive. [We use “strand displacement” as a shorthand for mined by the corresponding rate constants, all of which imposes toehold-mediated branch migration and strand displacement formidable constraints on the chemical properties of the species participating in the reactions. For example, it is likely hard to find a physical implementation of arbitrary chemical reaction equa- A preliminary version of this work appeared as ref. 52. tions using small molecules, because small molecules have a lim- Author contributions: D.S., G.S., and E.W. designed research, performed research, ited set of reactivities. and wrote the paper. Thus, formal CRNs may appear to be an unforgiving target for The authors declare no conflict of interest. general implementation strategies. Indeed, most experimental This article is a PNAS Direct Submission. work in chemical and biological engineering has started with Freely available online through the PNAS open access option. — particular molecular systems genetic regulatory networks (10), 1To whom correspondence may be addressed. E-mail: [email protected], gseelig@ RNA folding and processing (11), metabolic pathways (12), signal u.washington.edu, or [email protected]. transduction pathways (13), cell-free enzyme systems (14, 15), This article contains supporting information online at www.pnas.org/cgi/content/full/ and small molecules (16, 17)—and found ways to modify or re- 0909380107/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.0909380107 PNAS Early Edition ∣ 1of6 12 (1) 2 (2) 2 (3) 23 1 23 12 3 Implementing Networks We implement CRNs in DNA by using three types of modules, 1* 2* 3* 1* 2* 3* which specify the DNA molecules to create for each target reac- 23 12 single-reaction model tion (either unimolecular or bimolecular) and for each target spe- 1* 2* 3* 1* 2* 3* cies. Initially, we suppose that our target CRN uses rate constants and concentrations that are realistic for aqueous-phase nucleic- Fig. 1. Strand displacement molecular primitive. Domains are labeled by acid strand displacement reactions. Additionally, the signal spe- numbers, with * denoting Watson–Crick complementarity. Multiple elemen- cies must always be at much lower concentrations than the tary steps are indicated: (1) binding of toeholds 1 and 1; (2) branch migra- auxiliary species. Thus, the feasibility of a CRN depends upon tion, a random walk process where domain 2 of strand 2–3 is partially its behavior; for example, we can implement explosive reactions displaced by domain 2 of strand 1–2; (3) the separation of toeholds 3 and like X1 → X 1 þ X 1 for only a limited time. If the target CRN op- 3 .(Inset) The single-reaction model of strand displacement. erates outside of a physically realistic regime, we will need to re- lax the goal of implementing the exact target system and instead reactions (33, 38), combined with the principles of toehold allow a uniform scaling down of concentrations and/or slowing sequestering (34) and toehold exchange (35).] down of the kinetics. We will show that such scaling can also ar- Intuitively, in a strand displacement reaction (Fig. 1) a strand bitrarily increase the accuracy of our construction. (X) displaces another strand (Y) from a complex (G). In our dia- grams, each subsequence that acts as a single functional unit Unimolecular Reactions (a domain) is labeled with a unique number (e.g., the two func- DNA Implementation. Standardizing the molecular equivalents of tionally distinct domains of strand X are 1 and 2). We assume the formal species facilitates their assignment to roles as reactants strong sequence design such that domain x is complementary or products in multiple reactions. We implement a formal species to domain x and interacts with no other, allowing us to analyze as a single-stranded DNA molecule (signal strand) consisting of DNA systems entirely at the domain level. Short (dashed) do- an inert history domain and a unique species identifier (a long mains are called toeholds. A strand displacement reaction starts domain flanked by two toehold domains; Fig. 2, orange boxes) when two single-stranded toehold domains (e.g., 1 and 1) bind that regulates its interactions with other molecules. The history each other. Then, a random walk process (branch migration) fol- domain depends on the formal reaction producing the species: If lows, where two domains with identical sequences compete for X j is a product of multiple reactions, signal strands with differing the same complementary domain. Domain 2 of strand 2–3 com- history domains but the same species identifier would represent petes with and is partially displaced by domain 2 of strand 1–2. the same formal species X j. In our scheme, the signal strand is Finally, the initially bound strand 2–3 is released when toeholds 3 active when fully single-stranded and inactive when bound to an- and 3 separate. Toobtain a fast and reliable reaction, the toehold other DNA molecule. Signal strands react exclusively by strand domains must be short (e.g., 6 nt) whereas the branch migration displacement reactions initiated at the left toehold of their spe- domains must be long (e.g., 20 nt). Despite the complex mecha- cies identifier; thus, sequestering the left toehold into a double nism, a strand displacement reaction is well modeled by a single helix is sufficient to inactivate them. The sum of the concentra- reversible reaction (38, 39) under a large range of experimental tions of the active signal strands for X j gives ½Xj, and the changes conditions where toehold binding is rate limiting (Fig. 1, Inset). in ½X j are effected either by the inactivation of a signal strand Removing toehold 3 slows the reverse reaction enough that we when it binds to a complementary DNA molecule or by the re- can consider the forward reaction effectively irreversible (38, 40). lease of a previously bound strand. Strand displacement reactions are programmable by the design Being composed of entirely distinct domains, DNA signal of sequences. A single base mismatch significantly impedes strands do not interact with each other directly, but rather a branch migration (32, 41). By varying the binding strength of set of auxiliary DNA complexes present at large concentrations the initiating toeholds, the reaction rate constant can be con- mediates exactly the desired reactions. Suppose the ith formal trolled over 6 orders of magnitude (38, 39). In turn, the length reaction is unimolecular, as in Fig. 2. We implement reaction and sequence composition of the toeholds controls their binding equation 1 as inactivation of a signal strand with species identifier strength. The forward and reverse rate constants q and r increase X 1 coupled with activation of strands with identifiers X 2 and X 3. with the hybridization energy of their respective toehold domains These chemical steps are done through two strand displacement (1 to 1 and 3 to 3 ). The same strand (X) can react at different reactions involving two auxiliary complexes Gi and Ti (for “gate” rates to different complexes, e.g., G and G0. To attain smaller and “translater,” respectively) as shown in Fig. 2. A two-step cas- q0 q G0 1 < for , we can use a toehold domain q0 that is not a full cade ensures that arbitrary species identifiers can be connected in 1 1 complement of 1. For example, q0 can be a truncation of (39) this reaction because there is no sequence dependence between (Fig. S1). X 1, X 2, and X 3. Toimplement other effective net reactions—even In the following, we construct systems of molecules whose in- catalytic or autocatalytic reactions such as X 1 → X 1 þ X 2 or teractions are mediated by strand displacement reactions. To en- X 1 → X1 þ X1—only the auxiliary complexes need to be modi- sure desired reactions and exclude undesired reactions, we use fied. A system of multiple coupled unimolecular reactions is modular components that incorporate three key design princi- implemented simply by starting with the auxiliary complexes ples. First, we design the system so that all long domains are for all the desired reactions, along with appropriate initial con- always double-stranded, thus ensuring that toeholds and their centrations of the signal strands. Implementing unimolecular re- complements are the only single-stranded domains capable of hy- actions with differing numbers of products requires extending or bridizing together. Second, we require that toeholds are short en- shrinking the intermediate output Oi. Removing auxiliary com- ough that this interaction is fleeting unless it triggers strand plex Ti altogether results in unimolecular decay reactions like displacement. Thus we need to consider only strand displace- X 1 → ∅, where ∅ indicates the absence of products. ment, and not hybridization, of long domains. Third, only the de- sired target must have the correct combination of toehold and Kinetic Analysis. To approximate ideal unimolecular mass-action displacement regions, preventing any undesired strand displace- kinetics, we require the dynamics of the target reaction network ment reactions. Satisfying these three design principles, which to be slow relative to the fastest strand displacement steps and the can be verified by inspection of the modules, guarantees that ar- concentration of auxiliary DNA species to be much larger than bitrarily complex systems will function as intended for ideal the signal concentrations. Here and in the next section, we strand displacement reactions. present an informal argument that the desired kinetics will be

2of6 ∣ www.pnas.org/cgi/doi/10.1073/pnas.0909380107 Soloveichik et al. A 7 11 reaction [1] species 4 identifier ? 10 q 123 2 3 104 11 7 implement ?123 23 i

1*q 2* 3* 1*q 2* 3* [2] X1 i i Oi Gi waste [3] B simplify 6 9 species species 5 8 2 identifier identifier 7 3 104 11 7 [4] 2 3 104 11 7 10 4 11 10 4 5 6 11 7 8 9 3* 10* 4* 11*7* 3* 10* 4* 11* 7* [5] Oi X2 X3 Ti waste

Fig. 2. Unimolecular module: DNA implementation of the formal unimolecular reaction X1 → X2 þ X3 with reaction index i. Orange boxes highlight the DNA species that correspond to the formal species X1 (species identifier 1-2-3), X2 (4-5-6), and X3 (7-8-9). Domains identical or complementary to species identifiers for X1, X2, and X3 are colored red, green, and blue, respectively. Black domains (10 and 11) are unique to this formal reaction. To reduce rate constant qi , 1 G X X toehold domain qi may not be a full complement of domain 1. (A) Complex i undergoes a strand displacement reaction with strand 1, with 1 displacing O O X X T q k C−1 q γ−1k C−1 strand i .(B) i displaces 2 and 3 from complex i . Without buffer cancellation, i ¼ i max; with buffer cancellation, i ¼ i max. Reaction equations of type 2–3 are used in simulations (Figs. 5 and 6); simplified reaction equations 4–5 are useful for analysis.

SI Text B T C q k attained under these conditions. In we prove the conver- i, and i to max and partial-toehold rate constant i ¼ i. τk X X ≪ C X ≪ gence to the desired kinetics in the limit of high concentration of By assuming i maxð½ 1Þ maxð½ 2Þ max and maxð½ 1Þ C L B T appropriate auxiliary species relative to the concentrations of the max, the concentrations ½ i, ½ i, and ½ i remain effectively con- signal species. stant throughout the duration of the experiment, and reaction 8 is For simplicity, we assume all full toehold domains have equal the rate-limiting step of the pair 8–9. This system of reactions can binding strength yielding a maximum strand displacement rate then be approximated by reaction equations 10 and 11. Further, q C X constant max.Let max be the starting concentration of auxiliary (informally) if in the target reaction network ½ 1 varies on a complexes Gi and Ti. The strand displacement cascade of Fig. 2 slower time scale than that of the equilibration of reaction 10, follows reaction equations 2 and 3, where qi ≤ q is a partial- we can assume that X1 and Hi attain instant equilibrium through

max BIOPHYSICS AND 10 X ∕ H q ∕k toehold strand displacement rate constant controlled by the bind- reaction with ½ 1 ½ i¼ max i. Then the instantaneous rate 1 1 11 q X H k X X COMPUTATIONAL BIOLOGY ing energy of domains qi and 1, and qi is chosen to obtain the of reaction is max½ 2½ i¼ i½ 1½ 2. At first glance this ap- desired rate constant for formal reaction i. Letting τ be the ex- pears to give the correct kinetics for the production of X 3; how- periment’s duration, we assume a regime where τki maxð½X1Þ ≪ ever, X 3 is buffered: Some fraction of it will quickly equilibrate C k ≪ q C H X max, and i max max, and set the partial-toehold strand dis- with i0 of other reactions in which 3 is the first reactant, result- q k ∕C X X X placement rate constant i ¼ i max. Then over a sufficiently ing in a lower effective rate of production of 3. 1 and 2 are short duration τ, ½Gi and ½Ti remain effectively constant at their likewise buffered. Let frðX 1Þ be the set of bimolecular reactions C X 10 CHEMISTRY initial concentration max, and the cascade becomes equivalent to in which 1 is the first reactant. Reactions in the different re- a pair of unimolecular reactions (Fig. 2, Eqs. 4 and 5, where action modules create instant equilibrium between all species in q C k 4 X X X H i max ¼ i). Further, reaction is the rate-limiting step, and the equilibrium set of 1,esð 1Þ, consisting of 1 and every i, we obtain the desired unimolecular kinetics for formal reaction 1: i ∈ frðX 1Þ. Producing or consuming a unit of any species in the The instantaneous rate of the consumption of X 1 and the produc- equilibrium set of X1 results in the fraction γ1 ≡ ½X 1∕½esðX1Þ tion of X 2 and X 3 in this module is ki½X 1. of a unit change in ½X 1. Thus the effective instantaneous rates of the consumption of X 1, consumption of X 2, and produc- Adding Bimolecular Reactions tion of X3 because of reaction 11 in the above module are DNA Implementation. We now extend the construction to realize γ1ki½X 1½X 2, γ2ki½X1½X 2, and γ3ki½X 1½X2, respectively. k ≪ q bimolecular in addition to unimolecular reactions. Suppose the Making the additional assumption i max shifts equilibria of ith formal reaction is bimolecular, as in Fig. 3. Implementing re- reactions 10 to the left such that γj approaches 1. Then the in- action equation 6, X 1 must not be consumed in the absence of X 2 stantaneous rates of variation in X 1, X 2, and X 3 because of this and vice versa, which seems difficult because X 1 and X 2 cannot reaction module approach the desired bimolecular kinetics value react directly, and therefore one of them (say, X1) must react with of ki½X 1½X 2. an auxiliary complex first without knowing whether the other is present. The challenge here is to minimize the X 2-independent Canceling out the Buffering Effect. We can exactly cancel out the k ≪ q “drain” on X 1; later we will show how we can exactly compensate buffering effect, thereby easing the requirement i max for bi- for it by rescaling rate constants. molecular reactions and allowing the implementation of much Our construction (Fig. 3) solves this problem by introducing a larger rate constants. Intuitively, because the buffering effect sys- reversible first step. Strand X 1 reversibly displaces Bi (the “back- tematically counteracts desired concentration changes of X j when ward” strand) from auxiliary complex Li, producing activated in- γj < 1, we should be able to compensate by increasing the rate termediate Hi.IfnoX2 is present, Bi can reverse the reaction and constants of all reactions involving X j. Because a reaction be- release X 1 back into solution. But if X 2 is present, it can react tween species with different γj will behave as if it has incorrect with intermediate Hi to release intermediate output Oi.Asin stoichiometry, we want all γj to be equal. However, even with −1 the unimolecular module, this in turn releases the final output equal γj, we cannot simply multiply all ki by γj because that X 3 and makes the overall reaction irreversible. The species iden- would in turn change γj. σ Σ k tifiers of the reactants and all products can be arbitrarily specified Let j ¼ i∈frðXjÞ i be the sum of formal rate constants of as in the unimolecular modules. Not all Bi are necessarily unique bimolecular reactions with Xj as the first reactant, and let 0 DNA species: If reactions i and i have the same reactants, then Bi σ ¼ maxjfσjg. We ensure that the fraction γj is equal for all X j would have the same sequence as Bi0 . by using a new buffering module (Fig. 4) for each X j for which σj < σ. Each buffering module is modeled as reaction equation 12 LS BS C Kinetic Analysis. The kinetics of the module of Fig. 3 is modeled . Initially ½ j¼½ j¼ max, and as for the bimolecular mod- using reaction equations 7–9. Set the initial concentration of Li, ule we can reduce 12 to 13. The equilibrium set of X j now

Soloveichik et al. PNAS Early Edition ∣ 3of6 species A identifier 7 7 12 ? 12 243 56 qi 1 2563 [6] ? 123 + 243 reaction * 2*3*4* 5* 6* * 2*3*4* 5* 6* + implement 1qi 1qi X1 Bi Li Hi [7] species B identifier 7 [8] 12 ? 1 2563 ? 1 25? 46 ?456 5 6 12 7 [9] + * 2*3*4* 5* 6* * 2*3*4* 5* 6* + 1qi 1qi X2 waste Oi Hi simplify C species 9 identifier 8 5 [10] 5612 7 12 7 6712 12 7 8 9 + + [11] 6* 12* 7* 6* 12* 7* Oi X3 Ti waste

Fig. 3. Bimolecular module: DNA implementation of the formal bimolecular reaction X1 þ X2 → X3 with reaction index i. The black domain (12) is unique to this formal reaction. (A) X1 reversibly displaces Bi from complex Li producing complex Hi .(B) X2 displaces Oi from complex Hi . Occurrence of reaction B pre- −1 cludes the backward reaction of A.(C) Oi displaces X3 from complex T i . Without buffer cancellation, qi ¼ ki ; with buffer cancellation, qi ¼ γ ki. Reaction equations of type 7–9 are used in simulations (Figs. 5 and 6); simplified reaction equations 10–11 are useful for analysis.

includes HSj. It is not hard to show that with the buffering-scaling all formal unimolecular rate constants by a factor of δ, and in- γ−1 q q − σ −1 qs γ−1 σ − σ τ δ factor ¼ maxð max Þ , setting j ¼ ð jÞ, and repla- creasing by a factor of to simulate the same behavior. 0 −1 cing ki by ki ¼ γ ki for the formal rate constants when setting qi in reactions 2 and 7, we obtain a common buffer fraction γj ¼ γ, Examples −1 – which is exactly compensated for by the γ scaling of formal rate We illustrate our method on the Lotka Volterra chemical oscil- A constants. For initial concentrations cj of formal species X j,we lator shown in Fig. 5 . The concentration oscillations are in the −1 – start with initial concentrations γ cj of DNA strands X j. Then range of about 0 2. Under typical nucleic-acid experimental con- all species in esðXjÞ quickly equilibrate yielding ½Xj¼cj. ditions, maximal second-order rate constants for strand displace- SI Text ment reactions are about 106∕M∕s, and maximum concentrations See for a summary of our algorithm for compiling an −5 arbitrary CRN into DNA-based chemistry. are on the order of 10 M. Tofit into this regime with reasonable simulation accuracy and time span, we scale the original system by Rescaling for Feasibility and Accuracy a time-scaling factor of α ¼ 300, and a concentration-scaling fac- The above procedure will not yield a functional DNA system if tor β ¼ 10−8 employing units of seconds and molar, and use C 10−5 q 106∕ ∕ the original formal CRN has infeasibly high reaction rates or con- max ¼ M and max ¼ M s. The DNA species for centrations. In that case, we scale the system to use lower rate our implementation, including buffering modules, are shown in constants and concentrations while maintaining the same, albeit Fig. S2 and the possible strand displacement reactions in scaled, behavior. If ½XjðtÞ are solutions to differential equations Fig. S3. The equations governing our DNA implementation as arising from a set of unimolecular and bimolecular reactions, derived by using the transforms described in Figs. 2–4 are shown then β · ½Xjðt∕αÞ are solutions to the same set of reactions but in Fig. 5B. Simulations shown in Fig. 5C confirm that the DNA in which we multiply all unimolecular rate constants by 1∕α implementation nicely approximates the ideal formal chemical 1∕ α β C ∞ and all bimolecular rate constants by ð · Þ. We introduce a system. Because max < , deviations between the DNA imple- mixed concentration-time-scaling parameter δ with α ¼ δ and mentation and the target system gradually develop, as the deple- β ¼ 1∕δ, which scales down the concentration and slows down tion of complexes L1, G1, and G2 and the buildup of strands B1 the dynamics by a factor of δ without increasing the largest rate alter the effective rate constants. constant. We next apply our construction to more complex systems We justified the accuracy of our construction by assuming the (Fig. 6). For fastest behavior, all systems are scaled so that the q qs q 106∕ ∕ C 10−5 target system operates in a regime with concentrations suffi- largest i or j is max ¼ M s, and max ¼ M, leaving C δ ciently smaller than max, a physically determined parameter. only one free scaling parameter , which determines both imple- This may not hold without rescaling, but thankfully, arbitrarily mentation accuracy and the speed of the dynamics. The Orego- high accuracy for arbitrarily large duration of interest τ can still nator limit-cycle oscillator (Fig. 6A) is a simplified model of the be attained in a regime of smaller concentrations of formal spe- Belousov–Zhabotinsky reaction (43). The DNA implementation cies and slowed down dynamics. We prove the convergence of the of this system is relatively slow because of the wide range of rate DNA-based kinetics with buffer cancellation to the target CRN in constants. The chaotic system due to Willamowski and Rössler C → ∞ the limit max by using singular perturbation theory (27, 42) (44) exhibits complex concentration fluctuations and is particu- SI Text C → ∞ B (see ). Whereas taking the limit max is more math- larly sensitive to perturbations at long time scales (Fig. 6 ). C δ ematically convenient, increasing max by a factor of is equiva- The DNA implementation follows the ideal system relatively well lent, up to scaling in concentration of the experimental for a few revolutions around the strange attractor. Fig. 6C implementation, to decreasing all ½X j by a factor of δ, decreasing demonstrates the efficacy of our construction for implementing

species identifier ? qs simplify ? 456 5136 qs2 4 5 6 j 5136 [12] [13] * 5* 6*13* * 5* 6*13* 4qs 4qs2 X2 2 BS2 LS2 HS2

Fig. 4. Buffering module: DNA implementation of the buffering module used to cancel out the buffering effect. A buffering module is needed for each formal species Xj for which σj < σ. The buffering module for species X2 is shown. X2 reversibly displaces BS2 from complex LS2 to produce complex HS2 similarly to the −1 first reaction of the bimolecular module. The black domain (13) is unique to this buffering module for species X2.Setqsj ¼ γ ðσ − σjÞ. Reaction equations 12 are used in simulations (Figs. 5 and 6); simplified reaction equations 13 are useful for analysis.

4of6 ∣ www.pnas.org/cgi/doi/10.1073/pnas.0909380107 Soloveichik et al. ABIdeal chemical reactions DNA reaction modules C Simulation of ideal and DNA reactions 20 1: buffering module ideal DNA 2: 1 15

3: 10

unscaled scaled 1.5 5.105 /M/s 2 5

1 1/300 /s concentration (nM) 1 1/300 /s 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 3 time (hrs)

Fig. 5. Lotka–Volterra chemical oscillator example. (A) The formal chemical reaction system to be implemented with original (unscaled) and scaled rate con- stants. Desired initial concentrations of X1 and X2 are 2 and 1 unscaled and 20 and 10 nM scaled. (B) Reactions modeling our DNA implementation. Each formal reaction corresponds to a set of DNA reactions as indicated. Species X2 requires a buffering module because σ2 < σ (σ ¼ σ1 ¼ k1 and σ2 ¼ 0). Maximum strand q 106 −1 −1 G T L B LS BS C 10 μ displacement rate constant max ¼ M s and initial concentration of auxiliary species i, i , i , i, j, and j is max ¼ M. Buffering-scaling factor γ−1 q q − σ −1 2 X X γ−120 40 γ−110 20 ¼ maxð max Þ ¼ . The initial concentrations of strands 1 and 2 introduced into the system is nM ¼ nM and nM ¼ nM. (C) Plot of the concentrations of X1 (Red curve) and X2 (Green curve) for the ideal system (Dashed line) and the corresponding DNA species (Solid line). a chemical logic circuit responding to an external input signal This limit also constrains choices for reaction rate constants but (black trace). The 2-bit counter is a classic example of a digital can be countered by adjustment of auxiliary complex concentra- circuit with feedback (45); the high or low values of the red and tions. More serious issues are presented by leak reactions in green output species give the binary count of the number of input which an output is produced even if no input is present. Although pulses, 0–3. This feedback circuit contrasts with the use-once cir- experimentally characterized strand displacement systems exhibit cuits of ref. 34. Fig. 6D shows a different style of algorithmic be- leak rate constants up to a million times slower than the fastest havior: a state machine (5, 6). This state machine increments the desired reactions (35, 39), a leak could pose a problem for some number of green spikes between consecutive red spikes by 1 target CRNs. One way to ameliorate a leak, while also allowing every time. for unbounded running times, would be to provide auxiliary com- plexes at low concentrations in a continuous-flow stirred-tank re- Experimental Considerations actor (1). These issues are discussed further in SI Text. BIOPHYSICS AND

The correctness of our systematic construction was predicated on COMPUTATIONAL BIOLOGY several idealizations of DNA behavior, and it is worth considering Conclusions the deviations that we would expect in practice. A good approxi- With a rich history and extensive theoretical and software tools, mation to strong sequence design (domain x binds exclusively formal CRNs are a powerful descriptive language for modeling to x) should be possible for several thousand long domains by chemical reaction kinetics. By providing a systematic method using existing techniques developed for strand displacement sys- for compiling formal CRNs into DNA molecules, our work tems (46). There are a limited number of short toehold domain suggests that CRNs can also be regarded as an effective program- CHEMISTRY sequences available, but it is straightforward to modify our con- ming language and used prescriptively for the synthesis of unique struction to reuse toehold sequences without introducing errors. molecular systems. This view is bolstered by the fact that CRNs

A Oregonator (limit cycle oscillator) B Rössler (chaotic) nM nM 5 5 4 4 3 3 2 2 1 1

0 50 100 150 200 250 hr 0 10 20 30 40 hr 2-bit pulse counter Incrementer state machine (algorithmic) nM v:=0 C (digital circuit) D 60 nM w:=0 20 30 10 v > 0? 0 0 where no yes 60 20 w:=w+1 v:=v-1 30 v:=w 10 0 0 0 20 40 60 80 100 hr 0 20 40 60 80 100 hr

where x z clock made catalyzed by clk1 catalyzed by clk2 catalyzed by clk3 y logic thresholding from chemical oscillator: v1 v2 + v3 r1 r2+ r3 v2 v3 v1 x(0)+y(1) onw w(1)+w(1) onz dual rail clk1 c1 c2 + c3 c2 c3 c1 onw+w(0) onw+w(1) w(0)+w(0) off representation: z w2 w1 i r3 r1 species value x(1)+w(1) x(1)+w(0) on +z(0) on +z(1) x(0) x(1) x z z y(0)+w(1) y(0)+w(0) off +z(1) off +z(0) high low 0 z z clk3 clk2 v2 + c2 r2+ v2 d+ v2 c2 + r 2 c2 + i + w1 low high 1 d+v1 i+ w1 i+ v1+ w2 anytime q 106 −1 −1 Fig. 6. Examples showing more complex behavior. In all the maximum strand displacement rate constant max ¼ M s and the initial concentration of C 10 μ – auxiliary species max ¼ M. See Figs. S4 and S5 and SI Text for the rate constants used in A D, as well as the details of C and D. Plots show the ideal CRN (Dashed lines) and the DNA reactions (Solid lines).

Soloveichik et al. PNAS Early Edition ∣ 5of6 can implement arbitrary ordinary differential equations, analog specific strand displacement reactions, suggesting that it could and digital circuits, Turing-universal behavior, and—in the limit be adapted for other molecular substrates, such as RNA (11) of small signal species concentrations and finite reaction volume or even peptides (17). —stochastic CRNs (47, 48). It is perhaps surprising that a simple Many important chemical behaviors, such as pattern formation primitive such as sequence-specific strand displacement proves and self-assembly, involve geometric aspects that are beyond the sufficient for the implementation of arbitrarily complex chemical scope of well-stirred reactions considered here. However, with a reaction kinetics, somewhat akin to how a few basic electronic slight extension of our approach—controlling the relative diffu- elements such as transistors and wires are sufficient for the con- — struction of arbitrarily complex logic circuits. sion rates of signal species it would be possible to implement Our construction has several notable features for future mo- arbitrary reaction-diffusion systems (1). Though more challen- lecular programming efforts. First, the signal species, entirely un- ging, regulating molecular assembly and disassembly steps with reactive by themselves, react only when immersed in a complex DNA signal species would allow for qualitatively more powerful buffer of auxiliary complexes. Thus, when implementing a system, computational systems (50) and systems capable of movement one focusses on designing the buffer rather than the signal and construction of molecular-scale objects. We hope that the species, which stay the same regardless of the system. Our con- methods presented here provide a framework for successfully struction also highlights the rich dynamical behavior possible in generalizing and unifying the existing experimental achievements multistranded nucleic-acid systems even without enzymes and in this direction (36, 37, 51). covalent bond modification. Furthermore, established techniques for interfacing DNA to other chemistries (e.g., ref. 49) can enable ACKNOWLEDGMENTS. We thank L. Cardelli, D. Dotty, L. Qian, D. Zhang, programmable DNA reaction networks to respond to or control J. Schaeffer, and M. Magnasco for useful discussions. This work was supported non-nucleic-acid systems, such as complex chemical synthesis by National Science Foundation Grants EMT-0728703 and CCF-0832824 and pathways, biomedical diagnostics in vitro, or cellular functions Human Frontier Science Program Award RGY0074/2006-C.D.S. D.S. was sup- in vivo. Finally, our construction and variants thereof (48) rely ported by the CIFellows project. G.S. was supported by the Swiss National exclusively on a very simple molecular primitive, sequence- Science Foundation and a Burroughs Wellcome Fund CASI award.

1. Epstein IR, Pojman JA (1998) An Introduction to Nonlinear Chemical Dynamics: Oscilla- 28. Korzuhin M (1967) Oscillatory Processes in Biological and Chemical Systems (Nauka, tions, Waves, Patterns, and Chaos (Oxford Univ Press, London). Moscow), (in Russian), pp 231–251. 2. Arkin A, Ross J (1994) Computational functions in biochemical reaction networks. Bio- 29. Morrison L, Stols L (1993) Sensitive fluorescence-based thermodynamic and kinetic phys J 67:560–578. measurements of DNA hybridization in solution. Biochemistry 32:3095–3104. 3. Tyson J, Chen K, Novak B (2003) Sniffers, buzzers, toggles and blinkers: Dynamics of 30. Green C, Tibbetts C (1981) Reassociation rate limited displacement of DNA strands by regulatory and signaling pathways in the cell. Curr Opin Cell Biol 15:221–231. branch migration. Nucleic Acids Res 9:1905–1918. 4. Wolf D, Arkin A (2003) Motifs, modules and games in bacteria. Curr Opin Microbiol 31. Weinstock PH, Wetmur JG (1990) Branch capture reactions: Effect of recipient struc- 6:125–134. ture. Nucleic Acids Res 18:4207–4213. 5. Liekens AML, Fernando CT (2007) Turing Complete Catalytic Particle Computers, 32. Panyutin I, Hsieh P (1993) Formation of a single base mismatch impedes spontaneous Lecture Notes in Computer Science (Springer, Berlin), Vol 4648, pp 1202–1211. DNA branch migration. J Mol Biol 230:413–424. 6. Soloveichik D, Cook M, Winfree E, Bruck J (2008) Computation with finite stochastic 33. Yurke B, Turberfield AJ, Mills AP, Jr, Simmel FC, Neumann J (2000) A DNA-fuelled mo- – chemical reaction networks. Nat Comp 7:615 633. lecular machine made of DNA. Nature 406:605–608. 7. Cook M, Soloveichik D, Winfree E, Bruck J (2009) Algorithmic Bioprocesses (Springer, 34. Seelig G, Soloveichik D, Zhang DY, Winfree E (2006) Enzyme-free nucleic acid logic – Berlin), pp 543 584. circuits. Science 314:1585–1588. 8. Fett B, Bruck J, Riedel M (2007) Synthesizing Stochasticity in Biochemical Systems (As- 35. Zhang DY, Turberfield AJ, Yurke B, Winfree E (2007) Engineering entropy-driven re- – sociation for Computing Machinery, New York), pp 640 645. actions and networks catalyzed by DNA. Science 318:1121–1125. 9. Cardelli L (2008) From processes to ODEs by chemistry. TCS 2008, IFIP International 36. Yin P, Choi H, Calvert C, Pierce N (2008) Programming biomolecular self-assembly path- – Federation for Information Processing, (Springer, Boston), Vol 273, pp 261 281. ways. Nature 451:318–322. 10. Andrianantoandro E, Basu S, Karig DK, Weiss R (2006) Synthetic biology: New engi- 37. Bath J, Turberfield A (2007) DNA nanomachines. Nat Nanotechnol 2:275–284. neering rules for an emerging discipline. Mol Syst Biol 2:2006.0028. 38. Yurke B, Mills AP (2003) Using DNA to power nanostructures. Genet Program Evol M 11. Win M, Liang J, Smolke C (2009) Frameworks for programming biological function 4:111–122. through RNA parts and devices. Chem Biol 16:298–310. 39. Zhang DY, Winfree E (2009) Control of DNA strand displacement kinetics using toehold 12. Keasling J (2008) Synthetic biology for synthetic chemistry. ACS Chem Biol 3:64–76. exchange. J Am Chem Soc 131:17303–17314. 13. Levskaya A, et al. (2005) Synthetic biology: Engineering Escherichia coli to see light. 40. Reynaldo LP, Vologodskii AV, Neri BP, Lyamichev VI (2000) The kinetics of oligonucleo- Nature 438:441–442. tide replacements. J Mol Biol 297:511–520. 14. Noireaux V, Bar-Ziv R, Libchaber A (2003) Principles of cell-free genetic circuit assembly. 41. Panyutin I, Hsieh P (1994) The kinetics of spontaneous DNA branch migration. Proc Proc Natl Acad Sci USA 100:12672–12677. Natl Acad Sci USA 91:2021–2025. 15. Ran T, Kaplan S, Shapiro E (2009) Molecular implementation of simple logic programs. 42. Khalil HK (1996) Nonlinear Systems (Prentice Hall, Englewood Cliffs, NJ). Nat Nanotechnol 4:642–648. 43. Field RJ, Noyes RM (1974) Oscillations in chemical systems. IV. Limit cycle behavior in a 16. Benner S, Sismour A (2005) Synthetic biology. Nat Rev Genet 6:533–543. model of a real chemical reaction. J Chem Phys 60:1877–1884. 17. Ashkenasy G, Ghadiri M (2004) Boolean logic functions of a synthetic peptide network. 44. Willamowski KD, Rössler OE (1980) Irregular oscillations in a realistic abstract quadratic J Am Chem Soc 126:11140–11141. – 18. Magnasco MO (1997) Chemical kinetics is Turing universal. Phys Rev Lett 78:1190–1193. mass action system. Z Naturforsch A 35:317 318. – 19. Simpson Z, Tsai T, Nguyen N, Chen X, Ellington A (2009) Modelling amorphous com- 45. Marcovitz A (2004) Introduction to Logic Design (McGraw Hill, New York), 2nd Ed, pp – putations with transcription networks. J R Soc Interface 6:S523–S533. 369 441. 20. Hjelmfelt A, Weinberger ED, Ross J (1991) Chemical implementation of neural net- 46. Qian L, Winfree E (2009) A simple DNA gate motif for synthesizing large-scale circuits. works and Turing machines. Proc Natl Acad Sci USA 88:10983–10987. DNA Computing 14, Lecture Notes in Computer Science (Springer, Berlin), Vol 5347, pp – 21. Buchler NE, Gerland U, Hwa T (2003) On schemes of combinatorial transcription logic. 70 89. Proc Natl Acad Sci USA 100:5136–5141. 47. Gillespie D, et al. (1977) Exact stochastic simulation of coupled chemical reactions. – 22. Kim J, Hopfield JJ, Winfree E (2004) Neural Network Computation by in Vitro Tran- J Phys Chem 81:2340 2361. scriptional Circuits. Neural Information Processing Systems, ed Saul LK (MIT Press, Cam- 48. Cardelli L (2009) Strand algebras for DNA computing. DNA Computing 15, Lecture bridge, MA), Vol 17, pp 681–688. Notes in Computer Science (Springer, Berlin), Vol 5877, pp 12–24. 23. Okamoto M, Sakai T, Hayashi K (1987) Switching mechanism of a cyclic enzyme system: 49. Gartner Z, et al. (2004) DNA-templated organic synthesis and selection of a library of Role as a “chemical diode”. Biosystems 21:1–11. macrocycles. Science 305:1601–1605. 24. Hjelmfelt A, Weinberger ED, Ross J (1992) Chemical implementation of finite-state 50. Bennett C (1982) The thermodynamics of computation—A review. Int J Theor Phys machines. Proc Natl Acad Sci USA 89:383–387. 21:905–940. 25. Horn F, Jackson R (1972) General mass action kinetics. Arch Ration Mech An 47:81–116. 51. Rothemund P, Papadakis N, Winfree E (2004) Algorithmic self-assembly of DNA 26. Érdi P, Tóth J (1989) Mathematical Models of Chemical Reactions: Theory and Applica- sierpinski triangles. PLoS Biol 2:e424. tions of Deterministic and Stochastic Models (Manchester Univ Press, Manchester). 52. Soloveichik D, Seelig G, Winfree E (2009) DNA as a Universal Substrate for Chemical 27. Klonowski W (1983) Simplifying principles for chemical and enzyme reaction kinetics. Kinetics (Extended Abstract). DNA Computing 14, Lecture Notes in Computer Science Biophys Chem 18:73–87. (Springer, Berlin), Vol 5347, pp 57–69.

6of6 ∣ www.pnas.org/cgi/doi/10.1073/pnas.0909380107 Soloveichik et al.