DNA Chemical Reaction Network Design Synthesis and Compilation M
Total Page:16
File Type:pdf, Size:1020Kb
University of New Mexico UNM Digital Repository Computer Science ETDs Engineering ETDs 12-1-2014 DNA Chemical Reaction Network Design Synthesis and Compilation M. Leigh Fanning Follow this and additional works at: https://digitalrepository.unm.edu/cs_etds Recommended Citation Fanning, M. Leigh. "DNA Chemical Reaction Network Design Synthesis and Compilation." (2014). https://digitalrepository.unm.edu/cs_etds/48 This Dissertation is brought to you for free and open access by the Engineering ETDs at UNM Digital Repository. It has been accepted for inclusion in Computer Science ETDs by an authorized administrator of UNM Digital Repository. For more information, please contact [email protected]. Candidate Department This dissertation is approved, and it is acceptable in quality and form for publication: Approved by the Dissertation Committee: , Chairperson DNA Chemical Reaction Network Design Synthesis and Compilation by M. Leigh Fanning B.S., Engineering Physics, University of Colorado M.S., Computer Science, University of New Mexico DISSERTATION Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Computer Science The University of New Mexico Albuquerque, New Mexico December, 2014 ii Acknowledgments Advisor and Dissertation Supervisor Darko Stefanovic, Associate Professor of Computer Science, University of New Mexico Committee Member Shuang Luan, Associate Professor of Computer Science, University of New Mexico Committee Member George Luger, Professor of Computer Science, University of New Mexico Committee Member Christof Teuscher, Professor of Computer Science, Portland State University This work was supported by National Science Foundation grants EMT-0829881, CCF- 0829793, 1027877, and 1028238, scholarships from Intel Corporation, and the UNM Charlotte and William Kraft graduate fellowship. iii iii DNA Chemical Reaction Network Design Synthesis and Compilation by M. Leigh Fanning B.S., Engineering Physics, University of Colorado M.S., Computer Science, University of New Mexico Ph.D., Computer Science, University of New Mexico Abstract The advantages of biomolecular computing include 1) the ability to interface with, moni- tor, and intelligently protect and maintain the functionality of living systems, 2) the ability to create computational devices with minimal energy needs and hazardous waste produc- tion during manufacture and lifecycle, 3) the ability to store large amounts of information for extremely long time periods, and 4) the ability to create computation analogous to human brain function. To realize these advantages over electronics, biomolecular comput- ing is at a watershed moment in its evolution. Computing with entire molecules presents different challenges and requirements than computing just with electric charge. These challenges have led to ad-hoc design and programming methods with high development costs and limited device performance. At the present time, device building entails com- plete low-level detail immersion. We address these shortcomings by creation of a systems engineering process for building and programming DNA-based computing devices. iv iv Contributions of this thesis include numeric abstractions for nucleic acid sequence and secondary structure, and a set of algorithms which employ these abstractions. The ab- stractions and algorithms have been implemented into three artifacts: DNADL, a design description language; Pyxis, a molecular compiler and design toolset; and KCA, a simula- tion of DNA kinetics using a cellular automaton discretization. Our methods are applicable to other DNA nanotechnology constructions and may serve in the development of a full DNA computing model. v Contents 1 Thesis Introduction and Central Question 1 1.1 The Central Challenge for Molecular Computing . .3 1.2 Thesis Question . .4 1.3 Platform, Programming, Achievable Computation . .6 1.3.1 Deoxyribozyme Computing Platform . .6 1.3.2 Deoxyribozyme Computing Components . .9 1.3.3 Platform Programming and Instantiation . 17 1.3.4 Contrasts with electronic computing . 23 1.4 Related Work . 24 2 Abstractions Development 27 2.1 DNA Computing Model Foundations . 27 2.2 Physical-Level Entities and Interactions . 29 vi Contents 2.2.1 Entities . 30 2.2.2 Interactions . 31 2.3 Abstractions . 34 2.3.1 Sequence Abstractions . 37 2.3.2 Structure Abstraction . 40 2.3.3 Abstracting Reactions . 47 3 Algorithms 53 3.1 Sequence Algorithms . 54 3.1.1 FindAllHybrids . 54 3.1.2 Generate Separated DNA Oligonucleotides . 62 3.2 Structure Algorithms . 73 3.2.1 ISO/Dot-Parenthesis Conversion . 73 3.2.2 Shape Inference . 75 3.3 Binding Characterization . 93 3.3.1 Exhaustive Generation Method For All Possible Structures . 97 4 Implementation 105 4.1 DNADL: DNA Description Language . 106 4.1.1 Description Levels . 107 vii Contents 4.1.2 Types and Identifiers . 108 4.1.3 Four Layer Cascade Example . 120 4.1.4 Deoxyribozyme Logic Gates . 121 4.1.5 MAYAII revisited . 126 4.2 Examining Cross-Talk Using The Kinetic Cellular Automaton (KCA) Simulation . 132 4.2.1 DNA Chemistry . 136 4.2.2 Simulation and Modeling Approaches . 138 4.2.3 KCA Implementation . 140 4.3 Pyxis . 143 4.3.1 Compiling DNA Systems . 144 4.3.2 Pyxis Features . 145 5 Conclusion 154 5.1 Contributions of Thesis . 155 Appendices 159 A Deoxyribozyme Gate Catalog 160 A.1 Gate Catalog . 160 A.1.1 Gate Schematics . 160 viii Contents A.1.2 Gate Sequence Specifications . 163 A.1.3 Gate Structure Specification . 174 A.1.4 Gate Stem-Loop Specification . 175 A.1.5 Gate-Input Binding Structure Specification . 176 A.1.6 Gate-Input Binding Stem-Loop Specification . 177 A.1.7 Gate-Substrate Structure Specification . 178 A.2 Reactions . 179 A.2.1 Reactions for gates with only positive inputs. 180 A.2.2 Reactions for a gate with a single negative input. 182 A.2.3 Reactions for gates with positive inputs and a single negative input. 183 A.3 Logic Examples . 185 A.3.1 Adders . 185 A.3.2 MAYA2 . 185 A.3.3 Sensor Platform . 188 B Structure Shape and Binding Inference Report 190 B.1 Inference report for multibranch structure. 190 C Four Layer Cascade DNADL File 196 C.1 Four Layer Cascade DDL File . 196 ix Contents C.2 Four Layer Cascade Diagrams . 205 C.3 Maya II DDL File . 206 D Deoxyribozyme Gate Evaluation Rules 358 References 365 x Chapter 1 Thesis Introduction and Central Question Engineering is the creative application of scientific principles to design or develop structures, ma- chines, apparatus, or manufacturing processes, or works utilizing them singly or in combination; or to construct or operate the same with full cognizance of their design; or to forecast their behavior under specific operating conditions; all as respects an intended function, economics or operation and safety to life and property. – American Engineer’s Council for Professional Development Whereas the physical sciences have benefitted from the reductionist approach in under- standing phenomena, the life sciences are distinctly more complex and challenging to distill. It is therefore not surprising that computing technologies built using reductionist physical principles occurred first. Babbage’s Difference and Analytical Engines were ad- vanced in the age of the steam engine [24]. Turing and Newman’s Colossus and ENIAC were advanced at the inception of the age of electronics, an age we have not yet left. We are just now arriving at the biological age, where we can manipulate the building blocks of life to compute. This age, anticipated by Turing as unorganized systems and Feynman as 1 Chapter 1. Thesis Introduction and Central Question infinitesimal machines, will irrevocably integrate computing into the fabric of humanity. Molecular computing as a field has achieved demonstration of solving small instances of computationally hard problems. For a final design, materials costs are low yet up-front development and testing costs are high, and worse, the spectacular advantage held by electronic computing suggests that performance will never catch up and justify current research dollar expenditures. At the current level of technology, computing with nucleic acids DNA or RNA, involves low-level detail iteration over long timelines. Designing individual components and then getting them to work together cohesively defies engineer- ing approaches that have sufficed for electronic computers. Additionally, although various architectures have been shown to work for certain problems, problem coverage remains sparse. Designers of necessity have chosen problems that fit their specific architectures, and the field lacks the general ability to solve an arbitrary problem on a molecular sub- strate. Missing are design standards, benchmarks, commonly recognized formalisms, and well-understood abstractions in the manner of the stored program computer model put forth by Eckert, Mauchly, and von Neumann [55]. These observations, however, are not made dissuasively. Truly new technology always suffers in comparisons, yet these comparisons dampen extreme responses. Neither wildly unrealistic promises on the part of proponents, nor instant dismissal on the part of op- ponents who may fear change and eventual loss in market dominance, serve purpose in engineering. Within the current context, comparison to electronic computing and criti- cal observation of existing development processes help us practically identify the main difficulties, and what challenges can bear fruit if solved. 2 Chapter 1. Thesis Introduction and Central Question 1.1 The Central Challenge for Molecular Computing Electronic computing achieved dominance through