Synthetic Biology: Secure Digital Storage, DNA-Based Computation and the Organic Computer
Total Page:16
File Type:pdf, Size:1020Kb
Synthetic Biology: Secure Digital Storage, DNA-based Computation and the Organic Computer Alexander R Widdel Division of Science and Mathematics University of Minnesota, Morris Morris, Minnesota, USA 56267 [email protected] ABSTRACT uses of DNA to securely encrypt data. Finally, section 5 will This paper addresses the role of synthetic biology in com- focus primarily on computing with biological components. puter science. Some applications of synthetic biology in computing include the storage of digital information in or- 2. BACKGROUND ganic material, cryptographic security via DNA-based en- To understand the fundamentals of using biology as a cryption, and biological computation. Furthermore, devel- technology, a basic comprehension of the structure and be- opment of a universal operating system for the biological havior of Deoxyribonucleic acid (DNA) and ribonucleic acid cell, meaning to simplify the process of coding for organic (RNA) is required. It is also important to note the dis- systems or bio-computers is being attempted by the Univer- tinction between genetic modification and the practice of sity of Nottingham's AUdACiOuS project. Though mean- synthetic biology. Genetic modification - splicing alternate ingful progress has been made in the field, much of the work genes into living embryos to produce organisms with novel of synthetic biology is still precursory. properties - is performed in vivo. In vivo operations are per- formed within and between complete and living organisms. 1. INTRODUCTION This differs from the practice of synthesizing DNA in vitro (in a controlled environment outside of a living organism) In the future, the hardware-software paradigm as we know for use in living organisms or as a biological component [12]. it may be complemented, and perhaps in some places sub- This section will also cover necessary background in crypto- stituted, by biological alternatives. Advancement in syn- graphic methods and systems architecture. thetic biology can only be accomplished through collabora- tive work between biologists and computer scientists. Mu- 2.1 Biology tual comprehension of the dynamic behavior of cells is also required. Throughout this paper, the practical uses of cellu- DNA is a complex molecule comprised of four elementary lar or molecular functions and procedures in storage, infor- units that can be seen as \encoding" the information con- mation security, computation and the construction of bio- tained within. Structurally, it resembles a \double-helix," computers are addressed. Molecules known as DNA and with two strands of these units (called nucleotides, or bases) RNA serve as the basis for the majority of these processes. linking together and spiraling around one another. The four The advantages, drawbacks and future prospects of the use bases are Adenine (A), Thymine (T), Cytosine (C) and Gua- of DNA for these processes will be discussed. A traditional nine (G). A links with T and C links with G in the spiral. computer must perform operations like reading or writing Ribonucleic acid or RNA is very similar to DNA, differing bits and executing logical and arithmetic operations. DNA in that it is commonly single-strand, and the base Uracil can encode bits in a way that is compatible with the way replaces Thymine, binding to Adenine. DNA can also exist computers store information. The equivalent to writing bits as a single strand. The main role of DNA molecules is the in a biological system is DNA synthesis, while the equivalent long-term storage of information. Unlike typical electronic of reading bits is DNA sequencing. storage, data that is deleted cannot be reread [6]. In order Computation with biological components will be addressed to create RNA, DNA undergoes transcription. in greater detail in this paper. Background will first be pro- 2.1.1 Transcription vided for the biological, cryptographic, and systems mate- rial. Section 3 provides a comparison between traditional During transcription, a part of the cell's\machinery"called and DNA-based storage. Section 4 addresses the potential RNA polymerase moves along the DNA strand. The appa- ratus unzips the two halves of the strand, and reads one of them, one base at a time, as it progresses. As it reads the DNA, it adds the opposite base (also supplanting Uracil for Thymine) to a \messenger" RNA (mRNA) strand. This mRNA strand with opposing bases is said to be comple- This work is licensed under the Creative Commons Attribution- mentary to the original template DNA. In order to create a Noncommercial-Share Alike 3.0 United States License. To view a copy protein, mRNA undergoes translation. of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Fran- cisco, California, 94105, USA. UMM CSci Senior Seminar Conference, December 2014 Morris, MN. A B A ⊕ B DNA CCT TCA CAT TCT AGC TGG 0 0 0 Transcription 0 1 1 RNA CCU UCA CAU UCU AGC UGG 1 0 1 1 1 0 Translation PROTEIN Pro Ser Ala His Ser Trp Table 1: ⊕ Truth Table Figure 1: Information Flow: DNA to RNA to Consider plaintext A, OTP B, and cipher-text C. Proteins C = A ⊕ B 2.1.2 Translation Notice that the cipher-text can be inverted again to retrieve the original plaintext: In the process of translation, mRNA travels to another molecular machine known as a \ribosome" which reads its C ⊕ B = A bases in groups of three. These base triplets - called codons The resulting cipher-text will have a uniform frequency dis- - determine the addition of one of 20 protein subunits called tribution, meaning that it is just as likely as any of the other amino acids to a chain that is crafted by the ribosome. A possible 26n encipherings. The pad is shared secretly with completed chain of these amino acids constitutes a complete the recipient, and is XOR'ed again with the cipher-text to protein. Once formed, the protein undergoes a folding pro- decrypt the message. It has been mathematically proven cess, forming itself into a shape which will define its func- that the OTP produces a theoretically unbreakable crypto- tion. Proteins are responsible for nearly all chemical reac- system for encrypting and decrypting data [17]. The OTP tions within cells. The information flow in biological systems achieves perfect secrecy in the sense that it is information- of \DNA to RNA to proteins" is known as the central dogma theoretically secure, meaning that the encrypted message or of genetics. cipher-text provides no information about the original plain- text. The security of OTP relies on the pad being discarded 2.1.3 Copying DNA after use. DNA is a favorable medium for application of the OTP as a small amount of DNA can store enormous pads. Duplication - also called amplification - of DNA is ac- complished using polymerase chain reaction, or PCR. This 2.3 Systems process is distinct from the operation of copying data on a This section will review the basic classical computer ar- hard drive, as potentially billions of copies of the original chitecture. Applications of systemics to synthetic biology DNA are being generated. In PCR, DNA is heated in order have led to the conception of silicon computers and their to be denatured or \unzipped" into two strands. Then, rel- engineering as a blueprint for the development of a similar atively short \oligos" which match the beginning and end of apparatus made up of biological components. The current the DNA template are attached to their matching sequence. model of the bio-computer follows the Von Neumann com- An oligo is a short strand of DNA. These oligos are called puter architecture with four units: an input/output device, primers. Finally, after being heated again, a cellular appara- an arithmetic logic unit, a control unit and wires (bus) to tus called DNA polymerase binds to each primer and works interconnect these components. Such a bio-computer could its way along its strand, adding complementary bases until operate within a living organism, observe its environment replication is complete. The newly formed strands are then and serve to control biological systems [13]. denatured and replicated repeatedly, yielding roughly one A simple arithmetic logic unit (as seen in Figure 2) can million copies after 20 cycles, and one billion after 30. PCR take in two control bits, S and S , which can encode four op- takes approximately 2 hours to complete. Approximately 1 1 2 erations for the unit to perform. The input S S will deter- in 10,000 bases are duplicated incorrectly per cycle. Errors 1 2 mine which operation will be performed on two 8-bit inputs. which occur early in the process will accumulate, negatively Consider control bits S S = 10 and input bits 10111010 and impacting its fidelity. For comparison, data transfer in a 1 2 00010110. Control bits 10 code for the not operation, which typical 7200 RPM desktop HDD, as of 2010, rates up to will consider only the first input bitstream. The not opera- 1030 Mbit/s. tion simply flips every bit in the bitstream. Here, the final output would be 01000101. This ALU would be sufficient 2.2 The One Time Pad for performing addition and subtraction on integers. Section 5 will include details on building a biological ALU. Cryptography is concerned with the secure encryption of data into a form unreadable to any party without access to the key which decrypts it. Encryption is the process of 3. TRADITIONAL VS ORGANIC STORAGE scrambling the original message (plain-text) message into an A lot of attention has been drawn in recent research ef- encrypted message (cipher-text). One method of encryption forts to next-generation storage media [5][6]. Efficient long is called the One Time Pad or OTP.