A Sticker Bas e d Mo del for DNA Computation Sam Roweis Er ik Winf ree Richard Burgoyne Nickolas V Chelyap ov Myron F Go o dman Paul W K Rothemund y Leonard M Adleman Laboratory for Molecular Science Univers ityofSouther n Califor nia and Computation and Neural Systems Option Califor nia InstituteofTechnology Departm entofBiome dical Engineer ing Univers ity of Souther n Califor nia Department of Computer Science Univers ity of Souther n Califor nia Department of Biological Science s Univers ity of Souther n Califor nia May Ab stract Weintro duce a new mo del of molecular computation thatwe call the sticker model Likemany previous prop o sals it make s us e of DNA strands as thephys ical sub strateinwhichinformation i s repre s ented and of s eparation byhybr idization as a central mechani sm However unlike previous estickers mo del has a random acce s s memory that require s no strandextens ion us e s mo dels th no enzymes andat le ast in theory itsmater ials are reusable The pap er de scr ib e s computation under thestickers mo del and di scus s e s p o s s ible me ans for phys ically implementing each op eration We go on to propose a sp ecic machine architecture for implementingthestickers mo del as a micropro ce s sorcontrolle d parallel rob otic workstation Finallywe di scus s s everal m etho ds for achieving acceptable overall error rate s for a computation us ing bas ic op erations that are error prone In the cours e of thi s development a number of previous general concer ns about molecular computation Smith Hartmani sLetters to Science are addre s s e d First it i s cle ar thatgeneral purp o s e algor ithms can be implemented by DNAbas e d computers potentially solving a wide clas s of s e arch problems Second we nd that there are challengingproblems for which only mo dest volume s of DNA should suce Third wedemonstratethatthe formation and bre aking of covalentbonds i s not intr ins ic to DNAbas e d computation Thi s means thatcostly andshort lived mater ials such as enzyme s are not nece s sary nor are energetically co stly processes suchas PCR Fourth weshowthatasingle essential biotechnology s equencesp ecic s eparation suce s h we illustratethatseparation errors for constructing a generalpurp o s e molecular computer Fift can theoretically b e re duce d totolerable levels byinvoking a tradeo b etween time space and error rates atthe level of algor ithm de s ign wealsooutlineseveral sp ecic ways in whichthi s can b e doneandpresentnumer ical calculations of the ir p erformance De spite these encouragingtheoretical advance s we emphas ize that substantial engineer ing challenge s remain at almo st all stage s andtha ttheultimatesucce s s or f ailure of DNA computing will certainly dep endonwhether these challenge s can b e met in laboratory investigations Repr int reque ststo roweiscnscaltechedu TheMATLAB co de whichwas us e d togenerate all of the gure s in Section of thi s pap er i s also available by reque st f rom roweiscnscaltechedu Roweis is supporte d in part bythe Center for Neuromorphic Systems Engineer ing as a part of theNational Science Foundation Engineer ingResearch Center Program under grant EEC andbytheNatural Science s andEngineer ing Re s e arch Council of Canada tal He althNIMHTrainingGrant T MH Winf ree i s supp orte d in part byNational Institute for Men also byGeneral Motors Technology Re s e archPartnership s program Adleman Chelyap ov andRothemundare supporte d in part by grants f rom theNational Science Foundation CCR and Sloan Foundation y Towhom corre sp ondence should b e addre s s e d Intro duction Much of the recent intere st in molecular computation has been fuele d by the hope that it might someday providetheme ans for constructinga massively parallel computational platform capable of attacking problems whichhave b een re s i stanttosolution withconventional architecture s Mo del ar chitecture s ha ve b een prop o s e d which sugge st that DNA bas e d computers may b e exible enough to tackle a widerange of problems Adleman Adleman Amo s LiptonBoneh Be aver Rothemund although fundamental issue s such as the volumetr ic scale of mater ials and delity of var ious labo ratory pro ce dure s remain largely unanswere d In thi s pap er we intro duce a new mo del of molecular comput ation that we call the sticker model Like many previous proposals it makes us e of DNA strands as the phys ical sub strate in which information is repre s ented and of s eparation by hybr idization as a central mechani sm However unlike previous mo dels the stickers mo del has a ran dom acce s s memory that require s no strand extens ion us e s no enzymes andat le ast in theory itsmater ials are reusable The pap er b egins by intro ducing a new way of repre s enting information in DNA followed by an ab stract de scr iption of the bas ic operations p ossible under thi s repre s entation Possible me ans for phys ically implementing each o peration are di scus s e d We go on to propose a sp ecic machine architecture for implementing the stickers mo del as a micropro ce s sorcontrolle d parallel rob otic workstation employingonlytechnologie s which exi st today Finallywe di scus s metho ds for achiev ing acceptable error rate s f rom imp erfect s eparation units The Stickers Mo del Repre s entation of Information Thestickers mo del employs two bas ic group s of s ingle strande d DNA molecule s in its repre s entation of a bit str ing Cons ider a memory strand N bas e s in length subdivided into K nonoverlapping regions e ach M bas e s longthus N MK Each region i s identie d with exactly one bit p o s ition computation We also de s ign K or equivalently one boole an var iable dur ing the cours e of the dierent sticker strands or simply stickers Each sticker is M bas e s longand i s complementary to oneandonlyoneofthe K memory regions Ifasticker i s anne ale d toitsmatching region on a given memory strandthen the bit corre sp ondingthat particular region i s on for that strand If no sticker is anneale d to a region then that regions bit i s o Figure illustrates thi s repre s en tation scheme bit ... bit i bit i+1 bit i+2 bit ... (up to bit K) 5’ A TCG G T CATAG C A CT 3’ T T A G T A Memory G T M bases 0 0 0 A A G A Strands C G T G C G T A G C A G C C CTG G A G T A C T T A CC T Stickers 5’ G T A A TCG G T CATAG C A CT 3’ A 1 0 1 Figure Amemory strandandassociated stickers together calle d a memory complex repre s enta bit str ing Thetopcomplex on the left has all three bits othebottom complex has two anneale d stickers andthus twobits on Eachmemory strand along with its anne ale d stickers if any repre s entsone bit str ing Such partial duplexe s are calle d memory complexes A large s et of bit str ings i s repre s ented by a large number of identical memory strands e ach of whichhas stickers anne ale d only atthe require d bit p o s itions We call such a collection of memory complexe s a tube Thi s diers f rom previous repre s entations of information us ing DNA in whichthe pre s ence or ab s ence of a particular sub s equence in a strand corre sp onded t o aparticular bit b e ingonoro eg s ee Adleman Lipton In thi s new mo del each p ossible bit str ing is repre s ented by a unique association of memory strands and stickers where as previously e ach bit str ingwas repre s ented bya unique molecule To give a feel for the numb ers involved a re asonable size problem for example bre aking DES as di scus s e d in Adleman mightusememory strands of roughly bas e s N which repre s ent binary var iable s K using bas e regions M The information dens ityinthi s storage schemeisM bitsbas e directly comparable tothedens ity of previous scheme s Adleman Boneh Lipton Weremarkthat while information storage in DNA has atheoretical maximumvalue of bitsbas e exploitingsuchhighvalue s in a s eparation bas e d molecular computer would require the ability to reliably separate strands us ing only single bas e mi smatches Instead we choose to sacr ice information dens ity in order to make the exp er imental dicultie s le s s s evere Op erations on Sets of Str ings We now intro duce several p ossible operations on sets of bit str ings which together tur n out to be quite exible for implementing general algor ithms The four pr inciple operations are combination of twosetsofstrings intoonenew s et separation of one s et of str ings intotwonew s etsand setting or th clearing the k bit of every str ing in a s et Eachofthe s e logical s et operations has a corre sp onding interpretation in terms of the DNA repre s entation intro duce d above Figure summar ize s these require d DNA interactions The mo st bas i c operation is to combine two sets of bit str ings into one Thi s pro duce s a new set containing the multis et union of all the str ings in the two input sets In DNA thi s corre sp onds to pro ducing a new tube containing all the memory complexe s with their ckers undi sturb e d f rom b oth inputtube s anne ale d sti A set of str ings may be separated into two new sets one containing all the or iginal str ings havingaparticular bit on andtheother all thos e withthe bit o Thi s corre sp onds toisolating from the s ets tub e exactly tho s e complexe s witha sticker anne ale d tothe given bits region be is destroyed Theoriginal inputsettu To set tur n on a particularbitinevery str ingofa setthesticker for that bit i s anneale d to theappropr iate region on every complex in the s ets tubeorleftinplace if alre ady anne ale d Finallyto clear tur n o a bit in every str ingofa setthesticker
