IEEE SENSORS JOURNAL, VOL. 8, NO. 6, JUNE 2008 1011 Towards De Novo Design of Deoxyribozyme Biosensors for GMO Detection Elebeoba E. May, Member, IEEE, Patricia L. Dolan, Paul S. Crozier, Susan Brozik, and Monica Manginell

Abstract—Hybrid systems that provide a seamless interface be- properties make molecular beacons effective platforms for tween nanoscale molecular events and microsystem technologies detecting genetically modified targets in biodefense systems. enable the development of complex biological sensor systems that not only detect biomolecular threats, but also are able to determine A. Deoxyribozyme Molecular Beacons and execute a programmed response to such threats. The chal- lenge is to move beyond the current paradigm of compartmental- Molecular beacons are single-stranded izing detection, analysis, and interpretation into separate steps. We probes that form stem-loop structures. The loop contains a present methods that will enable the de novo design and develop- probe sequence that is complementary to a target sequence. ment of customizable biosensors that can exploit deoxyribozyme Traditional molecular beacons contain a fluorophore and computing (Stojanovic and Stefanovic, 2003) to concurrently per- quencher on each arm of the stem of the beacon. Fluorescence form in vitro target detection, genetically modified organism detec- is achieved by separation of the fluorophore and quencher due tion, and classification. to a conformation change that takes place following target Index Terms—Avian influenza, biosensor, deoxyribozyme, error hybridization to the loop structure [5]. However, traditional control codes, hybridization thermodynamics, molecular beacons, beacons are limited by a 1:1 (target: signal) stoichiometry, and single polymorphism. the sensitivity of the detection is linked to the amount of target present. Target amplification (PCR) is required to increase the level of sensitivity. I. INTRODUCTION Though DNA serves primarily as a carrier of the genetic YBRIDIZATION-BASED target recognition and dis- code and no made of DNA have been found in nature, Hcrimination is central to a wide variety of applications: molecular beacons comprised of single-stranded can high throughput screening, distinguishing genetically modified be engineered to perform catalytic reactions similar to those organisms (GMOs), molecular computing, differentiating bi- of and RNA. Catalytic DNAs or deoxyribozymes are synthesized in the laboratory via an in vitro iterative selection ological markers, fingerprinting a specific sensor response for process. complex systems, etc. The recognition substrate can exist in Like traditional molecular beacons, catalytic molecular solution or be immobilized onto a transducer and hybridiza- beacons also contain a stem-loop structure that undergoes a tion events can be detected optically, electrochemically, or conformational change following target hybridization to the via a mass-sensitive device [2]. The bioreceptor or probe is loop region. Unlike traditional molecular beacons, catalytic critical to the specificity of the biosensor. Although several molecular beacons are modular molecules, containing a de- single-stranded DNA sensor technologies, such as DNA mi- oxyribozyme appended to the stem-loop structure [6]. Catalytic croarrays, are widely used, molecular beacon probes are highly activity is initiated by a target DNA sequence binding to the sensitive and specific bioreceptors [2]–[4]. They can detect loop region that is distinct from the enzymatic active site. This mutations in target sequences and can be multiplexed [3]; these allosterically activates the deoxyribozyme complex to bind and cleave a labeled substrate oligonucleotide molecule, producing Manuscript received March 20, 2008; accepted March 21, 2008. Sandia is a a detectable signal. Once activated, the catalytic molecular multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin beacon will continuously cleave labeled substrate molecules. Company for the United States Department of Energys National Nuclear Secu- rity Administration under contract DEAC0494AL85000. This work was sup- Target amplification is not required since signal amplification ported by Sandia National Laboratories’ Laboratory Directed Research and De- is obtained through repeated processing of excess substrate velopment Program. E.E.M. and P.L.D. contributed equally to this work. The molecules due to the recognition and binding of a single target. associate editor coordinating the review of this paper and approving it for pub- lication was Dr. Dennis Polla. Single and multi-receptor site configurations enable the de- E. E. May is with the Department of Computational Biology, Sandia oxyribozyme to function as YES, NOT, or multi-input AND National Laboratories, Albuquerque, NM 87185-1316 USA (e-mail: gates [1]. [email protected]). P. L. Dolan, S. Brozik, and M. Manginell are with the Department of Current high-throughput DNA sensing systems rely heavily Biosensors and Nanomaterials, Sandia National Laboratories, Albuquerque, on silicon-based computing for interpretation of molecular NM 87185-0892 USA (e-mail: [email protected]; [email protected]; recognition events. Complex bioinformatics algorithms are [email protected]). P. S. Crozier is with the Department of Multiscale Computational Materials tasked with de-noising and processing of sensor output sig- and Methods, Sandia National Laboratories, Albuquerque, NM 87185-1322 nals. This approach is error prone and not easily integrated USA (e-mail: [email protected]). into emerging nanoscale sensor architectures for lab-on-chip Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. systems. Molecular computation, although impractical for real Digital Object Identifier 10.1109/JSEN.2008.923945 time computing needs, offers the ability for massively parallel

1530-437X/$25.00 © 2008 IEEE

Authorized licensed use limited to: IEEE Xplore. Downloaded on January 12, 2009 at 15:09 from IEEE Xplore. Restrictions apply. 1012 IEEE SENSORS JOURNAL, VOL. 8, NO. 6, JUNE 2008

Fig. 2. Schematic of deoxyribozyme gate .

Fig. 1. ‰iƒ (E6) Gate (DNA mfold server used) http://frontend.bioinfo.rpi. Fluorogenic assays to determine the detection and single nu- edu/applications/mfold/cgi-bin/-form1.cgi. cleotide polymorphism (SNP) differentiation capability of the gate were conducted with the FluoDia T70 Microplate Reader using black 384-well microplates. For the detection computation and detection, and is a promising technology limit experiments, a 55 detection volume containing the base for the development of next generation hybrid molecular gate (100 nM), the TAMRA-labeled substrate (2.5 ), sensor systems. In vitro computation using deoxyribozyme and varying concentrations of input DNA in reaction buffer gates enables the design of complex biosensor systems and (HEPES pH 7.0, 1 M NaCl, 2 mM ) was used. For reduces reliance on classical computing for interpretation of the SNP experiments, a 55 detection volume containing sensor response. the gate (100 nM), the TAMRA-labeled substrate In this work, we present methods for the development of a (1.0 ), and input DNA (0.2 ) in reaction buffer was used. deoxyribozyme-based computational biosensor. In Section II, The estimation of substrate turnover assays were conducted in we demonstrate the sensing capability of deoxyribozymes and 55 reaction buffer with the: 1) E6 deoxyribozyme (100 nM) present quantitative methods for predicting the response of and TAMRA-labeled substrate (2.5 )or2) gate the deoxyribozyme to fully complementary and genetically (100 nM), the TAMRA-labeled substrate (1.0 ), and input modified target probes. Results are presented and analyzed in DNA (0.2 ). Section III. We conclude by demonstrating the use of our sensor 1) Equipment, , Chemicals: All experi- platform in the de novo design and development of biosensor ments were conducted with the FluoDia T70 Microplate Reader systems for detecting genetic modifications in avian influenza (Photon Technology International, Inc.) using black 384-well followed by a summary discussion. microplates (OptiPlate-384 F, Perkin-Elmer). Oligonucleotides were custom synthesized by Integrated DNA Technologies and purified and reconstituted as follows. II. MATERIALS AND METHODS 1) The E6 deoxyribozyme was HPLC purified and stored as 100 stock solutions in RNase/DNase/Protease-free A. Deoxyribozyme Construction Assays water (Acros Organics) at . E6 sequence [8]: We constructed the (E6) deoxyribozyme gate (Fig. 1; 5-CTCTTCAGCGATGGCGAAGCCCACCCATGTTAG [6]) based upon a previously reported modular design [5] that TGA -3. appended a 15 nucleotide molecular recognition loop to the E6 2) The deoxyribozyme gate was purified by denaturing deoxyribozyme [7]. This one-input gate contains the E6 de- PAGE and stored as 100 stock solutions in oxyribozyme, a ribonucleic acid phosphodiesterase, and a stem- RNase/DNase/Protease-free water or reaction buffer loop structure attached to the 5 end of the enzymatic domain. (HEPES pH 7.0, 1 M NaCl, 2 mM )at . When the structure is in its inactive form, the stem is comple- (E6) gate [8]: 5 -CTGAAGAGTATCCGAT- mentary-bound to the substrate recognition region and inhibits TATAAGCCTCTTC-AGCGATGGCGAAGCCCACCCA hydrolysis of a dual-labeled substrate oligonucleotide. Upon hy- TGTTAGTGA-3 . bridization of a target (input) oligonucleotide complementary 3) The substrate molecule was dual-labeled with a tetram- to the loop sequence, the stem opens up and permits binding ethylrhodamine fluorophore donor (TAMRA) at the and cleavage of the substrate at the site of an embedded ribonu- 5 terminus and a Black-Hole 2 quencher ac- cleic acid (Fig. 2). Our substrate molecule is end-labeled with a ceptor at the 3 terminus. It was HPLC purified and tetramethylrhodamine fluorophore (TAMRA) at the 5 terminus stored as 250 stock solutions in RNase/DNase/Pro- and a Black-Hole 2 quencher at the 3 terminus. TAMRA tease-free water at . TAMRA substrate [8]: emission is quenched by distance-dependent fluorescence reso- . nance energy transfer (FRET) to and upon cleavage and 4) Input sequences were desalted and stored as 1 mM stock separation of the two products, fluorescence increases. solutions in reaction buffer at .

Authorized licensed use limited to: IEEE Xplore. Downloaded on January 12, 2009 at 15:09 from IEEE Xplore. Restrictions apply. MAY et al.: TOWARDS DE NOVO DESIGN OF DEOXYRIBOZYME BIOSENSORS FOR GMO DETECTION 1013

2) Estimation of Substrate Cleavage and Product Formation: The following formula [5] was used to estimate the product formed at time in

(1) where is the fluorescence emission at , is the substrate concentration at in , is the fluorescence emission at time . The formula was derived from

(2) where is the maximum fluorescence increase based on 100% turnover [5].

Fig. 3. Correlation between HyTher melting temperature data measured fluo- B. Computational Methods for Modeling and Analysis of rescence data. SNPs near the 5 and 3 ends are outliers. Deoxyribozyme Gates Biological sensor systems for defense applications must be specific for a predefined threat, the emergence of which is not easy to forecast. Therefore, the time from threat identification to the development and deployment of the threat-specific biosensor must be minimal. Rapid design and development of de novo biosensor systems require computational modeling and simulation to increase accuracy and reduce the need for experimental testing of each gate used in the system. 1) Using Hybridization Thermodynamic Data as a Predictor for Fluorescence Measurements: Since empirically characterizing the fluorescence signature of each gate can be slow and expensive, it is desirable to computationally predict the outcome of the experimental measurements. Even if im- perfect, computational predictions can be useful as qualitative guides, and can quickly and inexpensively yield large data sets for screening purposes. For a simple YES gate, since an excess amount of substrate Fig. 4. By using an empirical correction factor for the SNPs at the 5 and 3 is used, let us assume that each ribozyme molecule is either ends, much a better correlation between HyTher melting temperatures and flu- fully active or fully inactive so that the measured fluorescence orescence measurements is obtained. at a given time is directly proportional to the number of acti- vated . Let us further assume that ribozyme activa- HyTher melting temperature is adjusted by empirically fit pa- tion is directly related to the thermodynamics of the hybridiza- rameters if the SNP is one or two base pairs from either the 3 tion between the input molecules and the YES gate. Following or 5 end this logic, there should be an observable correlation between the input/gate hybridization thermodynamics and the measured flu- orescence. We demonstrate that this is indeed the case. (3) We have created an interface to automatically harvest raw hy- bridization thermodynamics data from the HyTher web server where if a SNP is present at the last from the [9]–[11]. Once collected, we compared hybridization free en- 5 end and otherwise; if a SNP is present at the ergy and melting temperature data with measured fluorescence second-to-last base pair from the 5 end and 0, otherwise; data for the gate (Fig. 7) and found reasonable correla- if a SNP is present at the last base pair from the 3 end and 0, tion, as shown in Fig. 3. Improved correlation was obtained by otherwise; if a SNP is present at the second-to-last base adding corrections for single nucleotide polymorphisms (SNPs) pair from the 3 end and 0, otherwise. The values are near the 3 and 5 ends of the hybridized sequences (Fig. 4). empirically fit parameters that are optimally chosen to give the Given a set of measured fluorescence data, an empirical rela- best fit between predicted and measured fluorescence outputs tionship between the HyTher melting temperature data and the for the given data set. measured fluorescence data can be created and used to infer flu- Initial guesses for the parameters are supplied to the algo- orescence data that has not been measured. The empirical rela- rithm for the first iteration of the fitting procedure. A least tionship that we have used here is simple but effective. First, the squares algorithm is then used to produce a straight line with

Authorized licensed use limited to: IEEE Xplore. Downloaded on January 12, 2009 at 15:09 from IEEE Xplore. Restrictions apply. 1014 IEEE SENSORS JOURNAL, VOL. 8, NO. 6, JUNE 2008 optimal correlation between the modified melt temperatures and the measured fluorescence data. The slope and intercept of the drawn line become additional fit parameters in the following equation:

(4) such that the following difference is minimized:

(5)

After selection of the optimal slope and intercept, the first it- eration is complete, new parameters are chosen for calculation of the modified melting temperature calculation, and the second iteration begins. Iteration continues until a convergence criteria is satisfied.

III. RESULTS AND ANALYSIS Fig. 5. Estimation of substrate turnover. [hete™tion volume a SS "l; (E6 "w ‰iƒ A. Deoxyribozyme Gates as Biosensor Elements deoxyribozyme (100 nM) and TAMRA substrate (2.5 )) or ( gate (100 nM), TAMRA substrate (1.0 "w), Input A and SNPs (0.2 "w)).] To determine the viability of using deoxyribozyme gates as probes for a sensor system and to demonstrate computational TABLE I elements based on deoxyribozymes, we constructed the ESTIMATE OF SUBSTRATE TURNOVER (E6) deoxyribozyme gate (Fig. 1). The E6 deoxyribozyme cleaves the of a chimeric substrate at the site of a single (rA) embedded in the middle of a molecule. It has a reported of 13 and a turnover rate of 0.039 for hydrolysis of a 15-mer oligonucleotide substrate containing a single ribonucleotide [7]. Stojanovic et al. [5], [12] demonstrated that when a single- ™@€ A a@@i i AaWi A  ™@ƒA . stranded oligonucleotide substrate was dual-end labeled with a i a uores™en™e emission at t aH. ™@ƒA a su˜str—te ™on™entr—tion t aH "w fluorescein donor and a TAMRA acceptor, cleavage by the E6 at in . i a uores™en™e emission at time t. deoxyribozyme resulted in a tenfold increase (estimated sub- strate turnover) in fluorescein emission at 520 nm. The authors found that the degree of substrate turnover varied between 2 and 30 with different substrates depending on their structures, as well as variation due to interbatch differences within the same substrates [8], [12]. We performed assays of the E6 that were based on the hydrolysis of dual-labeled substrate molecule (2.5 ) with a tetramethylrhodamine donor (TAMRA) at the 5 terminus and a Black-Hole 2 quencher acceptor at the 3 terminus. Cleavage of substrate resulted in a 14-fold increase in TAMRA fluorescence at 590 nm (excitation at 530 nm) due to separation of the fluorophore from the quencher (Fig. 5). To determine the effect of adding a stem-loop structure to the E6 deoxyribozyme, we estimated the substrate turnover of the (E6) deoxyribozyme gate (using (1)) with a full com- plement input sequence (Input A), SNPs 9, 23, and 36, and TAMRA-labeled substrate (1.0 ). The degree of substrate turnover decreased significantly, as shown in Table I and Fig. 5. Fig. 6. Fluorescence intensity of varying amounts of target DNA The kinetics of this modulated bi-reactant (input and substrate) [dete™tion volume a SS "l, ‰iƒ gate (100 nM), TAMRA substrate system warrants further investigation. (2.5 "w)]. Detection limit of 500.5 attomoles to 1.25 femtomoles (9.1 22.7 pM). To verify the detection capability of the deoxyribozyme gates, we performed fluorogenic assays with the (E6) gate. Using a 55 detection volume containing the 500.5 attomoles ( ; 9.1 pM) of target DNA gate (100 nM) and the TAMRA-labeled substrate ( ), as shown in Fig. 6. To determine (2.5 ), our detection limit, resolvable after 3 min, was how effective the (E6) gate is with regard to its ability between 1.25 femtomoles ( ; 22.7 pM) and to differentiate a perfectly complementary target sequence

Authorized licensed use limited to: IEEE Xplore. Downloaded on January 12, 2009 at 15:09 from IEEE Xplore. Restrictions apply. MAY et al.: TOWARDS DE NOVO DESIGN OF DEOXYRIBOZYME BIOSENSORS FOR GMO DETECTION 1015

B. Quantitative Analysis of Gate Response The accuracy of hybridization-based biosensors depends on how well we are able to determine if the “target” or a modi- fied form of the “target” is encountered based on the fluores- cence signature; where the terminology “target” is used in a general sense to include actual target organisms, as well as clas- sification categories. There are many statistical and non-sta- tistical quantitative tools and algorithms for de-noising, pro- cessing, and analyzing the output of sensors using a classical computing platform. However, our objective is to limit the amount of in silico postprocessing and perform concurrent detection and classification of targets. We investigate alterna- tive analysis methods that produce algorithms realizable using deoxyribozyme gates. As a point of reference, we consider the problem of classi- Fig. 7. Fluorescence Intensity of 45 SNP sequences vs. full complement target sequence [dete™tion volume a SS "l, ‰iƒ gate (100 nM), TAMRA sub- fying the 45 SNPs in the gate experiment (previously strate (1.0 "w), Input A and SNPs (0.2 "w)]. Differences in fluorescence in- discussed in Sections II-A and III-A). Using a traditional statis- € aIXSTiUI tensity between SNPs is statistically significant, (single-factor tical approach, we constructed a two and three class Bayesian ANOVA). classifier based on the output fluorescence data at an optimal time point, . For the two class system, we desig- nate an output fluorescence as either belonging to a SNP in the upstream region (position 1–7) or the downstream region (po- sitions 8–15). The correct classification rate for this classifier was 71.1%. The three class system designates a fluorescence as belonging to the 5 end (base position 1 to base position 5), the middle (base position 6 to base position 10), or the 3 end (base position 11 to base position 15). The correct classification rate for this classification system was 62.2%. If we use the average difference between the full complement and SNP sequence as our classification variable, the correct classification rate for the two class system increases slightly to 73.3%. The correct clas- sification rate for SNPs in the middle and 3 ends increase for the three class system to 66.7% and 93.3%, respectively, but the Fig. 8. Comparison between the predicted and measured output fluorescence overall correct classification rate decreases slightly to 60%. Al- values for InputA and 45 SNP-containing target sequences. though we are confident that better statistical classifiers can be constructed, it would not result in an algorithm we can easily implement in vitro. from sequences containing single base mutations, we compiled An alternative view of this classification problem is: how and compared experimental data for 45 single nucleotide can we algorithmically group sequences into correct or muta- polymorphism (SNP)-containing target sequences (Fig. 7). The tion/error-containing sets. Using ideas from the field of coding 45 sequences represent all possible single base mutations for theory, we can draw parallels between detection and classifi- each of the 15 positions in the target sequence. We were able to cation of mishybridizations and detection and classification of distinguish target from mismatched sequences within 3–5 min. error vectors for error control codes [13]. The differences in fluorescence intensity between the SNPs 1) Error Control Codes (ECCs) and Syndrome Generation: is statistically significant, i.e., (single-factor Error control codes (ECCs) are algorithms that recognize a set ANOVA). of target signals (binary bit streams in digital communications) 1) Comparison of Predicted and Experimental Fluorescence and specific variations (errors) of the reference signal set [14]. Measurements: Using methods discussed in Section II-B1, we The reference signal set is called a codebook, , and is a matrix derived an optimal parameter set for use in thermodynamics- where each row corresponds to a codeword vector. The ECC is based fluorescence prediction. For each of the 45 SNP-con- usually represented by a generator matrix, , which has a corre- taining target sequences, we predicted an output fluorescence sponding dual matrix , referred to as the parity check matrix. signal. Fig. 8 shows a comparison between the predicted and The relationship between the parity check matrix and the code- measured fluorescence values. Optimization of empirical fit pa- book is: , where represents multiplication, rameters yielded further improvement in prediction capability is the transpose of the parity check matrix, is the set of syn- up to a correlation coefficient of 0.8 when comparing predicted drome values, and equals zero if there are no errors in the and measured fluorescence. Our simulation capability will aid sequences in . Each vector, ,in represents an error vector. in our de novo design process by filtering out in silico gates with Unique errors in codewords should produce unique error vec- undesired and nonrobust functionality. tors until the number of errors exceeds the error detection limit

Authorized licensed use limited to: IEEE Xplore. Downloaded on January 12, 2009 at 15:09 from IEEE Xplore. Restrictions apply. 1016 IEEE SENSORS JOURNAL, VOL. 8, NO. 6, JUNE 2008 of the code. The transpose of is our syndrome generation al- gorithm; this can be used to detect and classify variations in .If the codebook, , represents the sequence group we want to rec- ognize, then our objective is to develop methods for finding . 2) Syndrome Generator for SNP Classification: Using the sequence data from Fig. 7, we reverse-engineered an ECC syn- drome generation algorithm that can identify a target and SNPs of the target using computed syndromes. We pose the problem as follows: Find such that

where • is composed of the correct sequence and all SNP variations of this sequence. • is the transpose of the dual matrix in systematic form [14], [15]. • The optimal maximizes the number of zeros in , while minimizing the number of zeros in [14], [15]. We convert the input DNA sequence set (InputA plus 45 SNPs) into their binary equivalence using a chemical ac- tivity-based mapping proposed by MacDonaill [16]. Once converted the binary coding rate is ( , ), where each DNA base is represented by a four bit binary sequence. This coding rate is equivalent to a rate 1/3 code, similar to the degeneracy of the genetic code. The binary sequence set serves as inputs to a genetic algorithm based error control code inverter for linear block codes [17]. The resulting syndrome generation algorithm, , has a fitness of 0.8609, where 1.0 is the maximum fitness. Using this code, we calculate the syndrome for each of the 46 input sequences in Matlab. The non-SNP input (InputA) produced an all zero syndrome, which in the coding theoretic sense indicates the absence of er- rors. The sequences with SNPs near the middle through 3 ends produced unique syndrome vectors (Fig. 9 middle and bottom plots, corresponding to SNPs in position 6 to 15). SNPs in the 5 end (Fig. 9, top plot) appear to have unique syndrome pat- terns, but they contain significantly more nonzero syndrome values. Syndrome patterns for the middle and 3 groups sepa- rate into distinct groups, consistent with the experimentally ob- served clustering in Fig. 7. Also, consistent with these observa- tions, the syndrome patterns for sequences in the 5 SNP group are dispersed throughout the syndrome vector window. We analyzed the syndrome vectors based on their Hamming distance. SNPs in the middle and 3 ends, in general, clustered around SNP base locations. If we omit the non-SNP sequence, the minimum hamming distance values for these SNPs corre- spond to SNPs that are co-located. Conversely, SNPs in position Fig. 9. Unique syndrome pattern can be used to identify location and type of one to five do not cluster according to SNP location. genetic modification. (Top to bottom) Syndromes from SNPs in the 5 , middle, The syndrome generation algorithm specified by and 3 ends of the input probe sequence. can be implemented using simple AND-NOT Boolean logic gates (combined to implement an exclusive-OR operation). Hence, it is feasible to implement using deoxyribozyme gate calculated syndrome vectors, this analysis can potentially be structures. Syndrome vectors can be used to uniquely identify used to determine a minimal set of target SNP mutants we the SNPs in the middle to 3 end and possibly the type of can use to identify region-specific mutations, where region mutation that occurred (usually the transition mutations have indicates the location of the SNP (5 , middle, or 3 ). Methods more ones in their syndrome than transversion mutations). to directly implement for mutation classification are Given the consistency between experimental observations and being explored. We have used this syndrome-based approach

Authorized licensed use limited to: IEEE Xplore. Downloaded on January 12, 2009 at 15:09 from IEEE Xplore. Restrictions apply. MAY et al.: TOWARDS DE NOVO DESIGN OF DEOXYRIBOZYME BIOSENSORS FOR GMO DETECTION 1017

Fig. 10. H5N1 exh (8-17) gate. to develop a deoxyribozyme-compliant algorithm for in vitro target detection and classification of avian influenza into geo- graphical subtypes [18].

IV. TOWARDS De novo DESIGN OF DEOXYRIBOZYME BIOSENSORS FOR GMO DETECTION Using the methods described herein, we demonstrate the de novo design and implementation of a deoxyribozyme biosensor for detection of key mutations in avian influenza.

A. Detecting Mutations in the HA and PB2 of Avian Influenza Since the outbreak of H5N1 avian influenza A in humans (Hong Kong, 1997), 319 confirmed cases and 192 deaths have been reported to World Health Organization (WHO) [19]. It is widely held that the virulence of influenza A in different organisms is caused by multiple [20]. The genome of the Fig. 11. Hemagglutinin (HA) and polymerase PB2 target sequences. H5N1 influenza virus consists of eight single-stranded RNA molecules which encode ten proteins [21]. Two proteins that are implicated in human virulence of H5N1 are hemag- Polymerase Basic Protein 2 (PB2) is one of three proteins glutinin (HA) and polymerase Basic Protein 2 (PB2). Moni- comprising the polymerase that replicates each of the eight viral toring HA and PB2 for the occurrence of mutations that increase genomic in infected cells. A mutation at position 627, a human susceptibility to H5N1 is critical in the race to avert a change from glutamic acid (Glu) to lysine (Lys), allows efficient pandemic. replication in mammalian cells and is implicated in human vir- Hemagglutinin (HA) is the major surface protein of the virus ulence [23]. which binds to sialic acid-containing receptors on host cells. We constructed a H5N1 AND (8-17) gate (a two-input Avian influenza A viruses recognize host receptors with sac- gate), (8-17), by appending two DNA recognition charides terminating in sialic acid 2,6-galactose (SAa2,6Gal), loops to the 8-17 deoxyribozyme (Fig. 10) [24], [25]. The while human influenza viruses recognize host receptors termi- 5 -end recognition loop binds a 15-base HA target sequence nating in SA 2,3-galactose (SAa2,3Gal) [11]. Any mutation containing the implicated position 192 arginine codon, while that might enable an avian H5N1 virus to bind to human-type the 3 -end recognition loop binds a 15-base PB2 target se- receptors would allow H5N1 viruses to replicate efficiently in quence containing the implicated position 627 glutamic acid humans. Mutations at positions 182, i.e., asparagine (Asn) to codon (Fig. 11). Both the HA and PB2 target sequences were lysine (Lys), and 192, i.e., glutamine (Gln) to arginine (Arg), in obtained from the Influenza Sequence Database (ISD), H5N1 hemagglutinin independently convert host-cell recognition from strain A/Viet Nam/HN30408/2005 and H5N1 strain A/Hong the avian receptor to the human receptor [22]. Kong/156/97, respectively [26].

Authorized licensed use limited to: IEEE Xplore. Downloaded on January 12, 2009 at 15:09 from IEEE Xplore. Restrictions apply. 1018 IEEE SENSORS JOURNAL, VOL. 8, NO. 6, JUNE 2008

we can potentially increase the sensitivity and specificity of the biosensor, hence increasing accuracy by increasing computation. We are investigating immobilization methods for electro- chemical detection of deoxyribozyme gate activity. The electro- chemical detection platform will produce differential currents to indicate positive or negative algorithm outcomes. This will obviate our need for a fluorescence reader, thus permitting the development of a field deployable rapid detection system.

ACKNOWLEDGMENT The authors would like to thank M. Finley (Sandia National Laboratories) for insights on avian influenza surveillance; M. Stojanovic (Columbia University), D. Stefanovic (Univer- sity of New Mexico), and J. Macdonald (Columbia University) Fig. 12. Normalized fluorescence output for the H5N1 exh (8-17) for insights on deoxyribozyme gates; M. T. Lee (University gate. of Pennsylvania School of Medicine) for contributions in the computational modeling aspects of this work.

To determine if the (8-17) gate could dif- REFERENCES ferentiate perfectly complementary normal target sequences [1] M. N. Stojanovic and D. Stefanovic, “A deoxyribozyme-based molec- ular automaton,” Nature Biotechnology, vol. 21, pp. 1069–107, 2003. from sequences containing the HA and the [2] J. Wang, “Survey and summary: From DNA biosensors to chips,” PB2 mutations, either independently or con- Nucleic Acids Res., vol. 28, no. 16, pp. 3011–3016, 2000. currently, we compared experimental data for the following [3] D. Horejsh, F. Martini, F. Poccia, G. Ippolito, A. DiCaro, and M. Capo- bianchi, “A molecular beacon, bead-based assay for the detection of nu- sequence pairs. cleic acids by flow cytometry,” Nucleic Acids Res., vol. 33, no. 2, 2005. • Normal HA/Normal PB2 Targets (Inputs HA/PB2). [4] Z.-S. Wu, J.-H. Jian, G.-L. Shen, and R.-Q. Yu, “Highly sensitive DNA • Normal HA/Mutated PB2 Targets (HA/PB2 SNP23). detection and identification: An electrochemical ap- proach based on the combined use of ligase and reverse molecular • Normal PB2/Mutated HA Targets (PB2/HA SNP1). beacon,” Human Mutation, vol. 28, no. 6, pp. 630–637, 2007. • Mutated HA/Mutated PB2 Targets (HA SNP1/PB2 [5] M. N. Stojanovic, P. de Prada, and D. W. Landry, “Catalytic molecular beacons,” Chembiochem, vol. 2, pp. 411–415, 2001. SNP23). [6] M. Zuker, “Mfold web server for nucleic acid folding and hybridization • Normal HA Target Only (Input HA Only). prediction,” Nucleic Acids Res., vol. 31, pp. 3406–3415, 2003. • Normal PB2 Target Only (Input PB2 Only). [7] R. Breaker and G. Joyce, “A DNA enzyme with Mg2+ dependent RNA phosphoesterase activity,” Chem. Biol., vol. 2, pp. 655–660, 1995. Background subtracted fluorescence rates were normalized [8] J. Macdonald, D. Stefanovic, and M. N. Stojanovic, “Solution-phase using the formula: , where is molecular-scale computation with deoxyribozyme-based logic gates and fluorescent readouts,” Methods Mol. Biol., vol. 335, no. NIL, pp. the slope of the fluorescence intensities from 10–20 min (the 343–63, 2006. most linear portion of the curve) for each experimental sample [9] N. Peyret and J. SantaLucia, Jr., HYTHER Version 1.0.. and is the slope of fluorescence intensities [10] J. SantaLucia, Jr., “A unified view of polymer, dumbbell, and oligonu- cleotide DNA nearest-neighbor thermodynamics,” Proc. Natl. Acad. of the fully complement HA and PB2 target sequences. Thus, Sci. USA, vol. 95, p. 1460, 1998. the for the fully complement target [11] N. Peyret, P. A. Seneviratne, H. T. Allawi, and J. SantaLucia, Jr., sequences is equal to 1. Results are shown in Fig. 12. “Nearest-neighbor thermodynamics and NMR of DNA sequences with internal AA, CC, GG, and TT mismatches,” Biochemistry, vol. We were able to distinguish normal targets from mismatched 38, pp. 3468–3477, 1999. sequences, i.e., the greater the total number of mismatches, the [12] M. N. Stojanovic, T. E. Mitchell, and D. Stefanovic, “Deoxyribozyme- based logic gates,” J. Amer. Chem. Soc., vol. 124, pp. 3555–3561, 2002. lower the normalized fluorescence rates. When only one target [13] E. E. May, M. A. Vouk, D. L. Bitzer, and D. I. Rosnick, “Coding theory sequence is present, fluorescence rates are significantly lower based models for protein translation initiation in prokaryotic organ- than either normal or mismatched targets. We noted that the nor- isms,” BioSystems, 2004. [14] S. Lin and D. J. Costello, Error Control Coding: 2nd Edition. Engle- malized fluorescence rate for the HA SNP1/normal PB2 target wood Cliffs, NJ: Prentice-Hall, 2004. pair was high, i.e., 0.94, compared with the rate for the PB2 [15] E. E. May, “Optimal generators for a systematic block code model of SNP23/normal HA target pair, i.e., 0.55. We are conducting fur- prokaryotic translation initiation,” in Proc. 25th Silver Anniversary Int. Conf. IEEE Eng. Med. Biol. Soc., 2003, pp. 4548–4551. ther experiments to determine the reason for this. [16] D. MacDonaill, “A parity code interpretation of nucleotide alphabet composition,” Chem. Commun., pp. 2062–2063, 2002. [17] E. E. May, M. A. Vouk, and D. L. Bitzer, “An error-control coding ONCLUSION V. C model for classification of escherichia coli K-12 ribosome binding We have developed methods that enable the de novo design sites,” IEEE EMB Mag., pp. 90–97, Jan. 2006. [18] E. E. May, P. L. Dolan, P. S. Crozier, S. M. Brozik, M. Manginell, and realization of a computational biosensor based on deoxyri- R. Polsky, J. C. Harper, J. Rawlings, and M. T. Lee, Deoxyribozyme bozymelogicgates.Thebiosensorsystemisabletodetectgenetic Biosensor: DNA-Based Intelligent Microsensors for Genetically Mod- modifications in the target and perform complex detection tasks, ified Organisms (GMO). Albuquerque, NM: Sandia National Labo- ratories, Jan. 2008. reducing reliance on classical computing for output interpreta- [19] WHO, 2007, World Health Organization [Online]. Available: http:// tion. The ability to compute differentiates this system from other www.who.int/csr/disease/avian_influenza/country/cases_tab%le_2007 _07_25/en/index.html molecular beacon based systems. By increasing the complexity [20] R. Krug, “Clues to the virulence of H5N1 viruses in humans,” Science, of the algorithm (or incorporating more computational wells), vol. 311, pp. 1562–1563, 2006.

Authorized licensed use limited to: IEEE Xplore. Downloaded on January 12, 2009 at 15:09 from IEEE Xplore. Restrictions apply. MAY et al.: TOWARDS DE NOVO DESIGN OF DEOXYRIBOZYME BIOSENSORS FOR GMO DETECTION 1019

[21] R. Lamb and R. Krug, “Orthomyxoviridae: The viruses and their Paul S. Crozier received the Ph.D. degree in chem- replication,” in Fundamental Virology, 4th ed. New York: Lippincott ical engineering from Brigham Young University, Williams and Wilkins, 2001, pp. 725–769. Provo, UT, in 2001, his dissertation on slab geometry [22] S. Yamada et al., “Haemagglutinin mutations responsible for the molecular dynamics simulations. binding of H5N1 influenza A viruses to human-type receptors,” For the past seven years, he has been working on Nature, vol. 444, pp. 378–382, 2006. molecular dynamics and related code development [23] K. Shinya, S. Hamm, M. Hatta, H. Ito, T. Ito, and Y. Kawaoka, “PB2 and applications projects at Sandia National Lab- at position 627 affects replicative efficiency, but not cell oratories, where he is currently a Senior Member tropism, of Hong Kong H5N1 influenza A viruses in mice,” Virology, of Technical Staff. He is one of the developers of vol. 320, pp. 258–266, 2004. the open-source parallel molecular dynamics code, [24] S. W. Santoro and G. F. Joyce, “A general purpose RNA-cleaving DNA LAMMPS, and served as the principal investigator of enzyme,” Proc. Natl. Acad. Sci., vol. 94, pp. 4262–4266, 1997. a code development project aimed at acceleration of biomolecular simulation. [25] J. Li, W. Zheng, A. H. Kwon, and Y. Lu, “In vitro selection and char- His research interests include biomolecular and materials science simulation acterization of a highly efficient Zn(II)-dependent RNA-cleaving de- applications, as well as molecular simulation methods development. oxyribozyme,” Nucleic Acids Res., vol. 28, pp. 481–488, 2000. [26] C. Macken, H. Lu, J. Goodman, and L. Boykin, “The value of a data- base in surveillance and vaccine selection,” in Options for the Control of Influenza IV. New York: Elsevier Science, 2001, pp. 103–106. Susan Brozik received the Ph.D. degree in analytical chemistry from Wash- ington State University, Pullman, WA, in 1994. She held postdoctoral research positions in biochemistry at Emory University Elebeoba E. May (S’99–M’02) received the Ph.D. and Los Alamos National Laboratory before joining Sandia National Laborato- degree in computer engineering from North Carolina ries in 1999, where she is currently a Principal Member of Technical Staff. Her State University, Raleigh. current research interests include development of sensitive chemical and bio- She joined Sandia National Laboratories’ Com- logical sensors applicable to CW/BW agent detection and medical diagnostics. putational Biology Department in April 2002. Her Sensor technologies include electrochemical, acoustic, optical, and cell-based research interests include the use and application sensing. This work encompasses development of nano and microelectrode ar- of information theory, coding theory, and signal rays, genetic engineering of reporter cell lines, and development of microflu- processing to the analysis of genetic regulatory idic measurement platforms. Other activities include the generation of novel mechanisms, the design and development of intel- bio-inspired materials through gene engineering, assembly of chaperonin reac- ligent biosensors, and large-scale simulation and tors for synthesis of nanomaterials, and development of catalytic nanoparticles analysis of biological pathways and systems. for signal amplification and label free detection. Dr. May has served as an Associate Editor and Reviewer for the IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY in Biomedicine and as a Guest Editor for the IEEE Engineering in Medicine and Biology Magazine for the Special Issue on Communication Theory, Coding Theory, and Molecular Monica Manginell received the B.S. degree in biology from the University of Biology. She is the recipient of a 2005 NIH/NHLBI Mentored Quantitative New Mexico, Albuquerque, in 1994. Research Career Development Award and the 2003 Women of Color Research She joined Sandia National Laboratories in 1997. Her research interests Sciences and Technology Awards for Outstanding Young Scientist or Engineer. include genetic engineering of cell lines for detection of chemical and biolog- Organizational memberships include the IEEE Engineering in Medicine and ical agents and design of novel in vivo cellular recognition and amplification Biology Society, the IEEE Signal Processing Society, and the IEEE Information schemes. Other activities include biocatalysis and gene engineering for en- Theory Society. hanced enzyme function, and development of miniaturized high throughput microfluidic platforms for protein assays.

Patricia L. Dolan received the Ph.D. degree in biology from the University of New Mexico, Albuquerque. She joined Sandia National Laboratories’ Biosen- sors and Nanomaterials Department in January 2003. Her research interests include development of chem- ical and biological microsensors, biofilm detection and control in membrane-based water treatment sys- tems, and protein engineering and biocatalysis for en- hanced enzyme function. As a member of a multidis- ciplinary team, she engineered a more active glucose oxidase enzyme for use in a glucose-fueled bio micro fuel cell. Dr. Dolan was the recipient of a 2005 Employee Recognition Award for Ex- ceptional Teamwork and Creativity in developing new technologies for sustain- able micro scale power sources powered by carbohydrate fuels. She is a member of the American Society for Microbiology.

Authorized licensed use limited to: IEEE Xplore. Downloaded on January 12, 2009 at 15:09 from IEEE Xplore. Restrictions apply.