<<

PERSPECTIVES

ness of solutions is easy to check, and ‘NP- TIMELINE complete’ problems are considered the hardest of the class because they require exponentially increasing amounts of time to solve.) The past, present and future of The problem asks whether, given a set of n cities (‘vertices’) with m paths (‘edges’) con- molecular necting them, a ‘hamiltonian’ path exists that starts at a given vertex vin, passes through each vertex exactly once, and ends at vertex vout. Adam J. Ruben and Laura F. Landweber For fairly small values of n, today’s easily solve this problem. However, when n becomes very large, the amount of time Ever since scientists discovered that 2 required to generate and check every possible conventional silicon-based computers have solution increases exponentially, making very an upper limit in terms of speed, they have 1 3 large calculations infeasible. been searching for alternative media with The version Adleman solved contained v which to solve computational problems. in only seven vertices (FIG. 1), which is no remark- That search has led them, among other 0 able feat in the world of science. He places, to DNA. used a simple, brute-force algorithm: generate vout random paths through the graph, discard any When most people think of a ‘DNA comput- 6 that do not begin at v and end at v , discard 4 in out er’,the first image that springs to mind is a any that do not enter exactly n vertices, and -like interface with micro- 5 discard any that do not pass through each ver- centrifuge tubes lined up inside the central Figure 1 | An example of a seven-vertex, 13- tex at least once. But the use of DNA to solve processor and a keyboard that plugs directly edge graph. The ‘salesman’ starts his journey at this small computational problem marked the ′ into the molecule’s 5 end. Perhaps someday vertex 0 (vin) and ends at vertex 6 (vout). genesis of a new scientific field. such a project will become reality. At the Adleman began by synthesizing a random moment, however,‘DNA computing’ is the have always hoped would be innately linked. 20-base-pair DNA (20-mer) slightly misleading title applied to experi- Much of the human body is, in theory, a flood to represent each vertex, followed by another ments in which DNA molecules have compu- of binary operations. We could be staring into series of 20-mers to represent edges. The edge tational roles. Often sequences of about 8–20 the abyss of a science as doomed as phrenolo- DNA had a certain built-in feature: in each base pairs are used to represent bits, and gy or mesmerism. But we may be at the fore- 20-mer, the first ten nucleotides comple- numerous methods have been devised to front of a new and creative technology whose mented the last ten of one vertex, and the last manipulate and evaluate them. implications have not even been fully ten complemented the first ten of another DNA is a convenient choice, as it is both mapped out, let alone realized. Fifty years vertex (FIG. 2). For example, a 20-base-pair self-complementary, allowing single-strand- from now, molecular computing may con- edge connecting vertex one to vertex two ed DNA to select its own Watson–Crick ceivably be important in our lives — who, would consist of the complements to vertex complement, and can easily be copied. Also, fifty years ago, could have predicted the per- one’s last ten base pairs and vertex two’s first molecular biologists have already built a tool- sonal computer? — and it all began with an ten. That way, when the mixture of DNA is box for manipulating DNA, including experiment published by a computer scientist restriction enzyme digestion, ligation, in Science in 1994. Path 1–2 sequencing, amplification and fluorescent labelling, giving DNA a head start over alter- The Travelling Salesman Problem native computational media. In a seminal paper, Leonard Adleman1 solved 12 This unique combination of computer sci- a simplified instance of a famous NP-com- Figure 2 | Edge DNA forming a splint bridging ence and has fascinated the plete computer problem called the Travelling two vertices. The last five nucleotides of vertex world for nearly six years, perhaps because it Salesman Problem. (‘NP’ is the name given to 1, and the first five of vertex 2, complement the finally links two popular disciplines that we the class of search problems for which correct- nucleotides on vertex 1–2.

NATURE REVIEWS | MOLECULAR CELL BIOLOGY VOLUME 1 | OCTOBER 2000 | 69 PERSPECTIVES

denatured and then allowed to anneal and Six years later, DNA computing is still in ligate, the edge will form a splint to ‘connect’ “Here’s nature’s toolbox … its infancy. Researchers have developed sever- both vertices, and T4 DNA ligase can join al algorithms to solve classic problems, and a together all possible such combinations. It is a bunch of little tools that handful have been tested and work. Despite the bulk-annealing step that allows this mole- are dirt cheap; you can buy current progress, we are still a long way away cular approach to test a massive number of from solving complex problems. possibilities in parallel. a DNA strand for 100 This initial denaturing/hybridization/liga- femtocents.” Computing on surfaces tion step generated over 1013 strands of DNA. There is an interesting and potentially useful Among these, it was probable that at least one exists and ‘readout’ of the correct solution approach now available to the field of DNA encoded the hamiltonian path. Next, all required graduated PCR on the final product. computing: computing on surfaces2. This

sequences that began at vin and ended at vout This involved a series of six different PCR involves affixing the solution DNA strands, were selectively amplified by the polymerase reactions, each using the vin forward primer correct and incorrect, to a solid medium, and chain reaction (PCR) using primers recogniz- and a primer complementary to each of the — in a subtractive algorithm, for example — ing those sequences. Any path that did not other 20-mer vertices, which were then selectively destroying the incorrect ones. pass through exactly seven points was then analysed in separate lanes on a gel. (So, for Liu et al.2 used this approach to solve a eliminated by gel-purifying only the 140- example, a readout showing bands of 40, x, small instance of an NP-complete satisfiability base-pair product (necessarily containing 60, 80, 100 and 120 base pairs, where x repre- (SAT) problem involving Boolean logic. They seven 20-mers). Finally, to eliminate solutions sents the absence of a band in the second lane, began with a logical statement in four vari- that did not pass through each vertex exactly would represent the path 0→1, 1→3, 3→4, ables divided into four connected sections, or once, the product from the previous step was 4→5, 5→6. This is not a hamiltonian path, as ‘clauses’.As each variable could be either 0 or affinity purified by denaturing the double- it misses the second vertex, and so would have 1, there existed 24 = 16 possible solutions to stranded paths and removing the biotinylated been eliminated in the size-purification step.) the problem, four of which were correct. complementary strand on magnetic beads. Adleman’s path, however, did indeed pass A DNA strand corresponding to each pos- The first affinity target was a single-stranded through each vertex exactly once. Although sibility was synthesized. Each strand had the 20-mer complementary to the sequence of several of these methods have been modified format 5′-FFFFvvvvvvvvFFFF-3′, where the the second vertex, so only paths that con- and improved, graduated PCR remains a sim- eight bases labelled ‘F’ are fixed (for use in the tained the second ‘city’ were retained. This ple and rapid method for readout that PCR amplification step) and ‘vvvvvvvv’ is a process was repeated for every vertex except bypasses DNA sequencing. unique octanucleotide corresponding to a the first and the last (as all paths were already As this was the first experimental demon- different predetermined solution. The strands

bounded by vin and vout PCR primers). stration of DNA computing, Adleman reflect- were bound to solid media (a maleimide- The presence of a DNA band at the end ed on some possible implications. This com- functionalized gold surface), and a fixed 21- indicated that, for the given graph, a hamil- putation required about seven days of mer spacer sequence was attached to the 5′ tonian path exists. Confirmation that a path laboratory work, so for large n such a process end in order to separate the solution could prove unwieldy, unless some sort of sequence from the support. All DNA was automation or alternative algorithm could be single stranded. aabc devised. However, a typical desktop computer The algorithm used at this point involved def can execute about 106 operations per second, a cycle of mark–destroy–unmark operations. gh i and the fastest supercomputer can execute First the sequences encoding correct solu- about 1012 — but Adleman’s molecular com- tions were marked (by hybridization to their 01 01 putation, if each ligation counts as an opera- complements), then all unmarked (single- 14 i tion, did over 10 . Scaling up the ligation step stranded) sequences were destroyed using 200 20 h could push the number over 10 operations Escherichia coli exonuclease I, and finally the g per second. Furthermore, it used an extremely remaining solutions were unmarked by small quantity of energy — 2 × 1019 opera- removing their complements. This cycle was f 150 tions (ligations) per joule, whereas the second then repeated for each clause of the problem. e law of thermodynamics dictates a theoretical A distinct advantage of surface computing × 19 d maximum of 34 10 operations per joule. Modern supercomputers only operate at 109 1 100 operations per joule . Finally, one bit of infor- c mation can be stored in a cubic nanometre of DNA, which is about 1012 times more efficient b than existing storage media1. So molecular computers have the potential to be far faster and more efficient than any electronics we have developed. There is, of course, a possibil- a ity for error, although in this example the 50 thoroughness of each step reduces the chance Figure 4 | Five correct solutions to the Knight Figure 3 | Solving the Knight Problem. Multiplex of survival of an incorrect path. However, in Problem and one incorrect solution. In the last linear PCR produces a ‘bar-code’ representing two larger instances of the same algorithm, errors solution, the white knight in square i (see FIG. 3 for configurations of knights on a 3 × 3 chessboard3. are more likely to propagate. key) threatens the knights in squares b and d.

70 | OCTOBER 2000 | VOLUME 1 www.nature.com/reviews/molcellbio PERSPECTIVES is the readout step. This involves a second sur- 1234 5 67 face, which is a DNA chip — an addressed array of surface-bound . The remaining solution strands were amplified by Macronucleus PCR (using FFFF-region primers) and fluores- cein labelled. They were then hybridized to a DNA chip on which each of the original 16 solution molecules were bound in a predeter- Micronucleus mined spot. The four solutions hybridized to their four corresponding spots, and their pres- 1 3 4 ence was then detected by a fluorescence inten- 5 7 2 sity of 10 to 777 times the background intensity 6 of the other 12 spots. So a lot of technology was Figure 5 | DNA computing in vivo. Scrambled genes in some ciliates — microbial eukaryotes of the needed to recover the four sequences out of 16, genus Stylonychia or Oxytricha — undergo massive rearrangement to form a functional gene in the macronucleus (top) that encodes a single protein product. Telomere repeats (yellow boxes) mark and but again it proved the principle. protect the surviving ends. Dispersed coding segments 1–7 (blue boxes) become joined at their ends, DNA chips allow more rapid throughput analogous to the assembly of a seven-vertex path through a graph, such as the one in FIG. 1. than direct sequencing, particularly in the crucial readout step. However, graduated PCR1 has the advantage that it can not only sisted of a series of ten 15-nucleotide bits DNA strand and hence digestion by RNase H. determine which strand is present, but it also (each of which can represent either 1 or 0, rep- So accurate size purification of the data pool measures its length, which is necessary for cer- resenting the presence or absence of a knight (with a resolution of 1–2 nucleotides) would tain problems. The benefits of using the chip in this case) separated by nine 5-nucleotide effectively eliminate this most common method are also sizeable. For example, each spacers. The 5′‘prefix’ and 3′‘suffix’ sequences source of error. However, the 2.3% error 16-base (FFFFvvvvvvvvFFFF) oligonucleotide at the beginning and end permit PCR amplifi- (incorrect placement of one knight out of 127 ‘word’ can be extended to several words in cation and T7 RNA transcription of each on 43 boards), although not bad in a normal sequence. This sort of flexibility allows for library strand. molecular biology experiment, is still higher easy scaling up, meaning that a much more As Liu et al.2 did, we separated the correct than that in a computer chip. Fortunately in complex, chip-based DNA computer could be solutions from the data pool by ‘destroying’ this type of search problem one can easily designed2. Another advantage is the ease of the incorrect solutions. This time, however, check if the recovered solution is correct; using solid-state media such as DNA chips. the enzyme used was ribonuclease H (RNase hence it is robust to small amounts of error. This allows the DNA to be easily separated H) because it digests RNA sequences from solutions, and so no column separation hybridized to DNA, thus destroying specifi- Pros and cons or nucleic acid precipitation steps are neces- cally marked strands rather than unmarked Molecular computing may or may not be a sary. The only components bound to the chip ones. Furthermore, the use of RNase H in wave of the future, paving the way for tech- are the DNA and anything attached to it. This combination with any complementary DNA nological advances in chemistry, computer greatly streamlines complex, repetitive chemi- oligonucleotide is universally scalable, allow- science and biology. At its present stage of cal processes and, perhaps most importantly ing specific digestion of many ‘words’ in par- development it has several challenges. for the field of DNA computing, opens a door allel with a single enzyme. This flexibility in First, the materials used, whether DNA, towards automation. cleaving target sequences was an important RNA or proteins, are not reusable1. Whereas a incentive for choosing RNA. With DNA, silicon computer can operate indefinitely RNA computing computing can be done with the finite cata- with electricity as its only input, a molecular Earlier this year our laboratory demonstrated logue of restriction enzymes4, but these computer would require periodic refuelling the first use of RNA to solve a computational require specific, often palindromic, sequences and cleaning. This illustrates a second impor- problem3. Known as the Knight Problem, this and the use of several enzymes — that may tant drawback to DNA computing — the nine-bit SAT problem (the largest molecular not be compatible — in one pot. molecular components used are generally computation solved so far) asks, given a 3 × 3 Our readout involved multiplex linear specialized. A very different set of oligonu- chessboard, what configurations of knights PCR, a process similar to Adleman’s graduat- cleotides is used to solve the Knight Problem3 may be placed on the board such that none ed PCR1, except that we combined all the nec- than is used to find a hamiltonian path1. threatens any others. (The knight can attack essary primers in one reaction tube5,produc- A third notable downside is the error rate any piece placed two spaces away from it in ing a bar-code pattern of all the amplified — typically, in vivo, with all the cellular one direction and one space in the other — products in two lanes on a polyacrylamide gel machinery functioning properly, a mismatch that is, moving in an ‘L’shape.) There are 94 (FIG. 3)3. Readouts were taken of 43 randomly error occurs only once every billion base pairs correct solutions (out of a possible 29 = 512), sampled strands remaining at the end of the or so6. However, when DNA is used to com- ranging from one solution with zero knights algorithm, of which 42 offered correct solu- pute in vitro, the conditions are hardly as on the board to two solutions with five tions to the Knight Problem (FIG. 4). Attention good as those in vivo, and although the error knights on the board. therefore focused on the forty-third solution: rate may seem acceptable for other experi- We generated a ten-bit DNA data pool, If it was incorrect, why had it not been ments, computational problems generally with a tenth bit included in case one of the cleaved by RNase H? As it turned out, the require higher standards. In order to rival previous nine should compute less reliably RNA strand contained an adjacent point conventional computers, all of the procedures (similar to having ‘spare blocks’ on computer mutation and a deletion in bit nine, prevent- described here would need to lower their disks). Each RNA strand in the data pool con- ing its hybridization to the complementary error rates to possibly unattainable levels.

NATURE REVIEWS | MOLECULAR CELL BIOLOGY VOLUME 1 | OCTOBER 2000| 71 PERSPECTIVES

Finally, the most apparent drawback is the Whether or not nucleic acid computers 175–179 (2000). 3. Faulhammer, D., Cukras, A., Lipton, R. J. & Landweber, L. time required for each computation. ultimately prove feasible, they have already F. Molecular computation: RNA solutions to chess Whereas a simple desktop computer can contributed to multi-disciplinary science by problems. Proc. Natl Acad. Sci. USA 97, 1385–1389 (2000). solve the seven-city instance of the Travelling causing us to question the nature of comput- 4. Ouyang, Q., Kaplan, P. D., Liu, S. & Libchaber, A. DNA Salesman Problem in less than a second, ing and to forge new links between the biolog- solution of the maximal . Science 278, 1 446–449 (1997). Adleman took seven days . The use of DNA ical and computational sciences. For example, 5. Henegariu, O., Heerema, N. A., Dlouhy, S. R., Vance, G. H. chips2 or other approaches may eventually it has led us to focus on the nature of biologi- & Vogt, P. H. Multiplex PCR: Critical parameters and step- by-step protocol. Biotechniques 23, 504–511 (1997). lead to automation, which would save con- cal DNA computations, such as the assembly 6. Karp, G. Cell and Molecular Biology: Concepts and siderable amounts of time, but fundamental of modern genes from encrypted building- Experiments 2nd edn (John Wiley & Sons, New York, 1999). DNA computing technology needs to blocks in the genomes of some single-celled 7. Seife, C. RNA works out knight moves. Science 287, advance far beyond its current bounds before ciliates (FIG. 5)14. After all, our bodies already 1182–1183 (2000). 8. Condon, A. & Rozenberg, G. (eds) Prelim. Proc. 6th Int. it can be made practical. contain millions of complicated, efficient, Meet. DNA Based Computers (Leiden Univ., The DNA computing has its advantages, evolved molecular computers called cells. Netherlands, 2000). 9. Meller, A. et al. Rapid nanopore discrimination between though. One is its massive parallelism — that Adam J. Ruben and Laura F. Landweber are in the single polynucleotide molecules. Proc. Natl Acad. Sci. is, brute-force algorithms can search through Department of Ecology and Evolutionary Biology, USA 97, 1079–1084 (2000). 10. Sakamoto, K. et al. Molecular computation by DNA quadrillions of molecules at the same time Princeton University, Princeton, hairpin formation. Science 288, 1223–1226 (2000). and find a correct solution, akin to in vitro New Jersey 08544, USA. 11. Seeman, N. C. DNA engineering and its application to biotechnology. Trends Biotechnol. 17, 437–443 (2000). selection3. Another is miniaturization. And e-mail: [email protected] Links 12. Winfree, E. et al. in DNA Based Computers II. DIMACS once the procedures are under control, the Series in Discrete Mathematics and Theoretical Computer Science Vol. 44 (eds Landweber, L. F. & Baum, raw materials cost less too.“Here’s nature’s FURTHER INFORMATION DNA computing: a E. B.) (American Mathematical Soc., Providence, Rhode toolbox,”commented Adleman7,“a bunch of primer | Laura Landweber’s homepage Island, 1999). 13. Winfree, E., Liu, F., Wenzler, L. A. & Seeman, N. C. little tools that are dirt cheap; you can buy a Design and self-assembly of two-dimensional DNA DNA strand for 100 femtocents.” crystals. Nature 394, 539–544 (1998). 1. Adleman, L. Molecular computation of solutions to 14. Landweber, L. F., Kuo, T.-C. & Curtis, E. A. Evolution and combinatorial problems. Science 266, 1021–1023 (1994). assembly of an extremely scrambled gene. Proc. Natl The near future 2. Liu, Q. et al. DNA computing on surfaces. Nature 403, Acad. Sci. USA 97, 3298–3303 (2000). Now is an exciting time in the field of DNA computing, as there is so much that has not been tried. In June, over 120 molecular biolo- gists, computer scientists, mathematicians TIMELINE and chemists from around the world gath- ered in Leiden8 to discuss the latest in DNA computing technology. Hayflick, his limit, and cellular ageing Clearly a next step is automation. McCaskill and colleagues in Germany have constructed a ‘microflow reactor’ on which Jerry W. Shay and Woodring E. Wright they propose to solve a 20-bit satisfiability problem in an hour and a half 8. One could also construct a microfluidic device consist- Almost 40 years ago, Leonard Hayflick in culture are immortal, and that the lack of ing of gated channels so small that only one discovered that cultured normal human cells continuous cell replication was due to igno- molecule can pass through at a time9, vastly have limited capacity to divide, after which rance on how best to cultivate the cells. improving readout8. And a team led by they become senescent — a phenomenon Carrel’s view was based on his and Albert Adleman recently solved a 6-variable, 11- now known as the ‘Hayflick limit’. Hayflick’s Ebeling’s work, done at the Rockefeller clause satisfiability problem using a ‘dry’ com- findings were strongly challenged at the Institute in New York City, in which they puter consisting of thin, gel-filled glass tubes8. time, and continue to be questioned in a few claimed that chick heart fibroblasts grew con- As for DNA chips, their future in DNA circles, but his achievements have enabled computing looks bright as well, because ‘uni- others to make considerable progress versal’ DNA chips could contain every possi- towards understanding and manipulating the ble DNA sequence of a given length (probably molecular mechanisms of ageing. about 8–12 base pairs). Hagiya and colleagues in Tokyo are finding creative uses for single- To set Hayflick’s discoveries in context, we stranded DNA molecules that fold into intra- need to go back to 1881 (TIMELINE, overleaf), strand ‘hairpins’8,10. Winfree, Seeman and col- when the German biologist August leagues — responsible for construction of Weismann1 speculated that “death takes place beautiful assemblies with DNA, such as a because a worn-out tissue cannot forever DNA nano-cube11 — have proposed the renew itself, and because a capacity for assembly of even more ordered structures that increase by means of cell division is not ever- show patterned algorithmic supramolecular lasting but finite”.This concept, which was self-assembly8,11–13. Even a handful of mathe- almost entirely forgotten by the time Hayflick maticians have lent a hand, proposing faster began his work, was later challenged by the and more efficient algorithms tailored to the French Nobel-prize-winning surgeon Alexis Figure 1 | Leonard Hayflick in 1988. needs of DNA computing8. Carrel, who suggested that all cells explanted (Photograph: Peter Argentine.)

72 | OCTOBER 2000 | VOLUME 1 www.nature.com/reviews/molcellbio