The Evolution of Cellular Computing: Nature's Solution to a Computational Problem
Total Page:16
File Type:pdf, Size:1020Kb
BioSystems 52 (1999) 3–13 www.elsevier.com/locate/biosystems The evolution of cellular computing: nature’s solution to a computational problem Laura F. Landweber a,*, Lila Kari b a Ecology and E6olutionary Biology, Princeton Uni6ersity, Princeton, NJ 08544-1003, USA b Department of Computer Science, Uni6ersity of Western Ontario, London, Ont., N6A 5B7, Canada Abstract How do cells and nature ‘compute’? They read and ‘rewrite’ DNA all the time, by processes that modify sequences at the DNA or RNA level. In 1994, Adleman’s elegant solution to a seven-city directed Hamiltonian path problem using DNA launched the new field of DNA computing, which in a few years has grown to international scope. However, unknown to this field, two ciliated protozoans of the genus Oxytricha had solved a potentially harder problem using DNA several million years earlier. The solution to this problem, which occurs during the process of gene unscrambling, represents one of nature’s ingenious solutions to the problem of the creation of genes. RNA editing, which can also be viewed as a computational process, offers a second algorithm for the construction of functional genes from encrypted pieces of the genome. © 1999 Elsevier Science Ireland Ltd. All rights reserved. Keywords: DNA computing; Scrambled gene; Molecular evolution; Ciliate; Hypotrich 1. Gene unscrambling as a computational problem germline micronucleus after sexual reproduction, during the course of development. The genomic copies of some protein-coding genes in the mi- 1.1. Introduction cronucleus of hypotrichous ciliates are obscured by the presence of intervening non-protein-coding Ciliates are a diverse group of 8000 or more DNA sequence elements (internally eliminated se- unicellular eukaryotes (nucleated cells) named for quences or IESs). These must be removed before their wisp-like covering of cilia. They possess two the assembly of a functional copy of the gene in types of nuclei: an active macronucleus (soma) and the somatic macronucleus. Furthermore, the a functionally inert micronucleus (germline), which protein-coding DNA segments (macronuclear des- contribute only to sexual reproduction. The so- tined sequences or MDSs) in Oxytricha species matically active macronucleus forms from the are sometimes present in a permuted order rela- tive to their final position in the macronuclear * Corresponding author. Tel.: +1-609-258-1947; fax: +1- copy. For example, in O. no6a, the micronuclear 609-258-1682. a E-mail addresses: lfl@princeton.edu (L.F. Landweber), copy of three genes (Actin I, -telomere binding [email protected] (L. Kari) protein, and DNA polymerase a) must be re- 0303-2647/99/$ - see front matter © 1999 Elsevier Science Ireland Ltd. All rights reserved. PII: S0303-2647(99)00027-1 4 L.F. Landweber, L. Kari / BioSystems 52 (1999) 3–13 1.4 introduces a formal model of gene unscram- bling. (Adleman’s algorithm involves the use of edge-encoding sequences as splints to connect city-encoding sequences, allowing the formation of all possible paths through the graph (Fig. 1). Afterwards, a screening process eliminates the paths that are not Hamiltonian, i.e. ones which either skip a city, enter a city twice, or do not start and end in the correct origin and final destinations.) The developing ciliate macronuclear ‘computer’ (Figs. 2 and 3) apparently relies on the informa- Fig. 1. DNA hybridization in a molecular computer. PCR primers are indicated by arrows. tion contained in short repeat sequences to act as guides in a series of homologous recombination ordered and intervening DNA sequences removed events (Table 1). These guide sequences provide in order to construct functional macronuclear the splints analogous to the edges in Adleman’s genes. Most impressively, the gene encoding DNA graph, and the process of recombination results in polymerase a (DNA pol a)inO. trifallax is linking the protein-encoding segments (MDSs, or apparently scrabled in 50 or more pieces in its ‘cities’) that belong next to each other in the final germline nucleus (Hoffman and Prescott, 1997). protein coding sequence (‘Hamiltonian path’). As Destined to unscramble its micronuclear genes by such, the unscrambling of sequences that encode putting the pieces together again, O. trifallax rou- DNA polymerase a accomplishes an astounding tinely solves a potentially complicated computa- feat of cellular computation, especially as 50-city tional problem when rewriting its genomic Hamiltonian path problems are often considered sequences to form the macronuclear copies. hard problems in computer science and present a This process of unscrambling bears a remark- formidable challenge to a biological computer. able resemblance to the DNA algorithm used by Other structural components of the ciliate chro- Adleman (1994) to solve a seven-city instance of matin presumably play a significant role, but the the Directed Hamiltonian Path problem. Section exact details of the mechanism are still unknown. Fig. 2. Overview of gene unscrambling. Dispersed coding MDSs 1–7 reassemble during macronuclear development to form the functional gene copy (top), complete with telomere addition to mark and protect both ends of the gene. L.F. Landweber, L. Kari / BioSystems 52 (1999) 3–13 5 Fig. 3. A ciliate molecular computer? Correct gene assembly in Stylonychia (inset) (Lynn and Corliss, 1991) requires the joining of many segments of DNA guided by short sequence repeats, only at the ends. Telomeres, indicated by thicker lines, mark the termini of correctly assembled gene-sized chromosomes. Note the striking similarities to DNA computations that specifically rely on pairing of short repeats at the ends of DNA fragments, as in the experiment of Adleman (1994). 1.2. The path towards unscrambling the IESs and reorder the MDSs. For example, the DNA sequence present at the junction between Typical IES excision in ciliates involves the MDS n and the downstream IES is generally the removal of short (14600 bp) A–T rich se- same as the sequence between MDS n+1 and its quences, often released as circular DNA molecules upstream IES, leading to correct ligation of (Tausta and Klobutcher, 1989). The choice of MDS n to MDS n+1 over a distance (Table 1). which sequences to remove appears to be mini- However, the presence of such short repeats (aver- mally ‘guided’ by recombination between direct age length four bp between non-scrambled MDSs, repeats of only 2–14 bp (Table 1). nine bp between scrambled MDSs (Prescott and Unscrambling is a particular type of IES re- Dubois, 1996)) suggests that although these guides moval in which the order of the MDSs in the MIC are necessary, they are certainly not sufficient to is often radically different from that in the MAC. guide accurate splicing. Hence, it is likely that the For example, in the micronuclear genome of repeats satisfy more of a structural requirement Oxytricha no6a, the MDSs of a-telomere binding for MDS splicing, and less of a role in substrate protein (a-TP) are arranged in the cryptic order recognition. Otherwise, incorrectly spliced se- 1–3–5–7–9–11–2–4–6–8–10–12–13–14 rela- quences (the results of promiscuous recombina- tive to their position in the ‘clear’ macronuclear tion) would dominate, especially in the case of sequence 1–2–3–4–5–6–7–8–9–10–11–12– very small (2–4 bp) repeats that would be present 13–14. This particular arrangement predicts a thousands of times throughout the genome. This spiral mechanism in the path of unscrambling, incorrect hybridization could be a driving force in which links odd and even segments in order (Fig. the production of newly scrambled patterns in 4; Mitcham et al., 1992). evolution. However during macronuclear develop- Homologous recombination between identical ment only unscrambled molecules that contain 5% short sequences at appropriate MDS–IES junc- and 3% telomere addition sequences would be selec- tions is implicated in the mechanism of gene tively retained in the macronucleus, ensuring that unscrambling, as it could simultaneously remove most promiscuously ordered genes would be lost. 6 L.F. Landweber, L. Kari / BioSystems 52 (1999) 3–13 Table 1 O. trifallax DNA polymerase a (data modified from Hoffman and Prescott, 1997) 5% MDS/IES junction sequence MDS3% MDS/IES junction sequence Number repeats in Mac sequence* 5% Telomere addition site 1 AGATA 8 AGATA 2 ATT * ATT 3 ATA * ATA 4 ATGATGAGTGGAAT 1 ATGATGAGTGGAAT 5 AACAGAAC 1 AACAGAAC 6 AGAAATATG 1 AGAAATATG 7 n.d. n.d. 9 TTATCATT 2 TTATCATT 10 AAAATAAT 1 AAAATAAT 11 GTTTCTTG 1 GTTTCTTG 12 ATGCAAA 1 ATGCAAA 13 TAAAATGA 1 TAAAATGA 14 AGAGGAG 1 AGAGGAG 15 TAATGATGG 1 TAATGATGG 16 ATGGTGAG 1 ATGGTGAG 17 AAAATCAA 3 AAAATCAA 18 AAAGCATGCTTG 1 AAAGCATGCTTG 19 GATTTCAAGAAAA 1 GATTTTAAGAAAA 20 GTTACTCTTG 1 GTTACTCTTG 21 GCTCAATAAAA 1 GCTCAATAAAA 22 ATCTTG 2 ATCATG 23 AAAACTT 1 AAAACTT 24 GAGAGATAGA 1 GAGAGATAGA 25 TAGTTGCTC 1 TAGTTGCTC 26 AAGCTAGATTTT 1 AAGCTAGATTTT 27 GGAGGATC 1 GGAGGATC 28 CAAGATAA 1 CAAGATAA 29 GTTCAACT 1 GTTCAACT 30 ATAAGACTTTGATGA 1 ATAAGACTTTGATGA 31 CTAATGAA 1 CTAATGAA 32 n.d. n.d 36 CTTGAGAT 1 CTTGAGAT 37 AAAGTAGTTTAG 1 AAAGTAGTTTAG 38 CACTTTCAA 1 CACTTTCAA 39 ATGAAAAATAA 1 ATGAAAAATAA 40 CCTTGGATCA 1 CCTTGGATCA 41 AAGAGTGAAT 1 AAGAGTGAAT 42 TGAACAACTTT 1 TGAACAACTTT 43 GTGCTTAG 1 GTGCTTAG 44 n.d. n.d 49 ATAAAA 4 ATAAAA 50 AT * AT 51 3% Telomere addition site * The number of occurrences of di- and trinucleotides was not determined since they would be extremely represented in any gene sequence. Note the values shown in this column only represent the number of occurrences of these sequence motifs in the macronuclear copy of the gene. There are also several occurrences of these repeats throughout the non-coding portion of the micronuclear copy of this gene as well as throughout the entire genome; each such occurrence offers the opportunity for incorrect pairing, which would lead to the production of ‘dead-end’ copies of the gene.