<<

DNA Computing Based on Splicing

Universality Results

Erzseb et CSUHAJVARJU

Computer and Automation Institute Hungarian Academy of Sciences

Kende u Budap est Hungary



Rudolf FREUND

Institute for Languages Technical UniversityWien

Resselgasse Wien Austria



Lila KARI

Department of Mathematics and

UniversityofWestern Ontario London Ontario NA B Canada



Gheorghe PAUN

Institute of Mathematics of the Romanian Academy

PO Box Bucuresti Romania

Abstract The pap er extends some of the most recently obtained results on

the computational universality of sp ecic variants of H systems e g with

regular sets of rules and proves that we can construct universal

based on various typ es of H systems with a nite set of splicing rules as well

as a nite set of axioms i e weshow the theoretical p ossibility to design

programmable universal DNA computers based on the splicing op eration For

H systems working in the multiset style where the numb ers of copies of all

available strings are counted we elab orate howa computing

a partial recursive function can b e simulated by an equivalent H system com

puting the same function in that way from a universal Turing machine we

obtain a universal H system

Considering H systems as language generating devices wehav e to add vari

ous simple control mechanisms checking the presenceabsence of certain sym

b ols in the spliced strings to systems with a nite set of splicing rules as well

as with a nite set of axioms in order to obtain the full computational p ower

i e to get a characterization of the family of recursively enumerable languages

We also intro duce test tub e systems where several H systems work in parallel

Research supp orted by grant T of the Hungarian Scientic ResearchFund

OTKA



All corresp ondence to this author



Research supp orted by grants OGP and OGP of the National Science

and Engineering Research Council of Canada



Research supp orted by the Academy of Finland pro ject

in their tub es and from time to time the contents of each tub e are redistributed

to all tub es according to certain separation conditions By the construction of

universal test tub e systems weshow that also such systems could serveasthe

theoretical basis for the development of biological DNA computers

Keywords DNA splicing grammar systems H systems test tub es Turing

machines universal computing

Intro duction

One of the recently intro duced paradigms which promises to have a tremendous

inuence on the theoretical and practical progress of computer science is DNA

computing One main step in making it so interesting was the announcement

of solving a small instance of the Hamiltonian path problem in a test tub e

just by handling DNA sequences yet actually the universality of Adlemans

way for computing using DNA still seems to b e not settled theoretically in a

satisfactory way

Another trend in DNA computing is based on the recombinant b ehaviour

of DNA double stranded sequences under the inuence of restriction enzymes

and lygases This approach started with where the op eration of splicing

was intro duced as a mo del for this phenomenon One of the imp ortant results

in the area of H systems states that H systems with nite sets of axioms and

nite sets of splicing rules can generate only regular languages However

if we use a regular set of splicing rules of a very particular typ e a maximal

increase of the generativepower is obtained such H systems characterize

the family of recursively enumerable languages Working with innite

regular sets of splicing rules is natural from a mathematical p oint of view but

unrealistic from a practical p ointofview How to obtain H systems with both

the set of axioms and the set of splicing rules being nite but stil l being able

to reach the ful l power of Turing machines In view of the results in

w ehavetopay the reduction of the sets of rules from b eing regular to b eing

nite One way whichiswellknown from formal language theory to do

this is to regulate the use of the splicing rules by suitable control mechanisms

This idea has b een explored in where computationally universal classes of

nite H systems are obtained by asso ciating permitting or forbidding context

conditions to the splicing rules a splicing rule can b e used only when a certain

favourizing symb ol a catalyst a promotor is present in the strings to b e

spliced resp ectively when no inhibitor from a sp ecied nite set of symbols

is present

Another p owerful idea able to increase the p ower of H systems with nite

sets of axioms and nite sets of splicing rules is to countthenumb er of copies

of each used string In it is proved that extended H systems working in the

multiset style are able to characterize the recursively enumerable languages

Here we extend this theorem and its pro of in a way closer to the computing

framework i e we consider Turing machines as devices for computing partial

recursive functions The work of sucha Turing machine can b e simulated in

a natural wayby an H system for which the string originally written on the

Turing machine tap e is supp osed to app ear in only one copy whereas all the

other strings are available in arbitrarily many copies Moreover in this way

the obtained H system need not to b e extended due to the working styles

of the Turing machines we consider Hence we here nd interesting details

imp ortant from practical p oints of view

A new idea is to consider a parallel communicating architecture as in

grammar system several test tub es work in parallel splicing their con

tents communicating by redistributing their contents in a way similar to the

op eration of separating the contents of a tub e the contents of a tub e

are redistributed to all tub es according to certain sp ecied separation con

ditions Again we obtain a characterization of the recursively enumerable

languages

From the existence of universal Turing machines and from the pro ofs

of all the results mentioned ab ove for dierenttyp es of H systems and for

each of these typ es we obtain a way to construct a universal H system of the

corresp onding typ e This can b e interpreted as a pro of for the theoretical

p ossibility to construct universal programmable DNA computers based on the

splicing op eration

Denitions for H systems

We use the following notations V is the free monoid generated bytheal



phab et V is the empty string V V fg FINREGRE are the

families of nite regular and recursively enumerable languages resp ectively

For general formal language theory prerequisites we refer to for regulated

to and for grammar systems to

Denition A splicing scheme or H scheme isapair V R where

V is an alphab et and R V fg V fg V fg V are sp ecial symbols

not in V V is the alphabet of and R is the set of splicing rules For

x y z w V and r u u u u in R we dene x y z wifand

   r

only if x x u u x y y u u y and z x u u y w y u u x for

         

some x x y y V For a splicing scheme V R and for any language

 

we write L V

Lfz V j x y z worx y w z for some x y L r Rg

r r

S

i  i i

and we dene L L where LL and L L

i

i

L for all i

Denition An H system is a pair A where V Risa

splicing scheme and A V is the set of axioms An extended H system is a

quadruple V T R A where V R A is the underlying H system

and T V is the terminal alphab et The language generated by the extended

A T H system is dened by L

For two families of languages F F an extended H system V T R A



with A F and R F is said to b e of type F F and we denote the family

 

of languages generated by extended H systems of that typ e by EH F F



In the denitions ab ove after splicing two strings x y and obtaining two

strings z and w wemay again use x or y they are not consumed by splicing

as a term of a splicing p ossibly the second one b eing z or w moreover also

the new strings are supp osed to app ear in innitely many copies Probably

more realistic is the assumption that at least a part of the strings is available

in a limited numb er of copies This leads us to consider multisetsiesets

ts with multiplicities asso ciated to their elemen

In the style of a multiset over V is a function M V N fg

M x is the numb er of copies of x V in the multiset M All the multisets

we consider are supp osed to b e dened by recursive mappings M The set

fw V j M w g is called the supp ort of M anditisdenotedby suppM

A usual set S V is interpreted as the multiset dened by S x forx S

and S x for x S

For twomultisets M M we dene their union by M M xM x

 

M x and their dierence byM M xM x M xx V

  

provided M x M x for all x V Usuallyamultiset with nite



supp ort M is presented as a set of pairs x M x for x suppM

Denition An extended mH system is a quadruple V T R A

where V T R are as in an extended H system Denition and A is a multiset

over V

For such an mH system and twomultisets M M over V we dene



M M i there are x y z w V suchthat



i M x M y and if x y then M x

ii x x u u x y y u u y

    

z x u u y w y u u x

    

for x x y y V uu u u R

    

iii M M fx g fy g fz g fw g



At p oint iii wehave op erations with multisets

The language generated by an extended mH system is

Lfw T j w suppM for some M such that A M g

where is the reexive and transitive closure of

For two families of languages F F an extended mH system



V T R A is said to b e of typ e mF F ifsupp A F and R F

 

we denote the family of languages generated by such extended mH systems of

typ e mF F by EH mF F

 

An extended H system as in Denition can b e interpreted as an mH

system working with multisets of the form M x for all x suchthat

M x

Universal computing with mH systems

In this section weshowhow even nonextended H systems with multisets

together with suitable strategies for selecting the nal strings can simulate

arbitrary computations with Turing machines We will assume that the H

system V R A really starts to work only if one additional single

string is added to A In the sense of multisets we take exactly one copy

of this starting string whereas all the other strings in A are assumed to b e

available unb oundedly hence it is sucient to sp ecify the strings to app ear as

axioms without their common multiplicity

In this mo del wenow can consider dierent p ossibilities for selecting the

result of the computation

We takeevery string w that is contained in a sp ecial regular language

i e weuseintersection with regular languages e g T

Wetakeevery string w that cannot b e pro cessed anymorewe call such

strings terminating

We takeevery string w not in A that has reached a steady state i e

there exist rules in R that can b e applied to w but they all still yield w

again

We will use the following mo del of a deterministic Turing machine which

is equivalent to all the other mo dels app earing in literature as the mo del of a

mechanism dening computability

A deterministic Turing machine M is an tuple Q q q VTZ B

 f 

where Q is the nite set of states q is the initial state q is the nal state V

 f

is the nite alphab et of tap e symbols T V is the set of terminal symb ols

Z V is the left b oundary symb ol V V fZ g B V is the blank

   

symbol Q V Q V fL Rg is the transition function with the

following restrictions the fact p Y D q X will b e expressed bythe

relation q X p Y D

q X p Y L implies X V i e X Z

 

q Z pYD DfL Rg implies Y Z and D R

 

q X p Z R implies X Z

 

whereas q X p Y D D fL Rg implies q q

f

for all q Q fq g and all X V there is some q X p Y D in

f

The Turing machine M works on a semiinnite tap e with left b oundary

marker Z An instantaneous description of that tap e during a computation



of M is of the form Z uqvB with u v V and q Q which describ es the



situation that M is in state q the head of the Turing machine lo oks at the

rightmost symbol of Z u and the contents of the tap e is Z uv B where the

 

notation B just describ es the fact that to the rightwe nd an innite number

of blank symb ols B

The eect of a transition sp ecied by on sucha conguration Z uq v B



is dened as follows

 

Applying q X p Y L yields Z u pY v B from Z u XqvB

 

   

Applying q X p Y R yields Z u YUpv B from Z u XqUv B

 

It is wellknown that such a mo del for deterministic Turing machines as

dened ab ove is one of the general mo dels for computabilityie

for every computable partial function f T T there exists a Turing

machine M suchthat

f

if f w w T is dened M started with the initial conguration

f

Z wq B computes f w by ending up in the nal conguration

 

Z f w q B and

 f

if w T is not contained in dom f the domain of the function f

then M started with the initial conguration Z wq B never halts

f  

i e M never enters the nal state q

f f

for each alphab et T there exists a universal Turing machine M ie

UT

every Turing machine M with terminal alphab et T can b e enco ded as



a string C M fc c g in suchaway that M starting with the

 UT

initial conguration Z C M c wq B w T halts in the nal state q

   f

if and only if f w is dened where c c c are new symb ols and f

M   M

is the partial function induced bytheTuring machine M and moreover

the nal conguration is Z f w q B

 M f

Theorem Let f T T beapartial recursive function and let

M Q q q VTZ B be a deterministic Turing machine computing f

f  f 

i e for every w T starting with the initial conguration Z wq B M

  f

halts with the nal conguration Z f w q B if w domf and never halts

 f



otherwise Then we can eectively construct an H system V AR

f

which also computes f in the fol lowing way For an arbitrary w T the

Z wq Z if w domf computation with on w is initialized by the string

  f

then nal ly the string Z f w q Z wil l be derivable and also be terminating

 f

no other string can be obtainedfrom this string any more if w domf

no terminating string is derivable



Proof From M we construct in the following way V AR

f f f

with



V V fZ Z Z gfr r j r g

 

and A A A where



A fZ UpZ j for some q Q q Z pZ R U V p Qg

    

fr YUp r j r q X p Y R UXY V pq Qg



fr pY r j r q X p Y L XY V pq Qg



fZ qB Z j q Q fq gg fZ q Z g

 f  f 

A fZ qU Z j for some q Q q Z pZ R U V p Qg

     

fr XqU r j r q X p Y R UXY V pq Qg



fr Xq r j r q X p Y L XY V pq Qg



fZ qZ j q Q fq ggfZ q BZ g

 f  f 

The set R contains the following splicing rules

Z qU C Z UpZ for q Z pZ R U V

     

p q Q C V fZ g



DX q U C r YUpr for r q X p Y R UXY V



p q Q C V fZ g D V



D XqU r r YUpC for r q X p Y R UXY V



g D V p q Q C V fZ



DX q C r pY r for r q X p Y L XY V



p q Q C V fZ g D V



D Xq r r pY C for r q X p Y L XY V



p q Q C V fZ g D V



U qZ Z qB Z for q Q fq g and U V

 f

D q B Z q Z for D V

f  f 

Dq Z Z q B C for D V C fB Z g

f   f

w w for all w A

Any conguration Z uqvB in M is represented by some nite string



m

Z uq v B Z for some m in the H system The only terminating

 f

string not in A obtained by using the rules from R from an initial string

Z wq Z is the nal string Z f w q Z provided w domf b ecause of the

   f

rules in group this nal string even is the only string that cannot b e derived

any more For w domf no terminating string is derivable from the initial

string Z wq Z

 

If wewant to select the nal strings via the steady state condition in

the pro of of the preceding theorem weonlyhave to replace group by group





q Z q Z

f f

Moreover wehave to add q Z to A Then we obtain another metho d for

f

selecting the nal string conguration i e by taking exactly those strings

q Z will b e the only string in not in A that reach a steady state obviously

f

A also fullling the steady state condition

Another way to select the nal strings congurations is to apply the

regular lter fZ g T fq Z g

 f

The pro of of the preceding theorem could also b e extended in suchaway

that from the nal conguration Z f w q Z we could obtain f w itself as

 f

the nal string that cannot b e pro cessed any more

The following result proving the universality of H systems with multisets

with resp ect to computability is based on the existence of universal Turing

machines and can b e proved by using similar techniques as in the pro of of

the previous theorem

Theorem Let T beanarbitrary alphabet Then we can eectively con

struct an H system V A R with T V such that can compute

every partial recursive function f in the fol lowing way Started with the initial

string Z C M c wq Z where C M is the code of a deterministic Turing

 f   f

machine M realizing f f T T for w domf computes the termi

f

nating string Z f w q Z whereas for w domf no terminating string not

 f

in A is derivable

The result ab ove can again b e obtained for other selection strategies to o

as already elab orated after the pro of of Theorem

The generativepower of H systems

From the results proved in and in we can deduce

Theorem

EH FINFINEH F F I N RE G

for al l F with FIN F RE G

EH F I N RE GEH F F RE



for al l F F with FIN F RE RE G F RE

 

Theorem EH mF I N F I N EH mF F RE for al l fami



lies F and F such that FIN F RE and FIN F RE

 

A natural way to regulate the application of the splicing rules is to use

context conditions as in random context grammars asso ciate sets of sym

b olsstrings to rules and use a rule only when the asso ciated symb olsstrings

are present in the currently spliced strings p ermitting contexts resp ectively

when they are not present forbidden contexts These forbidden contexts can

be interpreted as inhibitors of the asso ciated rules and they can b e checked

manually as in

Denition An extended H system with permitting contexts is a qua

druple V T R A where V T A are as in Denition and R is a set of

triples we call them rules with p ermitting contexts of the form p r C C



with r u u u u where u u u u is a splicing rule over V and

     



C C are nite subsets of V For x y z w V and p R we dene



x y z w if and only if x y z w every string contained in C

p r

app ears as a substring in x and every string contained in C app ears as a



substring in y of course when C or C then this imp oses no



restriction on the use of the rule p The language generated by is dened

in the natural way and the family of languages L for V T R Aas

ab ove with A F and R having the set of strings u u u u C C in the

   

rules with p ermitting contexts in a family F is denoted by EH F cF

 

Denition An extended H system with forbidden contexts is a qua

druple V T R A where V T A are as in Denition and R is a set of

triples we call them rules with forbidden contexts of the form p r D D



with r u u u u where u u u u is a splicing rule over V and

     



D D are nite subsets of V For x y z w V and p R as ab ove



we dene x y z w if and only if x y z w no string contained

p r

in D app ears as a substring in x and no string contained in D app ears as



a substring in y of course when D or D then this imp oses no



restriction on the use of the rule p The language generated by is dened

L for V T R Aas in the natural way and the family of languages

ab ove with A F and R having the set of strings u u u u D D in the

   

rules with forbidden contexts in a family F is denoted by EH F fF

 

Theorem EH FINrFIN EH F rF RE for al l families



F F such that FIN F RE FIN F RE and for each r fc f g

 

The results stated ab ove in this section prove that the considered typ es of

H systems with a nite numb er of rules as well as a nite numberofaxiomsare

already computationally complete Yet in order to prove that programmable

computers based on splicing can b e constructed it is necessary to nd universal

H systems

Denition Given an alphab et T and two families of languages F F



a construct V TR A where V is an alphab et A F and

U U U U U U

R V fg V fg V fg V R F issaidtobea universal H system

U U 

U U U U

of typ e F F if for every V T R Aoftyp e F F there is a language

 

 

A A F and LL where V TR A A A such that

U U U U

U U

The particularizations of this denition to mH systems or to H systems with

p ermitting resp ectively forbidden contexts are obvious

Theorem For every given alphabet T there exist mH systems of

type mFIN F IN extended H systems of type FINFIN with permitting

respectively forbidden contexts that are universal for the class of mH systems

respectively the class of extended H systems with permitting respectively forbid

den contexts with the terminal alphabet T

Test tub e systems

In this section weinvestigate a new mo del for biological computers that incor

p orates basic ideas of parallel communicating grammar systems

Denition A test tube TT for short scheme of degree n n is

a construct V R F R F where V is an alphab et F V

n n i

and R V fg V fg V fg V for each i i n

i

EachpairR F is called a component of the scheme or a tube R

i i i

is the set of splicing rules of the tub e i F is the selector of the tub e i

i

The pair V R is the underlying H scheme asso ciated with the com

i i

 

p onent i of the system For arbitrary ntuples L L L L

n

n

  

L L V i n we dene L L L L if and only if

i n

i n

S

n



L F L for all i i n

j i

i j  j

In words the contents of each tub e are spliced according to the asso ciated

set of rules we pass from L to L i n and the result is re

i i

distributed among the n tub es according to the selectors F F When a

n

string b elongs to several languages F then copies of it will b e distributed to

i

all tub es i with this prop erty

Denition ATT system of degreen n is a construct

V R F R F A A

n n n

where V R F R F is the underlying TT scheme and A

n n i

i n is the set of axioms of the ith comp onent i e of tub e i Considering

the iterated application of the scheme to the initial conguration A A

n

k

i e A A by convention we obtain the language generated byaTT

n

system as the result of all computations in tub e i e

L fw V j w L for some L L with

n

o

k

L L A A for some k

n n

In the following we will restrict ourselves to very sp ecial selectors F i e

i

of the form V where the V V i n In this sense selecting the results

i

i

in the rst tub e corresp onds to the intersection with a terminal alphab et T in

V T extended H systems bycho osing

Denition Given two families of languages F and F a TT system



V R V R V A A with A F and R F for

m n i i 

m

all i i m is said to b e of typ e F F The family of all languages



generated by such TT systems is denoted by TT F F



Theorem TT FINFIN TT F F RE for al l families F



and F such that FIN F RE and FIN F RE

 

Denition Auniversal TT system for a given alphab et T is a construct

V R T R V R V A A A

U U U U nU U U nU

U nU

with the following prop erty If wetake an arbitrary TT system then there



is a set A V such that L L for the TT system



U U



T R V R V A A A A V R

U U U nU nU U U  nU U

U

Theorem For every given alphabet T there exists a universal TT

system of type FINFIN

Concluding remarks

The most signicant of the results we obtained is the existence of universal H

systems of various typ es This theoretically proves the feasibility of designing

programmable universal DNA computers where a program consists of a single

string to b e added to the axiom set of the universal computer In the particular

case of mH systems these program axioms havemultiplicity one while an

unb ounded numb er of copies of all the other axioms is available

Although in this pap er wehave shown the theoretical p ossibilityhowto

obtain universal biological computers based on the splicing op eration many

questions remain op en For instance from a theoretical p oint of view an

interesting problem is to determine the minimal numb er of test tub es needed

for a universal test tub e system with resp ect to a given alphab et From a

practical p oint of view the real implementation of suchtesttubesystems

or other universal H systems intro duced in the preceding sections is a most

challenging task that has to b e done in a joint team of theorists biologists and

practitioners of DNA computing the presentation of this pap er should also b e

seen as a starting p oint for discussions on such topics b etween formal language

theorists and collegues working in other elds

References

L M Adleman Molecular computation of solutions to combinatorial problems

Science Nov

L M Adleman On constructing a molecular computer Manuscript in circula

tion January

E Csuha jVarju J Dassow J Kelemen Gh Paun Grammar Systems A

Grammatical Approach to Distribution and Cooperation Gordon and Breach

London

E Csuha jVarju L Kari Gh Paun Test tub e distributed systems based on

splicing manuscript

J Dassow Gh Paun RegulatedRewriting in Formal Language Theory

SpringerVerlag Berlin Heidelb erg

S Eilenberg Automata Languages and Machines Vol A Academic Press

New York

R Freund L Kari Gh Paun DNA computing based on splicing The existence

of universal computers Techn Rep ort FR TU Wien Institute for

Computer Languages

T Head Formal language theory and DNA An analysis of the generative

capacity of sp ecic recombinant b ehaviors Bul l Math Biology

R J Lipton Sp eeding up computations via molecular biologyManuscript in

circulation Decemb er

Gh Paun Regular extended H systems are computationally universal J In

form Process Cybern EIK to app ear

D Pixton Regularity of splicing languages Discrete Appl Math

A M Turing On computable numb ers with an application to the Entschei

dungsproblem Proc London Math Soc Ser

A Salomaa Formal Languages Academic Press New York