In Search of an Evolutionary Coding Style
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Chalmers Research In search of an evolutionary coding style Downloaded from: https://research.chalmers.se, 2019-09-07 22:09 UTC Citation for the original published paper (version of record): Lundh, T. (2000) In search of an evolutionary coding style [Source Title missing], 00(03) N.B. When citing this work, cite the original published paper. research.chalmers.se offers the possibility of retrieving research publications produced at Chalmers University of Technology. It covers all kind of research output: articles, dissertations, conference papers, reports etc. since 2004. research.chalmers.se is administrated and maintained by Chalmers Library (article starts on next page) In search of an evolutionary coding style TorbjÄorn Lundh¤ Department of Mathematics S-412 96 GÄoteborg, Sweden [email protected] of the evolved program codes in his avida set up. He Keywords: style, code, evolution, arti¯cial life, emer- said something like this: gence, complexity, DNA, Avida, stylometry, GRASP, Halstead's complexity. \The codes that are evolved will eventually be almost totally unreadable. Things are never used only once, but two or more times. Abstract It is a kind of a `madman's' code." In the near future, all the human genes will be iden- In recent years, a lot of examples have been found ti¯ed. But understanding the functions coded in the where genes have been \reused" for di®erent purposes genes is a much harder problem. For example, by during development. As an example, take the runt using block entropy, one has that the DNA code is gene in Drosophila, which is used in sex determina- closer to a random code then written text, which in tion, segmentation and central nervous system cre- turn is less ordered then an ordinary computer code; ation. In [7] the authors write: see [17]. \As mature organisms we are composed of Instead of saying that the DNA is badly written, an astonishing array of diverse cell types| using our programming standards, we might say that all derived from a single-celled zygote. it is written in a di®erent style | an evolutionary When faced with the task of generating such style. cellular diversity in a reproducible fashion, We will suggest a way to search for such a style how has the embryo chosen to respond? Re- in a quanti¯ed manner by using an arti¯cial life pro- cent work in a number of developmental sys- gram, and by giving a de¯nition of general codes and tems has suggested that the embryo has em- a de¯nition of style for such codes. ployed two approaches. First, given ¯nite resources, the embryo has e±ciently chosen 1 Background to reutilize a limited set of proteins in di®er- ent temporal and spatial contexts to create Let us as a background cite three di®erent sources. cellular diversity. Second, the embryo has The ¯rst is J. Madox's comment on page 376 in also chosen to install molecular redundan- [11]. cies to ensure the reproducibility of these patterns from individual to individual." \In genetics for example, the task of under- standing the functions of all the 100,000 hu- How can we capture these comments about the man genes will require a much greater ef- style or quality of the computer code and the DNA, fort than that involved in their identi¯ca- in a quanti¯ed manner? Can we do that in such a tion, and by a factor 10 or more." general manner that we will be able to use analogous quality measure both for carbon{ and silicon based Chris Adami from Caltech made, in his survey talk genetic codes? at Renaissance Technologies in Stony Brook 10/27/98 about arti¯cial life, a brief remark about the quality 1.1 Plan of the paper ¤This work was done at SUNY Stony Brook under the sup- port (408003356-8) from the Swedish Natural Science Research We give a straightforward but general de¯nition of Council, NFR. a \code", a general de¯nition of the \style" of such Stony Brook IMS Preprint #2000/3 March 2000 2 TorbjornÄ Lundh codes using a given set of measures. We will also Finally I would like to thank D. Brander for play- suggest such measures of characteristic features, such ing squash with me and for trying to teach me some as some type of \madness", the robustness, and the grammar. (All the remaining language errors are nat- amount of reuse, of such codes. Finally, we will make urally completely my own fault.) some \baby{experiments" by running the avida pro- gram and analyze samples of the evolved code, gen- erate two simulated programs in the same function 2 A general code class, and eventually stylistically compare the real evolved code with these simulated codes, all of this We will give a general de¯nition of a code as a string will be done in a C++ program. Some results will be of generalized \letters" from a given alphabet, that graphically displayed at the end. when interpreted, will de¯ne a function. This inter- pretation is not unique, i.e. many di®erent codes will 1.2 Future goals produce the same function. That fact will give us a way to study classes of codes. That is, two codes are One could then more systematically run the avida in the same class if their interpretation gives equiva- program (or something similar) and analyze the lent functions. evolved codes stylistically with more realistic, and a higher number of comparasion codes. By changing 2.1 Codes parameters for the set up in avida, one might even- tually capture some common features. By comparing We will give a de¯nition of a code by using an under- the carbon based programming style and the silicon lying alphabet, and a generalized interpretor. based style, it might be possible to ¯nd some common parts that would describe the natural programming De¯nition 2.1 Let us de¯ne a code to be a ¯nite style for evolution. That would in turn help us read string of \letters" taken from an \alphabet", , A carbon based code. And it would also give us some k Code = ®i ; where ®i ; hints how to create more robust computer programs, f gi=1 2 A by looking at how nature has solved such problems. such that when the code is interpreted, the code will represent a well de¯ned function (or a process), 1.3 Questions Codej fj , with a domain Df such that for all in- ! j Is there an existing way to study the style or quality puts, x Dfj , to the interpretation of the code, will 2 1 of the DNA? And is there any existing theory in the give fj(x) as the output . information sciences that deals with this on the sili- con side? We are aiming at a complexity level higher 2.2 Classes of codes than the usual information theory measures, such as the notion of entropy (see the comment in Section From the above de¯nition of codes, we see that di®er- 6.2). ent codes can have the same function representation, see Example 2.5 below. 1.4 Acknowledgement Let us therefore introduce the following classi¯ca- tion. Let the code Codef and the codes Codefi have I would very much like to thank professor G. Thom- the functions f and fi as their representations. sen for listen to my ideas and for his constant encour- agement. I would also like to thank him for making De¯nition 2.2 Let the code class with respect to the me feel so much at home in his Xenopus lab, and function f be the following (in¯nite) set of codes. teaching me the basics in frog handling and cultural f = Codef : fi(x) = f(x); for all x Df : behaviour in the life sciences. C f i 2 g I would also like to thank Dr D. Slice for his interest and all his help and suggestions. Remark 2.3 If the interpretor is not able to produce B. Cohen has been a good source for me concerning a well de¯ned function from the code Code, we say that Code is in the error class ". That (huge) class the existing computer code complexity measures. He C has also been a fruitful discussion partner. can be viewed as the complement to all interpretable I am grateful to S. Sutherland for reading an earlier codes. version of this paper and giving me useful comments 1Note that we here consider a function in a general sense, and pointing out errors and weaknesses. i.e. not necessarily numerical. In Search of an Evolutionary Coding Style 3 Question 2.4 With the suggested norm of f below, of codes in f . If a given measure has a range outside C C is f compact? Do there exist extremal codes? [0; 1], let us use the transformation C x Example 2.5 Let x ! 1 + x 1 f(x) = ; x 2 to make it ¯t into [0; 1]. ¡ We de¯ne the following scalar measure. x+2 2 if x = 2 g(x) = x ¡4 6 ¡ ½ 1 if x = 2 ºw(Codeg) = w ¹(Codeg); ¡ 4 ¡ ¢ and where w is a normalized weight vector such that 1 4 if x = 2 w = 1, for some norm . For example, let ¡ 1 ¡ jj jj jj ¢ jj 8 2 if x = 0 > ¡ n 1 h(x) = > 1 if x = 1 p > ¡ p < 1 if x = 3 w = w p = wi ; jj jj jj jj µ j j ¶ 1 Xi=1 > 2 if x = 4 > :> Note that Codeg and Codef are both in h; and f = where p 1.