An Approach to Biological Hideaki Suzuki ATR Human Computation: Unicellular Processing Research Core-Memory Creatures Evolved Laboratories 2-2 Hikaridai Seika-cho Using Genetic Soraku-gun Kyoto 619-0288 Japan [email protected]

Keywords Abstract A novel machine language genetic programming core memory, unicellular creature, system that uses one-dimensional core memories is proposed membrane, biological computation, algorithmic complexity, machine lan- and simulated. The core is compared to a biochemical guage genetic programming, genetic reaction space, and in imitation of biological molecules, four algorithms types of data words (Membrane, Pure data, Operator, and Instruction) are prepared in the core. A program is represented by a sequence of Instructions. During execution of the core, Instructions are transcribed into corresponding Operators, and Operators modify, create, or transfer Pure data. The core is hierarchically partitioned into sections by the Membrane data, and the data transfer between sections by special channel Operators constitutes a tree data-flow structure among sections in the core. In the experiment, genetic algorithms are used to modify program information. A simple machine learning problem is prepared for the environment data set of the creatures (programs), and the fitness value of a creature is calculated from the Pure data excreted by the creature. Breeding of programs that can output the predefined answer is successfully carried out. Several future plans to extend this system are also discussed.

1 Introduction

Recent approaches for the designing of an automatic programming system in imitation of biological evolution are based on the notion that during the long history of evolution, some lineage of living things has increased the degree of complexity that is defined as the number of functional genes in a living cell. For example, higher such as mammals are expected to have about 50,000 genes in each cell, which is about 10 times more than the number of genes that a yeast cell has. Of course it is controversial whether or not “functional” or “structural” complexity increases at all in evolution [29], and yet so-called “genomic” complexity which we may view as the number of genes, has clearly increased during evolution [13]. If one could clarify the mechanisms (the necessary and sufficient conditions) that have facilitated this growth of complexity, we might be able to devise a computational system that can increase algorithmic complexity by implementing those clarified mechnisms. This is a strong motivation for many researchers in genetic programming, and with the aim of implementing such a system, various schemes have been proposed and tested [1–3, 5, 11, 26–28, 35, 38–40, 45]. Here I focus on two groups of studies that have much relevance to this article.

c 2000 Massachusetts Institute of Technology Artificial Life 5: 367–386 (1999)

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

The first group is studies on machine language genetic programming. Several sys- tems have been proposed in this area. Nordin [34, 35] devised the compiling genetic programming system (CGPS), which directly manipulates the machine code on a Sun SPARC station. Ray [38–40] proposed the famous Tierra system, which uses a core memory for breeding self-replicating programs. The author [44, 45] and Huelsbergen [23, 24] independently proposed machine language genetic programming systems that use genetic algorithms (GAs) [21, 22, 32, 33, 43, 46, 49, 50] to evolve programs. The most successful system among them is Tierra, which accomplished the emergence of higher functions such as parasitism between programs. Tierra is the first system that demonstrated that programs can experience a kind of open-ended evolution under an appropriate environment. Since this system was proposed, a number of different ap- proaches have been taken to extend Tierra to a more advanced system that can evolve programs with much algorithmic complexity. One of them is Ray and Hart’s Network Tierra [41]. In this system, the tierran core was extended to a vast memory space that consists of a large number of computers distributed throughout the world. Several interesting phenomena are observed in this system; however, it seems that creating programs with much algorithmic complexity has not yet been accomplished therein. Another approach to extend Tierra was taken by Adami and Adami and Brown [1–3]. The Avida system devised by them demonstrated how quite complex programs can be evolved in the Tierra-type architecture. The second group that is relevant to this article comprises approaches to chemi- cal computation in some mathematical medium. Fontana [20] proposed Algorithmic chemistry (ALChemy), which manipulates Lisp trees as objects (molecules) and allows combinations of trees as reactions between objects. A similar approach was recently taken by Szuba [51] who designed a chemical-reaction-like system that proceeds with Prolog inference. Rasmussen and colleagues [37, 36] devised a core memory system in which core words react with one another and change their inner codes. Banzhaf and colleagues [8–10, 18, 19] introduced a kind of information object that catalyzes the change of another object. They expressed the objects by binary strings and made them work not only as “operands” but also “operators” of computation. These stud- ies succeeded in inducing so-called catalytic networks, in which reaction arrows are connected to each other and constitute intricate topology-like loops. However, from the viewpoint of automatic programming, the functions achieved in these systems are still unsatisfactory. Compared to these systems, the functions of real organisms are much higher and more complicated. Processes of a biological system, molecular re- actions in a cell, are not simple chemical reactions. They are biochemical reactions catalyzed by enzymes that are created from genetic information that has evolved over three billion years. To create a computational system that can evolve complex func- tions, we might have to make a computational system that imitates biological systems more elaborately. Here, I propose a novel evolutionary programming system called SeMar. SeMar is an abbreviation of the sea of matter. SeMar uses a core memory. The core is compared to a biochemical reaction space, and in imitation of biological matter (substances), four types of data words are prepared in the core. These are the Membrane, Pure data, Operator, and Instruction. The Membrane can be compared to a lipid bilayer, the Pure data can be compared to a small molecule, the Operator can be compared to a protein, and the Instruction can be compared to a gene. In the experiment, GAs are used to modify a sequence of Instructions that a creature (program) has as its genetic information. To calculate the fitness value of each creature, a core memory is prepared for each creature and the Instruction sequence is substituted for the core together with a sequence of environmental data (Pure data). The execution of the core proceeds with the transcription of Instructions to Operators and the actions of Operators to induce

368 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

modifications of Pure data. During these processes, an arbitrary number of Membranes, Pure data, or Operators are inserted or deleted at an arbitrary address of the core. The principal part of this article has already been described in preliminary reports [47, 48]. However, the brevity of those papers allowed only a brief description of the model. The background of the model is scarcely described. Here, I remedy those problems and give a full description of the basic strategy for SeMar. The organization of the article is as follows: Section 2 describes several results of preliminary exper- iments that led me to devise SeMar. In Section 3, I demonstrate a concrete imple- mentation of SeMar and the simulation procedure using GAs. The results of a SeMar simulation are given in Section 4, where an external problem is imposed upon the crea- tures and it is shown that SeMar creatures succeed in creating a program that outputs the desired answer. In Section 5, the characteristics of SeMar are briefly summarized and the differences between SeMar and other programming systems are discussed. The final section (Section 6) is devoted to description of future plans to extend Se- Mar.

2 Prelimiary Experiments

SeMar has stemmed from the study on a machine language genetic programming system called MUNCs (MUltiple von Neumann Computers) devised by the author [44, 45]. In this section, I survey the journey that has led me from MUNCs to SeMar. MUNCs form a system in which machine language programs evolve using GAs. In several experiments using environmental problems, MUNCs succeeded in creating a small functional program [45]; and yet they could not succeed in creating a higher (longer) program with much algorithmic complexity. One of the most serious problems MUNCs suffer from is what I call “evolutionary dead end” (Figure 1). In MUNCs, a sequence of instructions is put into action one by one using an instruction pointer. A “jump” instruction can move this pointer to an earlier address, and once such a loop has been accomplished, the other program regions are not tested, no matter how good the functions within them are. (In an extreme case, a program appeared that executed only 5 of the 200 instructions it had.) The crossover operation cannot destroy jump instructions that are fixed in the population, so the evolutionary speed conspicuously drops. I tried several modifications of the CPU hardware architecture and the basic instruction set to no avail. Evolutionary dead end is a serious and inevitable problem for any sequentially executed programming system that evolves using GAs. A similar problem occurred in the early form of Tierra too; it was solved to some extent by preparing multiple pointers or multiple threads to execute a program (see the later version of Tierra [52] and Avida by Adami et al. [1–3]). Biological systems, on the other hand, do not seem to suffer from evolutionary dead end. Of course the genome of a higher ordinarily includes many uncoded regions (base sequences that are not transcribed to proteins); however, these regions include several regulatory sequences and determine regulatory pathways of genes. Modifications in the uncoded regions cannot only make new coded regions (new genes) but also change regulatory networks among present structural proteins. There is a biological theory that these modifications had contributed most to the development of complexity [14]. To make modifications in the uncoded regions affect the entire genetic network, the uncoded as well as coded regions have to be watched all the time. In a living cell, this is done by a large number of proteins that work in parallel. A set of proteins for transcription are diffused throughout a cell or nucleus, and if there is a particular DNA sequence to react with these proteins, transcription is begun at that place. In a biological system, evolutionary dead end is circumvented by parallel execution of transcription proteins. This suggested that to overcome evolutionary dead

Artificial Life Volume 5, Number 4 369

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

Figure 1. Evolutionary dead end in MUNCs.

end, I needed to abandon sequential execution of machine codes and a system that executes instructions logically in parallel. Based upon this reasoning, I next devised a system that consists of multiple com- puters with data-flow (data-driven) architecture. A data-flow machine is a parallel execution computer, whose program is represented by a directed graph. Nodes of a graph denote operators (instructions) and arrows of a graph represent flows of operand data [15–17, 30, 31, 42]. I prepared a population of matrices that represent program graphs and evolved them using GAs. From several experiments using problem data sets, I found that this system did not suffer from evolutionary dead end. The data-flow architecture enabled every operator to start execution only by a local modification of the connection matrix. An operator no longer needed to wait for a visit from the instruction pointer; thus the evolutionary dead end was not a problem with this machine. However, a population of data-flow computers was still unable to evolve programs with much algorithmic complexity. They could not answer difficult problems prepared in an environmental data set within a practical simulation time. To revise this system to a more efficient one, I next focused on another parallel process in living systems. Although an operator in the data-flow computer manipulates only one operand datum a time, a protein (enzyme) in a living cell can catalyze 1000 chemical reactions per second, on average [4]. This “parallel” execution of molecules can include slightly different versions of catalytic reactions, which might be a test-bed of the search for more advantageous reaction processes in a cell. To imitate this mechanism, I revised the data-flow architecture so that it might be able to deal with multiple operand data at one time. Here I call this system the “array-

370 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

Figure 2. Judging operations.

data-type” data-driven architecture. I prepared an operator matrix and an operand soup. The operator matrix, that is, genetic information modified by GAs, contains a sequence of instruction-template sets that are executed in parallel. At each time step, every operator chooses all operand data using its template data from the soup, modifies them, and puts them back into the soup. The soup is a size-variable array of label- value sets with no address number. As an enzyme selects substrates by conformational matching, so operand data are selected from the soup by the matching of the instruction template and data labels. After several experiments that used GAs to evolve a population of programs (op- erator matrices), I found that this system could evolve particular kinds of programs efficiently, and yet it suffers from a serious drawback as a computational system. As is well known, for a system to be able to execute any kind of computation, or for a computational system to be equivalent to the universal Turing machine, it must be able to execute some kind of judging operations. In the data-flow architecture, this operation is typically achieved by a specific type of operator (a comparator) that com- pares two operand data and outputs a control datum. In the present system, however, this operation cannot work well because the template matching typically selects many operands at once and a comparator cannot choose an appropriate pair of operands for the judgment (Figure 2). The multiple selection of operands, which was at first devised to improve the performance of the system, destroys the computational capability of the system. In a biological system, on the other hand, this problem is solved in the following way. In a living cell, substrate molecules (and catalytic molecules also) are not dispersed uniformly in the solvent. Typically, a living cell is structurally partitioned into sections by membranes, and in a particular section, molecules that are necessary for specific catalytic reactions are concentrated. This partitioning of the cell not only increases the rate of catalytic reactions in a cell, but helps select controlling molecules that serve as switches for succeeding reactions. Inspired by this partitioning of a living cell, I revised the previous system in the following ways. First, I introduced a special type of membrane data that partition the

Artificial Life Volume 5, Number 4 371

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

operand soup into small sections. Then I merged the operator matrix into the operand soup so that an operator can select its operand data only in the section to which it belongs. This constitutes the framework of SeMar. The detailed design of SeMar described in the next section was established by making small modifications on this baseline.

3 The Model

SeMar (the sea of matter) is a simulation of a one-dimensional core memory. The core satisfies the periodic boundary condition (constitutes an endless loop) and is addressed with logical addresses. Since logical addresses are different from physical ones, the core is amenable to the insertion or deletion of any number of words (size-variable). In this section, I first describe the metaphor that I used in the design of SeMar and then give a detailed explanation of the model. Although the final version of SeMar (see future plans described in Section 6) does not necessarily use GAs, I here use GAs to evolve creatures toward the desired direction. The entire simulation procedure using GAs is described in the last subsection (Section 3.6).

3.1 Metaphor When designing MUNCs [45], I compared a MUNC instruction to a gene and a register operation in a CPU to a chemical reaction in a living cell. The analogy by which SeMar is designed is an extension of the above similarity. I compare the SeMar core to a biochemical reaction space (a solvent in which various biological substrates are dissolved or agglomerated), and I compare a computational operation in the core to a chemical reaction between biological molecules. In imitation of biological substrates, I prepare four types of data words in the core. These are the Membrane, Pure data, Operator, and Instruction. The Membrane can be compared to a lipid bilayer (a septum or a cytoplasmic membrane), the Pure data can be compared to a small molecule that functions as a substrate or a ligand, the Operator can be compared to a protein, and the Instruction can be compared to a gene (see Figure 3). Like lipid bilayers in a solvent, the Membrane data work as core “walls” and partition the core into small sections (compartments). The data words located on the other side of the Membrane cannot be mixed without the use of data transfer by the specific Operators. Each section can be compared to an organelle or a cell in living systems.

3.2 The Core Words and Their Notation Each word in the core is coded in a sequence of 32 binary bits, and according to the data type, it is expressed by the notation shown in Figure 4. Depending upon the data type, a data word includes a Header, Type, Label, Value, Mnemonic, or Address. The Type of Membrane data is either Bgn or End. A section is delimited by a pair of Membranes (MEM:Bgn and MEM:End) that have the same Label bits. The Address of the Operator or Instruction is the ordinal number in the sequence of Instructions that a creature holds as its genetic information. At that address, not only machine code (Mnemonic) but a Label and several Templates are also stored. Templates are used for bit matching with Label bits and help to choose the appropriate data word necessary for the execution of the Instruction/Operator.

3.3 Execution of the Core Whereas the operation of MUNCs (and any other machine language programming sys- tems) is directly activated by an instruction, the biochemical reaction in a cell is not directly catalyzed by a gene. The reaction is catalyzed by an enzyme (protein), and

372 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

Figure 3. Analogy between the SeMar core and a biochemical reaction space.

Figure 4. Notation of the four types of data words in the core.

Artificial Life Volume 5, Number 4 373

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

Figure 5. An example picture (a snapshot) of a part of the core memory in the execution of an Operator.

a gene works only as original information from which a corresponding protein is cre- ated. In a cell, proteins take the initiative in all actions. I imitate this system and make the execution of SeMar proceed with the actions of the Operators and Instructions. (Although in a future version of SeMar, I plan to accomplish all the workings of the core by the actions of Operators by preparing a new Operator for the transcription of Instructions to Operators (Section 6), here I give both the Operators and Instruc- tions the initiative in the actions.) The actions of the Instructions and Operators are as follows. The execution of an Instruction transcribes itself to create the corresponding Oper- ator or a pair of Membranes. When transcribed to an Operator, the Operator is created in the nuclear section (the precise definition of “nuclear” is given later) or in a section chosen by matching between the Template and Label, depending upon the kind of Instruction. Figure 5 shows a snapshot of a part of the core in which an Operator is in action. The Operator pointed at by the black triangle searches the core in the direction of earlier logical addresses, moves to the nearest operand datum, and modifies it. An operand datum is chosen by matching between the Operator Template and Pure data

374 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

Labels. This modification of data is continued until the Operator bumps into the nearest Membrane and disappears. As I mentioned before, the Membrane works as a core wall and the Operator cannot move beyond the Membrane. The execution pointer represented by the gray triangle holds the address of the next Operator/Instruction to be executed. Using this pointer, all Operators and Instructions are put into action one by one in the order of their logical addresses in the core. The entire action of the core, that is, logically a parallel process, is considered to be simulated by this sequential execution.

3.4 Elementary Instructions Table 1 shows 16 elementary Instructions that I prepared. They are classified into four groups. The first group (CP o and CP e) consists of receptors for regulatory Operators. INS:CP o and INS:CP e are not transcribed and serve only as starting or terminating signals for consecutive Instructions whose activity values (values representing tran- scription capability) are regulated simultaneously. (In imitation of , I designate such a sequence of Instructions as an “operon.”) The second group (which includes only MEMB) is a special Instruction to create a Membrane pair. When this Instruction is put into action, the appropriate “outer” section is chosen by matching between the Instruction Template and Membrane Labels, and a new “inner” section is created in it by inserting a pair of Membranes (MEM:Bgn and MEM:End). Thus, all sections in the core have their own “outer” sections, so the core is partitioned hierarchically. The third group (Prom and Repr) consists of regulatory Instructions. Like regulatory genes in a living cell, when transcribed, INS:Prom and INS:Repr create OPE:Prom and OPE:Repr only in the “nuclear” section, respectively. (As in biology, where the organelle that contains genes (DNA) is called the nucleus, I call only the section that includes a sequence of Instructions the nuclear section.) These Operators search for the matched INS:CP o or INS:CP e (operands), move to it, and regulate the activity value of the succeeding operon. The fourth group (cre0 to lt 2) consists of structural Instructions. When a structural Instruction is put into action, the corresponding structural Operator is created in any matched section. A structural operator searches for an appropriate operand (a Pure data whose Label matches with the Operator Template), moves to it, and changes its value or creates a new Pure data according to the defined function. During these processes, all regulatory and structural Operators choose not only operand data but also “ligand” data by the matching process. Like an allosteric protein in a cell, the activity value of an Operator changes by the effect of the ligand data and when the value is zero (it is inert), the Operator cannot modify the operand data. Finally in this section, I describe the flexible description-length matching that is used to choose an appropriate operand, ligand, or Membrane (See Figure 6). The matching is tested between a 32-bit Template that an Operator/Instruction has and a 16-bit Label that a Pure data/Membrane has. A Template consists of the 16-bit Mask and the 16-bit Pattern. At the matching process, the Pattern and the Label are compared only at the bit sites in which the Mask has bit 1s. If the Pattern bits and the Label bits are all the same at those bit sites, the Template and the Label are regarded as matched.

3.5 Tree Data-Flow Structure Among Sections Although the operations of almost all Operators cannot reach beyond the Membrane, only the two structural Operators OPE:CHO1 and OPE:CHI1 can violate this rule. These Operators are devised in imitation of membrane proteins in a cell. Membrane proteins are channels between compartments in a cell. They are anchored to the lipid bilay-

Artificial Life Volume 5, Number 4 375

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

Table 1. The basic Instruction set. Mnemonic Function as an Instruction Function as an Operator CP o Works as a receptor for OPE:Prom. CP e Works as a receptor for OPE:Repr. MEMB Creates a new pair of Membranes in the matched section. Prom Creates OPE:Prom in the Promotes the transcription capability of nuclear section. Instructions (an operon) succeeding the matched INS:CP o. Repr Creates OPE:Repr in the Represses the transcription capability of nuclear section. Instructions (an operon) succeeding the matched INS:CP e. cre0 Creates OPE:cre0 in the Creates new Pure data at the beginning matched section. of the section. cop1 Creates OPE:cop1 in the Creates DAT:(a new Label) matched section. :(operand Value). inc1 Creates OPE:inc1 in the Creates DAT:(a new Label) matched section. :(operand Value + 1). dec1 Creates OPE:dec1 in the Creates DAT:(a new Label) matched section. :(operand Value 1). sfl1 Creates OPE:sfl1 in the Creates DAT:(a new Label) matched section. :(operand Value 2). CHO1 Creates OPE:CHO1 in the Transfers the operand DAT to the outer matched section. section or the answer stack. CHI1 Creates OPE:CHI1 in the Transfers the operand DAT to the inner matched section. section. cop2 Creates OPE:cop2 in the Copies the Value of the second operand matched section. to the first one. add2 Creates OPE:add2 in the Creates DAT:(a new Label):(the first matched section. operand Value + the second operand Value). gt 2 Creates OPE:gt 2 in the Creates new Pure data if (the first operand matched section. Value) > (the second operand Value). lt 2 Creates OPE:lt 2 in the Creates new Pure data if matched section. (the first operand Value) < (the second operand Value).

ers and transfer specific molecules from one side to the other. Like these proteins, OPE:CHO1 and OPE:CHI1 transfer operands to the outer or inner section, respectively. As described before, the Membrane partitions the core hierarchically. Every section has its outer section. The above channel Operators transfer data between the outer section and the inner section, so that the data flow among sections in the core constitutes a tree structure with one “outermost” section. The left-hand image in Figure 7 shows a typical partitioning of the core with the data flow by channel Operators expressed as arrows. In this example, the core consists

376 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

Figure 6. Flexible description-length matching for data selection.

Figure 7. The partitioned core and the data transfer between sections.

of two parts, the environmental section and the outermost section, and the outermost section includes three inner sections. The lowest inner section is the nuclear section. The root of the tree, that is, the outermost section, imports Pure data from the environ- mental section and exports Pure data to the answer stack that is prepared apart from the core. Although after the transfer, the source data are ordinarily deleted from the core; only data in the environmental section are not deleted with the transfer. This is a representation of the infinity of the environment. In Figure 7, the corresponding biological system is also illustrated on the right-hand side. As is clear from this figure, the data stored in the answer stack can be compared

Artificial Life Volume 5, Number 4 377

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

to molecules excreted by a unicellular creature. The fitness value of each creature used in GAs is calculated from this excreted data.

3.6 Simulation Procedure Using Genetic Algorithms The evolution of creatures is driven by genetic algorithms (GAs), especially those called simple GAs [21]. Every creature has a sequence of Instructions, Labels, and Templates as its program information and a sequence of Pure data as its initial data of execution. All this genetic information is expressed as a long sequence of binary bits, and GAs modify a population of these sequences using a generation cycle of selection, mutation, and crossover operations. The fitness value of each creature for selection is calculated using the execution process. In the execution, I prepare a core for each creature. A sequence of Instructions, initial Pure data, and an environmental data set are substituted for the core, and the core is put into practice. When the size of the answer stack is not changed for a long time or it reaches the predefined maximum value, the execution process is terminated and the contents of the stack are examined. The fitness value is calculated from the ratio of the “correct” value in this answer stack. The higher the ratio, the larger the fitness value. The generation cycle of GAs is continued until a creature that can output the correct answer in the stack dominates the population.

4 Results of Experiments

First I show a snapshot of the SeMar core in Figure 8. In this figure all words are lined up in the order of their logical addresses. (The number on the left-hand side of each word is the physical address.) The outermost section includes two inner sections in this example, and among them, one section has its inner section. Active Instructions (Instructions that are able to be transcribed) are colored in dark gray, and inactive ones (those that are unable to be transcribed) are colored in light gray. At this moment the activity values of Instructions are determined by the initial setting, so that an operon, which is delimited by receptors (INS:CP o or INS:CP e), does not necessarily have the same activity value. Next I show the results of an experiment in which the SeMar creatures are given a problem to be solved. The problem is a simple machine learning problem that is the same as that was used in the experiment of MUNCs (a problem that was called a “larger- of-two-entries” problem in [44, 45]). Figure 9 shows the environmental (problem) data set I prepared. In this figure, each column of numbers (32 numbers) represents a data set, so in other words, eight data sets are shown in this figure. I prepared 500 different problem data sets. All problem numbers E[dp] are random integers that range from 0 to 99, and for each problem data set, the teaching value (T ) is calculated using

T = max{E[6], E[7]}.

In this case, the teaching value is the larger of two numbers, the sixth and the seventh entries. At the beginning of the execution process of a creature, one problem data set is randomly chosen out of the 500 data sets and is used as the environment data in the core. At this time, a problem number E[dp] is translated into the Pure data DAT:dp:E[dp]. When calculating the fitness value from the answer stack, I judged a number stored in the stack to be correct when its value is the same as the teaching value T .

378 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

Figure 8. A snapshot of the core.

Figure 10 shows a typical result for this problem. After two drastic adaptive evo- lutionary steps, the population was dominated by a creature that could output the correct answer on the answer stack. The principal part of the final data-flow is shown in Figure 11. The two significant environmental data (Pure data with Label 006 and Label 007) are imported by the CHI1 operator from the environmental section to the outermost section. These data are compared by the gt 2 operator in the outermost section, and when the seventh entry is larger than the sixth entry, a Pure data with Label 2f4 is created by the gt 2 Operator. This data works as a ligand of the CHO1 Operator and inhibits the transfer of the sixth entry to the answer stack.

5 Discussion

With the objective of designing an evolutionary programming system that can automat- ically increase algorithmic complexity, I devised SeMar, a novel programming system that simulates biological catalytic reactions using a core memory. In imitation of bi- ological substances, four types of data words (Membrane, Pure data, Operator, and Instruction) were prepared in the core, and the core was partitioned into sections by the Membrane data. In the following, I compare SeMar with some other studies and discuss its merits and demerits.

Artificial Life Volume 5, Number 4 379

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

Figure 9. Environment data set and the teaching data.

Figure 10. A typical result for the larger-of-two-entries problem (fitness curve).

380 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

Figure 11. A typical result for the larger-of-two-entries problem (data-flow diagram).

5.1 Comparison with Tierra Several researchers have proposed various systems that evolve programs with ma- chine language architecture. These include a machine-language core-memory program- ming system called Tierra by Ray [38–40], the compiling genetic programming system (CGPS) by Nordin [34, 35], Avida by Adami [1–3], and multiple von Neumann computers (MUNCs) by the author [45]. The most serious difference between Tierra and SeMar is in the analogy by which the systems are designed. Whereas a tierran instruction can be compared to a codon or a base in DNA, a SeMar Instruction can be compared to a gene. In other words, a tierran instruction is a lower unit of genetic information than a SeMar Instruction. The Tierran core is basically occupied only by instructions, and the instructions interact with each other by the intermediation of pure data that are stored in the working registers prepared apart from the core. In SeMar, on the other hand, a core word is compared to a molecule, and in imitation of biological molecules, various types of data words are prepared in the core. An instruction is designed by referring to a gene. As is often pointed out, for machine language genetic programming, ele- mentary instructions are of critical importance, as they determine the final performance of the system. Because of the above difference, SeMar has the following merits and demerits compared to Tierra. First, we must remember that in a living cell, many thousands, or many hundreds of thousands, of genes work to maintain the metabolic activities of the cell. This means that a designer who adopts the metaphor in SeMar might need to prepare an enormous number of instructions to achieve higher functions. In addition, because a protein often has highly complicated functions in a living cell, to design a gene-like instruction is often rather laborious, and at the same time, the computational cost of each instruction is usually heavy.

Artificial Life Volume 5, Number 4 381

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

However, the advantage of this approach is that a protein, into which a gene is ordinarily transcribed, is a physiological functional unit in a cell. In this approach, one can define the function of an instruction one by one by referring to the knowledge provided by modern molecular biology, which has made rapid progress during the last several decades. Preparing gene-like instructions using this knowledge might be a steady and sure approach to the implementation of the functional units that will enable the evolution of higher computational functions in a computer.

5.2 Comparison with Chemical Computation Here I discuss another kind of study that is concerned in SeMar, chemical computation using mathematical media. Roughly speaking, studies in this area are classified into two groups. The first group comprises theoretical work trying to clarify the computational capa- bility of a set and its operations [7, 12]. In these studies, the elements of a set can be compared to molecules, and operations on the elements can be compared to chemi- cal reactions. A computational model called CHAM (CHemical Abstract Machine) by Berry and Boudol [12] first introduced the membrane-like encapsulation of elements in a set and allowed the transfer of elements through “airlocks” between capsules. The partitioning of the SeMar core is similar to this encapsulation of CHAM. Although the study by Berry and Boudol only focuses on theoretical analyses of the perfor- mance of parallel computation on the multiple sets, if it were extended to an analysis of evolutionary programming techniques, it might provide valuable information for SeMar. The second approach to chemical computation is represented by experimental stud- ies by Bagley and Farmer [6], Fontana [20], Szuba [51], Rasmussen et al. [37, 36], Banzhaf et al. [8–10, 18, 19], and Ikegami and Hashimoto [25]. These researchers implemented computational systems that include chemical-reaction-like operations acting on infor- mational objects and conducted simulations. They succeeded in organizing (auto- )catalytic networks between operations or objects. Among these experiments, a study that has many characteristics common to SeMar is the core memory model proposed by Rasmussen et al. [37, 36]. They compared a core word (machine code) to a chemical molecule and allowed the modification of codes using code–code interactions that were designed in imitation of chemical reactions between molecules. However, unlike my efforts with SeMar, they did not prepare different types of data words in the core. All words in the core are of the flat data type and there was no specific data for catalyzing reactions between data words. Also, there is no partitioning by membranes in their core. SeMar, which has the same level of analogy to a chemical system as that used by Rasmussen, is a system that is designed using a detailed comparison to a biological system.

6 Future Work

In this article, I described an experiment of SeMar in which a population of programs were evolved using genetic algorithms (GAs). A simple machine learning problem was prepared for their environment, and by rewarding a creature that can “excrete” the correct answer with high fitness values, I succeeded in breeding programs that have a desired function. As is shown in this example, GAs are useful for evolving a population toward a direction that a human wishes; however in the near future, I

382 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

intend to eliminate the outer loop with GAs and make creatures evolve for their own purposes. Future plans to extend SeMar are as follows.

To put all creatures into one core and prepare Instructions necessary for transcription, reproduction, and crossover. By doing this, I can make creatures choose generation cycles and reproduction units by themselves. The entire simulation process will be accomplished only by the actions of Operators and I can eliminate the outer loop using GAs. I expect that this will enhance system evolvability dramatically. In the present implementation, the fitness of the organisms is not given by its replication rate, that is, the speed of the metabolism, but rather by the chemicals it secretes. This feature limits its capacity to yield complex programs. In biochemistry, the identity of the molecules an organism secretes is important only for its biochemistry and its ultimate survival in a given environment. To induce parasitism, symbiosis, and genetic fusion between creatures in the core. The data operation shown in Figure 10 is accomplished only between the environmental section and the outermost section, and the other inner sections make no direct contribution to the final answer of the creature. However, to create highly functional programs, it is essential to make use of the tree structure among sections and to assign subtasks to the inner sections (a structured programming technique). If the above relations appeared between creatures in the core, it might be the first step in realizing structured programming in SeMar. For instance, symbiosis could be achieved by neighboring creatures that mutually import Pure data excreted by the others and make use of it. Symbiosis is a kind of cooperation, and if such creatures stuck together to work out some more difficult problem, it could be said that the problem was solved in a structured way. To introduce a special type of data representing energy and make the actions of Operators facilitated by the energy data. It is expected that this will make creatures scramble after this third resource and devise various strategies for getting it (see the next item). To prepare Instructions for capturing prey and digestion. If a creature can prey on another creature and obtain the contents from it, it can snatch energy (and other Pure data) from others and obtain more energy than it can obtain by itself. By introducing this relation between creatures in the core, I expect a food chain to emerge in SeMar. As you know, in the biological ecosystem, higher organisms almost always occupy the top layer of the food chain hierarchy. If a food chain appeared in the core, the creatures at the top level of the hierarchy might have a program with much algorithmic complexity.

Acknowledgments The study presented in this article was principally carried out while the author worked at Honda R&D Co., Ltd., Wako Research Center. The author also thanks Dr. Chris Adami, California Institute of Technology, for his many valuable comments on an early version of the manuscript.

References 1. Adami, C. (1995). Learning and complexity in genetic auto-adaptive systems. Physica D 80, 154–170. 2. Adami, C. (1998). Introduction to artificial life. Santa Clara, CA: Springer-Verlag.

Artificial Life Volume 5, Number 4 383

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

3. Adami, C. & Brown, C. T. (1994). Evolutionary learning in the 2D artificial life system “Avida.” In R. Brooks & P. Maes (Eds.), Artificial Life IV: Proceedings of an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems (pp. 377–381). Cambridge, MA: MIT Press. 4. Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K., & Watson, J. D. (1994). Molecular biology of the cell (3rd. ed.). New York: Garland Publishing. 5. Angeline, P. J., & Kinnear, K. E., Jr. (Ed.). (1996). Advances in genetic programming: Volume 2. Cambridge, MA: MIT Press. 6. Bagley, R. J., & Farmer, J. D. (1992). Spontaneous emergence of a metabolism. In C. G. Langton, C. Taylor, J. D. Farmer, & S. Rasmussen (Eds.), Artificial Life II: Proceedings of an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems (pp. 93–140). Vol. 10 of SFI Studies in the Sciences of Complexity. Redwood City, CA: Addison-Wesley. 7. Banatre,ˆ J.-P., & Le Metayer,´ D. (1990). The gamma model and its discipline of programming. Science of Computer Programming 15, 55–77. 8. Banzhaf, W. (1994). Self-organization in a system of binary strings. In R. Brooks & P. Maes (Eds.), Artificial Life IV: Proceedings of an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems (pp. 109–118). Cambridge, MA: MIT Press. 9. Banzhaf, W. (in press). Self-organization in a system of binary strings with topological interactions. Physica D. 10. Banzhaf, W., Dittrich, P., & Rauhe, H. (1996). Emergent computation by catalytic reactions. Nanotechnology 7, 307–314. 11. Banzhaf, W., Nordin, P., Keller, R. E., & Francone, F. D. (1997). Genetic programming—An introduction on the automatic evolution of computer programs and its applications. San Francisco: Morgan Kaufmann. 12. Berry, G., & Boudol, G. (1992). The chemical abstract machine. Theoretical Computer Science, 96, 217–248. 13. Bird, A. P. (1995). Gene number, noise reduction and biological complexity. Perspectives, 11, 94–100. 14. Britten, R. J., & Davidson, E. H. (1971). Repetitive and non-repetitive DNA sequences and a speculation on the origins of evolutionary novelty. Quarterly Review of Biology, 46, 111–138. 15. Dennis, J. B. (1975). Packet communication architecture. In Proceedings of the 1975 Sagamore Computer Conference on Parallel Processing (pp. 224–229). 16. Dennis, J. B. (1980). Data flow supercomputer. Computer, 13, 48–56. 17. Dennis, J. B., & Misunas, D. P. (1974). A computer architecture for highly parallel signal processing. In Proceedings of the ACM 1974 National Conference (pp. 402–409). 18. Dittrich, P., & Banzhaf, W. (1998). Self-evolution in a constructive binary string system. Artificial Life, 4, 203–220. 19. Dittrich, P., Ziegler, J., & Banzhaf, W. (1998). Mesoscopic analysis of self-evolution in an artificial chemistry. In C. Adami, R. K. Belew, H. Kitano, & C. E. Taylor (Eds.), Artificial Life VI: Proceedings of the Sixth International Conference on Artificial Life (pp. 95–103). Cambridge, MA: MIT Press. 20. Fontana, W. (1992). Algorithmic chemistry. In C. G. Langton, C. Taylor, J. D. Farmer, & S. Rasmussen (Eds.), Artificial Life II: Proceedings of an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems (pp. 159–209). Vol. 10 of SFI Studies in the Sciences of Complexity. Redwood City, CA: Addison-Wesley. 21. Goldberg, D. E. (1989). Genetic algorithms in search, optimization and machine learning. New York: Addison-Wesley. 22. Holland, J. (1992). Adaptation in natural and artificial systems. Cambridge, MA: MIT Press. 23. Huelsbergen, L. (1996). Toward simulated evolution of machine-language iteration. In

384 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

J. R. Koza, D. E. Goldberg, D. B. Fogel, & R. L. Riolo (Eds.), Genetic Programming: Proceedings of the First Annual Conference (pp. 315–320). Cambridge, MA: MIT Press. 24. Huelsbergen, L. (1997). Learning recursive sequences via evolution of machine-language programs. In J. R. Koza et al. (Eds.), Genetic Programming 1997: Proceedings of the Second Annual Conference (pp. 186–194). San Francisco, CA: Morgan Kaufmann. 25. Ikegami, T., & Hashimoto, T. (1995). Coevolution of machines and tapes. In F. Moran,´ A. Moreno, J. J. Merelo, & P. Chacon´ (Eds.), Proceedings of the Third European Conference on Artificial Life (pp. 234–245). Berlin: Springer. 26. Koza, J. R. (1992). Genetic programming: On the programming of computers by means of natural selection. Cambridge, MA: MIT Press. 27. Koza, J. R. (1994). Genetic programming II: Automatic discovery of reusable programs. Cambridge, MA: MIT Press. 28. Langdon, W. B. (1998). Genetic programming and data structures. Boston: Kluwer Academic. 29. McShea, D. W. (1996). Metazoan complexity and evolution: Is there a trend? Evolution, 50, 477–492. 30. Miller, R. E., & Cocke, J. (1972). Configurable computers: A new class of general purpose machines. IBM Research (RC 3897). Also available in Lecture notes in computer science 5 (pp. 285–298). Heidelberg, Germany: Springer-Verlag (1974). 31. Misunas, D. P. (1976). Performance analysis of a data-flow processor. In Proceedings of the 1976 International Conference on Parallel Processing (pp. 100–105). 32. Mitchell, M. (1996). An introduction to genetic algorithms. Cambridge, MA: MIT Press. 33. Nimwegen, E. v., Crutchfield, J. P., & Mitchell, M. (1997). Statistical dynamics of the royal road genetic . Santa Fe Institute Working Paper 97-04-035. 34. Nordin, P. (1994). A compiling genetic programming system that directly manipulates the machine code. In K. E. Kinnear, Jr. (Ed.), Advances in genetic programming (pp. 311–331). Cambridge, MA: MIT Press. 35. Nordin, P. (1997). Evolutionary program induction of binary machine code and its applications. Muenster, Germany: Krehl Verlag. 36. Rasmussen, S., Knudsen, C., & Feldberg, R. (1992). Dynamics of programmable matter. In C. G. Langton, C. Taylor, J. D. Farmer, & S. Rasmussen (Eds.), Artificial Life II: Proceedings of an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems (pp. 211–254). Vol. 10 of SFI Studies in the Sciences of Complexity. Redwood City, CA: Addison-Wesley. 37. Rasmussen, S., Knudsen, C., Feldberg, R., & Hindsholm, M. (1990). The coreworld: Emergence and evolution of cooperative structures in a computational chemistry. Physica D, 42, 111–194. 38. Ray, T. S. (1992). An approach to the synthesis of life. In C. G. Langton, C. Taylor, J. D. Farmer, & S. Rasmussen (Eds.), Artificial Life II: Proceedings of an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems (pp. 371–408). Vol. 10 of SFI Studies in the Sciences of Complexity. Redwood City, CA: Addison-Wesley (1992). 39. Ray, T. S. (1994). Evolution, complexity, entropy, and artificial reality. Physica D, 75, 239–263. 40. Ray, T. S. (1997). Selecting naturally for differentiation. In J. R. Koza et al. (Eds.), Genetic programming 1997: Proceedings of the Second Annual Conference (pp. 414–419). San Francisco, CA: Morgan Kaufmann. 41. Ray, T. S., & Hart, J. (1998). Evolution of differentiated multi-threaded digital organisms. In C. Adami, R. K. Belew, H. Kitano, & C. E. Taylor (Eds.), Artificial Life VI: Proceedings of the Sixth International Conference on Artificial Life (pp. 295–304). Cambridge, MA: MIT Press. 42. Sharp, J. A. (1985). Data flow computing. Chichester, U.K.: Ellis Harwood.

Artificial Life Volume 5, Number 4 385

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021 H. Suzuki An Approach to Biological Computation

43. Stadler, P. F., & Wagner, G. P. (1997). Algebraic theory of recombination spaces. Evolutionary Computation, 5, 241–275. 44. Suzuki, H. (1997). Functional emergence with multiple von Neumann computers. In C. Langton & K. Shimohara (Eds.), Artificial Life V: Proceedings of the Fifth International Workshop on the Synthesis and Simulation of Living Systems (pp. 108–115). Cambridge, MA: MIT Press. 45. Suzuki, H. (1997). Multiple von Neumann computers: An evolutionary approach to functional emergence. Artificial Life, 3, 121–142. 46. Suzuki, H. (1997). The optimum recombination rate that realizes the fastest evolution of a novel functional combination of many genes. Theoretical Population Biology, 51, 185–200. 47. Suzuki, H. (1998). One-dimensional unicellular creatures evolved with genetic algorithms. In JCIS ’98: The Fourth Joint Conference on Information Sciences, Proceedings Vol. II (pp. 411–414). Association for Intelligent Machinery. 48. Suzuki, H. (1998). Unicellular core-memory creatures evolved using genetic algorithms. Inter-Journal of Complex Systems, Submitted. Available at http://www.interjournal.org/ or http://www.hip.atr.co.jp/hsuzuki/body/papers/1998.12 InterJournal/index.html. 49. Suzuki, H., & Iwasa, Y. (1997). GA performance in a Babel-like fitness landscape. In Proceedings of the Ninth IEEE International Conference on Tools with Artificial Intelligence (pp. 357–366). Los Alamitos, CA: IEEE Computer Society Press. 50. Suzuki, H., & Iwasa, Y. (in press). Crossover accelerates evolution in GAs with a Babel-like fitness landscape: Mathematical analyses. Evolutionary Computation, 7. 51. Szuba, T. (1998). A quasi-chaotic computational model of collective intelligence and its IQ measure. In JCIS ’98: The Fourth Joint Conference on Information Sciences, Proceedings Vol. II (pp. 44–47). Association for Intelligent Machinery. 52. Thearling, K., & Ray, T. S. (1994). Evolving multi-cellular artificial life. In R. Brooks & P. Maes (Eds.), Artificial Life IV: Proceedings of an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems (pp. 283–288). Cambridge, MA: MIT Press.

386 Artificial Life Volume 5, Number 4

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/106454699568827 by guest on 25 September 2021