Computational Discovery of Zhijian Pan*,** Instructionless Self-Replicating IBM Annapolis Lab Structures in Cellular Automata James A. Reggiay University of Maryland

Keywords Artificial life, self-replication, cellular Abstract Cellular automata models have historically been a automata, evolutionary computing, major approach to studying the information-processing properties computational creativity, genetic of self-replication. Here we explore the feasibility of adopting genetic programming programming so that, when it is given a fairly arbitrary initial cellular automata configuration, it will automatically generate a set of rules that make the given configuration replicate. We found that this approach works surprisingly effectively for structures as large as 50 components or more. The replication mechanisms discovered by genetic programming work quite differently than those of many past manually designed replicators: There is no identifiable instruction sequence or construction arm, the replicating structures generally translate and rotate as they reproduce, and they divide via a fissionlike process that involves highly parallel operations. This makes replication very fast, and one cannot identify which descendant is the parent and which is the child. The ability to automatically generate self-replicating structures in this fashion allowed us to examine the resulting replicators as their properties were systematically varied. Further, it proved possible to produce replicators that simultaneously deposited secondary structures while replicating, as in some past manually designed models. We conclude that genetic programming is a powerful tool for studying self-replication that might also be profitably used in contexts other than cellular spaces.

1 Introduction

The study of self-replicating ‘‘machines’’ has long been of great interest in the field of artificial life. A variety of approaches have been studied [9, 11, 12, 24, 27, 29, 30], motivated in part by the desire to understand the fundamental information-processing principles underlying self-replication, to gain a better understanding of the origins and evolvability of life, and to explore the potential technological relevance of self-replication to producing robust electronic systems and atomic-scale manufacturing (nanotechnology). The use of cellular automata models has played an important and at times central role in this work. Historically, cellular automata models have primarily focused on two broad classes (families) of replicating structures: universal constructors and self-replicating loops. Following John von Neumann’s seminal work in the 1950s [32], large, complex universal constructors consisting of

* Contact author. ** IBM Annapolis Lab, 1997 Annapolis Exchange Parkway, Annapolis, MD 21401. E-mail: [email protected] y Computer Science Department, University of Maryland, College Park, MD 20742. E-mail: [email protected]

n 2009 Massachusetts Institute of Technology Artificial Life 16: 39–63 (2010)

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

numerous components were the dominant focus of attention. This and subsequent work established the feasibility of artificial self-replication, examined many important theoretical issues, and gradually created progressively simpler self-replicating universal systems [13]. While implementation of such universal constructors has made great progress since von Neumann’s time [33], and important work in this area has continued during recent years [4, 22], their implementation remains only mar- ginally realizable. A second class of replicators, self-replicating loops, appeared in the 1980s when Chris Langton demonstrated that looplike structures used in universal constructors [6] could indepen- dently reproduce themselves [14]. Subsequent work produced simpler and smaller loops [25], demonstrated that they can emerge from a primordial soup of non-replicating components and/or evolve over time [5, 26], and showed how to embed instruction sequences in them so that they perform other tasks as they replicate [21, 31]. In addition to the two broad families of cellular automata replicators described above, there is a third, less widely investigated class of quite different replicators that we will refer to as instructionless replicators. These models do not incorporate identifiable instructions like those used on the tape of universal constructors or embedded in the loop of self-replicating loops. However, these systems often depend on totalistic transition functions that evaluate the sum of a cell’s neighborhood cell states. For example, Edmond Fredkin and others have described totalistic transaction functions, such as those based on parity or modular arithmetic, that cause fairly arbitrary initial configurations to replicate [1, 8, 17, 28, 34, 35]. These instructionless replicators are sometimes dismissed as trivial or as not even exhibiting self-replication, because their replication is a consequence of the transition function alone rather than being guided by a sequence of instructions as with universal constructors and self-replicating loops [13, 28], and because replication can even occur with the limiting case of a single non-quiescent cell (e.g., using Rule 90 in one-dimensional cellular automata [36]). These and other past results have established the plausibility of self-replicating configurations in cellular spaces, examined their properties, shown that they can be much simpler than originally expected, and shown that they can perform computational tasks. However, over the past half cen- tury only a limited number of replicator designs have undergone systematic and extensive study. In part, this is due to the fact that manually creating the local state transition rules that govern replication in cellular spaces is in many cases both very difficult and time-consuming. Further, the replication process underlying both universal constructors and self-replicating loops, which have received the most attention, incorporates a sequential construction strategy in which an arm extends from the parent structure and deposits the emerging child structure in a step-by-step fashion, making the replication process relatively slow. Such a sequential approach to replication seems at odds with the fact that the underlying physical and computational mechanisms involved in biological reproduc- tion inherently include large-scale parallel computations, and does not use effectively the enormous amount of parallel processing that occurs in the ‘‘underlying physics’’ of cellular automata. Further, those past self-replicating structures that have been given the capability of performing a secondary task as they replicate [21, 31] also depend on manually programmed sequential instructions that carry out their task in a similar sequential, step-by-step fashion. The third class of replicators mentioned above do not depend on such sequential instructions; however, such instructionless self-replication capability in the past (e.g., Fredkin replicators) often depended on the presence of totalistic transi- tion functions. In view of the limited types of self-replicating structures examined in past studies, we recently asked the following question:

Given an arbitrary initial cellular automata structure/configuration, is it possible to automatically generate a set of local, non-totalistic rules that will make the given structure replicate, and that will more effectively use concurrent computations reminiscent of those occurring in nature?

One might initially be pessimistic about the possibility of such an approach, because the automated rule generation that would be involved can be viewed as a form of automatic programming, an area

40 Artificial Life Volume 16, Number 1

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

of computer science that has had only limited success in the past. However, an exception to this limited progress is recent work using genetic programming as a creativity support design tool in a broad range of applications, such as art, chemistry, architecture, engineering, and several other areas [2, 3, 10, 15]. This work, as well as the fact that biological evolution has created numerous self-replicating ‘‘machines’’ of remarkable diversity, suggests that evolutionary computation may provide an approach to more effectively exploring the space of self-replicating structures in cellu- lar spaces. In this context, we recently adopted and extended genetic programming methods to evolve cellular automata rule sets when given fairly arbitrary initial structures where cells may have several possible states [18, 19]. Our approach uses uniform treelike data structures to represent both structural information and state transition information, making the application of genetic programming straightforward and computationally efficient. Given the automated ‘‘manufacturing’’ of replicating structures that genetic programming produces, we refer to our approach as the replicator factory.Below we briefly describe the underlying methods used in this system (presented in more detail in [18, 19]), and expand the results to a broader range of replicators than previously reported, followed by examples of self-replicators automatically discovered (programmed) by the replicator factory, with arbitrary structures. These novel replicators inherit the properties of the third class of replicators mentioned above, but without requiring the presence of totalistic transition functions, and often further provide additional features that appear to be beyond the power of past models. Our initial computational experiments using the replicator factory established that it typically produces replicating structures that are qualitatively different from past manually designed universal constructors and self-replicating loops. More specifically, the resulting replicators discovered via genetic programming can be viewed as substantially expanding the range of known instructionless replicators. The rule sets produced by the replicator factory, which can have appreciable sizes, cause both the parent and child structures to grow, split, translate, and sometimes rotate in a much more parallel and distributed fashion than with most past models, and they do not make use of a construc- tion arm that executes a sequence of instructions. If anything, the replication process resembles biological mitosis more than it does past manually created replicating structures studied in cellular automata that follow sequential instructions. As a result of this parallel processing, the replicators generated typically exhibit remarkable replication speed. The replicator factory can produce replicating structures that have arbitrary size, shape, allowable set of states, substructures, symmetries, or other features. As a result, it is possible to rapidly produce many replicators that vary in just a single property. We use this ability to systematically examine the effects of these different properties on replicating structures. This article illustrates the types of replicators produced using our approach, and summarizes the effects of systematically varying replicator properties. We also extend this approach in two ways. One extension is to have the replicators generate additional structures as they replicate, much like what has been done with manually created self-replicating loops in the past [31]. Second, we have examined whether it is possible to coevolve both the rules and the structures that are replicating simulta- neously. We conclude this article by discussing some of the implications of our results and by making suggestions for future issues that will be important to consider. One significant implication of this work is that genetic programming can serve as a very powerful creativity support tool in the discovery of novel self-replicating structures, and might be adopted profitably to study non-cellular automata replicators in future studies.

2 Methods: The Replicator Factory

Historically, genetic programming (GP) has often used tree structures to represent the genetic material associated with each individual in a population. We accordingly adopted a uniform tree- based approach to representing both arbitrary cellular automata (CA) structures and the rules that control a cell’s transitions. Here we briefly describe this representation, how it is used efficiently in GP, and how fitness for self-replication is assessed with our approach. Further technical details of these issues can be found in [19, 20].

Artificial Life Volume 16, Number 1 41

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

2.1 S-Tree and R-Tree Encodings Two tree encodings are used in the replicator factory. The first of these is the structure tree,orS-tree, which represents a target configuration [i.e., a pattern of contiguous active cells in the CA space like those shown at t = 0 in Figure 1(a)] for which a set of rules is sought to make that configuration replicate. The S-tree is a minimum spanning tree for the seed structure that includes all of the non- quiescent cell states in the target configuration, every immediately adjacent quiescent cell based on the (9-neighborhood), and no other cells. An example is given in Figure 1(b). Each node is one of the components (cell states) of this configuration, and its immediate descen- dants are adjacent components that have not already appeared elsewhere in the tree. The S-tree has several desirable properties: It precisely represents the structure that is to replicate, it is acyclic and unambiguous, it is efficiently derived using minimum-spanning-tree algorithms, it is universal in that it can represent arbitrary structures, it is compact in that each structural component appears only once, and it is independent of where the structure may be located in the CA space. Just as the seed structure can be represented by an S-tree, the rules that govern state transitions of individual cells can be represented as a rule tree,orR-tree [19], something that has been recognized independently in other cellular automata models (e.g., in the game of Life simulator ). An R-tree is a rooted and ordered tree that encodes every rule needed to direct the state transition of a given structure, and only those rules. An example is given in Figure 1(c). The root is a dummy node. Each node at the first level represents the current state of a cell at time t whose value is to be updated. Each node at lower levels represents the state of each neighbor of that cell. Each leaf node represents what the state of that cell should become at time t + 1. Therefore, the R-tree can be viewed as similar to a decision tree, where each cell can find a unique path to a leaf (i.e., the new state it is to assume) by selecting each subbranch based on the states of itself and its neighbors. The R-tree is height-balanced, is parsimonious, and can be used efficiently by GP, as described below. In the

Figure 1. Example replicator with rules using the . (a) The seed structure consists of seven oriented components as shown at t = 0. (b) The S-tree corresponding to this seed structure. (c) An evolved R-tree that drives replication of the seed within two time steps as shown in (a) at t =1andt =2.Attimestepst > 2 these replicas attempt to repeat the same replication, and when enough space is available, more replicas will appear, isolated from each other.

42 Artificial Life Volume 16, Number 1

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

following—and unlike in Figure 1, where evolved rules are based on the von Neumann neighborhood for illustrative purposes—R-trees are based on the Moore neighborhood. While S-tree and R-tree encodings can work for other encodings, we elected to use the Moore neighborhood because it provides more local information than the von Neumann neighborhood that the evolutionary process might exploit. We used just a single neighborhood uniformly so that the properties of the different evolved replicators could be compared fairly.

2.2 Genetic Programming with S-Tree and R-Tree Encodings To build a GP system that can program a CA to support self-replication using S-trees and R-trees, a specific seed structure is first encoded as an S-tree. Then, an R-tree population is initialized randomly and starts to evolve, guided by a fitness function that evaluates how well structures produced at intermediate evaluation time steps by each R-tree match the S-tree. Evolution continues until an R-tree forms that produces a number of isolated structures that match the original S-tree encoding, that is, that are replicas of the seed structure and that have surrounding quiescent cells. We base crossover of R-trees on a nonstandard schema theorem [23]. Homologous one-point R-tree crossover swaps the subtrees with crossover points selected only in the common upper part of the trees. This means that until a common upper structure emerges, R-tree crossover is effectively searching a much smaller space, and therefore the algorithm quickly converges toward a common (and good) upper part of the tree, which cannot be modified again without the mutation operator. The R-tree muta- tion operator simply picks an edge from the entire tree with uniform probability, and then eliminates the subtree below the selected edge. Unlike standard GP, a replacement subtree is not generated immediately, but only on an as-needed basis subsequently during CA simulation, as described below. To evaluate the fitness of each R-tree in an evolving population, one needs to simulate the R-tree in the CA space in order to measure how well it produces self-replication. The range of the time steps during which a simulation is performed is referred to as the simulation time steps (Ts), such as Ts = (1,2,...,12).ItisundesirableforTs to be either too small or too large. If it is too small, it may be insufficient for capturing the self-replication phenomenon. If it is too large, it may lead to a significant decrease in evolution efficiency due to oversize R-trees with many random rules. The question then is how to determine an appropriate Ts. Our strategy is to let the simulation start with a small number of time steps, and thus the evolution and searching for an optimal R-tree runs fast initially. Only when evolution has made sufficient progress is it allowed to adaptively add more simulation time steps and progressively improve R-trees. This keeps the R-trees parsimonious, avoids the bloating problem that frequently occurs with GP, and maintains an effective evolutionary search for optimal rules in a paced fashion. In the beginning of an evolutionary run, a population of R-trees is initialized, with each having only one default branch. Therefore, unlike with conventional GP, initially every R-tree in the population is identical and very small, each only containing one trivial rule (a quiescent cell surrounded by quiescent cells stays quiescent). Before a simulation starts (t = 0), every cell in the entire CA space is quiescent, except for those cells containing the active components of a single seed structure. At each subsequent time step t, each cell attempts to change its state for the next time step by identifying and firing a specific rule in the R-tree based on the states of its neighbors. If such a rule is not found within the current R-tree, a new rule is inserted into the R-tree with its target state (the leaf node) randomly generated. This operation is referred as R-tree expansion. On the other hand, at the end of the current simulation, those branches in the R-tree that represent a rule or rules that were never fired by any cells at any time step are removed. This operation is referred to as R-tree pruning, and it provides a type of parsimony pressure on the evolutionary process. The purpose of an R-tree’s simulation is to evaluate its fitness in terms of producing duplicated seed structures. However, since every R-tree in the initial population is randomly constructed during the first generation’s expansion process, it is extremely unlikely that any of them will directly lead to self-replication. Thus, a critical and very difficult issue in this context for any evolutionary algorithm is to find a fitness function that can identify more promising candidates and discard less promising

Artificial Life Volume 16, Number 1 43

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

Figure 2. Each cell is probed by an S-tree to evaluate a potential match in four different orientations. Such probing can be done with any cell in the configuration.

candidates, that works with arbitrary structures, and that produces precise fitness measures that reflect the subtle differences leading to future self-replication. The introduction of the S-tree as an encoding mechanism of arbitrary structures gives GP the ability to perform precise fitness assignment for full or partial matching structures at any GP stage. This is because S-tree encoding can retrieve complete structural information about the target replicant and compare it with the configurations produced by R-tree simulation, determining how well they are matched. Figure 2 illustrates conceptually how this is done. From a given configuration produced at any time step during a simulation guided by a candidate R-tree, each non-quiescent cell is selected as a potential root cell (such as the shaded one in Figure 2). The state of every component in the target structure (S-tree) is compared with the actual state of the corresponding cell in the current configuration, and the total number of components that match is counted. Dividing this result by the total number of components in the structure, we get a scalar measure in the range [0, 1], indicating a complete mis- match, a partial match, or a perfect match. This S-tree probing is used to test every cell in a given con- figuration, and to measure how much a structure can be matched if we align the structure with that cell. Each cell in the current configuration is probed from four different orientations with the same S-tree, as illustrated in Figure 2. Further, since an S-tree contains not only the active components from a structure, but also the immediate quiescent cells surrounding the active cells, an actual probing also traverses those surrounding cells, and can determine how many of them also match the states of corresponding nodes in the S-tree (which are all quiescent). This measures whether the currently probed structure is completely non-isolated, partially isolated, or fully isolated from surrounding components. We adopted the standard that ultimately the last is required for self-replication to have occurred. More specifically, at any time step during a simulation with a candidate R-tree, every possible non- quiescent root cell and every possible orientation with the given S-tree is probed, returning the best matching result. Let r represent a simulated (evaluated) R-tree, s an S-tree for a given structure, E(s) the number of nodes of s, p an infinite cellular space, c a p a root cell being probed, ha (1,2,3,4) the orientation of the current probe, and t a time step applying r starting with structure s in p. Define function n(r,s,p,t,c,h) to be, after applying r for t time steps and then probing s from c with orientation h, the number of traversed cells that match the state of the corresponding node (active or quiescent) as guided by s. We can define a probing function as follows:

jðr; s; p; t; c; hÞ fjðr; s; p; tÞ¼max max : ð1Þ cap hað1;2;3;4Þ EðsÞ

44 Artificial Life Volume 16, Number 1

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

Since, our goal is not just to produce one instance of the seed structure but to allow self-replication to carry on sustainably, ultimately we will want to reward those R-trees that are more likely to generate a maximum number of replicas. Thus, to compute the fitness of an R-tree, we use the probing function above repeatedly to identify multiple best matches. However, such an approach raises a difficult question: How many probes should we accept (i.e., count) at each time step? One might ask, why not accept as many probes as possible? The answer is that, if we accept too many probes at a given time, it may have the effect of promoting the formation of many partially matching structures, few of which would have enough room and potential to grow into full replicas, and this ultimately degrades the performance of evolution. This problem is referred as overprobing. To address it, when evolution starts, each R-tree is automatically allowed to accept the best two probes at each time step. After some R-trees become more and more successful at generating two perfect probes (likely with a higher number of simulation time steps), we can allow these R-trees to incrementally increase the number of accepted probes (and so become more aggressive in working on additional replicas). Denote the number of accepted probes (those that are selected to determine fitness) at evaluation time t a T s for R-tree r as kt(r). We adopt a strategy in which an evolving R-tree starts with a basic goal of programming itself to find a minimum CA space just to produce two isolated seed structures. Not until this is achieved can it accept more probes. On the other hand, once this goal is achieved, it gradually raises its goal by adjusting the number of acceptable probes (kt ) at a controlled and adaptive pace. Based on the above, an R-tree r at evaluation time t is allowed to accept kt(r) probes. Each accepted probe identifies a best probe from the cells not yet marked as unavailable by previously accepted probes. The overall fitness function fR for R-tree r is then

X n¼Xpt ðrÞ jðr; s; p; t; c; hÞ fRðrÞ¼ max max ; ð2Þ cap; c Ia m¼n1p hað1;2;3;4Þ EðsÞ taT s n¼1 [m¼1 m

where pm is the collection of cells no longer available due to previous accepted probes. Equation 2 indicates the overall fitness measure for a candidate R-tree at a given GP generation as its accu- mulated result of every accepted probe at every evaluation time step. Every accepted probe finds a best probe among the tested probes at every location and every orientation. The number of evalu- ated time steps at a given generation, T s , is common to all R-trees in the population, but at any specific evaluation time step, the allowable number of accepted probes varies from R-tree to R-tree. An R-tree can gain higher fitness by earning a higher number of accepted probes, by having better individual accepted probes, or both.

2.3 The Replicator Factory and Experimental Methods To summarize, the resulting S-tree–R-tree GP model for producing self-replicating structures works as follows. First, an S-tree s is automatically derived from the prespecified seed structure. Then, an R-tree population of size M is initialized. Each R-tree is simulated in the given cellular space within the current time window Ts, during which each R-tree may potentially expand or prune itself as needed. Next, based on the simulation results, fitness is measured for each R-tree. If the desired fitness level is reached, the algorithm has produced a satisfactory R-tree and stops. Other facets of the evolutionary process are as follows. The fitness values are adjusted using fitness sharing. R-tree elitism is performed so that it is ensured that the best R-tree in a new population will be at least as good as before. The entire population is fully ordered based on the final fitness value of each R-tree. Tournament selection is performed, and M/2 pairs of parents are selected. Each pair may perform an R-tree crossover before entering the mating pool, and each R-tree in the mating pool may be further mutated. If the number of generations that have passed since the last increase in fitness has exceeded

Artificial Life Volume 16, Number 1 45

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

a specified threshold, Ts is incrementally increased. Then, the R-tree population enters a new GP generation, and the same process repeats. In the following we apply the replicator factory in a number of experiments. Typically, the following model parameters are used: population size = 100, R-tree mutation probability = 0.45, R-tree crossover probability = 0.85, R-tree GP tournament size = 2, and (threshold for increasing Ts) = 200. Unless explicitly stated otherwise, all evolutionary runs described below use the Moore neighborhood in R-trees [unlike Figure 1(c)]. To make it easier to visualize the produced structures at each evaluation time step, shading is used in subsequent figures. Also, to clearly illustrate relative location information, the edges of cells containing the initial seed structure are highlighted by thickened edges at any time step.

3 Results

In this section we first describe the characteristics of the basic replicators that are produced by evolv- ing R-trees for fairly arbitrary structures. These self-replicating structures are substantially different from previous manually designed ones (universal constructors and self-replicating loops). They can be viewed as rediscovering the class of instructionless replicators as described in the introduction, yet differ from the latter in the following ways. First, the transitional rules discovered by the replicator factory that support a given structure to self-replicate are non-totalistic, but regular local rules as in the case of the first and second classes. Second, several of their features (rotation, two-step replication for structures having n-state cells, etc.) are beyond the powers of the third class. Further, we show for the first time that by using multiple S-trees simultaneously, this type of replicators can be extended to carry out a simple task, construction of secondary structures, as they replicate. Finally, we demonstrate that it is also possible to simultaneously evolve both a seed structure and its rule set by allowing the S-tree to evolve along with the R-tree. In the following subsections, we provide more detailed analysis of their properties and such task-performing and coevolutionary capabilities.

3.1 Properties of the New Class of Replicators In general, we found that, given an arbitrary seed structure of up to 56 contiguous, non-quiescent components (which is as large as we have examined), the approach we outlined above successfully generates a set of rules that cause the given structure to replicate. The replicators created in this fashion are qualitatively different from universal constructors and self-replicating loops. They generally du- plicate themselves very quickly: Many of the larger replicators produced manage to self-replicate within only a few time steps, a speed that has generally not been achieved with past manually designed replicator models of corresponding sizes. Almost universally, a fissionlike strategy is used in the self- replication process that is discovered by genetic programming. As noted earlier, past manually designed universal constructors and self-replicating loops have been based on a sequential, step-by-step process in which a series of instructions control a construction arm that creates an initially passive child structure. In contrast, the models created by GP are based on highly parallel computations involving many components in both the parent and child structures simultaneously, allowing the replicants to reach more of the surrounding space in fewer time steps. Figure 3(a) illustrates an example of the kind of replicators typically produced by our approach. Here we use a small initial seed structure having six oriented components and 25 (6 4+1= 25) cell states to demonstrate several typical aspects of the replicants, as follows. First, replicating structures discovered by the replicator factory usually involve a fission process in which a structure literally splits into two copies that move away from each other as occurs here. This is very different from past manually designed instruction-based replicators in which the parent and children replicants remain in fixed locations. One can no longer claim that one of the two copies produced is the parent and one the child as with these past manual designs. Second, the steps in the replication process occur in parallel as a kind of grow-and-divide process that is more reminiscent of biological cell division (mitosis) than it is of past universal constructors or self-replicating loops that use a construction arm to build a second copy of themselves. As a result of the parallel operations, replication is very fast.

46 Artificial Life Volume 16, Number 1

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

Figure 3. (a) An example of evolved self-replication, using a seed structure having six oriented components, from time t = 0tot = 6. For illustrative purposes, non-isolated replicants are shaded light gray and isolated replicants dark gray. Also, to provide relative location information, the cells of the initial seed structure are highlighted by thicker edges at every time step. This and all subsequent replicators use rules based on the Moore neighborhood. (b) The same seed structure as in (a), but now the evolved rules cause replication to occur in a different way, as illustrated here.

Third, the replicator factory is capable of discovering multiple R-trees that reflect different replication strategies for the same seed structure. For example, Figure 3(b) illustrates a second approach for rep- lication of the same seed structure as in Figure 3(a). It illustrates similar properties, except that now the structures also continually rotate as they simultaneously translate and replicate. To our knowledge this is the first time that two different mechanisms for self-replication of the same CA seed structure have ever been described. This second approach to replication is faster than the first; for example, it produces eight replicas by time t = 5, in contrast to four by time t = 6 with the initial approach. Finally, often after a number of replicas have formed, there exist in the CA space extraneous components that do not belong to any replica. We refer to these extraneous components as debris; examples can been seen in Figure 3(b) (t = 2tot = 5). To our knowledge, individuals manually designing self-replicators in CA spaces have generally not tolerated the emergence of persisting debris. This debris that occurs in solutions found by the replicator factory is typically transitory, but in some cases may persist. However, we have found that the replicator factory is usually capable of completely cleaning up the debris by further refining the R-tree after the desired replication is achieved, and so we do not consider it further in this article.1 The ability of the replicator factory to rapidly (relative to manual design) generate rule sets for self-replication makes it possible to undertake controlled computational experiments where one systematically varies a single aspect of a seed structure while holding other features constant. This allows one to explore how variations of the seed structure influence the rules produced. For example, a basic question is how the size of the seed structure affects replication. We examined this using a set of 16 structures with seven identical allowable states and similar rectangular shapes, but with the size gradually varying from 4 to 56 components (Figure 4). The replicator factory discovered a rule set (R-tree) for each that enables it to self-replicate with no debris within four time steps or less. An example is shown in Figure 5 for a seed structure that has 56 components, and its fissionlike replication process is typical of those produced in general. The seed structure splits into two isolated replicators where both descendant structures translate in opposite directions at maximum speed (one

1 The replicators described in this article can all be produced in an isotropic CA space. However, we found that debris cannot always be completely removed unless one gives up isotropy, so we relax this requirement in the remaining example replicators presented in this section for illustrative purposes.

Artificial Life Volume 16, Number 1 47

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

cell each time step), and hence reach enough CA space to form and isolate the replicators in a mini- mum of time. In general, over the 16 structures examined, as the seed size increased there was a tendency for the GP to take longer to discover rules leading to replication (in some cases requiring more than 3000 generations) and for the R-tree to becomes larger, although these relations were imperfect. The Pearson correlation coefficient relating seed size and number of generations needed to produce the first replicants was 0.62, while that relating seed size and the corresponding R-tree size was 0.58. Another basic question is how the number of allowable cell states influences the replication process. To address this issue, we examined a set of seed structures, of which examples are shown in Figure 6(a), where the number of allowable states was varied from 6 to 15 while keeping the seed’s size and shape constant. Each structure contains exactly one oriented component, all in the upper left position, and a different number of non-oriented components corresponding to the total number of allowable states. The replicator factory successfully and consistently discovered a rule set (R-tree) for every one of these structures that enables it to self-replicate with no debris within three time steps. Correlation and regression analysis based on these experiments suggests the number of time steps required for a structure to self-replicate, whether it is a square or a loop, is not dependent on the number of allowable states. However, the number of states does have a strong positive influence on the evolution cost and size of a rule set required to enable self-replication. The correlation coefficient between the number of states and the number of generations required to produce the first pair of replicants was 0.77, and that between the number of states and the size of the corresponding R-tree was 0.53. As noted above, GP generally produced replicants via a parallel fission process in which the child structures move away from each other. This raises the question whether GP, when given a seed structure similar to self-replicating loops used in past manual designs, would discover sequential rep- lication processes using the construction arm that are similar to those created by people in the past [25] or would still produce replication via fission. To examine this question, we used a set of seven looplike structures, examples of which are shown in Figure 6(b), which have identical size and shape, but with their allowable number of states varying from 7 to 13. Each structure contains two adjacent oriented components, followed by, repeatedly, a different number of non-oriented components cor- responding to the total number of allowable states. Given these looplike structures, the replicator fac- tory successfully discovered a rule set (R-tree) that enables each of these structures to self-replicate. However, we found that this serial construction-arm strategy used in past studies was never adopted by the replicator factory for any of these looplike structures. Instead, the replicator factory always evolved a fission strategy first, where self-replication takes place in a more parallel and distributed fashion, as with non-loop structures. As in the example shown in Figure 7, GP-generated loops rep- licated quickly (within only two time steps here) by growing, dividing, moving, and rotating. Another basic question is how the shape of the seed structure influences replication. To examine this, and to verify that our approach can also handle structures of arbitrary shape, we tested another set of 14 structures, shown in Figure 8, that all had an identical size and number of allowable states, but with their shapes randomly altered. Again, the replicator factory discovered an R-tree for each of these seed structures that enables the structure to self-replicate with no debris within two to three time steps, an example being shown in Figure 9. The splitting replicators move at maximum speed in

Figure 4. Examples of sixteen structures, which vary from 4 to 56 components. With the rules evolved by the replicator factory, each of these structures replicates itself within four or fewer time steps.

48 Artificial Life Volume 16, Number 1

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

Figure 5. The replicator factory evolved an R-tree causing this 7 8 structure (t = 0) to replicate in just four time steps. The seed structure splits into two isolated replicators as both of the replicas translate diagonally in opposite directions, moving one cell each time step.

opposite directions and are isolated in only two time steps. Further, the replicators in this case have aligned themselves in a way such that they occupy a minimal cellular space but nevertheless are fully isolated from each other. Finally, it is possible that the presence of repeated substructures or symmetry in the seed structure may also affect the performance of replication and evolution. We used the replication factory to take

Artificial Life Volume 16, Number 1 49

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

Figure 6. Examples of two sets of replicators: The first set (a) has a fixed size of 25 components and a fixed square shape, but gradually increasing numbers of allowable cell states (from 6 to 15). The second set (b) has a fixed size of 10 and a fixed loop shape, but a gradually increasing number of allowable cell states (from 7 to 13).

some initial steps in examining these possibilities. Figure 10 shows four pairs of structures that are pairwise identical except that only the first structure contains repeated substructures. The replicator factory successfully evolved an R-tree for each structure to self-replicate without debris within two to four time steps, the self-replication process of the first pair of structures shown in Figure 11 being an example. Typically, more rules were required when the structure did not have repeated substructures. We have also tested with four types of symmetries: reflection, rotation, translation, and glide reflection. For example, each structure in Figure 12 has exactly the same substructures, size, and states, but with various types of symmetry (translation symmetry is actually the same as repeated substructures). The replicator factory successfully discovered R-trees for each of these structures that enable them to self- duplicate without debris within two to three time steps. The self-replication of two of the structures, one of reflection and the other of glide-reflection symmetry, is illustrated in Figure 13. There was no striking difference in the number of rules required with any of the types of symmetry, but it did take more generations on average to produce the rules with glide reflection (2,289 generations on average) than with the other types (275 generations).

3.2 Self-Replication with Secondary Task Performance Most past cellular automata models of self-replicating structures, including those we derived using GP as above, have generally done only one thing: replicate themselves. However, it has occasionally

Figure 7. An example loop-structure seed (t = 0) for which GP generated rules for replication. Self-replication occurs quickly (within only two time steps) as the structure grows and then divides. This result, which is typical for all of the loop structures we examined, is much faster than that of manually designed self-replicating loops and occurs without using the components on the loop as circulating instructions that cause a stationary parent’s construction arm to extend and form a second child loop.

50 Artificial Life Volume 16, Number 1

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

Figure 8. Fourteen structures having identical size and allowable states, but varying shapes. Rules to make each self- replicate were produced by the replicator factory.

been demonstrated that manually designed replicators can be formulated to also carry out a task during the replication process. Typically this involves creating additional structures in the cellular space, some- thing that is motivated in part by the idea that self-replication may ultimately have applications in manufacturing at the nano scale [7]. For example, it is possible to add instructions to self-replicating loops, giving them the added capability of writing out extra structures in the CA space as they repli- cate, say, the letters ‘‘LSL,’’ the acronym for ‘‘Logic Systems Laboratory’’ [31]. Other work has shown how self-replicating loops can be used to solve simple satisfiability problems [5], and that a simple yet universal language can be embedded into self-replicating loops, enabling them to carry out secondary tasks in general [21]. In these past models, 13 such task-performing capabilities depend on instruction sequences that are prewritten manually and that require a specific loop or tape structure in which to embed them. Self-replication is carried out in a sequential construction process with the construction

Figure 9. A representative example replication process for one of the 14 structures of Figure 8. The splitting replicators move at maximum speed in opposite directions and are isolated within only two time steps.

Artificial Life Volume 16, Number 1 51

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

Figure 10. Four pairs of replicators (a–b; c–d; e–f; g–h). Structures in each pair are identical except that only the first structure contains repeated substructures.

arm, and task performance is done in the same fashion, despite the highly parallel nature of the under- lying CA computations. In this subsection, we demonstrate that evolutionary methods can also discover rule sets for our new class of replicators that support secondary structure construction at the same time as replication occurs. In our model, both the secondary structure to be constructed and the seed structure can be quite arbitrary. When the state transition rules that are evolved by GP are executed in parallel in each cell, they perform the given secondary task in a parallel and distributed fashion different from that

Figure 11. Self-replication of the first pair of replicators in Figure 10. The upper left seed has repeated substructures. The lower left seed does not.

52 Artificial Life Volume 16, Number 1

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

Figure 12. Sample seeds with reflection (first row), rotation (second row), translation (third row), and glide-reflection (last row) symmetry. These structures otherwise have exactly the same substructures, size, and number of allowable states.

used in past self-replicating loops, and thus, as with self-replication, often have high processing speeds due to the parallelism involved. We address the problem of secondary structure construction by having R-tree evolution occur in the presence of two S-trees. In other words, a secondary target structure to be constructed can be viewed as an arbitrary configuration of active cells in the CA space and thus can also be encoded by an S-tree. The replicator factory is modified to evolve an R-tree that produces a sustainable number of replicating and secondary structures, guided by two S-trees at the same time, and using two types of probing. One type of probing uses the seed S-tree to detect configurations matching the seed structure, and the other uses the target S-tree to detect configurations that match the secondary target structure. However, this raises a number of problems. First, overprobing can now happen not only by accepting too many seed probes, but also by accepting too many target probes. Second, if a relatively high number of accepted seed probes dominates the limited space, this can prevent the secondary target structure from being formed, or vice versa. Third, the order of seed probing versus target probing becomes important, because any accepted probes will mark the traversed active cells as unavailable to subsequent probes, and thus potentially affect their results. Our approach to these problems is to use an adaptive algorithm where evolving an R-tree starts with the basic goal of finding a set of rules that produce two isolated seed structures and one target structure in a minimal CA space. Until this initial goal is achieved, neither more seed probes nor more target probes are accepted. On the other hand, once this goal is achieved, it is extended by gradually adjusting the number of acceptable seed probes and target probes as before. Whenever these numbers are adjusted, the new numbers have to be realized by finding perfectly matched structures as evaluated

Artificial Life Volume 16, Number 1 53

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

by the seed or target S-tree, before they can be adjusted again. Therefore, the better an R-tree has performed in the previous evaluation time steps, the faster it can accept more seed and target probes and produce more sustained time steps. In addition, the order of seed probing versus target probing needs to be determined, so as to avoid introducing a bias in the evolutionary process favoring either reproduction or target construc- tion. We adopted a strategy where seed probing and target probing are conducted and accepted in an intermixed and interlaced fashion, so that a more promising probe, whether it is a seed probe or a target probe, can always be accepted before a less promising one, given the fixed number of accept- able seed probes and target probes at a given time step. Therefore, for each R-tree at each evolution step, the replicator factory probes each location and orientation in the CA space both as a seed probe and as a target probe. After each available cell and orientation has been probed, the stored best probe, either as a seed probe or as a target probe, is accepted, and the traversed active cells are marked as unavailable for subsequent probings. This is repeated until there are no more acceptable seed or target probes. In the interlaced order dynamically determined in such a fashion, a less promising probe is never accepted in favor of a more promising one, regardless of its type. Further, the evolutionary process is now more complex in that it involves multiple, simultaneous objectives. We need to define appropriate fitness functions to encourage the R-tree evolution to discover

Figure 13. Self-replication of two of the structures in Figure 12. The structure on the left has reflection symmetry, while the one on right has glide-reflection symmetry.

54 Artificial Life Volume 16, Number 1

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

novel strategies that support (1) self-replication, (2) concurrent task performance, and (3) do so in a sustainable persistent fashion. Since multiple objectives and fitness functions are now involved, the R-tree population may no longer be fully ordered. This is handled by basing GP selection on three fitness measures, as follows. For the first objective of self-replication, the same fitness measure fR(r)for R-tree r as before is used (Equation 2). For the second task-performance fitness measure, a completely analogous fitness measure fT(r) is used, based on the accumulated result of every accepted target probe during every evaluation time step. Finally, for the third objective of sustainability of replicants and target structures, fS(r) is taken to be the number of time steps among the evaluated ones (the latter being the same for every R-tree in a population) during which at least two perfectly matching seed probes and at least one perfectly matching secondary target structure are present. Given the above, the fitness of any R-tree is now represented as a vector [ fR, fS, fT ], so selection for reproduction during evolution is no longer straightforward. One approach would be adopting a multi-objective evolutionary approach in this context. However, for simplicity and to minimize changes to the replicator factory, we adopted a simpler approach that borrows many concepts from multi-objective evolutionary optimization yet still retains a single overall fitness value that is used during selection, as follows. An R-tree is said to be more fit overall, or to dominate another R-tree, if it is not worse in any objective and is better in at least one objective. We created a domination measure based on this concept. A domination measure of zero means that an R-tree is not dominated by any other R-trees (it is potentially a good choice to select for replication), while a high domination measure means an R-tree is dominated by many other R-trees, which themselves dominate still other R-trees. In addition, to promote R-trees with greater diversity, we introduce a second distribution measure inspired by the fitness assignment methods used in the multi-objective optimization evo- lutionary system SPEA2 [38]. This distribution measure is inversely proportional to the density of neighboring peers, which can be estimated with the distance to the kth closest neighbor of each R-tree, that is, it is based on calculating the Euclidean distance between their vectors [ fR, fS, fT ]. Finally, athirdparsimony measure is introduced to keep each R-tree as simple as possible. The parsimony measure is defined as proportional to the number of nodes in the R-tree. Since it is desirable to minimize each of these three measures, the overall fitness of an R-tree is taken to vary inversely with the sum of the domination, distribution, and parsimony measures. The detailed calculations used in this process can be found in [20]. We tested the above approach with various combinations of seed structures and target struc- tures. We found that the resulting R-tree evolution process often discovered intriguing strategies that not only enable self-replication, but also perform simple secondary tasks. Both the seed and target structures can be fairly arbitrary, with the complexity of the extra structure being allowed to significantly exceed that of the seed structure. We present two representative examples. The first example uses prespecified six-component target and four-component seed structures as shown in Figure 14(a,b) in a CA space allowing 29 cell states. Each component in the seed and target structures is oriented, and each cell in the cellular space can be either the quiescent state, or one of the seven oriented components (hence there are altogether 7 4+1= 29 states/cell). For illustrative purposes, the seed structure in this example is chosen to spell the word ‘‘seed,’’ and the larger target structure is chosen to spell the word ‘‘target.’’ The evolved R-tree directed the given seed structure to self-replicate and construct the given secondary target structure at the same time, starting from an initial state in which only the seed structure was present. The execution results from t = 11 to t = 14 are shown in Figure 14(c). The cells covered by the initial seed structure are highlighted by widened edges at each time step. Here a wall of replicating seed structures progressively moves to the left and expands, leaving behind a population of secondary target structures that rotate where they were deposited. By time t = 14, there are eight seed replicas near the left edge and nine target structures organized in the triangular space behind the edge. The second example consists of a bigger and more complex target structure and a different, smaller seed structure [Figure 15(a,b)] in a seven-state CA space. The new target structure [Figure 15(a)] has 18 components that represent ‘‘UM,’’ the acronym for University of Maryland. The replicator fac- tory finds an R-tree that permits the seed structure to both replicate and construct ‘‘UM’’ in a clear

Artificial Life Volume 16, Number 1 55

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

Figure 14. A desired target structure (a) that is to be repeatedly constructed by a replicating seed structure (b), as the seed structure replicates. Using an evolved R-tree and starting from an initial state containing just the seed structure (its initial location indicated by the four cells with thick boundaries), both replication and secondary structure construction occur, as shown here for time steps t =11tot = 14 (c). The seed structures move to the left as they replicate, while the secondary target structures they deposit rotate.

Figure 15. (a) The target structure, prespecified as an 18-component, seven-state weakly rotationally symmetric structure that represents ‘‘UM’’, the acronym for University of Maryland. (b) The seed structure, prespecified as a three- component, seven-state weakly rotationally symmetric structure. (c) The replicating seeds move to the left and upward in this case, depositing a progressively increasing number of secondary structures over time.

56 Artificial Life Volume 16, Number 1

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

and consistent pattern, as illustrated in Figure 15(c). We were unsuccessful in scaling this result up to the three letters ‘‘UMD’’ in spite of long evolutionary runs.

3.3 Self-Replication with Structure and Rule Coevolution So far we have shown that the evolutionary approach used in the replicator factory is capable of discovering novel strategies that enable both self-replication and the simultaneous construction of secondary structures, starting from just a given seed structure. In such cases, both the target structure and the seed structure are known before evolution starts, and only R-trees are evolved, guided by static S-trees generated from the seed and target structure. Now we ask if it is possible to only specify a task to be performed, and let the replicator factory search for a suitable seed structure and associated rules that collectively and cooperatively provide a strategy for both replication and performing the given task. This appears to be particularly daunting in that the seed structure itself is now evolving, so the state transitional rules of the emerging R-tree have to coevolve with the secondary structure in a coordinated fashion. Structure evolution can be implemented by evolving S-trees, owing to the S-tree’s property that the encoding and decoding between a structure and an S-tree are unambiguous in both directions. To generate an initial population of seed structures to start evolution, we use a random structure initialization algorithm, which takes two configurable control parameters, the initial seed size r and density control U. First, an empty center cell is selected in the cellular automata space, and an active state for that cell is randomly selected from among legal states with uniform probability. This cell now becomes the root of the S-tree. Next, starting from the root, each active cell attempts to expand the structure by recursively assigning states to its Moore neighbors, controlled by U, until the structure size reaches r. This is done as follows. First, the root cell attempts to assign a state to each of its eight Moore neighbors. Either a quiescent state or an active state can be selected, with the probability of selecting an active state being U. If an active state is selected, it is picked with uniform probability among all active states. In either case, each of these neighbor cells is marked as assigned. Next, in the order in which they were assigned, each of the active Moore neighbors, in turn, recursively attempts to assign states to its own Moore neighbors. It is possible that some of their Moore neighbors are already marked assigned, in which case they are left untouched. This recursive construction process continues until the initial structure size r is reached. In general, a higher U results in a more compact shape. Once the random seed structure is complete, an S-tree can be immediately derived, exactly as before. Just as with R-trees, existing GP operators can readily be applied to S-trees. One-point crossover between two S-trees is equivalent to exchanging substructures between two structures, and one-point mutation is equivalent to local modification to the structure. Figure 16 shows an example of structure evolution using a one-point GP crossover operator. As illustrated here, a subtle point is that with S-tree crossover, a substructure is not always exactly copied over to the new structure: It may be reorganized as its new neighborhood requires. Here Structure1new is much larger and more complex than either Structure1 or Structure2, and at the same time Structure2new becomes a smaller structure containing only two components. As with R-tree crossover operations, the more restrictive homologous one-point crossover genetic operator is used. The essential idea is almost the same as with R-tree homologous crossover, except the definition of ‘‘mismatch’’ is different here. In S-tree homologous one-point cross- over, two nodes (each from a different parent) are not a ‘‘mismatch’’ as long as both are either quiescent or active (meaning the states themselves do not have to exactly match). The S-tree homologous one- point crossover operator is less restrictive than the R-tree one, so that structural crossover can take place on two components as long as the topologies of their Moore neighbors match each other. This was done because otherwise it is very difficult to get the structural evolutionary process started, as there are so few matches between the initial random structures. The offspring S-trees resulting from crossover or mutation operations may contain nodes that have more, or less, child nodes than needed to reconstruct their Moore neighbors. This is similar to the R-tree evolution when a missing-rule or unused-rule scenario occurs and the R-tree is allowed to repair itself by generating new rules as needed or pruning obsolete rules. The same operations can be done for S-tree evolution. If any node in an offspring S-tree is found missing the child nodes needed

Artificial Life Volume 16, Number 1 57

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

Figure 16. An example of the effects on the corresponding structures of one-point S-tree crossover. A crossover takes place between S-tree1 (a, left) and S-tree2 (a, right) with the indicated subtrees exchanged. Correspondingly, Structure1 (b, left) and Structure2 (c, left) are now replaced by Structure1new (b, right) and Structure2new (c, right), respectively. The substructures being exchanged are circled for illustrative purposes. The resulting new S-trees after the crossover operation now represent the new structures.

to fully reconstruct its Moore neighborhood, a new random child node can be added to repair the S-tree. Likewise, if any node is found containing more child nodes than it needs, the extraneous child nodes can be pruned. When only the secondary task to be performed is specified with an encoded target structure, and a replicator is to be discovered for performing the given task, then the replicator factory must produce an evolved pair consisting of both a seed S-tree and an R-tree. The S-tree part can be viewed as the structural aspect of the replicator, and the R-tree part can be viewed as the state-transitional aspect of the replicator. Therefore, here both trees are initially randomly generated, then subjected to coevolution with separate genetic operators. Once the replicator is simulated with a given number of time steps, when the evolving seed S-tree is applied against the evolving R-tree, its fitness measures can be cal- culated exactly the same way as before. A domination measure, distribution measure, and parsimony measure can also be derived as before. Tournament selection based on these criteria drives the rep- licator population to evolve toward diversified strategies that reflect different structure-rule combina- tions that perform the given task. Using this approach, and given only an initial target structure, we found that it is indeed possible for the replicator factory to discover replicators with unspecified seed structures. Diverse structure- rule combinations can be found concurrently, with each reflecting a different strategy with which to perform the same given task. Here, we first present two examples where the random seeds are initialized with size r = 5 and seed density U = 0.8. In the first example, a six-component, 29-state target structure is prespecified, exactly the same as in the earlier example of Figure 14(a), but now no seed structure is prespecified. Structure-rule coevolution produced a set of replicators with both the

58 Artificial Life Volume 16, Number 1

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

Figure 17. Coevolution example 1: executing an automatically programmed R-tree against the automatically discovered seed structure at t =6,t = 7, and t = 14. Same target structure as in Figure 15.

S-tree and R-tree evolved simultaneously. These replicators were typically simple structures and produced the target structure as the seed structure replicates. For example, in Figure 17 it can be seen that the evolutionary process has discovered a strategy in which increasing numbers of seed structures are replicated as they move upward, while depositing target structures that in turn move downward. The target structures appear to be organized in a nested triangular pattern reminiscent of similar patterns demonstrated previously in simpler CA spaces [37]. By time t = 14 (Figure 17), there are eight seed replicas formed at the top and 19 target structures have been deposited below them. Figure 18(a) illustrates a second example with the same target structure but a different out- come. Here, evolution has discovered a very different and interesting spiral trail strategy to perform the same task. In this scenario, after the first isolated pair of seed and target structures are formed at the center, they keep spinning and seemingly emit and launch endless material forming a spiral curve. There are an equal and increasing number of seed and target structures, all lined up on this

Figure 18. Two additional coevolution examples. Both the seed structures and the rules were automatically programmed in each example: (a) the same target structure as Figure 14; (b) part of the target structure in Figure 15. Random seeds initialized with size r = 5 and seed density U = 0.8.

Artificial Life Volume 16, Number 1 59

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

Figure 19. Coevolution example 4: the replicator factory finds a seed (shown at t = 0), which replicates and constructs an extra structure that spells the acronym for University of Maryland at t = 4, when executed with the associated rules also automatically programmed by the replicator factory.

spiral trail. Note that at t = 17, 21 seed replicas and 21 target structures can be found on the spiral trail. The spiral can extend indefinitely and produce any number of seed and target structures. A third example, in which the target structure is a part of that used earlier [Figure 15(a)] illustrates that similar results are often found with different target structures. From Figure 18(b), it can be seen that this larger target structure can also be constructed by the discovered replicator in a way that closely resembles the earlier results. The seed structure replicates as it moves upward, and target structures get deposited behind them and move downward. Compared to the earlier results where a given fixed seed structure was prespecified, here the freedom to evolve a seed structure that car- ries out the same task produces a much more efficient and cleaner result, without producing any persisting debris in the process. Further, it was also possible to evolve larger three-letter secondary structures such as ‘‘UMD’’ (see Figure 19) that we were unable to produce from fixed initial seed structures. Here, random seeds were initialized with size r = 15 and seed density U = 0.6. This suggests that the additional flexibility provided by being able to coevolve an appropriate seed structure along with the R-tree can be very important in some situations.

4 Discussion

During the past several decades, most studies attempting to create artificial self-replicating machines in cellular automata have focused on the difficult issues of manually creating state transition rules that produce global, self-replicating behavior when executed locally. This past work has been very successful but has yielded only a limited number of self-replicating structures. Many past CA rep- licators studied computationally have been either universal constructors or self-replicating loops. In both cases, self-replication takes place by extending a construction arm in a sequential fashion, despite the potentially highly parallel nature of computations that can be done in a cellular automata space. In this article, we have used genetic programming as an approach to discovering novel self- replicating models having fairly arbitrary structures and the ability to perform secondary tasks. By using trees as data structures (‘‘chromosomes’’) that represent both structural information and the rules forming the state transition functions, it was straightforward to apply fairly typical genetic programming. Evolutionary search was guided by a fitness function that gave partial credit for the

60 Artificial Life Volume 16, Number 1

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

occurrence of multiple copies of an original structure in the cellular space over time. We refer to the resulting evolutionary approach as the replicator factory model because it can automatically ‘‘manufacture’’ self-replicators when given an initial, fairly arbitrary structure. The self-replicating structures produced by GP were strikingly different from those found in past universal constructors and self-replicating loops. There is no identifiable instruction sequence and no construction arm. The structures replicated via a fission process in which highly parallel pro- cessing occurs. An initial structure would typically grow and then divide, making replication very fast. In addition, the structures involved generally moved and rotated over time, and one could not distinguish between parent and child. Based on these properties, the self-replicating structures described in this article are best viewed as new examples of the class of instructionless replicators described earlier. The evolved replicators extend Fredkin replicators described previously [28], but differ from them both in exhibiting strategies such as rotation during replication, and in being based on structure-specific rules rather than totalistic transition functions. In contrast to others (e.g., [13, 28]), we believe that such instructionless replicators are intriguing because, in some ways, their fissionlike replication process is reminiscent of the way that biological cells split during mitosis. Fur- ther, replication is very fast, even compared to small self-replicating loops described previously [25]. We further demonstrated that GP-produced instructionless replicators can also support the construc- tion of secondary structures as they replicate, either with or without a given initial seed structure, as has

Table 1. Number of rules for each sample replicator.

Replicator (Figure) Number of rules

3 (a) 29

3 (b) 85

5 128

761

966

11 (top) 67

11 (bottom) 70

13 (left) 58

13 (right) 83

14 112

15 1,559

17 56

18 (a) 56

18 (b) 80

19 1,223

Artificial Life Volume 16, Number 1 61

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

been done with self-replicating loops in the past. Both the task to be performed and the seed structure to be used can be quite arbitrary. Self-replication and secondary structure construction are carried out by the same set of state transition rules that are automatically evolved by GP. When these rules are executed in parallel in each cell, they often use a strategy that has not been manually created in the past, such as the moving-wall strategy [e.g., Figure 18(b)] in which a line of replicating structures deposits secondary structures behind it. Table 1 provides a summary of the number of rules produced by the replicator factory for the sample replicators we described. The first column of the table gives the figure number of each replicator, and the second column is the corresponding number of rules needed to produce the first replicant. For example, for the replicator in Figure 3(a), 29 rules were needed to produce replication. For the replicators in Figures 14–19, the number of rules includes those that also construct the secondary target structures. The rule sets for sample replicators are available online at http:// www.replicatorfactory.org/examples.php along with a simulator that can be used to run them. We conclude that genetic programming can be used as a very powerful creativity support tool in the discovery of novel self-replicating structures in cellular automata spaces. Among the many issues that might be examined in the future, it would be interesting to investigate the discovery of evolvability in replicators, and to explore the use of GP as a creative design tool for self-replicating systems other than those in CA spaces, including physical (kinematic) models [9] and electronic hardware directly supporting self-replication [9, 16].

Acknowledgments This research is supported by NSF award IIS-0753845.

References 1. Amoroso, S., & Cooper, G. (1971). Tesselation structures for reproduction of arbitrary patterns. Journal of Computer and System Sciences, 5, 455–464. 2. Bentley, P. (1999). An introduction to evolutionary design by computers. In P. Bentley (Ed.), Evolutionary design by computers (pp. 1–73). San Mateo, CA: Morgan Kaufmann. 3. Bentley, P., & Corne, D. (2002). Introduction to creative evolutionary systems. In P. Bentley & D. Corne (Eds.), Creative evolutionary systems (pp. 1–75). New York: Academic Press. 4. Buckley, W., & Mukherjee, A. (2005). Constructability of signal-crossing solutions in von Neumann 29-state cellular automata. In V. Sunderam et al. (Eds.), Lecture Notes in Computer Science, 3515, 395–403. 5. Chou, H., & Reggia, J. (1998). Problem solving during artificial selection of self-replicating loops. Physica D, 115, 293–312. 6. Codd, E. (1968). Cellular automata. New York: Academic Press. 7. Drexler, K. (1989). Biological and nanomechanical systems: Contrasts in evolutionary capacity. In C. Langton (Ed.), Artificial life (pp. 501–509). Reading, MA: Addison-Wesley. 8. Fredkin, E. (1990). Digital mechanics. Physica D, 45, 254–270. 9. Freitas, R., & Merkle, R. (2004). Kinematic self-replicating machines. Georgetown, TX: Landes Bioscience. 10. Haupt, R., & Wermer, D. (2007). Genetic algorithms in electromagnetics. New York: Wiley. 11. Hutton, T. (2002). Evolvable self-replicating molecules in an artificial chemistry. Artificial Life, 8, 341–356. 12. Hutton, T. (2004). A functional self-reproducing cell in a two-dimensional artificial chemistry. In J. Pollack et al. (Eds.), Proceedings of the Ninth International Conference on the Simulation and Synthesis of Living Systems (ALIFE9) (pp. 444–449). 13. Langton, C. (1984). Self-reproduction in cellular automata. Physica D, 10, 135–144. 14. Langton, C. (1986). Studying artificial life with cellular automata. Physica D, 22, 120–149. 15. Lohn, J., Linden, D., Hornby, G., Kraus, W., Rodriguez, A., & Seufert, S. (2004). Evolutionary design of an X-band antenna for NASA’s Space Technology 5 Mission. In Proceedings of the IEEE Antenna and Propagation Society International Symposium and USNC/URSI National Radio Science Meeting, Vol. 3 (pp. 2313–2316). 16. Mange, D., Goeke, M., Madon, D., Stauffer, A., Tempesti, G., & Durand, S. (1996). Embryonics. In Towards evolvable hardware (pp. 197–200). Berlin: Springer-Verlag.

62 Artificial Life Volume 16, Number 1

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Z. Pan and J. A. Reggia Computational Discovery of Instructionless Self-Replicating Structures

17. Mitra, S., & Kumar, S. (2005). Fractal replication in time-manipulated one-dimensional cellular automata. Complex Systems, 16(3). 18. Pan, Z., & Reggia, J. (2007). Properties of self-replicating cellular automata systems discovered using genetic programming. Advances in Complex Systems, 10, Suppl. 1, 61–84. 19. Pan, Z., & Reggia, J. (2006). Artificial evolution of arbitrary self-replicating structures. Journal of Cellular Automata, 1(2), 105–123. 20. Pan, Z. (2007). Artificial evolution of arbitrary self-replicating cellular automata (Technical report). Department of Computer Science, University of Maryland–College Park. http://hdl.handle.net/1903/ 7404. 21. Perrier, J., Sipper, M., & Zahnd, J. (1996). Toward a viable self-reproducing universal computer. Physica D, 97, 335–352. 22. Pesavento, U. (1995). An implementation of von Neumann’s self-reproducing machine. Artificial Life, 2, 337–354. 23. Poli, R., & Langdon, W. (1998). Schema theory for genetic programming with one-point crossover and point mutation. Evolutionary Computation, 6, 231–252. 24. Rebek, J. (1994). Synthetic self-replicating molecules. Scientific American, 271(1, July), 48–55. 25. Reggia, J., Armentrout, S., Chou, H., & Peng, Y. (1993). Simple systems that exhibit self-directed replication. Science, 259, 1282–1288. 26. Sayama, H. (1999). A new structurally dissolvable self-reproducing loop evolving in a simple cellular automata space. Artificial Life, 5, 343–365. 27. Sipper, M. (1998). Fifty years of research on self-reproduction: An overview. Artificial Life, 4, 237–257. 28. Schiff, J. (2008). Cellular automata. New York: Wiley-Interscience. 29. Stauffer, A., & Sipper, M. (2002). Interactive self-replicating, self-incrementing and self-decrementing loops. In R. K. Standish, M. A. Abbass, & M. A. Bedau (Eds.), Artificial Life VIII (pp. 53–56). Cambridge, MA: MIT Press. 30. Stauffer, A., & Sipper, M. (2002). An interactive self replicator implemented in hardware. Artificial Life, 8, 175–183. 31. Tempesti, G. (1995). A new self-reproducing capable of construction and computation. In F. Moran, A. Moreno, J. J. Merelo, & P. Chaco´n (Eds.), ECAL’95: Third European Conference on Artificial Life, Lecture Notes in Computer Science, 929, 555–563. Berlin: Springer-Verlag. 32. von Neumann, J. The general and logical theory of automata. A. H. Taub (Ed.), Collected Works, Vol 5, p. 288. 33. von Neumann, J. (1966). Theory of self-reproducing automata. Edited and completed by A. W. Burks. University of Illinois Press. 34. Waksman, A. (1969). A model of replication. Journal of the Association for Computing Machinery, 16, 178–188. 35. Winograd, T. (1970). A simple algorithm for self-replication (Artificial Intelligence Memo No. 197). Massachusetts Institute of Technology, Cambridge, MA. 36. Wolfram, S. (1983). Statistical mechanics of cellular automata. Reviews of Modern Physics, 55, 601–644. 37. Wolfram, S. (2002). A new kind of science. Champaign, IL: Wolfram Media. 38. Zitzler, E., Laumanns, M., & Thiele, L. (2001). SPEA2: Improving the strength Pareto evolutionary algorithm for multiobjective optimization. In EUROGEN 2001. Athens, Greece.

Artificial Life Volume 16, Number 1 63

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/artl.2009.16.1.16104 by guest on 01 October 2021