A Novel Generative Encoding for Evolving Modular, Regular and Scalable Networks

Marcin Suchorzewski Jeff Clune Artificial Intelligence Laboratory Creative Machines Laboratory West Pomeranian University of Technology Cornell University [email protected] [email protected]

ABSTRACT 1. INTRODUCTION In this paper we introduce the Developmental Symbolic En- Networks designed by humans and evolution have certain coding (DSE), a new generative encoding for evolving net- properties that are thought to enhance both their perfor- works (e.g. neural or boolean). DSE combines elements of mance and adaptability, such as modularity, regularity, hier- two powerful generative encodings, Cellular Encoding and archy, scalability, and being scale-free [12, 11, 1]. In this pa- HyperNEAT, in order to evolve networks that are modular, per we introduce a new generative representation called the regular, scale-free, and scalable. Generating networks with Developmental Symbolic Encoding that evolves networks these properties is important because they can enhance per- that possess these properties. We first review the mean- formance and evolvability. We test DSE’s ability to gener- ing of these terms before discussing them with respect to ate scale-free and modular networks by explicitly rewarding previous evolutionary algorithms. these properties and seeing whether evolution can produce Modularity is the encapsulation of function within mostly networks that possess them. We compare the networks DSE independently operating units, where the internal workings evolves to those of HyperNEAT. The results show that both of the module tend to only affect the rest of the structure encodings can produce scale-free networks, although DSE through the module’s inputs and outputs [12, 11]. For ex- performs slightly, but significantly, better on this objective. ample, a car wheel is a module because it is a separate func- DSE networks are far more modular than HyperNEAT net- tional unit and changing its design is unlikely to affect other works. Both encodings produce regular networks. We fur- car systems (e.g. the muffler). This functional independence ther demonstrate that individual DSE genomes during de- enables the wheel design to be optimized independently from velopment can scale up a network pattern to accommodate most other car modules. In networks, modules are clusters of different numbers of inputs. We also compare DSE to Hyper- densely connected nodes, where connectivity is high within NEAT on a pattern recognition problem. DSE significantly the cluster and low to nodes outside the cluster [11, 12]. outperforms HyperNEAT, suggesting that its potential lay Regularity describes the compressibility of the network de- not just in the properties of the networks it produces, but scription, and often involves symmetries and the repetitions also because it can compete with leading encodings at solv- of modules [12]. Regularity and modularity are distinct con- ing challenging problems. These preliminary results imply cepts. A unicycle has one wheel module (modularity with- that DSE is an interesting new encoding worthy of additional out regularity), whereas a bicycle’s repetition of the wheel study. The results also raise questions about which network module is a form of regularity. Hierarchy is the recursive properties are more likely to be produced by different types composition of lower-level modules [12]. of generative encodings. Scalability in evolved networks refers to the ability to pro- duce effective solutions at different networks sizes. The con- Categories and Subject Descriptors: cept applies both to the idea of being able to evolve large, I.2.6 [Artificial Intelligence]: Learning but fixed-size networks [17], or to a single genome that can produce networks of different sizes as needed [6, 16, 5]. The General Terms: Algorithms latter capability is important for transferring knowledge to related problems of different complexity, and is a main moti- Keywords: Generative and Developmental Representations, vation for DSE. Scalability can be enhanced by other prop- , Indirect Encoding, Networks, Modularity, erties, such as modularity, regularity and hierarchy [12]. Regularity, Scale-free, Scalability Scale-free networks have a few nodes with many connec- tions, and many nodes with few connections, making them efficient at transmitting data and robust to random node failure [1]. Many different types of networks are scale-free, from computer science (e.g. the Internet) and sociology (e.g. Permission to make digital or hard copies of all or part of this work for social networks) to biology (e.g. protein and ecological net- personal or classroom use is granted without fee provided that copies are works) [1]. Animal brains, including the human brain, are not made or distributed for profit or commercial advantage and that copies scale-free, and it is thought that this property is one reason bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific they are so capable [18]. While the scale-free property may permission and/or a fee. not help performance on all problems, it is evidently bene- GECCO’11, July 12–16, 2011, Dublin, Ireland. Copyright 2011 ACM 978-1-4503-0557-0/11/07 ...$10.00.

1523 ficial on many problems, and it is therefore beneficial if an form a more traditional test of the capability of an encoding, encoding can produce it when it is useful. we also compare DSE to HyperNEAT on a visual pattern Previous evolutionary algorithms have evolved networks recognition task. On all three challenges DSE outperforms that have some of the aforementioned network properties. HyperNEAT, suggesting that it is an encoding that deserves Most such algorithms are generative encodings [17], wherein additional investigation. information in a genotype can influence multiple parts of a phenotype, instead of direct encodings, wherein genotypic 2. DESCRIPTION OF DSE information specifies separate aspects of phenotypes. Gen- erative encodings tend to produce more regular phenotypes 2.1 Phenotype that outperform direct encodings [3, 8, 10] and can produce The phenotype of the individual is the network: modular and hierarchical structures [10]. Individual genera- G =(V,E) = ([v1 ... vN ], eij ), (1) tive genomes can also produce networks of various sizes, as { } appropriate [16, 9, 7, 5]. We are not aware of research that where V is a vector of nodes and E is a set of connections. has investigated whether evolved networks are scale-free. Nodes and connections can have attribute values, which play Two generative encodings that DSE is inspired by have be- a role either in development or in computation. Weighted come well-known in part because of their ability to produce connections have a weight attribute and plastic connections important network properties. Cellular Encoding (CE) iter- can have a number of attributes (parameters). The represen- atively applies instructions to manipulate network nodes and tation of G is a bit different from conventional notation for links. The encoding allows cells to call functions, which in- directed graphs, where nodes and connections are contained volve an entire sequence of instructions (encoded in a gram- in sets and the attributes are eventually imposed using func- mar tree), facilitating the production of regular, modular, tions. This departure follows from the assumed sequential and hierarchical networks [5]. Intriguingly, because CE al- model of computation, in which nodes are evaluated in an lows a repeated symbol to unpack into the same subnetwork, ordered manner. It is therefore natural to have them explic- the exhibited regularity, while high, tends to be a formulaic itly ordered in a vector. repetition of the same subnetwork, without complex varia- The vector V is composed of 3 contiguous subvectors: tion. input Vu, hidden Vh and output Vy. Input and output HyperNEAT generates patterns in neural networks via an subvectors constitute the network interface and their length abstraction of biological development [16]. Just as in natu- depends on the problem. The hidden part of the network is ral developing organisms, phenotypic attributes (in this case, free to vary in size. edge weights in a network) are a function of evolved geomet- DSE network phenotypes are executed similarly to arti- ric coordinate frames. HyperNEAT departs from nature, ficial neural networks. The network is initialized before an however, by not having an iterative growth process. A sin- evaluation by zeroing the values of all nodes and setting gle HyperNEAT genome can encode networks of different the weights of any plastic connections to their initial values. sizes; one was scaled up from thousands of connections to Then, for any input vector u, the network is evaluated as millions and still performed well [16]. The neural wiring pat- follows: terns HyperNEAT produces are highly functional, complex, 1: Assign inputs Vu := u and regular, including symmetries and repetition, with and 2: For each vj [Vh Vy] do without variation [3]. However, HyperNEAT struggles to ∈ 3: Evaluate node: xj := fj ( eij (xi) i=1,...,n ) produce modular networks [2] unless it is manually injected { } i 4: For each connection eij do with a bias towards modular wiring patterns [20]. 5: Update eij (if applicable) Intriguingly, CE excels at producing modularity, hierar- 6: Return Vy, chy, and formulaic regularity, but fails to generate complex where x denotes the node’s value, e (x ) denotes the regularities with variation. HyperNEAT, on the other hand, j ij i operation performed by the connection (usually multiplying excels at producing complex regularities, but tends to create the source node x value by the weight of the connection), non-modular networks. This raises the question of whether i and f denotes a transfer function assigned to node j.The an encoding that combines the iterative growth of CE with j transfer function operates on the set of incoming signals, pattern-generating capabilities similar to HyperNEAT can which are usually aggregated by summing. While DSE can create modular, hierarchical networks with complex regu- handle plastic connections, network plasticity is not explored larities. DSE was designed with this motivation. in this paper and is therefore not described further. We will describe DSE (see [19] for additional details) and then present preliminary investigations into the properties 2.2 Genotype of the networks it evolves. The properties under focus are The genotype is a program to grow the network. The the modular, scalable, and scale-free nature of DSE net- program is tree-structured, with nodes being routines. The works. We explicitly reward for networks with these prop- root of the tree constitutes the main routine and is suffi- erties to test if DSE can produce them, and compare the cient to solve many test problems. Each routine tree has results to HyperNEAT. Unfortunately, we were unable to the same structure: γ =(R, Body, Tail,γ ), where R is an obtain a functioning distribution of CE, making a compar- γ identifier, Body is the main list of instructions, Tail is a list ison with that encoding impossible. By explicitly selecting of terminating instructions, and γ is a set of subroutines. for a desired property, we can study an encoding’s ability γ The whole genomic program Γ, i.e. the tree of γ routines, to produce it. While we could have evolved on a problem grows the network from its initial state into its final state, i.e. where it is supposed that scale-free, modular, scalable net- G =Γ(G0). The main component of the routine is Body, works would be beneficial, it is often difficult to ensure that τ which contains instructions that act on the network to de- these properties are beneficial on a specific problem. To per- velop it. Each instruction in Body can be seen as a function

1524 λ taking the network Gt and returning a new network, i.e. to the Tail and the covering operator, which automatically Gt+1 = λ(Gt). The instructions and their properties are inserts Term instructions for any non-terminal symbol in V. described in the subsequent list. Note that the arguments Figure 1 provides a simple example of network develop- X and Y following an instruction can select many nodes ment. Multiplication nodes (marked with ) without inputs at once, allowing operations to be carried out on groups of produce an output of 1.0. If the output∗ is thresholded at nodes. A central concept of DSE is that nodes are identified 0.5, the network solves the logical equals function. by an index and a label. Nodes having the same label make An important feature of DSE is the reuse of code via rou- up groups (e.g. inputs), and instructions can act on all nodes tines. For example, when the instruction Call RXis exe- from a group at once. Within groups, index values provide cuted, if a subroutine with an identifier matching R exists in a geometry for patterns of nodes to be selected and acted the list of subroutines, development replaces X with a new upon. subnetwork. Specifically, the new subnetwork takes on the inputs and outputs of X,andX becomes the hidden node 1. Con XY C: connect nodes X and Y , with the connec- in the newly developed subnetwork. tion having the same attribute values as the reference Figure 2 provides an example of this feature, and also connection C. illustrates how DSE can encode networks with regularity, 2. Cut XY: cut connection(s) between X and Y . modularity, and scalability. This network evolved to solve 3. ConE XY CE: connect nodes X and Y that satisfy the n-bit parity problem, which identifies whether a given the expression E, using C as reference connection. input vector contains an even number of ones. The net- 4. CutE XY E: cut connection(s) between X and Y that work is composed of n-input versions of two logical functions: satisfy the expression E. NAND (n), which returns 1 for 0 inputs and otherwise re- 5. DivP XY: divide node(s) X in parallel, which dupli- turns 1 iffall inputs are 0; and OR ( ), which returns 0 for | cates the node(s) along with all its connections, places 0 inputs and otherwise returns 1 if at least one input is 1. the new node(s) in V right after the original and as- The solution is partially scalable in that a single genotype signs label Y to them. encodes networks that solve several different problem sizes: 6. DivS XY: same as DivP, except the division is sequen- the genotype produces valid solutions for 2, 3,...,9 inputs, tial, which means that instead of duplicating connec- but it fails for higher numbers. A subroutine (labeled r214 in tions, Y takes over the outgoing connections of X and Figure 2), which is a genotypic module, produces a five-node a connection from X to Y is created. phenotypic module that computes a 2-input XOR function. 7. Subst XY: substitute node symbols X with Y . The routine is repeatedly called to generate a regular pattern 8. Call RX: call the subroutine R for each node X. of three of these subnetwork building blocks. This genome 9. Term XY: like Subst, except Y is necessarily terminal, has thus discovered the key regularity of the problem, which i.e. it denotes a transfer function. would likely enhance its evolvability were it further evolved to solve problems with more than 9 inputs, since doing so These instructions provide DSE with interesting capa- would only require appropriately calling the r214 subroutine. bilities motivated from Cellular Encoding [5] and Hyper- NEAT [16]. As in Cellular Encoding, nodes can divide in 2.4 Genetic operators a parallel or sequential manner, and subroutines can be The four operators that modify the genotype are muta- called. These features may encourage modularity and hi- tion, crossover, cleaning and covering. We describe them in erarchy. Similar to HyperNEAT, complex geometric pat- general terms, because they are not essential to the encoding terns can be evolved with an expression tree (E), which is and different implementations could be chosen. a genetic program that takes arguments derived from index Mutations are performed with a fixed probability (here values of nodes, which correspond to geometric coordinates 0.8) during reproduction. They start with the main rou- in HyperNEAT. Expression tree outputs determine connec- tine and recurse deeper into the tree, and can insert a new tion weights via the ConE and CutE instructions. The Con instruction, delete an instruction, mutate an instruction’s and Cut operations can also create regular patterns by se- argument, or duplicate an instruction while mutating an ar- lecting groups of nodes, or can adjust individual connections. gument. A mutated argument is changed to undetermined The ability to create regular and irregular changes, including and is determined again during the next development of changes to single weights, can improve the performance of the network. Undetermined arguments are determined dur- an encoding [3]. The substitution operation can also create ing network development to select a randomly-chosen set of regular patterns by selecting groups of nodes and assigning nodes already in the network (otherwise, arguments would transfer functions to them. not match nodes in the network and instructions would be An initial population of genotypes consists of either empty mostly ineffective). This pertains to all node-selecting argu- or randomly-generated genotypes, the latter of which have ments (denoted as X and Y ); expression trees are mutated instructions with arguments that are undetermined (explained as is typically done in [15]. in Section 2.4). The crossover operator transfers two randomly selected instructions from a donor’s routine to a recipient’s routine, 2.3 Development much as in Linear Genetic Programming [15]. The cleaning Development starts from an initial network G0 with an op- operator periodically (here, 5% of reproductions) removes tional hidden node called A0 that is fully connected (weights genotypic instructions that have no effect. This operator = 1) to inputs and outputs. The final network Gτ is devel- improves genotype readability and can impact search by re- oped by executing the Body and Tail parts of the genomic moving neutral variation, which improved performance on program, i.e. Gτ =Γ(G0). some problems in preliminary experiments, but further ex- The network Gτ is guaranteed to be functionally valid due periments are necessary to determine its impact.

1525 Initial network DivP A0 G Con u y0 w:-0.58 Cut u G Tail, final network

Body: u0 u1 u0 u1 u1 u0 u0 u1 u0 u1 DivP A0 G Con u y0 w:-0.58 Cut u G Tail: A0 A0 G0 w:-0.58 A0 G0 w:-0.58 w:-0.58 A0 G0 w:-0.58 w:-0.58 * * w:-0.58 Term A * Term y + Term G * y0 y0 y0 y0 +

Figure 1: Example of network development. The three Body instructions in the genome (left) are executed before the Tail, which assigns valid transfer functions. The final network performs the logical equivalence (equals) operation. Input nodes have irregular ovals and the output is the lowest node.

u0 15: |

u1 11: | 17: n 18: n y0: n u2 7: | 13: n 14: n 16: n 3: | 9: n 10: n 12: n 5: n 6: n 8: n Body: Call r214 A ConE n | 1 (+ (= j j)(= i i)) Call r214 |1 Call r214 |1 Call r214 |1 Call r214 |1 Call r214 |1 4: n Call r214 |1 Call r214 |1 DivP |0 R Tail: Term y n Term D n Term R n

Routine r214 Body: DivS u1 B CutE u A (i) DivS A D ConE u B 1 (i) Tail: Term A | Term y n Term B |

Figure 2: A modular, partially-scalable solution to the n-bit parity problem. The visualized network is developed for 3 inputs (irregular ovals), although the genome produces solutions for 2, 3, . . . , 9 inputs. The output is the rightmost node.

Unlike the previously-described operators, covering is not A single HyperNEAT genotype generates networks for 7 applied during reproduction, but at the end of development. different substrate sizes: (i +3)2, where i =0,...,6, and for For each non-terminal symbol X remaining in V at the end each of these resolutions the scale-free property is measured. of Tail, the operator appends a new Term XY instruction. The fitness measure for each network size is Covering may be applied for each routine or for the main fi =100+B(P, q) (nc 1), (4) routine only. Either way it guarantees that all the nodes in − − the final network are assigned valid transfer functions. where nc is the number of separate network components. The second term penalizes networks with disconnected pieces. 3. EXPERIMENTS The 100 ensures all fitnesses are positive. In all experiments the population size and number of gen- DSE is similarly challenged with generating genotypes erations is 200, and networks are feed-forward. Experiments that encode for networks that maximize the scale-free prop- 2 are implemented identically in DSE and HyperNEAT, with erty and have approximate sizes of ni =(i +3),i= default parameter settings. Full details for HyperNEAT are nu =0,...,6. The genome produces different sized net- in Stanley et al. [16]. Briefly, HyperNEAT evolves genomic works based on the number of inputs nu. To encourage networks that encode the weights of neural network pheno- correct network sizes, an additional penalty term is added types as a function of the geometric location of the weights. to the DSE fitness measure for each network size

fi =1 B(P, q)+(nc 1) + 1 n/ni , (5) 3.1 Evolving scale-free networks − − | − | Networks are scale-free when the distribution of the num- where n is the number of nodes in the developed network. ber of connections per node (its degree) follows a power law, Contrary to HyperNEAT, the DSE fitness function is min- such that the proportion of nodes having degree k is: imized. In both cases the overall fitness function is the fol- −γ lowing weighted average from all fitness cases: P (k) k , (2) ∝ fi√ni where γ is an exponent parameter, typically in the range F = #i . (6) [2, 3] (set to 2 in our experiments). To test whether DSE #i √ni and HyperNEAT can produce scalable, scale-free networks, This equation weights the results for bigger networks as more we rewarded genotypes that produce scale-free networks at important than smaller networks in square root proportions. several different target sizes. Because the two fitness functions differ, we will compare only The scale-free property is measured with the Bhattacharyya the B coefficients of evolved networks. coefficient as the overlap between the distributions P (the The results from 30 runs of DSE and HyperNEAT indicate power law distribution) and q (the distribution of the evolved that both encodings produce networks at different network network): sizes that are scale-free (Figure 3). The DSE results are slightly, but significantly, better (p<0.001, permutation B(P, q)=! "P (k)q(k). (3) k test), although the networks from both encodings had B

1526 distributed uniformly randomly. In other words, Q indicates Table 1: Target and evolved sizes of networks how much the modularity of the given network differs from evolved with DSE in both scale-free and modular a network with all random connections. For a fully random networks experiments. network, Q is about 0, while values above 0.3 indicate some Target size 9 16 25 36 49 64 81 Scale-free modularity in the network, and the the highest Q observed Mean size 8.7 16.5 25.4 36.3 49.0 64.1 76.3 at the time of publication was 0.807 [13]. Computation of Std. dev. 0.4 0.7 0.7 0.8 1.4 1.6 3.6 the Q coefficient requires the network to be partitioned into Modular modules in advance, otherwise it is not possible to calculate Mean size 9.2 18.2 28.0 39.3 52.5 65.3 76.5 the value eij . Ideally Q would be calculated for all possible Std. dev. 0.4 1.8 2.0 2.9 2.6 2.8 2.8 partitions of the network and the partition maximizing Q would be selected. In practice this exhaustive approach is computationally prohibitive, so a partition is selected with coefficients near the maximum value of 1.0. The plotted a greedy hill-climbing approach for which Q is the maxi- B coefficients are weighted in the same way that fitness is mum [13]. The algorithm starts by considering each node weighted in eq. 6. DSE was also nearly perfect at producing a separate module and iteratively joins modules for which networks of the target size (Table 1). HyperNEAT networks the increase in Q is the highest. The complexity of the al- had fixed, predefined sizes, and thus cannot be compared on gorithm is O((m + n)n), where n and m respectively denote this dimension. the number of nodes and connections. Visualizing the networks reveals interesting differences in DSE significantly outperforms HyperNEAT at evolving the types of networks each encoding produces (Figures 4 modular networks of different sizes (Figure 6, p<0.001, and 5; produced in Graphviz with the radial layout). While permutation test). The Q values plotted are weighted iden- both encodings produce regular networks, the DSE networks tically to the B values in the previous section, averaged over appear more modular, even though modularity was not re- 20 runs. Interestingly, DSE networks are initially quite mod- warded in these experiments. By quantifying modularity ular, and reach near-maximum modularity within a few gen- with the Q metric [14] (see the next section), we validate erations, whereas HyperNEAT required many generations that DSE networks from these experiments are significantly to approach its maximum, revealing the default tendencies more modular (DSE Q =0.48 0.08, HyperNEAT Q = of these encodings. DSE also evolved to produce networks .04 .05,p < 0.001, permutation± test). The DSE networks closely approximating the required size (Table 1). Interest- also± appear hierarchical in that nodes typically have only one ingly, additional experiments where HyperNEAT genomes path through multiple other nodes to hub nodes, although were tested on only one network size reveal that HyperNEAT quantifying hierarchy is beyond the scope of this paper. can produce highly-modular networks (data not shown), sug- gesting that the need to simultaneously produce modularity 1 on networks of different sizes is a unique challenge for Hy- perNEAT, but not DSE. 0.9 Figures 7 and 8 (Graphviz with spring-model layout) show the best evolved networks from both methods for increasing 0.8 numbers of inputs (DSE) or increasing network size (Hyper-

B NEAT). Evolved networks display some topological patterns 0.7 that scale up in interesting ways. The DSE networks resem- ble starfish that have one more arm than the number of 0.6 DSE, m.w.B = .9474 ± .0038 inputs. The HyperNEAT networks consist of clusters that H−NEAT, m.w.B = .9235± .0097 increase in number along with the network size, but are too 0.5 0 50 100 150 200 densely interconnected to yield a high average Q score. Generation To further probe differences between the encodings on properties not directly selected for, we measured the scale- Figure 3: Both DSE and HyperNEAT can evolve free property for the networks from this experiment. Al- genomes that produce networks of different size that though they were not selected for it, the DSE networks were are scale-free. Plotted are weighted averages SD significantly more scale-free (DSE B =0.67 0.07, Hyper- ± of the best B coefficients from 30 runs. ± NEAT B =0.50 0.18,p<0.001, permutation test). ± 3.3 A visual pattern recognition problem 3.2 Evolving modular networks It is also important to test if DSE is competitive with Testing the ability to evolve modular networks is formu- cutting-edge generative encodings at solving problems. We lated identically to the scale-free test, except the quantity chose a visual pattern recognition problem that HyperNEAT maximized is the modularity measure Q from Newman and was recently tested on [4], which itself was motivated by a Girvan [14, 13]. Let eij denote the proportion of connections problem from the paper that introduced HyperNEAT [16]. between the i-th and j-th group of nodes (a module), and In this problem, 3 shapes are visible and the network has let ai = #j eij , then to identify the location of a target shape while ignoring the 2 other two shapes. Each shape fits within a 3 3 grid of Q = !(eii ai) (7) × − pixels (Figure 9) and all 3 shapes are distributed on a 9 9 i pixel canvas. The S shape is most unique and thus easiest× to is the proportion of all connections within modules minus an distinguish. The networks are fixed-size, with a 9 9 input expected value of the same quantity if the connections were sheet of nodes and a corresponding 9 9 output sheet.× For ×

1527 8: +0 13: +1

11: +1

8: +0

14: +0 12: +0

Body: DivS u R Subst R R DivP u X Cut P0 A DivP X3 C Subst X0 + 7: +0

9: +1 15: +1 10: +0 3: +1

9: +0

7: +0 Cut R X0 Subst X1 + DivS A0 O Cut +0 R DivP X O Subst T2 V 6: +0

6: +0

6: +0

12: +0

7: +1 Cut R A0 DivS A0 W Cut u R Subst U + DivS u P Cut P0 O0 4: +0 16: +0 u1 DivS O3 P3 DivS u W Subst W0 W0 Subst +0 + Cut P0 X2 Con W0 P0 1 5: +0

20: +0 5: +0 0: +0 5: +0

10: +0 11: +0 3: +0

4: +0 Cut P0 W0 DivP A0 U Subst T2 A0 DivP X4 O Subst +0 R DivP O S 2: +0

17: +0

2: +0 1: +0

Con W0 R 1 DivS U0 S Con U P 1 DivP u T DivS X5 R Con R1 + 1 22: +0

4: +0 u0

14: +0

15: +0 Con O0 R 1 Cut P0 R Subst T2 + DivP C L Con W W 1 DivS W N 3: +0

19: +0 23: +0 21: +0

u0

24: +0 8: +1

2: +0 Cut W W2 DivP A0 L Subst U S0 Subst R1 + Con W N0 1 DivS S4 W 16: +0

1: +0 13: +0 18: +0 DivS W Z DivP X2 S Cut P0 W Cut P0 N0 nu : 0, n : 9, B : 0.92 nu : 1, n : 17, B : 0.92 nu : 2, n : 25, B : 0.95

28: +3

34: +4

33: +3

22: +2

27: +2

16: +1

32: +2

21: +3

9: +4

u4

11: +5 u5 26: +4

23: +3 21: +1 18: +2

16: +2

10: +4

7: +3 26: +1 31: +1 8: +3

u4

11: +1

u3

20: +2

32: +2 u3

25: +1 37: +1

25: +0 22: +4 9: +3 19: +1 5: +2

13: +1 31: +1 30: +0

33: +2 15: +1

38: +2

51: +0

27: +5 26: +2

u3

20: +0 52: +1 19: +3 30: +0

24: +4 24: +0

36: +0 18: +0 15: +0 17: +3 39: +2 6: +2 34: +2 7: +2

36: +0

61: +0

24: +0 20: +2 u2 17: +1

62: +1 37: +1 27: +2

21: +3 40: +2

25: +1 12: +2 53: +2

35: +2 u2

u2 8: +2 16: +2

14: +2

41: +2

63: +2

28: +2

36: +3 6: +1 21: +2 4: +1 u2 12: +0

18: +2 38: +2

5: +1 42: +3

10: +0 37: +3 13: +1 8: +0 14: +0 7: +1 u1

29: +2 43: +3

39: +3 11: +0 26: +2 9: +1 11: +1

15: +1 u1

22: +2 u1 54: +3 10: +0 5: +0 45: +3 u1

38: +3

44: +3 64: +3 13: +0

30: +3 9: +0

u0

12: +0 6: +0

7: +0

8: +0 4: +0

40: +3

62: +0 61: +0 46: +3 u0

6: +0 3: +0

23: +2 u0 23: +4 72: +0

73: +0 41: +3

47: +3 39: +3 31: +3

46: +0 20: +3 33: +3 28: +5

45: +0

u0

55: +4 17: +2

42: +4 25: +4

48: +4 33: +0

65: +4

14: +1 18: +3

22: +3

14: +2 64: +0

32: +0 43: +4 49: +4 63: +0 19: +2 32: +3 15: +2

60: +0

31: +0 34: +3 16: +1

12: +1 17: +2 46: +4 75: +0 52: +4 13: +2 66: +5

74: +0 44: +0 48: +0

47: +0 71: +0

47: +4 27: +0 10: +1 35: +3 53: +4

34: +0 35: +0

50: +4

56: +4

24: +4 51: +4

19: +3 45: +4

56: +0 59: +0

57: +0

58: +5

28: +0

40: +0

59: +5

60: +5 41: +0

70: +0

29: +5 43: +0 50: +4 68: +0 67: +0 30: +0

49: +4

55: +4

44: +4

23: +3

29: +4

57: +5 58: +0

69: +0 54: +4

29: +0 35: +5 42: +0 48: +4

nu : 3, n : 36, B : 0.96 nu : 4, n : 49, B : 0.96 nu : 5, n : 65, B : 0.96 nu : 6, n : 76, B : 0.97

Figure 4: The networks produced by the DSE genome that evolved the highest average B coefficient. nu = number of inputs, n = network size, and B measures the scale-free property.

n2 n3

n2 n3

n4

n4

n1 n10

n1 n3

n11

n9 n8

n2

n12

n13

n10

n9

n14

n11

n19

n10

n12

n20

n21

n17 n11

n22

n18

n15 n23

n30

n19

n34

n28

n43

n13

n24

n19

n16

n5

n29

n20

n39

n4

n53 n30

n22

n17

n21

n34

n5

n31

n57

n43

n27

n74

n56 n73

n44 n25

n42 n32 n72

n49

n18

n25

n64

n31 n12

n48

n63 n36

n33

n15

n26

n55

n14 n41

n35

n20

n22 n54 n1

n38 n40 n5 n2

n6 n3

n35 n6 n2 n27

n46 n7 n2 n1

n33 n28

n5

n6 n39

n7 n6 n8 n1 n32 n23 n45

n8 n7 n1

n3 n4

n23 n21 n24

n28 n1

n37

n9 n23

n13 n16 n6

n40

n14

n36

n7 n9 n27 n8 n14

n15 n7

n8 n12 n10 n29

n41 n16

n18

n24

n26

n42 n0 n5 n13 n17

n34

n7

n10 n9

n0

n0

n25

n17

n63 n47 n30 n12

n14 n11 n35

n48

n6

n48

n20

n62

n26 n11 n4 n24

n36 n8

n49

n15 n5

n47

n61

n4 n2

n3 n6 n3 n80 n4 n50

n37

n29

n60 n16 n35 n79

n51

n46 n12 n24

n38

n0 n0 n78 n30

n52

n59 n0 n15 n0

n42

n45 n18 n34

n77

n56

n58

n31

n43

n76

n57

n13 n23

n44 n8

n55

n75

n9 n58 n19 n33 n44

n32

n54

n71

n59 n41

n45

n33 n70

n53 n60

n46

n40

n69 n61 n52

n37 n20 n32 n47

n51 n68 n39 n62 n15 n22 n50

n38 n14 n67 n65

n66

n21 n31

n5

n10 n16 n21

n22 n29

n13

n17 n19 n25 n28

n7 n11 n18 n26 n27 n : 9, B : 0.85 n : 16, B : 0.89 n : 25, B : 0.93 n : 36, B : 0.95 n : 49, B : 0.96 n : 64, B : 0.95 n : 81, B : 0.95

Figure 5: Network phenotypes for the HyperNEAT genotype that evolved the highest average B coefficient. a single evaluation, shapes are placed at random locations 0.7 without overlap. A network evaluation is considered a suc- 0.6 cess if the output node with the highest activation value is in the same location as the center of the target shape, oth- 0.5 erwise an error is scored. Averaged over 20 evaluations, the 0.4

fitness for DSE genotypes is the error rate (to be minimized) Q 0.3 and for HyperNEAT is the success rate (to be maximized). There are 3 variants of the problem, with each shape respec- 0.2 ± tively being the target and the other two being distractions. 0.1 DSE, m.w.Q = .6772 .0062 Because this task is two-dimensional, DSE uses 2D vari- H−NEAT, m.w.Q = .4248± .0362 0 ants of the ConE and CutE instructions, which can create 0 50 100 150 200 connectivity patterns via an expression tree. Since these Generation are the only two instructions in this experiment, networks remain a fixed size. Nodes have two indexes as in a matrix. Figure 6: Resultant modularity values (weighted A key element of both encodings is that evolved functions mean Q SD) when modularity is selected for. create geometric patterns that specify network connectiv- ± ity. In DSE, such patterns are created by expression trees (Section 2.2). In both encodings, the functions that produce has a default set of operations following previous studies [16, geometric patterns have 8 arguments: the x and y values of 3], with sigmoid, sine, Gaussian, and bounded linear func- the input and output nodes (normalized to [0, 1)), ∆x and tions. DSE has +, , , abs(), <, Gauss(), and constants − ∗ ∆y (the difference between input and output nodes in the x randomly generated from a normal distribution (mean 0 and y dimensions), and the straight-line distance and polar 1 SD). For both HyperNEAT and DSE, the output of angle between input and output nodes. For DSE, the node an± evolved pattern-generating function specifies a network coordinates come from normalizing their indexes. weight (set to 0 if between -0.2 and 0.2). Another important One of the main differences between HyperNEAT and ex- difference is that in DSE weights produced by subsequent pression trees are the primitive (building block) operations ConE2D instructions accumulate. CutE2D instructions cut allowed in both systems. In these experiments, HyperNEAT any connections for which the expression value is positive.

1528 u1

12: +1 18: +0

28: +0

14: +1

17: +0 27: +0

37: +0

Body: Con A A 1 DivS A5 O DivS u A 15: +0 25: +0

13: +1

16: +0 26: +0

15: +1

38: +0 DivS u4 A0 DivS A S DivP S E DivS A5 H 35: +0

14: +0 24: +0 36: +0

16: +1

DivS A C DivS A X DivP C G DivS C4 G 34: +0

23: +0

13: +0 18: +1

33: +0

DivS X4 C DivS P D DivS X C2 DivS A D 17: +1 20: +1

11: +0

22: +0

31: +0

21: +0

19: +1 19: +1 Cut E X0 DivP D M 12: +0 32: +0

10: +0 17: +1

20: +0

30: +0

18: +1

13: +1

16: +1

15: +1

14: +1

10: +0 11: +1 8: +0 9: +0 28: +2 u1

11: +0

12: +1 8: +0

26: +2

8: +0

9: +0 6: +0

27: +2

7: +0

8: +0

7: +0

25: +2

6: +0 7: +0

5: +0

6: +0

9: +0 10: +0 29: +2

6: +0 24: +2 5: +0 4: +0

3: +0 7: +0 4: +0

3: +0 23: +2 5: +0 4: +0

1: +0

3: +0

5: +0 22: +2 4: +0

3: +0 2: +0

1: +0 2: +0 21: +2

0: +0

u0

2: +0 u0 u0 u2

nu : 0, n : 9, Q : 0.30 nu : 1, n : 19, Q : 0.55 nu : 2, n : 29, Q : 0.65 nu : 3, n : 39, Q : 0.69

u4

u4

6: +4

8: +4

5: +4

7: +4

7: +4

9: +4

u1

16: +4 u2

6: +4

12: +4

10: +4

8: +4

26: +1

13: +4

11: +4

22: +2

14: +4

9: +4 28: +1

27: +1

15: +4

24: +2

11: +4

23: +2

10: +4

53: +4 29: +1

55: +4

u2

25: +2

81: +0 12: +4

13: +4

54: +4

34: +2 30: +1 56: +4 15: +4

82: +0 79: +0 14: +4

36: +2 34: +1

u3 57: +4

26: +2 80: +0

52: +4

35: +2

54: +4

32: +1 33: +3 30: +2 58: +4 31: +3 78: +0 31: +1

37: +2

53: +4

27: +2 28: +2 59: +4

61: +4 55: +4 77: +0

34: +3 32: +3

33: +1

38: +2

56: +4 63: +4 60: +4

35: +3 75: +0

37: +2 62: +4

76: +0

57: +4 29: +2 42: +2 37: +3 40: +2 21: +1 u2 39: +2 35: +2 38: +2

40: +2 74: +0 42: +2

14: +1 41: +2

36: +3 39: +2

36: +2

58: +4

38: +3 39: +3 18: +1 72: +5

60: +4 43: +2

u1 13: +1 18: +0

u0 16: +1 20: +1 24: +0 17: +1 24: +0

19: +1 16: +0 40: +0 51: +3

62: +4 41: +2 71: +5 59: +4

15: +1 19: +0

61: +4

21: +0 69: +5

17: +0 20: +0 22: +0

25: +0 22: +0

23: +0 23: +0 41: +0

70: +5 11: +0 73: +5

49: +3 42: +0 63: +0

50: +3

68: +5

21: +0 32: +1 52: +3

10: +0 50: +3

65: +0

64: +0 43: +0 9: +0

30: +1 33: +1

48: +3 67: +5

20: +0

31: +1 48: +3

65: +5

51: +3

8: +0 18: +0

49: +3

12: +0

66: +0

29: +1 44: +0

66: +5

47: +3 19: +0

47: +3 64: +5

7: +0 17: +0

28: +1

46: +0

26: +1 5: +0 67: +0

46: +3 45: +3 45: +0 u5 46: +3

u0

44: +3

48: +0

27: +1

6: +0

25: +1

68: +0

69: +0 47: +0 45: +3 44: +3

4: +0

43: +3

u1

71: +0

70: +0

u0

u3 u3

nu : 4, n : 49, Q : 0.72 nu : 5, n : 72, Q : 0.77 nu : 6, n : 83, Q : 0.79

Figure 7: The DSE genotype that produced the highest Q coefficient and the 7 networks it develops for different numbers of inputs nu =0,...,6.

n61

n59

n5 n4 n60 n58

n53

n54

n68 n63 n57

n51

n69 n62

n60 n50 n57 n58

n67

n59 n52

n3 n71

n66 n39

n56 n55

n38 n70 n49

n24

n37 n61 n23 n56

n2 n0 n50 n65 n40 n6 n48 n7 n62 n44

n41 n49 n36 n48 n80

n37

n21 n45

n22 n55

n32 n38 n64 n40 n79 n77

n16 n51

n63 n34 n36

n41 n1 n33 n52 n46 n44

n42 n8 n47 n54

n78 n35 n32 n74

n72 n34 n46

n20 n33 n18

n35 n32 n43 n17 n33 n31 n47 n48 n53 n47 n73 n45

n30 n39 n76

n75 n29 n43

n15 n34

n30

n23

: : n24 n21 n 9 Q 0 35 n45 n19 n14 n22 n28 , . n42

n31 n36

n28

n35 n33

n20 n46 n37

n28 n29 n30 n31 n34 n25 n11 n29 n13 n10 n40 n27 n39

n26 n43

n16 n18

n21 n32 n19 n27

n17 n30 n44 n35

n15 n15 n19

n12 n18 n23 n31 n18 n17 n41 n42 n38 n16 n16

n22

n1 n25 n21 n14

n18

n3 n20 n24 n26

n19

n14 n29 n8

n24 n21 n9 n20 n14

n13

n12 n13

n19 n20 n22

n5 n25 n23

n10

n7 n28

n27 n2 n22 n9 n17

n24 n26

n0 n26

n23 n0

n15 n12 n6 n27

n12 n8 n25 n6 n13 n0 n13 n14

n7 n10

n8

n1 n9 n14

n0

n11 n15

n11

n0 n16

n8 n12 n0 n4 n15 n1 n11 n17

n8 n9 n1

n5 n10

n7 n3 n3 n11

n13 n5

n4

n12 n2

n2

n9

n10 n6 n9 n5 n3 n3 n8 n3 n4 n5 n6 n2 n2

n5 n1

n7 n6

n11

n2 n6 n7 n7 n4 n4 n4 n1 n10 n : 16, Q : 0.56 n : 25, Q : 0.59 n : 36, Q : 0.57 n : 49, Q : 0.57 n : 64, Q : 0.54 n : 81, Q : 0.54

Figure 8: Network phenotypes for the HyperNEAT genome with the highest Q value.

We perform 30 evolutionary runs for each of the three ficial on related problems. To test whether instructions from task variants. DSE substantially and significantly outper- one task aid in solving another task, we implemented an is- forms HyperNEAT on all three problem variants (Table 2, land model, where organisms on three islands are rewarded p<0.01). These scores underrepresent DSE’s advantage, for their performance on the three different task variants. because plots over time (not shown) suggest that only DSE Migrants travel between islands, where they can be crossed would improve with additional generations on the X and with native individuals. 5% of individuals migrate between O problems, and because DSE achieved near-perfect fitness each pair of islands each generation, crossover occurs 40% of on the S problem much earlier than HyperNEAT. While the time, and 60% of genomes are mutated. Exchanging ge- only from a single problem, these results suggest that DSE netic information between organisms rewarded for different can compete with cutting-edge generative encodings, such tasks should only be beneficial if instructions contain partial as HyperNEAT, especially given that this problem type was solutions common to multiple tasks. chosen to highlight HyperNEAT’s merits. DSE performance significantly improves with migration A benefit of encodings like DSE that have separate in- and crossover on the X and O problems (DSE 11, Table 2, structions or rules is that individual instructions can serve as p<0.01, permutation test). There was no meaningful room genetic modules that create wiring motifs that may be bene- for improvement on the S variant. To ensure that perfor- mance gains are due to exchanging beneficial instructions, we test DSE with migration, but not crossover (DSE 10) and with crossover, but not migration (DSE 01). Neither of these controls significantly improve performance over regular DSE (Table 2, p>0.05, permutation test), demonstrating that DSE can exchange functional building blocks between Figure 9: Shape X (cross), O (circle) and S (square). individuals working on similar problems to enhance perfor-

1529 5. REFERENCES Table 2: Error rates (percent wrong) on the pat- tern recognition problem. Averages of the final best [1] A. Barab´asi. Scale-Free Networks: A Decade and fitness values are shown, tested on 600 shape place- Beyond. Science, 1173299(412):325, 2009. ments to increase reporting accuracy. The letters [2] J. Clune, B. E. Beckmann, P. K. McKinley, and X, O, and S indicates which pattern was the target C. Ofria. Investigating whether HyperNEAT produces to be recognized. modular neural networks. In Proceedings of the X O S Genetic and Conference, H-NEAT 67.1 5.0 63.6 4.2 1.3 5.5 pages 635–642. ACM, 2010. DSE 29.0 ± 22.8 20.4 ± 29.1 0.1 ± 0.3 [3] J. Clune, K. Stanley, R. Pennock, and C. Ofria. On ± ± ± DSE 10 31.8 20.0 12.1 21.9 0.0 0.1 the performance of indirect encoding across the DSE 01 32.6 ± 21.7 22.9 ± 29.8 0.1 ± 0.3 continuum of regularity. IEEE Transactions on DSE 11 14.2 ± 15.8 06.9 ± 15.3 0.0 ± 0.1 Evolutionary Computation, To appear, 2011. ± ± ± [4] O. J. Coleman. Evolving neural networks for visual processing. B.S. Thesis. U. New South Wales, 2010. [5] F. Gruau. Neural Network Synthesis using Cellular mance. This capability should be increasingly beneficial as Encoding and the . PhD thesis, the number of objectives simultaneously being solved rises. Ecole Normale Sup´erieure de Lyon, France, 1994. It is an open question whether other leading generative en- codings, such as HyperNEAT, can similarly exchange partial [6] S. Harding and W. Banzhaf. Artificial development. Organic Computing solutions to different problems between individuals. , pages 201–219, 2008. To illustrate how expression trees can solve problems by [7] S. Harding, J. Miller, and W. Banzhaf. Self modifying creating connectivity patterns, we describe one of the short- cartesian genetic programming: Parity. In est DSE solutions for the S problem variant, which was the Evolutionary Computation, 2009. CEC’09. IEEE easiest. This solution contained only one instruction that Congress on, pages 285–292. IEEE, 2009. had an effect: ConE2D u y (abs (< u2 0.22)). It creates con- [8] G. Hornby, H. Lipson, and J. Pollack. Generative nections between all input-output node pairs with a straight- representations for the automated design of modular line distance between them of less than 0.22, which connects physical robots. IEEE Transactions on Robotics and each output node to its 3 3-neighborhood counterpart in Automation, 19(4):703–719, 2003. the input layer. This connectivity× pattern ensures that the [9] G. Hornby and J. Pollack. Creating high-level highest activation in the output layer is the node correspond- components with a generative representation for ing to the center of the square shape. body-brain evolution. Artificial Life, 8(3), 2002. [10] G. S. Hornby. Measuring, enabling and comparing modularity, regularity and hierarchy in evolutionary 4. DISCUSSION AND CONCLUSION design. In Proceedings of the Genetic and Evolutionary This paper demonstrates that a novel encoding for evolv- Computation Conference.ACM,2005. ing networks, DSE, can produce networks that are regular, [11] N. Kashtan and U. Alon. Spontaneous evolution of modular, scalable, and scale-free. The networks also appear modularity and network motifs. Proc. Natl. Acad. Sci. hierarchical, although additional work is needed to quan- U.S.A., 102(39):13773, 2005. tify this property. As such, DSE produces networks with [12] H. Lipson. Principles of modularity, regularity, and many properties thought to enhance evolvability and per- hierarchy for scalable systems. Journal of Biological formance [12, 11, 1]. DSE is interesting because it combines Physics and Chemistry, 7(4):125, 2007. concepts from two different camps of generative encodings. [13] M. E. J. Newman. Fast algorithm for detecting Like Cellular Encoding and L-Systems, DSE has a develop- community structure in networks. Physical Review E, mental process that iteratively rewrites symbols according 69(6):066133, 2004. to instructions. Like HyperNEAT, it can also specify con- [14] M. E. J. Newman and M. Girvan. Finding and nectivity as a function of evolved geometric patterns. Our evaluating community structure in networks. Physical results suggest that this hybridization can combine the best review E, 69(2):26113, 2004. attributes of both camps of generative encodings. For exam- [15] R. Poli, W. Langdon, and N. McPhee. A Field Guide ple, DSE can create scalable and modular networks (features to Genetic Programming. Lulu Press, 2008. typically associated with iterative-rewrite encodings [10, 8, [16] K. Stanley, D. D’Ambrosio, and J. Gauci. A 5]) with regular network patterns (HyperNEAT’s forte [16, hypercube-based encoding for evolving large-scale 3]). It is interesting, for instance, that DSE produced more neural networks. Artificial Life, 15(2):185–212, 2009. modular and scale-free networks than HyperNEAT when [17] K. Stanley and R. Miikkulainen. A taxonomy for these properties were not explicitly rewarded. Future work artificial embryogeny. Artificial Life, 9(2), 2003. is required to better understand the relative strengths and [18] G. Striedter. Principles of brain evolution. Sinauer weaknesses of both these camps of encodings and their hy- Associates Sunderland, MA, 2005. bridizations. DSE thus raises interesting questions about [19] M. Suchorzewski. Evolving scalable and modular what types of networks different encodings produce. It also adaptive networks with Developmental Symbolic appears to be a promising encoding in its own right worthy Encoding. Evolutionary Intelligence, To appear, 2011. of additional investigation. [20] P. Verbancsics and K. Stanley. Constraining Acknowledgments: NSF Postdoctoral Research Fellow- Connectivity to Encourage Modularity in U. Central Florida Tech Report ship in Biology to JeffClune (award number DBI-1003220). HyperNEAT. ,2010.

1530