A Novel Generative Encoding for Evolving Modular, Regular and Scalable Networks
Marcin Suchorzewski Jeff Clune Artificial Intelligence Laboratory Creative Machines Laboratory West Pomeranian University of Technology Cornell University [email protected] [email protected]
ABSTRACT 1. INTRODUCTION In this paper we introduce the Developmental Symbolic En- Networks designed by humans and evolution have certain coding (DSE), a new generative encoding for evolving net- properties that are thought to enhance both their perfor- works (e.g. neural or boolean). DSE combines elements of mance and adaptability, such as modularity, regularity, hier- two powerful generative encodings, Cellular Encoding and archy, scalability, and being scale-free [12, 11, 1]. In this pa- HyperNEAT, in order to evolve networks that are modular, per we introduce a new generative representation called the regular, scale-free, and scalable. Generating networks with Developmental Symbolic Encoding that evolves networks these properties is important because they can enhance per- that possess these properties. We first review the mean- formance and evolvability. We test DSE’s ability to gener- ing of these terms before discussing them with respect to ate scale-free and modular networks by explicitly rewarding previous evolutionary algorithms. these properties and seeing whether evolution can produce Modularity is the encapsulation of function within mostly networks that possess them. We compare the networks DSE independently operating units, where the internal workings evolves to those of HyperNEAT. The results show that both of the module tend to only affect the rest of the structure encodings can produce scale-free networks, although DSE through the module’s inputs and outputs [12, 11]. For ex- performs slightly, but significantly, better on this objective. ample, a car wheel is a module because it is a separate func- DSE networks are far more modular than HyperNEAT net- tional unit and changing its design is unlikely to affect other works. Both encodings produce regular networks. We fur- car systems (e.g. the muffler). This functional independence ther demonstrate that individual DSE genomes during de- enables the wheel design to be optimized independently from velopment can scale up a network pattern to accommodate most other car modules. In networks, modules are clusters of different numbers of inputs. We also compare DSE to Hyper- densely connected nodes, where connectivity is high within NEAT on a pattern recognition problem. DSE significantly the cluster and low to nodes outside the cluster [11, 12]. outperforms HyperNEAT, suggesting that its potential lay Regularity describes the compressibility of the network de- not just in the properties of the networks it produces, but scription, and often involves symmetries and the repetitions also because it can compete with leading encodings at solv- of modules [12]. Regularity and modularity are distinct con- ing challenging problems. These preliminary results imply cepts. A unicycle has one wheel module (modularity with- that DSE is an interesting new encoding worthy of additional out regularity), whereas a bicycle’s repetition of the wheel study. The results also raise questions about which network module is a form of regularity. Hierarchy is the recursive properties are more likely to be produced by different types composition of lower-level modules [12]. of generative encodings. Scalability in evolved networks refers to the ability to pro- duce effective solutions at different networks sizes. The con- Categories and Subject Descriptors: cept applies both to the idea of being able to evolve large, I.2.6 [Artificial Intelligence]: Learning but fixed-size networks [17], or to a single genome that can produce networks of different sizes as needed [6, 16, 5]. The General Terms: Algorithms latter capability is important for transferring knowledge to related problems of different complexity, and is a main moti- Keywords: Generative and Developmental Representations, vation for DSE. Scalability can be enhanced by other prop- Neuroevolution, Indirect Encoding, Networks, Modularity, erties, such as modularity, regularity and hierarchy [12]. Regularity, Scale-free, Scalability Scale-free networks have a few nodes with many connec- tions, and many nodes with few connections, making them efficient at transmitting data and robust to random node failure [1]. Many different types of networks are scale-free, from computer science (e.g. the Internet) and sociology (e.g. Permission to make digital or hard copies of all or part of this work for social networks) to biology (e.g. protein and ecological net- personal or classroom use is granted without fee provided that copies are works) [1]. Animal brains, including the human brain, are not made or distributed for profit or commercial advantage and that copies scale-free, and it is thought that this property is one reason bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific they are so capable [18]. While the scale-free property may permission and/or a fee. not help performance on all problems, it is evidently bene- GECCO’11, July 12–16, 2011, Dublin, Ireland. Copyright 2011 ACM 978-1-4503-0557-0/11/07 ...$10.00.
1523 ficial on many problems, and it is therefore beneficial if an form a more traditional test of the capability of an encoding, encoding can produce it when it is useful. we also compare DSE to HyperNEAT on a visual pattern Previous evolutionary algorithms have evolved networks recognition task. On all three challenges DSE outperforms that have some of the aforementioned network properties. HyperNEAT, suggesting that it is an encoding that deserves Most such algorithms are generative encodings [17], wherein additional investigation. information in a genotype can influence multiple parts of a phenotype, instead of direct encodings, wherein genotypic 2. DESCRIPTION OF DSE information specifies separate aspects of phenotypes. Gen- erative encodings tend to produce more regular phenotypes 2.1 Phenotype that outperform direct encodings [3, 8, 10] and can produce The phenotype of the individual is the network: modular and hierarchical structures [10]. Individual genera- G =(V,E) = ([v1 ... vN ], eij ), (1) tive genomes can also produce networks of various sizes, as { } appropriate [16, 9, 7, 5]. We are not aware of research that where V is a vector of nodes and E is a set of connections. has investigated whether evolved networks are scale-free. Nodes and connections can have attribute values, which play Two generative encodings that DSE is inspired by have be- a role either in development or in computation. Weighted come well-known in part because of their ability to produce connections have a weight attribute and plastic connections important network properties. Cellular Encoding (CE) iter- can have a number of attributes (parameters). The represen- atively applies instructions to manipulate network nodes and tation of G is a bit different from conventional notation for links. The encoding allows cells to call functions, which in- directed graphs, where nodes and connections are contained volve an entire sequence of instructions (encoded in a gram- in sets and the attributes are eventually imposed using func- mar tree), facilitating the production of regular, modular, tions. This departure follows from the assumed sequential and hierarchical networks [5]. Intriguingly, because CE al- model of computation, in which nodes are evaluated in an lows a repeated symbol to unpack into the same subnetwork, ordered manner. It is therefore natural to have them explic- the exhibited regularity, while high, tends to be a formulaic itly ordered in a vector. repetition of the same subnetwork, without complex varia- The vector V is composed of 3 contiguous subvectors: tion. input Vu, hidden Vh and output Vy. Input and output HyperNEAT generates patterns in neural networks via an subvectors constitute the network interface and their length abstraction of biological development [16]. Just as in natu- depends on the problem. The hidden part of the network is ral developing organisms, phenotypic attributes (in this case, free to vary in size. edge weights in a network) are a function of evolved geomet- DSE network phenotypes are executed similarly to arti- ric coordinate frames. HyperNEAT departs from nature, ficial neural networks. The network is initialized before an however, by not having an iterative growth process. A sin- evaluation by zeroing the values of all nodes and setting gle HyperNEAT genome can encode networks of different the weights of any plastic connections to their initial values. sizes; one was scaled up from thousands of connections to Then, for any input vector u, the network is evaluated as millions and still performed well [16]. The neural wiring pat- follows: terns HyperNEAT produces are highly functional, complex, 1: Assign inputs Vu := u and regular, including symmetries and repetition, with and 2: For each vj [Vh Vy] do without variation [3]. However, HyperNEAT struggles to ∈ 3: Evaluate node: xj := fj ( eij (xi) i=1,...,n ) produce modular networks [2] unless it is manually injected { } i 4: For each connection eij do with a bias towards modular wiring patterns [20]. 5: Update eij (if applicable) Intriguingly, CE excels at producing modularity, hierar- 6: Return Vy, chy, and formulaic regularity, but fails to generate complex where x denotes the node’s value, e (x ) denotes the regularities with variation. HyperNEAT, on the other hand, j ij i operation performed by the connection (usually multiplying excels at producing complex regularities, but tends to create the source node x value by the weight of the connection), non-modular networks. This raises the question of whether i and f denotes a transfer function assigned to node j.The an encoding that combines the iterative growth of CE with j transfer function operates on the set of incoming signals, pattern-generating capabilities similar to HyperNEAT can which are usually aggregated by summing. While DSE can create modular, hierarchical networks with complex regu- handle plastic connections, network plasticity is not explored larities. DSE was designed with this motivation. in this paper and is therefore not described further. We will describe DSE (see [19] for additional details) and then present preliminary investigations into the properties 2.2 Genotype of the networks it evolves. The properties under focus are The genotype is a program to grow the network. The the modular, scalable, and scale-free nature of DSE net- program is tree-structured, with nodes being routines. The works. We explicitly reward for networks with these prop- root of the tree constitutes the main routine and is suffi- erties to test if DSE can produce them, and compare the cient to solve many test problems. Each routine tree has results to HyperNEAT. Unfortunately, we were unable to the same structure: γ =(R, Body, Tail,γ ), where R is an obtain a functioning distribution of CE, making a compar- γ identifier, Body is the main list of instructions, Tail is a list ison with that encoding impossible. By explicitly selecting of terminating instructions, and γ is a set of subroutines. for a desired property, we can study an encoding’s ability γ The whole genomic program Γ, i.e. the tree of γ routines, to produce it. While we could have evolved on a problem grows the network from its initial state into its final state, i.e. where it is supposed that scale-free, modular, scalable net- G =Γ(G0). The main component of the routine is Body, works would be beneficial, it is often difficult to ensure that τ which contains instructions that act on the network to de- these properties are beneficial on a specific problem. To per- velop it. Each instruction in Body can be seen as a function
1524 λ taking the network Gt and returning a new network, i.e. to the Tail and the covering operator, which automatically Gt+1 = λ(Gt). The instructions and their properties are inserts Term instructions for any non-terminal symbol in V. described in the subsequent list. Note that the arguments Figure 1 provides a simple example of network develop- X and Y following an instruction can select many nodes ment. Multiplication nodes (marked with ) without inputs at once, allowing operations to be carried out on groups of produce an output of 1.0. If the output∗ is thresholded at nodes. A central concept of DSE is that nodes are identified 0.5, the network solves the logical equals function. by an index and a label. Nodes having the same label make An important feature of DSE is the reuse of code via rou- up groups (e.g. inputs), and instructions can act on all nodes tines. For example, when the instruction Call RXis exe- from a group at once. Within groups, index values provide cuted, if a subroutine with an identifier matching R exists in a geometry for patterns of nodes to be selected and acted the list of subroutines, development replaces X with a new upon. subnetwork. Specifically, the new subnetwork takes on the inputs and outputs of X,andX becomes the hidden node 1. Con XY C: connect nodes X and Y , with the connec- in the newly developed subnetwork. tion having the same attribute values as the reference Figure 2 provides an example of this feature, and also connection C. illustrates how DSE can encode networks with regularity, 2. Cut XY: cut connection(s) between X and Y . modularity, and scalability. This network evolved to solve 3. ConE XY CE: connect nodes X and Y that satisfy the n-bit parity problem, which identifies whether a given the expression E, using C as reference connection. input vector contains an even number of ones. The net- 4. CutE XY E: cut connection(s) between X and Y that work is composed of n-input versions of two logical functions: satisfy the expression E. NAND (n), which returns 1 for 0 inputs and otherwise re- 5. DivP XY: divide node(s) X in parallel, which dupli- turns 1 iffall inputs are 0; and OR ( ), which returns 0 for | cates the node(s) along with all its connections, places 0 inputs and otherwise returns 1 if at least one input is 1. the new node(s) in V right after the original and as- The solution is partially scalable in that a single genotype signs label Y to them. encodes networks that solve several different problem sizes: 6. DivS XY: same as DivP, except the division is sequen- the genotype produces valid solutions for 2, 3,...,9 inputs, tial, which means that instead of duplicating connec- but it fails for higher numbers. A subroutine (labeled r214 in tions, Y takes over the outgoing connections of X and Figure 2), which is a genotypic module, produces a five-node a connection from X to Y is created. phenotypic module that computes a 2-input XOR function. 7. Subst XY: substitute node symbols X with Y . The routine is repeatedly called to generate a regular pattern 8. Call RX: call the subroutine R for each node X. of three of these subnetwork building blocks. This genome 9. Term XY: like Subst, except Y is necessarily terminal, has thus discovered the key regularity of the problem, which i.e. it denotes a transfer function. would likely enhance its evolvability were it further evolved to solve problems with more than 9 inputs, since doing so These instructions provide DSE with interesting capa- would only require appropriately calling the r214 subroutine. bilities motivated from Cellular Encoding [5] and Hyper- NEAT [16]. As in Cellular Encoding, nodes can divide in 2.4 Genetic operators a parallel or sequential manner, and subroutines can be The four operators that modify the genotype are muta- called. These features may encourage modularity and hi- tion, crossover, cleaning and covering. We describe them in erarchy. Similar to HyperNEAT, complex geometric pat- general terms, because they are not essential to the encoding terns can be evolved with an expression tree (E), which is and different implementations could be chosen. a genetic program that takes arguments derived from index Mutations are performed with a fixed probability (here values of nodes, which correspond to geometric coordinates 0.8) during reproduction. They start with the main rou- in HyperNEAT. Expression tree outputs determine connec- tine and recurse deeper into the tree, and can insert a new tion weights via the ConE and CutE instructions. The Con instruction, delete an instruction, mutate an instruction’s and Cut operations can also create regular patterns by se- argument, or duplicate an instruction while mutating an ar- lecting groups of nodes, or can adjust individual connections. gument. A mutated argument is changed to undetermined The ability to create regular and irregular changes, including and is determined again during the next development of changes to single weights, can improve the performance of the network. Undetermined arguments are determined dur- an encoding [3]. The substitution operation can also create ing network development to select a randomly-chosen set of regular patterns by selecting groups of nodes and assigning nodes already in the network (otherwise, arguments would transfer functions to them. not match nodes in the network and instructions would be An initial population of genotypes consists of either empty mostly ineffective). This pertains to all node-selecting argu- or randomly-generated genotypes, the latter of which have ments (denoted as X and Y ); expression trees are mutated instructions with arguments that are undetermined (explained as is typically done in Genetic Programming [15]. in Section 2.4). The crossover operator transfers two randomly selected instructions from a donor’s routine to a recipient’s routine, 2.3 Development much as in Linear Genetic Programming [15]. The cleaning Development starts from an initial network G0 with an op- operator periodically (here, 5% of reproductions) removes tional hidden node called A0 that is fully connected (weights genotypic instructions that have no effect. This operator = 1) to inputs and outputs. The final network Gτ is devel- improves genotype readability and can impact search by re- oped by executing the Body and Tail parts of the genomic moving neutral variation, which improved performance on program, i.e. Gτ =Γ(G0). some problems in preliminary experiments, but further ex- The network Gτ is guaranteed to be functionally valid due periments are necessary to determine its impact.
1525 Initial network DivP A0 G Con u y0 w:-0.58 Cut u G Tail, final network
Body: u0 u1 u0 u1 u1 u0 u0 u1 u0 u1 DivP A0 G Con u y0 w:-0.58 Cut u G Tail: A0 A0 G0 w:-0.58 A0 G0 w:-0.58 w:-0.58 A0 G0 w:-0.58 w:-0.58 * * w:-0.58 Term A * Term y + Term G * y0 y0 y0 y0 +
Figure 1: Example of network development. The three Body instructions in the genome (left) are executed before the Tail, which assigns valid transfer functions. The final network performs the logical equivalence (equals) operation. Input nodes have irregular ovals and the output is the lowest node.
u0 15: |
u1 11: | 17: n 18: n y0: n u2 7: | 13: n 14: n 16: n 3: | 9: n 10: n 12: n 5: n 6: n 8: n Body: Call r214 A ConE n | 1 (+ (= j j)(= i i)) Call r214 |1 Call r214 |1 Call r214 |1 Call r214 |1 Call r214 |1 4: n Call r214 |1 Call r214 |1 DivP |0 R Tail: Term y n Term D n Term R n
Routine r214 Body: DivS u1 B CutE u A (i) DivS A D ConE u B 1 (i) Tail: Term A | Term y n Term B |
Figure 2: A modular, partially-scalable solution to the n-bit parity problem. The visualized network is developed for 3 inputs (irregular ovals), although the genome produces solutions for 2, 3, . . . , 9 inputs. The output is the rightmost node.
Unlike the previously-described operators, covering is not A single HyperNEAT genotype generates networks for 7 applied during reproduction, but at the end of development. different substrate sizes: (i +3)2, where i =0,...,6, and for For each non-terminal symbol X remaining in V at the end each of these resolutions the scale-free property is measured. of Tail, the operator appends a new Term XY instruction. The fitness measure for each network size is Covering may be applied for each routine or for the main fi =100+B(P, q) (nc 1), (4) routine only. Either way it guarantees that all the nodes in − − the final network are assigned valid transfer functions. where nc is the number of separate network components. The second term penalizes networks with disconnected pieces. 3. EXPERIMENTS The 100 ensures all fitnesses are positive. In all experiments the population size and number of gen- DSE is similarly challenged with generating genotypes erations is 200, and networks are feed-forward. Experiments that encode for networks that maximize the scale-free prop- 2 are implemented identically in DSE and HyperNEAT, with erty and have approximate sizes of ni =(i +3),i= default parameter settings. Full details for HyperNEAT are nu =0,...,6. The genome produces different sized net- in Stanley et al. [16]. Briefly, HyperNEAT evolves genomic works based on the number of inputs nu. To encourage networks that encode the weights of neural network pheno- correct network sizes, an additional penalty term is added types as a function of the geometric location of the weights. to the DSE fitness measure for each network size
fi =1 B(P, q)+(nc 1) + 1 n/ni , (5) 3.1 Evolving scale-free networks − − | − | Networks are scale-free when the distribution of the num- where n is the number of nodes in the developed network. ber of connections per node (its degree) follows a power law, Contrary to HyperNEAT, the DSE fitness function is min- such that the proportion of nodes having degree k is: imized. In both cases the overall fitness function is the fol- −γ lowing weighted average from all fitness cases: P (k) k , (2) ∝ fi√ni where γ is an exponent parameter, typically in the range F = #i . (6) [2, 3] (set to 2 in our experiments). To test whether DSE #i √ni and HyperNEAT can produce scalable, scale-free networks, This equation weights the results for bigger networks as more we rewarded genotypes that produce scale-free networks at important than smaller networks in square root proportions. several different target sizes. Because the two fitness functions differ, we will compare only The scale-free property is measured with the Bhattacharyya the B coefficients of evolved networks. coefficient as the overlap between the distributions P (the The results from 30 runs of DSE and HyperNEAT indicate power law distribution) and q (the distribution of the evolved that both encodings produce networks at different network network): sizes that are scale-free (Figure 3). The DSE results are slightly, but significantly, better (p<0.001, permutation B(P, q)=! "P (k)q(k). (3) k test), although the networks from both encodings had B
1526 distributed uniformly randomly. In other words, Q indicates Table 1: Target and evolved sizes of networks how much the modularity of the given network differs from evolved with DSE in both scale-free and modular a network with all random connections. For a fully random networks experiments. network, Q is about 0, while values above 0.3 indicate some Target size 9 16 25 36 49 64 81 Scale-free modularity in the network, and the the highest Q observed Mean size 8.7 16.5 25.4 36.3 49.0 64.1 76.3 at the time of publication was 0.807 [13]. Computation of Std. dev. 0.4 0.7 0.7 0.8 1.4 1.6 3.6 the Q coefficient requires the network to be partitioned into Modular modules in advance, otherwise it is not possible to calculate Mean size 9.2 18.2 28.0 39.3 52.5 65.3 76.5 the value eij . Ideally Q would be calculated for all possible Std. dev. 0.4 1.8 2.0 2.9 2.6 2.8 2.8 partitions of the network and the partition maximizing Q would be selected. In practice this exhaustive approach is computationally prohibitive, so a partition is selected with coefficients near the maximum value of 1.0. The plotted a greedy hill-climbing approach for which Q is the maxi- B coefficients are weighted in the same way that fitness is mum [13]. The algorithm starts by considering each node weighted in eq. 6. DSE was also nearly perfect at producing a separate module and iteratively joins modules for which networks of the target size (Table 1). HyperNEAT networks the increase in Q is the highest. The complexity of the al- had fixed, predefined sizes, and thus cannot be compared on gorithm is O((m + n)n), where n and m respectively denote this dimension. the number of nodes and connections. Visualizing the networks reveals interesting differences in DSE significantly outperforms HyperNEAT at evolving the types of networks each encoding produces (Figures 4 modular networks of different sizes (Figure 6, p<0.001, and 5; produced in Graphviz with the radial layout). While permutation test). The Q values plotted are weighted iden- both encodings produce regular networks, the DSE networks tically to the B values in the previous section, averaged over appear more modular, even though modularity was not re- 20 runs. Interestingly, DSE networks are initially quite mod- warded in these experiments. By quantifying modularity ular, and reach near-maximum modularity within a few gen- with the Q metric [14] (see the next section), we validate erations, whereas HyperNEAT required many generations that DSE networks from these experiments are significantly to approach its maximum, revealing the default tendencies more modular (DSE Q =0.48 0.08, HyperNEAT Q = of these encodings. DSE also evolved to produce networks .04 .05,p < 0.001, permutation± test). The DSE networks closely approximating the required size (Table 1). Interest- also± appear hierarchical in that nodes typically have only one ingly, additional experiments where HyperNEAT genomes path through multiple other nodes to hub nodes, although were tested on only one network size reveal that HyperNEAT quantifying hierarchy is beyond the scope of this paper. can produce highly-modular networks (data not shown), sug- gesting that the need to simultaneously produce modularity 1 on networks of different sizes is a unique challenge for Hy- perNEAT, but not DSE. 0.9 Figures 7 and 8 (Graphviz with spring-model layout) show the best evolved networks from both methods for increasing 0.8 numbers of inputs (DSE) or increasing network size (Hyper-
B NEAT). Evolved networks display some topological patterns 0.7 that scale up in interesting ways. The DSE networks resem- ble starfish that have one more arm than the number of 0.6 DSE, m.w.B = .9474 ± .0038 inputs. The HyperNEAT networks consist of clusters that H−NEAT, m.w.B = .9235± .0097 increase in number along with the network size, but are too 0.5 0 50 100 150 200 densely interconnected to yield a high average Q score. Generation To further probe differences between the encodings on properties not directly selected for, we measured the scale- Figure 3: Both DSE and HyperNEAT can evolve free property for the networks from this experiment. Al- genomes that produce networks of different size that though they were not selected for it, the DSE networks were are scale-free. Plotted are weighted averages SD significantly more scale-free (DSE B =0.67 0.07, Hyper- ± of the best B coefficients from 30 runs. ± NEAT B =0.50 0.18,p<0.001, permutation test). ± 3.3 A visual pattern recognition problem 3.2 Evolving modular networks It is also important to test if DSE is competitive with Testing the ability to evolve modular networks is formu- cutting-edge generative encodings at solving problems. We lated identically to the scale-free test, except the quantity chose a visual pattern recognition problem that HyperNEAT maximized is the modularity measure Q from Newman and was recently tested on [4], which itself was motivated by a Girvan [14, 13]. Let eij denote the proportion of connections problem from the paper that introduced HyperNEAT [16]. between the i-th and j-th group of nodes (a module), and In this problem, 3 shapes are visible and the network has let ai = #j eij , then to identify the location of a target shape while ignoring the 2 other two shapes. Each shape fits within a 3 3 grid of Q = !(eii ai) (7) × − pixels (Figure 9) and all 3 shapes are distributed on a 9 9 i pixel canvas. The S shape is most unique and thus easiest× to is the proportion of all connections within modules minus an distinguish. The networks are fixed-size, with a 9 9 input expected value of the same quantity if the connections were sheet of nodes and a corresponding 9 9 output sheet.× For ×
1527 8: +0 13: +1
11: +1
8: +0
14: +0 12: +0
Body: DivS u R Subst R R DivP u X Cut P0 A DivP X3 C Subst X0 + 7: +0
9: +1 15: +1 10: +0 3: +1
9: +0
7: +0 Cut R X0 Subst X1 + DivS A0 O Cut +0 R DivP X O Subst T2 V 6: +0
6: +0
6: +0
12: +0
7: +1 Cut R A0 DivS A0 W Cut u R Subst U + DivS u P Cut P0 O0 4: +0 16: +0 u1 DivS O3 P3 DivS u W Subst W0 W0 Subst +0 + Cut P0 X2 Con W0 P0 1 5: +0
20: +0 5: +0 0: +0 5: +0
10: +0 11: +0 3: +0
4: +0 Cut P0 W0 DivP A0 U Subst T2 A0 DivP X4 O Subst +0 R DivP O S 2: +0
17: +0
2: +0 1: +0
Con W0 R 1 DivS U0 S Con U P 1 DivP u T DivS X5 R Con R1 + 1 22: +0
4: +0 u0
14: +0
15: +0 Con O0 R 1 Cut P0 R Subst T2 + DivP C L Con W W 1 DivS W N 3: +0
19: +0 23: +0 21: +0
u0
24: +0 8: +1
2: +0 Cut W W2 DivP A0 L Subst U S0 Subst R1 + Con W N0 1 DivS S4 W 16: +0
1: +0 13: +0 18: +0 DivS W Z DivP X2 S Cut P0 W Cut P0 N0 nu : 0, n : 9, B : 0.92 nu : 1, n : 17, B : 0.92 nu : 2, n : 25, B : 0.95
28: +3
34: +4
33: +3
22: +2
27: +2
16: +1
32: +2
21: +3
9: +4
u4
11: +5 u5 26: +4
23: +3 21: +1 18: +2
16: +2
10: +4
7: +3 26: +1 31: +1 8: +3
u4
11: +1
u3
20: +2
32: +2 u3
25: +1 37: +1
25: +0 22: +4 9: +3 19: +1 5: +2
13: +1 31: +1 30: +0
33: +2 15: +1
38: +2
51: +0
27: +5 26: +2
u3
20: +0 52: +1 19: +3 30: +0
24: +4 24: +0
36: +0 18: +0 15: +0 17: +3 39: +2 6: +2 34: +2 7: +2
36: +0
61: +0
24: +0 20: +2 u2 17: +1
62: +1 37: +1 27: +2
21: +3 40: +2
25: +1 12: +2 53: +2
35: +2 u2
u2 8: +2 16: +2
14: +2
41: +2
63: +2
28: +2
36: +3 6: +1 21: +2 4: +1 u2 12: +0
18: +2 38: +2
5: +1 42: +3
10: +0 37: +3 13: +1 8: +0 14: +0 7: +1 u1
29: +2 43: +3
39: +3 11: +0 26: +2 9: +1 11: +1
15: +1 u1
22: +2 u1 54: +3 10: +0 5: +0 45: +3 u1
38: +3
44: +3 64: +3 13: +0
30: +3 9: +0
u0
12: +0 6: +0
7: +0
8: +0 4: +0
40: +3
62: +0 61: +0 46: +3 u0
6: +0 3: +0
23: +2 u0 23: +4 72: +0
73: +0 41: +3
47: +3 39: +3 31: +3
46: +0 20: +3 33: +3 28: +5
45: +0
u0
55: +4 17: +2
42: +4 25: +4
48: +4 33: +0
65: +4
14: +1 18: +3
22: +3
14: +2 64: +0
32: +0 43: +4 49: +4 63: +0 19: +2 32: +3 15: +2
60: +0
31: +0 34: +3 16: +1
12: +1 17: +2 46: +4 75: +0 52: +4 13: +2 66: +5
74: +0 44: +0 48: +0
47: +0 71: +0
47: +4 27: +0 10: +1 35: +3 53: +4
34: +0 35: +0
50: +4
56: +4
24: +4 51: +4
19: +3 45: +4
56: +0 59: +0
57: +0
58: +5
28: +0
40: +0
59: +5
60: +5 41: +0
70: +0
29: +5 43: +0 50: +4 68: +0 67: +0 30: +0
49: +4
55: +4
44: +4
23: +3
29: +4
57: +5 58: +0
69: +0 54: +4
29: +0 35: +5 42: +0 48: +4
nu : 3, n : 36, B : 0.96 nu : 4, n : 49, B : 0.96 nu : 5, n : 65, B : 0.96 nu : 6, n : 76, B : 0.97
Figure 4: The networks produced by the DSE genome that evolved the highest average B coefficient. nu = number of inputs, n = network size, and B measures the scale-free property.
n2 n3
n2 n3
n4
n4
n1 n10
n1 n3
n11
n9 n8
n2
n12
n13
n10
n9
n14
n11
n19
n10
n12
n20
n21
n17 n11
n22
n18
n15 n23
n30
n19
n34
n28
n43
n13
n24
n19
n16
n5
n29
n20
n39
n4
n53 n30
n22
n17
n21
n34
n5
n31
n57
n43
n27
n74
n56 n73
n44 n25
n42 n32 n72
n49
n18
n25
n64
n31 n12
n48
n63 n36
n33
n15
n26
n55
n14 n41
n35
n20
n22 n54 n1
n38 n40 n5 n2
n6 n3
n35 n6 n2 n27
n46 n7 n2 n1
n33 n28
n5
n6 n39
n7 n6 n8 n1 n32 n23 n45
n8 n7 n1
n3 n4
n23 n21 n24
n28 n1
n37
n9 n23
n13 n16 n6
n40
n14
n36
n7 n9 n27 n8 n14
n15 n7
n8 n12 n10 n29
n41 n16
n18
n24
n26
n42 n0 n5 n13 n17
n34
n7
n10 n9
n0
n0
n25
n17
n63 n47 n30 n12
n14 n11 n35
n48
n6
n48
n20
n62
n26 n11 n4 n24
n36 n8
n49
n15 n5
n47
n61
n4 n2
n3 n6 n3 n80 n4 n50
n37
n29
n60 n16 n35 n79
n51
n46 n12 n24
n38
n0 n0 n78 n30
n52
n59 n0 n15 n0
n42
n45 n18 n34
n77
n56
n58
n31
n43
n76
n57
n13 n23
n44 n8
n55
n75
n9 n58 n19 n33 n44
n32
n54
n71
n59 n41
n45
n33 n70
n53 n60
n46
n40
n69 n61 n52
n37 n20 n32 n47
n51 n68 n39 n62 n15 n22 n50
n38 n14 n67 n65
n66
n21 n31
n5
n10 n16 n21
n22 n29
n13
n17 n19 n25 n28
n7 n11 n18 n26 n27 n : 9, B : 0.85 n : 16, B : 0.89 n : 25, B : 0.93 n : 36, B : 0.95 n : 49, B : 0.96 n : 64, B : 0.95 n : 81, B : 0.95
Figure 5: Network phenotypes for the HyperNEAT genotype that evolved the highest average B coefficient. a single evaluation, shapes are placed at random locations 0.7 without overlap. A network evaluation is considered a suc- 0.6 cess if the output node with the highest activation value is in the same location as the center of the target shape, oth- 0.5 erwise an error is scored. Averaged over 20 evaluations, the 0.4
fitness for DSE genotypes is the error rate (to be minimized) Q 0.3 and for HyperNEAT is the success rate (to be maximized). There are 3 variants of the problem, with each shape respec- 0.2 ± tively being the target and the other two being distractions. 0.1 DSE, m.w.Q = .6772 .0062 Because this task is two-dimensional, DSE uses 2D vari- H−NEAT, m.w.Q = .4248± .0362 0 ants of the ConE and CutE instructions, which can create 0 50 100 150 200 connectivity patterns via an expression tree. Since these Generation are the only two instructions in this experiment, networks remain a fixed size. Nodes have two indexes as in a matrix. Figure 6: Resultant modularity values (weighted A key element of both encodings is that evolved functions mean Q SD) when modularity is selected for. create geometric patterns that specify network connectiv- ± ity. In DSE, such patterns are created by expression trees (Section 2.2). In both encodings, the functions that produce has a default set of operations following previous studies [16, geometric patterns have 8 arguments: the x and y values of 3], with sigmoid, sine, Gaussian, and bounded linear func- the input and output nodes (normalized to [0, 1)), ∆x and tions. DSE has +, , , abs(), <, Gauss(), and constants − ∗ ∆y (the difference between input and output nodes in the x randomly generated from a normal distribution (mean 0 and y dimensions), and the straight-line distance and polar 1 SD). For both HyperNEAT and DSE, the output of angle between input and output nodes. For DSE, the node an± evolved pattern-generating function specifies a network coordinates come from normalizing their indexes. weight (set to 0 if between -0.2 and 0.2). Another important One of the main differences between HyperNEAT and ex- difference is that in DSE weights produced by subsequent pression trees are the primitive (building block) operations ConE2D instructions accumulate. CutE2D instructions cut allowed in both systems. In these experiments, HyperNEAT any connections for which the expression value is positive.
1528 u1
12: +1 18: +0
28: +0
14: +1
17: +0 27: +0
37: +0
Body: Con A A 1 DivS A5 O DivS u A 15: +0 25: +0
13: +1
16: +0 26: +0
15: +1
38: +0 DivS u4 A0 DivS A S DivP S E DivS A5 H 35: +0
14: +0 24: +0 36: +0
16: +1
DivS A C DivS A X DivP C G DivS C4 G 34: +0
23: +0
13: +0 18: +1
33: +0
DivS X4 C DivS P D DivS X C2 DivS A D 17: +1 20: +1
11: +0
22: +0
31: +0
21: +0
19: +1 19: +1 Cut E X0 DivP D M 12: +0 32: +0
10: +0 17: +1
20: +0
30: +0
18: +1
13: +1
16: +1
15: +1
14: +1
10: +0 11: +1 8: +0 9: +0 28: +2 u1
11: +0
12: +1 8: +0
26: +2
8: +0
9: +0 6: +0
27: +2
7: +0
8: +0
7: +0
25: +2
6: +0 7: +0
5: +0
6: +0
9: +0 10: +0 29: +2
6: +0 24: +2 5: +0 4: +0
3: +0 7: +0 4: +0
3: +0 23: +2 5: +0 4: +0
1: +0
3: +0
5: +0 22: +2 4: +0
3: +0 2: +0
1: +0 2: +0 21: +2
0: +0
u0
2: +0 u0 u0 u2
nu : 0, n : 9, Q : 0.30 nu : 1, n : 19, Q : 0.55 nu : 2, n : 29, Q : 0.65 nu : 3, n : 39, Q : 0.69
u4
u4
6: +4
8: +4
5: +4
7: +4
7: +4
9: +4
u1
16: +4 u2
6: +4
12: +4
10: +4
8: +4
26: +1
13: +4
11: +4
22: +2
14: +4
9: +4 28: +1
27: +1
15: +4
24: +2
11: +4
23: +2
10: +4
53: +4 29: +1
55: +4
u2
25: +2
81: +0 12: +4
13: +4
54: +4
34: +2 30: +1 56: +4 15: +4
82: +0 79: +0 14: +4
36: +2 34: +1
u3 57: +4
26: +2 80: +0
52: +4
35: +2
54: +4
32: +1 33: +3 30: +2 58: +4 31: +3 78: +0 31: +1
37: +2
53: +4
27: +2 28: +2 59: +4
61: +4 55: +4 77: +0
34: +3 32: +3
33: +1
38: +2
56: +4 63: +4 60: +4
35: +3 75: +0
37: +2 62: +4
76: +0
57: +4 29: +2 42: +2 37: +3 40: +2 21: +1 u2 39: +2 35: +2 38: +2
40: +2 74: +0 42: +2
14: +1 41: +2
36: +3 39: +2
36: +2
58: +4
38: +3 39: +3 18: +1 72: +5
60: +4 43: +2
u1 13: +1 18: +0
u0 16: +1 20: +1 24: +0 17: +1 24: +0
19: +1 16: +0 40: +0 51: +3
62: +4 41: +2 71: +5 59: +4
15: +1 19: +0
61: +4
21: +0 69: +5
17: +0 20: +0 22: +0
25: +0 22: +0
23: +0 23: +0 41: +0
70: +5 11: +0 73: +5
49: +3 42: +0 63: +0
50: +3
68: +5
21: +0 32: +1 52: +3
10: +0 50: +3
65: +0
64: +0 43: +0 9: +0
30: +1 33: +1
48: +3 67: +5
20: +0
31: +1 48: +3
65: +5
51: +3
8: +0 18: +0
49: +3
12: +0
66: +0
29: +1 44: +0
66: +5
47: +3 19: +0
47: +3 64: +5
7: +0 17: +0
28: +1
46: +0
26: +1 5: +0 67: +0
46: +3 45: +3 45: +0 u5 46: +3
u0
44: +3
48: +0
27: +1
6: +0
25: +1
68: +0
69: +0 47: +0 45: +3 44: +3
4: +0
43: +3
u1
71: +0
70: +0
u0
u3 u3
nu : 4, n : 49, Q : 0.72 nu : 5, n : 72, Q : 0.77 nu : 6, n : 83, Q : 0.79
Figure 7: The DSE genotype that produced the highest Q coefficient and the 7 networks it develops for different numbers of inputs nu =0,...,6.
n61
n59
n5 n4 n60 n58
n53
n54
n68 n63 n57
n51
n69 n62
n60 n50 n57 n58
n67
n59 n52
n3 n71
n66 n39
n56 n55
n38 n70 n49
n24
n37 n61 n23 n56
n2 n0 n50 n65 n40 n6 n48 n7 n62 n44
n41 n49 n36 n48 n80
n37
n21 n45
n22 n55
n32 n38 n64 n40 n79 n77
n16 n51
n63 n34 n36
n41 n1 n33 n52 n46 n44
n42 n8 n47 n54
n78 n35 n32 n74
n72 n34 n46
n20 n33 n18
n35 n32 n43 n17 n33 n31 n47 n48 n53 n47 n73 n45
n30 n39 n76
n75 n29 n43
n15 n34
n30
n23
: : n24 n21 n 9 Q 0 35 n45 n19 n14 n22 n28 , . n42
n31 n36
n28
n35 n33
n20 n46 n37
n28 n29 n30 n31 n34 n25 n11 n29 n13 n10 n40 n27 n39
n26 n43
n16 n18
n21 n32 n19 n27
n17 n30 n44 n35
n15 n15 n19
n12 n18 n23 n31 n18 n17 n41 n42 n38 n16 n16
n22
n1 n25 n21 n14
n18
n3 n20 n24 n26
n19
n14 n29 n8
n24 n21 n9 n20 n14
n13
n12 n13
n19 n20 n22
n5 n25 n23
n10
n7 n28
n27 n2 n22 n9 n17
n24 n26
n0 n26
n23 n0
n15 n12 n6 n27
n12 n8 n25 n6 n13 n0 n13 n14
n7 n10
n8
n1 n9 n14
n0
n11 n15
n11
n0 n16
n8 n12 n0 n4 n15 n1 n11 n17
n8 n9 n1
n5 n10
n7 n3 n3 n11
n13 n5
n4
n12 n2
n2
n9
n10 n6 n9 n5 n3 n3 n8 n3 n4 n5 n6 n2 n2
n5 n1
n7 n6
n11
n2 n6 n7 n7 n4 n4 n4 n1 n10 n : 16, Q : 0.56 n : 25, Q : 0.59 n : 36, Q : 0.57 n : 49, Q : 0.57 n : 64, Q : 0.54 n : 81, Q : 0.54
Figure 8: Network phenotypes for the HyperNEAT genome with the highest Q value.
We perform 30 evolutionary runs for each of the three ficial on related problems. To test whether instructions from task variants. DSE substantially and significantly outper- one task aid in solving another task, we implemented an is- forms HyperNEAT on all three problem variants (Table 2, land model, where organisms on three islands are rewarded p<0.01). These scores underrepresent DSE’s advantage, for their performance on the three different task variants. because plots over time (not shown) suggest that only DSE Migrants travel between islands, where they can be crossed would improve with additional generations on the X and with native individuals. 5% of individuals migrate between O problems, and because DSE achieved near-perfect fitness each pair of islands each generation, crossover occurs 40% of on the S problem much earlier than HyperNEAT. While the time, and 60% of genomes are mutated. Exchanging ge- only from a single problem, these results suggest that DSE netic information between organisms rewarded for different can compete with cutting-edge generative encodings, such tasks should only be beneficial if instructions contain partial as HyperNEAT, especially given that this problem type was solutions common to multiple tasks. chosen to highlight HyperNEAT’s merits. DSE performance significantly improves with migration A benefit of encodings like DSE that have separate in- and crossover on the X and O problems (DSE 11, Table 2, structions or rules is that individual instructions can serve as p<0.01, permutation test). There was no meaningful room genetic modules that create wiring motifs that may be bene- for improvement on the S variant. To ensure that perfor- mance gains are due to exchanging beneficial instructions, we test DSE with migration, but not crossover (DSE 10) and with crossover, but not migration (DSE 01). Neither of these controls significantly improve performance over regular DSE (Table 2, p>0.05, permutation test), demonstrating that DSE can exchange functional building blocks between Figure 9: Shape X (cross), O (circle) and S (square). individuals working on similar problems to enhance perfor-
1529 5. REFERENCES Table 2: Error rates (percent wrong) on the pat- tern recognition problem. Averages of the final best [1] A. Barab´asi. Scale-Free Networks: A Decade and fitness values are shown, tested on 600 shape place- Beyond. Science, 1173299(412):325, 2009. ments to increase reporting accuracy. The letters [2] J. Clune, B. E. Beckmann, P. K. McKinley, and X, O, and S indicates which pattern was the target C. Ofria. Investigating whether HyperNEAT produces to be recognized. modular neural networks. In Proceedings of the X O S Genetic and Evolutionary Computation Conference, H-NEAT 67.1 5.0 63.6 4.2 1.3 5.5 pages 635–642. ACM, 2010. DSE 29.0 ± 22.8 20.4 ± 29.1 0.1 ± 0.3 [3] J. Clune, K. Stanley, R. Pennock, and C. Ofria. On ± ± ± DSE 10 31.8 20.0 12.1 21.9 0.0 0.1 the performance of indirect encoding across the DSE 01 32.6 ± 21.7 22.9 ± 29.8 0.1 ± 0.3 continuum of regularity. IEEE Transactions on DSE 11 14.2 ± 15.8 06.9 ± 15.3 0.0 ± 0.1 Evolutionary Computation, To appear, 2011. ± ± ± [4] O. J. Coleman. Evolving neural networks for visual processing. B.S. Thesis. U. New South Wales, 2010. [5] F. Gruau. Neural Network Synthesis using Cellular mance. This capability should be increasingly beneficial as Encoding and the Genetic Algorithm. PhD thesis, the number of objectives simultaneously being solved rises. Ecole Normale Sup´erieure de Lyon, France, 1994. It is an open question whether other leading generative en- codings, such as HyperNEAT, can similarly exchange partial [6] S. Harding and W. Banzhaf. Artificial development. Organic Computing solutions to different problems between individuals. , pages 201–219, 2008. To illustrate how expression trees can solve problems by [7] S. Harding, J. Miller, and W. Banzhaf. Self modifying creating connectivity patterns, we describe one of the short- cartesian genetic programming: Parity. In est DSE solutions for the S problem variant, which was the Evolutionary Computation, 2009. CEC’09. IEEE easiest. This solution contained only one instruction that Congress on, pages 285–292. IEEE, 2009. had an effect: ConE2D u y (abs (< u2 0.22)). It creates con- [8] G. Hornby, H. Lipson, and J. Pollack. Generative nections between all input-output node pairs with a straight- representations for the automated design of modular line distance between them of less than 0.22, which connects physical robots. IEEE Transactions on Robotics and each output node to its 3 3-neighborhood counterpart in Automation, 19(4):703–719, 2003. the input layer. This connectivity× pattern ensures that the [9] G. Hornby and J. Pollack. Creating high-level highest activation in the output layer is the node correspond- components with a generative representation for ing to the center of the square shape. body-brain evolution. Artificial Life, 8(3), 2002. [10] G. S. Hornby. Measuring, enabling and comparing modularity, regularity and hierarchy in evolutionary 4. DISCUSSION AND CONCLUSION design. In Proceedings of the Genetic and Evolutionary This paper demonstrates that a novel encoding for evolv- Computation Conference.ACM,2005. ing networks, DSE, can produce networks that are regular, [11] N. Kashtan and U. Alon. Spontaneous evolution of modular, scalable, and scale-free. The networks also appear modularity and network motifs. Proc. Natl. Acad. Sci. hierarchical, although additional work is needed to quan- U.S.A., 102(39):13773, 2005. tify this property. As such, DSE produces networks with [12] H. Lipson. Principles of modularity, regularity, and many properties thought to enhance evolvability and per- hierarchy for scalable systems. Journal of Biological formance [12, 11, 1]. DSE is interesting because it combines Physics and Chemistry, 7(4):125, 2007. concepts from two different camps of generative encodings. [13] M. E. J. Newman. Fast algorithm for detecting Like Cellular Encoding and L-Systems, DSE has a develop- community structure in networks. Physical Review E, mental process that iteratively rewrites symbols according 69(6):066133, 2004. to instructions. Like HyperNEAT, it can also specify con- [14] M. E. J. Newman and M. Girvan. Finding and nectivity as a function of evolved geometric patterns. Our evaluating community structure in networks. Physical results suggest that this hybridization can combine the best review E, 69(2):26113, 2004. attributes of both camps of generative encodings. For exam- [15] R. Poli, W. Langdon, and N. McPhee. A Field Guide ple, DSE can create scalable and modular networks (features to Genetic Programming. Lulu Press, 2008. typically associated with iterative-rewrite encodings [10, 8, [16] K. Stanley, D. D’Ambrosio, and J. Gauci. A 5]) with regular network patterns (HyperNEAT’s forte [16, hypercube-based encoding for evolving large-scale 3]). It is interesting, for instance, that DSE produced more neural networks. Artificial Life, 15(2):185–212, 2009. modular and scale-free networks than HyperNEAT when [17] K. Stanley and R. Miikkulainen. A taxonomy for these properties were not explicitly rewarded. Future work artificial embryogeny. Artificial Life, 9(2), 2003. is required to better understand the relative strengths and [18] G. Striedter. Principles of brain evolution. Sinauer weaknesses of both these camps of encodings and their hy- Associates Sunderland, MA, 2005. bridizations. DSE thus raises interesting questions about [19] M. Suchorzewski. Evolving scalable and modular what types of networks different encodings produce. It also adaptive networks with Developmental Symbolic appears to be a promising encoding in its own right worthy Encoding. Evolutionary Intelligence, To appear, 2011. of additional investigation. [20] P. Verbancsics and K. Stanley. Constraining Acknowledgments: NSF Postdoctoral Research Fellow- Connectivity to Encourage Modularity in U. Central Florida Tech Report ship in Biology to JeffClune (award number DBI-1003220). HyperNEAT. ,2010.
1530