<<

Revisiting the

Evolving Cellular Automata to Perform

Computations

Melanie Mitchell Peter T Hrab er and James P Crutcheld

Santa Fe Institute Working Pap er

Submitted to Complex

Abstract

We present results from an exp eriment similar to one p erformed by Packard in

which a is used to evolve cellular automata CA to p erform a particular

computational task Packard examined the frequency of evolved CA rules as a function of

Langtons parameter and interpreted the results of his exp eriment as giving evidence

for the following two hyp otheses CA rules able to p erform complex computations are

most likely to b e found near critical values which have b een claimed to correlate with

a b etween ordered and chaotic b ehavioral regimes for CA When CA

rules are evolved to p erform a complex computation will tend to select rules with

values close to the critical values Our exp eriment pro duced very dierent results and

we suggest that the interpretation of the original results is not correct We also review and

discuss issues related to dynamicalb ehavior classes and computation in CA

The main constructive results of our study are identifying the and comp etition

of computational strategies and analyzing the central role of symmetries in an evolutionary

In particular we demonstrate how symmetry breaking can imp ede the evolution

toward higher computational capability

1

Santa Fe Institute Old Pecos Trail Suite A Santa Fe New Mexico USA

Email mmsantafeedu pthsantafeedu

2

Physics Department University of California Berkeley CA USA

Email chaosgo jirab erkeleyedu

Intro duction

The notion of computation at the edge of chaos has gained considerable attention in the

study of complex systems and articial life eg This notion is related

to the broad question What is the relation b etween a computational systems ability for

complex information pro cessing and other measures of the systems b ehavior In particular

do es the ability for nontrivial computation require a systems dynamical b ehavior to b e near

a transition to chaos There has also b een considerable attention given to the notion of

the edge of chaos in the context of evolution In particular it has b een hyp othesized that

when biological systems must p erform complex computation in order to survive the pro cess

of evolution under natural selection tends to select such systems near a phase transition from

ordered to chaotic b ehavior

This pap er describ es a reexamination of one study that addressed these questions in

the context of cellular automata The results of the original study were interpreted

as evidence that an evolutionary pro cess in which cellularautomata rules are selected to

p erform a nontrivial computation preferentially selected rules near the transition to chaos

We show that this conclusion is neither supp orted by our exp erimental results nor consistent

with basic mathematical prop erties of the computation b eing evolved In the pro cess of

this demonstration we review and clarify notions relating to terms such as computation

dynamical b ehavior and edge of chaos in the context of cellular automata

Cellular Automata and Dynamics

Cellular automata CA are discrete spatiallyextended dynamical systems that have b een

studied extensively as mo dels of physical pro cesses and as computational devices

In its simplest form a CA consists of a spatial lattice of cel ls each of which at time

t can b e in one of K states We denote the lattice size or numb er of cells as L A CA has a

single xed rule used to up date each cell the rule maps from the states in a neighb orho o d

of cellseg the states of a cell and its nearest neighb orsto a single state which is the

up date value for the cell in question The lattice starts out with some initial conguration

of lo cal states and at each time step the states of all cells in the lattice are synchronously

up dated In the following we will use the term state to refer to the value of a single

celleg or and conguration to mean the of states over the entire lattice

The CA we will discuss in this pap er are all onedimensional with two p ossible states p er

cell and In a onedimensional CA the neighb orho o d of a cell includes the cell itself

and some numb er of neighb ors on either side of the cell The numb er of neighb ors on either

side of the center cell is referred to as the CAs radius r All of the simulations will b e of CA

with spatially p erio dic b oundary conditions ie the onedimensional lattice is viewed as a

circle with the right neighb or of the rightmost cell b eing the leftmost cell and vice versa

The equations of motion for a CA are often expressed in the form of a rule table This is

a lo okup table listing each of the neighb orho o d and the state to which the central

cell in that neighb orho o d is mapp ed For example Figure displays one p ossible rule table

for an elementary onedimensional twostate CA with radius r The lefthand column

gives the p ossible neighb orho o d congurations and the states in the righthand column

Figure Rule table for the elementarybinarystate nearest

3

neighb orCA with rule numb er The neighb orho o d

patterns are given in the lefthand column For each of these the

righthand column gives the output bit the center cells value at

the next time step

are referred to as the output bits of the rule table To run the CA this lo okup table is

applied to each neighb orho o d in the current lattice conguration resp ecting the choice of

b oundary conditions to pro duce the conguration at the next time step

A common metho d for examining the b ehavior of a twostate onedimensional CA is to

display its spacetime diagram a twodimensional picture that vertically strings together the

onedimensional CA lattice congurations at each successive time step with white squares

corresp onding to cells in state and black squares corresp onding to cells in state Two

such spacetime diagrams are displayed in Figure These show the actions of the Gacs

KurdyumovLevin GKL binarystate CA on two random initial congurations of dierent

densities of s In b oth cases over time the CA relaxes to a xed patternin one

case all s and in the other case all s These patterns are in fact xed p oints of the

GKL CA That is once reached further applications of the CA do not change the pattern

The GKL CA will b e discussed further b elow

CA are of interest as mo dels of physical pro cesses b ecause like many physical systems

they consist of a large numb er of simple comp onents cells which are mo died only by

lo cal interactions but which acting together can pro duce global complex b ehavior Like

the class of dissipative dynamical systems even the class of elementary onedimensional CA

exhibit the full sp ectrum of dynamical b ehavior from xed p oints as seen in Figure to

limit cycles p erio dic b ehavior to unpredictable chaotic b ehavior Wolfram considered

a coarse classication of CA b ehavior in terms of these categories He prop osed the following

four classes with the intention of capturing all p ossible CA b ehavior

Class All initial congurations relax after a transient p erio d to the same xed

conguration eg all s

Class All initial congurations relax after a transient p erio d to some xed p oint or

some temp orally p erio dic cycle of congurations but which one dep ends on the initial

conguration

Class Some initial congurations relax after a transient p erio d to chaotic b ehavior (a) (b) 0 0

t t

148 148

0 i 148 0 i 148

Figure Two spacetime diagrams for the binarystate Gacs

KurdyumovLevin CA L sites are shown evolving with time

increasing down the page from two dierent initial congurations over

time steps In a the initial conguration has a density of s of

approximately in b a density of approximately Notice

that by the last time step the CA has converged to a xed pattern of

a all s and b all s In this way the CA has classied the initial

congurations according to their density

The term chaotic here and in the rest of this pap er refers to apparently unpredictable

spacetime b ehavior

Class Some initial congurations result in complex lo calized structures sometimes

longlived

Wolfram do es not state the requirements for memb ership in Class any more precisely than

is given ab ove Thus unlike the categories derived from dynamical Class

is not rigorously dened

L

It should b e p ointed out that on nite lattices there is only a nite numb er of

p ossible congurations so all rules ultimately lead to p erio dic b ehavior Class refers not

L

to this typ e of p erio dic b ehavior but rather to cycles with p erio ds much shorter than

Cellular Automata and Computation

CA are also of interest as computational devices b oth as theoretical to ols and as practical

highly ecient parallel machines

Computation in the context of CA has several p ossible meanings The most common

meaning is that the CA do es some useful computational task Here the rule is interpreted

as the program the initial conguration is interpreted as the input and the CA runs

for some sp ecied numb er of time steps or until it reaches some goal patternp ossibly a

xed p oint pattern The nal pattern is interpreted as the output An example of this is

using CA to p erform imagepro cessing tasks

A second meaning of computation in CA is for a CA given certain sp ecial initial con

gurations to b e capable of universal computation That is the CA can given the right

initial conguration simulate a programmable complete with logical gates timing

devices and so on Conways Game of Life is such a CA one construction for univer

sal computation in the Game of Life is given in Similar constructions have b een made

for onedimensional CA Wolfram sp eculated that all Class rules have the capacity

for universal computation However given the informality of the denition of Class

not to mention the diculty of proving that a given rule is or is not capable of universal

computation this hyp othesis is imp ossible to verify

A third meaning of computation in CA involves interpreting the b ehavior of a given CA on

an ensemble of initial congurations as a kind of intrinsic computation Here computation

is not interpreted as the p erformance of a useful transformation of the input to pro duce

the output Rather it is measured in terms of generic structural computational elements

such as memory information pro duction information transfer logical op erations and so on

It is imp ortant to emphasize that the measurement of such intrinsic computational elements

do es not rely on a semantics of utility as do the preceding computation typ es That is

these elements can b e detected and quantied without reference to any sp ecic useful

computation p erformed by the CAsuch as enhancing edges in an image or computing

the digits of This notion of intrinsic computation is central to the work of Crutcheld

Hanson and Young

Generally CA have b oth the capacity for all kinds of dynamical b ehaviors and the ca

pacity for all kinds of computational b ehaviors For these reasons in addition to the compu

tational ease of simulating them CA have b een considered a go o d class of mo dels to use in

studying how dynamical b ehavior and computational ability are related Similar questions

have also b een addressed in the context of other dynamical systems including continuous

state dynamical systems such as iterated maps and dierential equations Bo olean

networks and recurrent neural networks Here we will conne our discussion to CA

With this background we can now rephrase the broad questions presented in Section

in the context of CA

What prop erties must a CA have for nontrivial computation

In particular do es a capacity for nontrivial computation in any of the three senses

describ ed ab ove require a CA to b e near a transition from ordered to chaotic b ehavior

When CA rules are evolved to p erform a nontrivial computation will evolution tend

to select rules near such a transition to chaos

Structure of CA Rule Space

Over the last decade there have b een a numb er of studies addressing the rst question ab ove

Here we fo cus on Langtons empirical investigations of the second question in terms of the

structure of the space of CA rules The relationship of the rst two questions to the

thirdevolving CAwill b e describ ed subsequently

One of the ma jor diculties in understanding the structure of the space of CA rules

and its relation to computational capability is its discrete In contrast to the well

develop ed theory of bifurcations for continuousstate dynamical systems there app ears to

b e little or no geometry in CA space and there is no notion of smo othly changing one CA to

get another nearby in b ehavior In an attempt to emulate this however Langton dened

a parameter that varies incrementally as single output bits are turned on or o in a given

rule table For a given CA rule table is computed as follows For a K state CA one state

3

q is chosen arbitrarily to b e quiescent The of a given CA rule is then the fraction of

nonquiescent output states in the rule table For a binarystate CA if is chosen to b e the

quiescent state then is simply the fraction of output bits in the rule table Typically

there are many CA rules with a given value For a binary CA the numb er is strongly

p eaked at due to the combinatorial dep endence on the radius r and the numb er of

states K It is also symmetric ab out due to the symmetry of exchanging s and

s Generally as is increased from to K the CA move from having the most

homogeneous rule tables to having the most heterogeneous

Langton p erformed a range of Monte Carlo samples of twodimensional CA in an attempt

to characterize their average b ehavior as a function of The notion of average b e

havior was intended to capture the most likely b ehavior observed with a randomly chosen

initial conguration for CA randomly selected in a xed subspace The observation was

that as is incremented from to K the average b ehavior of rules passes through

the following regimes

xed p oint p erio dic complex chaotic

That is according to Figure in for example the average b ehavior at low is for a

rule to relax to a xed p oint after a relatively short transient phase As is increased rules

tend to relax to p erio dic patterns again after a relatively short transient phase As reaches

a critical value rules tend to have longer and longer transient phases Additionally

c

the b ehavior in this regime exhibits longlived complexnonp erio dic but nonrandom

patterns As is increased further the average transient length decreases and rules tend

to relax to apparently random spacetime patterns The actual value of dep ends on r K

c

and the actual path of CA found as is incremented

These four b ehavioral regimes roughly corresp ond to Wolframs four classes Langtons

claim is that as is increased from to K the classes are passed through in the

order He notes that as is increased one observes a phase transition b etween

highly ordered and highly disordered dynamics analogous to the phase transition b etween

the solid and uid states of matter p

According to Langton as is increased from K to the four regimes o ccur in the

reverse order sub ject to some constraints for K For twostate CA since b ehavior

is necessarily symmetric ab out there are two values of at which the complex

c

regime o ccurs

3

In all states ob eyed a strong quiescence requirement For any state s f K g the

neighb orho o d consisting entirely of state s must map to s γ

0.0 λ 1.0

Figure A graph of the average dierencepattern spreading rate

of a large numb er of randomly chosen r K CA as a function

of Adapted from with p ermission of the author No vertical

scale was provided there

How is determined Following standard practice Langton used various statistics such

c

as singlesite entropy twosite mutual information and transient length to classify CA b e

havior The additional step was to correlate b ehavior with via these statistics Langtons

Monte Carlo samples showed there was some correlation b etween the statistics and But

the averaged statistics did not reveal a sharp transition in average b ehavior a basic prop

erty of a phase transition in which macroscopic highlyaveraged quantities do make marked

changes We note that Wo otters and Langton gave evidence that in the limit of an increasing

numb er of states the transition region narrows The main result indicates that in one

class of twodimensional innitestate sto chastic cellular automata there is a sharp transition

in singlesite entropy at

c

The existence of a critical and the dep endence of the critical regions width on r and

K is less clear for nitestate CA Nonetheless Packard empirically determined rough values

of for r K CA by lo oking at the dierencepattern spreading rate as a function

c

of The spreading rate is a measure of unpredictability in spatiotemp oral patterns

and so is one p ossible measure of chaotic b ehavior It is analogous to but not the

same as the Lyapunov exp onent for continuousstate dynamical systems In the case of CA

it indicates the average propagation sp eed of information through spacetime though not

the rate of pro duction of lo cal information

At each a large numb er of rules was sampled and for each CA was estimated The

average over the selected CA was taken as the average spreading rate at the given The

results are repro duced in Figure As can b e seen at low and high s vanishes at

intermediate it is maximal and in the critical regionscentered ab out and

it rises or falls gradually

While not shown in Figure for most values s variance is high The same is true

for singlesite entropy and twosite mutual information as a function of That is the

b ehavior of any particular rule at a given might b e very dierent from the average b ehavior

at that value Thus the interpretations of these averages is somewhat problematic This

recounting of the b ehavioral structure of CA rule space as parameterized by is based on

statistics taken from Langtons and Packards Monte Carlo simulations Various problems

in correlating with b ehavior will b e discussed in Section A detailed analysis of some of

these problems can b e found in Other work on investigating the structure of CA rule

space is rep orted in

The claim in is that predicts dynamical b ehavior well only when the space of rules

is large enough Apparently is not intended to b e a go o d b ehavioral predictor for the

space of elementary CA rulesr K and p ossibly r K rules as well

CA Rule Space and Computation

Langton hyp othesizes that a CAs computational capability is related to its average dynam

ical b ehavior which is claimed to predict In particular he hyp othesizes that CA

capable of p erforming nontrivial computationincluding universal computationare most

likely to b e found in the vicinity of phase transitions b etween order and chaos that is near

values The hyp othesis relies on a basic observation of computation theory that any form

c

of computation requires memoryinformation storageand communicationinformation

transmission and interaction b etween stored and transmitted information Ab ove and b e

yond these prop erties though universal computation requires memory and communication

over arbitrary distances in time and space Thus complex computation requires signicantly

long transients and spacetime correlation lengths in the case of universal computation

arbitrarily long transients and correlations are required Langtons claim is that these phe

nomena are most likely to b e seen near valuesnear phase transitions b etween order

c

and chaos This intuition is b ehind Langtons notion of computation at the edge of chaos

4

for CA

Evolving CA

The empirical studies describ ed ab ove addressed only the relationship b etween and the

dynamical b ehavior of CAas revealed by several statistics Those studies did not correlate

or b ehavior with an indep endent measure of computation Packard addressed this

issue by using a genetic algorithm GA to evolve CA rules to p erform a particular

computation This exp eriment was meant to test two hyp otheses CA rules able to

p erform complex computations are most likely to b e found near values and When

c

CA rules are evolved to p erform a complex computation evolution will tend to select rules

near values

c

The Computational Task and an Example CA

The original exp eriment consisted of evolving twostates f gonedimensional CA

with r That is the neighb orho o d of a cell consists of itself and its three neighb ors

4

This should b e contrasted with the analysis of computation at the onset of chaos in and in

particular with the discussion of the structure of CA space there

on each side The computational task for the CA is to decide whether or not the initial

conguration contains more than half s If the initial conguration contains more than

half s the desired b ehavior is for the CA after some numb er of time steps to relax to

a xedp oint pattern of all s If the initial conguration contains less than half s the

desired b ehavior is for the CA after some numb er of time steps to relax to a xedp oint

pattern of all s If the initial conguration contains exactly half s then the desired

b ehavior is undened This can b e avoided in practice by requiring the CA lattice to b e of

o dd length Thus the desired CA has only two invariant patterns either all s or all s In

the following we will denote the density of s in a lattice conguration by the density of

s in the conguration at time t by t and the threshold density for classication by

c

Do es the classication task count as a nontrivial computation for a small

c

radius r L CA Though this term was not rigorously dened in or one p ossible

denition might b e any computation for which the memory requirement increases with L

ie any computation which corresp onds to the recognition of a nonregular language and

in which information must b e transmitted over signicant spacetime distances on the order

of L Under this denition the classication task can b e thought of as a nontrivial

c

computation for a small radius CA The eective minimum amount of memory is prop ortional

to l og L since the equivalent of a counter register is required to track the excess of s in a

serial scan of the initial pattern And since the s can b e distributed throughout the lattice

information transfer over long spacetime distances must o ccur This is supp orted in a CA

by the nonlo cal interactions among many dierent neighb orho o ds after some p erio d of time

Packard cited a K r rule constructed by Gacs Kurdyumov and Levin

which purp ortedly p erforms this task The GacsKurdyumovLevin GKL CA is dened by

the following rule

If s t then s t ma jority s t s t s t

i i i i1 i3

If s t then s t ma jority s t s t s t

i i i i+1 i+3

where s t is the state of site i at time t

i

In words this rule says that for each neighb orho o d of seven adjacent cells if the state

of the central cell is then its new state is decided by a ma jority vote among itself its left

neighb or and the cell two cells to the left away Likewise if the state of the central cell is

then its new state is decided by a ma jority vote among itself its right neighb or and the

cell two cells to the right away

Figure gives spacetime diagrams for the action of the GKL rule on an initial congu

ration with and on an initial conguration with It can b e seen that although

c c

the CA eventually converges to a xed p oint there is a transient phase during which a spa

tial and temp oral transfer of information ab out lo cal neighb orho o ds takes place and this

lo cal information interacts with other lo cal information to pro duce the desired nal state

Very crudely the GKL CA successively classies lo cal densities with the lo cality range

increasing with time In regions where there is some ambiguity a signal is propagated

This is seen either as a checkerb oard pattern propagated in b oth spatial directions or as

a vertical whitetoblack b oundary These signals indicate that the classication is to b e

made at a larger scale Note that b oth signals lo cally have the result is that the

c

signal patterns can propagate since the density of patterns with is not increased or

c

decreased under the rule In a simple sense this is the CAs strategy for p erforming the

computational task

It has b een claimed that the GKL CA p erforms the task but actually this is

c

true only to an approximation The GKL rule was invented not for the purp ose of p erforming

any particular computational task but rather as part of studies of reliable computation and

phase transitions in one spatial The goal in the former for example was to

nd a CA whose b ehavior is robust to small errors in the rules up date of the conguration

It has b een proved that the GKL rule has only two attracting patterns either all s or

all s Attracting patterns here are those invariant patterns which when p erturb ed

a small amount return to the same pattern It turns out that the basins of attraction

for the all and all patterns are not precisely the initial congurations with or

5

resp ectively On nite lattices the GKL rule do es classify most initial congurations

according to this criterion but on a signicant numb er the incorrect is reached

One set of exp erimental measures of the GKL CAs classication p erformance is displayed

in Figure To make this plot we ran the GKL CA on randomly generated initial

congurations at each of densities The fraction of correct classications

was then plotted at each The rule was run either until a xed p oint was reached or for a

maximum numb er of time steps equal to L This was done for CA with three dierent

lattice sizes L f g

Note that approximately of the initial congurations with were misclassied

c

All the incorrect classications are made for initial congurations with In fact the

c

worst p erformances o ccur at Interestingly although the error region narrows with

c

increasing lattice size the p erformance at decreases when the lattice size is increased

c

from to

The GKL rule table has not Since it app ears to p erform a computational

c

task of some at a minimum it is a deviation from the edge of chaos hyp othesis

for CA computation The GKL rules puts it right at the center of the chaotic

region in Figure This may b e puzzling since clearly the GKL rule do es not pro duce chaotic

b ehavior during either its transient or asymptotic ep o chsfar from it in fact However the

parameter was intended to correlate with average b ehavior of CA rules at a given

value Recall that in Figure represent an average over a large numb er of randomly

chosen CA rules and while not shown in that plot for most values the variance in is

high Thus it can b e claimed that the b ehavior of any particular rule at its value might

b e very dierent from the average b ehavior at that value

More to the p oint though we exp ect a value close to for a rule that p erforms well on

the task This is largely b ecause the task is symmetric with resp ect to the exchange

c

of s and s Supp ose for example a rule that carries out the task has

c

This implies that there are more neighb orho o ds in the rule table that map to output bit

than to output bit This in turn means that there will b e some initial congurations

with on which the action of the rule will decrease the numb er of s And this is the

c

5

The terms attractor and basin of attraction are b eing used here in the sense of and This

diers substantially from the notion used in for example There attractor refers to any invariant or

timep erio dic pattern and basin of attraction means that set of nite lattice congurations relaxing to it 1

0.75

0.5 149 cell lattice 599 cell lattice 0.25 999 cell lattice fraction of correct classifications 0 0 0.25 0.5 0.75 1

ρ

Figure Exp erimental p erformance of the GKL rule as a function

of for the task Performance plots are given for three

c

lattice sizes L the size of the lattice used in the GA runs

and

opp osite of the desired action However if the rule acts to decrease the numb er of s on

an initial conguration with it risks pro ducing an intermediate conguration with

c

which then would lead under the original assumption that the rule carries out the

c

task correctly to a xed p oint of all s misclassifying the initial conguration A similar

argument holds in the other direction if the rules value is greater than This informal

argument shows that a rule with will misclassify certain initial congurations

Generally the further away the rule is from the more such initial congurations

there will b e Such rules may p erform fairly well classifying most initial congurations

correctly However we exp ect any rule that p erforms reasonably well on this taskin the

sense of b eing close to the GKL CAs average p erformance shown in Figure to have a

value close to

This analysis p oints to a problem with using this task as an evolutionary goal in order

to study the relationship among evolution computation and As was shown in Figure

for r K CA the values o ccur at roughly and and one hyp othesis

c

that was to b e tested by the original exp eriment is that the GA will tend to select rules

close to these values But for the classication tasks the range of values required for

c

go o d p erformance is simply a function of the task and sp ecically of For example the

c

underlying exchange symmetry of the task implies that if a CA exists to do

c

the task at an acceptable p erformance level then it has Even though this do es not

directly invalidate the hyp othesis or claims ab out s correlation with average

b ehavior it presents problems with using classication tasks as a way to gain evidence

ab out a generic relation b etween and computational capability

The Original Exp eriment

Packard used a GA to evolve CA rules to p erform the task His GA started out

c

with a randomly generated initial p opulation of CA rules Each rule was represented as a bit

string containing the output bits of the rule table That is the bit at p osition in the string

is the state to which the neighb orho o d is mapp ed the bit at p osition in the string

is the state to which the neighb orho o d is mapp ed and so on The initial p opulation

was randomly generated but it was constrained to b e uniformly distributed across values

b etween and

A given rule in the p opulation was evaluated for ability to p erform the classication task

by cho osing an initial conguration at random running the CA on that initial conguration

for some sp ecied numb er of time steps and at the nal time step measuring the fraction

of cells in the lattice that have the correct state For initial congurations with the

c

correct nal state for each cell is and for initial congurations with the correct

c

nal state for each cell is For example if the CA were run on an initial conguration with

and at the nal time step the lattice contained s the CAs score on that initial

c

6

conguration would b e The tness of a rule was simply the rules average score over a

large numb er of initial congurations For each rule in the p opulation Packard generated a

large set of initial congurations that were uniformly distributed across values from to

Packards GA worked as follows At each generation

The tness of each rule in the p opulation is calculated

The p opulation is ranked by tness

Some fraction of the lowest tness rules are removed

The removed rules are replaced by new rules formed by crossover and mutation from

the remaining rules

Crossover b etween two strings involves randomly selecting a p osition in the strings at random

and exchanging parts of the strings b efore and after that p osition Mutation involves ipping

one or more bits in a string with some low probability

A diversityenforcement scheme was also used to prevent the p opulation from converging

to o early and losing diversity If a rule is formed that is to o close in Hamming distance

ie the numb er of matching bits to existing rules in the p opulation its tness is decreased

The results from Packards exp eriment are displayed in Figure The two histograms

display the observed frequency of rules in the GA p opulation as a function of with rules

merged from a numb er of dierent runs The top graph gives this data for the initial

6

A slight variation on this metho d was used in Instead of measuring the fraction of correct states in

the nal lattice the GA measured the fraction of correct states over congurations from a small numb er n

of nal time steps This prevented the GA from evolving rules that were temp orally p erio dic viz those

with patterns that alternated b etween all s and all s Such rules obtained higher than average tness at

early generations by often landing at the correct phase of the oscillation for a given initial conguration

That is on the next time step the classication would have b een incorrect In our exp eriments we used a

slightly dierent metho d to address this problem This is explained in subsection Figure from The b ottom gure is Figure repro duced here for reference p opulations CA γ

0.0 Prob(rule) Prob(rule) for with p ermission of the author the Results of c CA from at generations classication the original λ and exp erimen task The resp ectiv t top on GA t w ely o ev gures olution v Adapted ersus 1.0 are of

generation As can b e seen the rules are uniformly distributed over values The middle

graph gives the same data for the nal generationin this case after the GA has run for

generations The rules now cluster around the two regions as can b e seen by comparison

c

with the dierencepattern spreading rate plot reprinted here at the b ottom of the gure

Note that each individual run pro duced rules at one or the other p eak in the middle graph

so when the runs were merged together b oth p eaks app ear Packard interpreted these

results as evidence for the hyp othesis that when an ability for complex computation is

required evolution tends to select rules near the transition to chaos He argues like Langton

that this result intuitively makes sense b ecause rules near the transition to chaos have the

capability to selectively communicate information with complex structures in spacetime

thus enabling computation p

New Exp eriments

As the rst step in a study of how well these general conclusions hold up we carried out a set

of exp eriments similar to that just describ ed We were unable to obtain some of the exact

details of the original exp eriments parameters such as the exact p opulation size for the

GA the mutation rate and so on As a result we used what we felt were reasonable values

for these various parameters We carried out a numb er of parameter sensitivity tests which

indicated that varying the parameters within small b ounds did not change our qualitative

results

Details of Our Exp eriments

In our exp eriments as in the original the CA rules in the p opulation all have r and

2r +1

K Thus the bit strings representing the rules are of length and the

128

size of the search space is hugethe numb er of p ossible CA rules is The tests

for each CA rule are carried out on lattices of length L with p erio dic b oundary

conditions The p opulation size is which was roughly the p opulation size used in the

original exp eriment The initial p opulation is generated at random but constrained to

b e uniformly distributed among dierent values A rules tness is estimated by running

the rule on randomly generated initial congurations that are uniformly distributed over

Exactly half the initial congurations have and exactly half have

c

7

c

We allow each rule to run for a maximum numb er M of iterations where a new M

is selected for each rule from a Poisson distribution with mean This is the measured

maximum amount of time for the GKL CA to reach an invariant pattern over a large numb er

7

It was necessary to have this exact symmetry in the initial congurations at each generation to avoid

early biases in the of selected rules If say of the initial congurations have and of initial

c

congurations have high rules would obtain slightly higher tness than low rules since high

c

rules will map most initial congurations to all s A rule with say would in this case classify

of the initial congurations correctly whereas a rule with would classify only correctly But such

slight dierences in tness have a large eect in the initial generation when all rules have tness close to

since the GA selects the best rules even if they are only very slightly b etter than the worst rules

This biases the representative rules in the early p opulation And this bias can p ersist well into the later

generations

8

of initial congurations on lattice size A rules tness is its average scorethe fraction

of cell states correct at the last iterationover the initial congurations We term this

tness function proportional tness to contrast with a second tness functionperformance

tnesswhich will b e describ ed b elow A new set of initial congurations is generated

every generation At each generation all the rules in the p opulation are tested on this set

Notice that this tness function is sto chasticthe tness of a given rule may vary from

generation to generation dep ending on the set of initial congurations used in testing it

Our GA is similar to Packards In our GA the fraction of new strings in the next

generationthe generation gapis That is once the p opulation is ordered according

to tness the top half of the p opulation the set of elite strings is copied without mo di

cation into the next generation For GA practitioners more familiar with nonoverlapping

generations this may sound like a small generation gap However since testing a rule on

training cases do es not necessarily provide a very reliable gauge of what the tness

would b e over a larger set of training cases our selected gap is a go o d way of making a rst

cut and allowing rules that survive to b e tested over more initial congurations Since a

new set of initial congurations is pro duced every generation rules that are copied without

mo dication are always retested on this new set If a rule p erforms well and thus survives

over a large numb er of generations then it is likely to b e a genuinely b etter rule than those

that are not selected since it has b een tested with a large set of initial congurations An

alternative metho d would b e to test every rule in every generation on a much larger set of

initial congurations but given the amount of compute time involved that metho d seems

unnecessarily wasteful Much to o much eort for example would go into testing very weak

rules which can safely b e weeded out early using our metho d

The remaining half of the p opulation for each new generation is created by crossover

9

and mutation from the previous generations p opulation Fifty pairs of parent rules are

chosen at random with replacement from the entire previous p opulation For each pair a

single crossover p oint is selected at random and two ospring are created by exchanging the

subparts of each parent b efore and after the crossover p oint The two ospring then undergo

mutation A mutation consists of ipping a randomly chosen bit in the string The numb er

of mutations for a given string is chosen from a Poisson distribution with a mean of this

is equivalent to a p erbit mutation rate of Again to GA practitioners this may seem

to b e a high mutation rate but one must take into account that at every generation half

the p opulation is b eing copied without mo dication

Results of Prop ortionalFitness Exp eriment

8

It may not b e necessary to allow the maximum numb er of iterations M to vary In some early tests with

smaller sets of xed initial congurations though we found the same problem Packard rep orted that

if M is xed then p erio d rules evolve that alternate b etween all s and all s These rules adapted to

the small set of initial congurations and the xed M by landing at the correct pattern for a given initial

conguration at time step M only to move to the opp osite pattern and so wrong classication at time step

M These rules did very p o orly when tested on a dierent set of initial congurationsevidence for

overtting

9

This metho d of pro ducing the nonelite strings diers from that in where the nonelite strings were

formed from crossover and mutation among the elite strings only rather than from the entire p opulation We

observed no statistically signicant dierences in our tests using the latter mechanism other than a mo dest

dierence in time scale 1

best fitness 0.75 mean fitness

(a) 0.5

fraction of total rules 0.25

0 0 0.25 0.5 0.75 1 λ

1

best fitness 0.75 mean fitness

(b) 0.5

fraction of total rules 0.25

0 0 0.25 0.5 0.75 1

λ

Figure Results from our exp eriment with prop ortional tness The

top histogram a plots as a function of the frequencies of rules

merged from the initial generations of runs The b ottom histogram

b plots the frequencies of rules merged from the nal generations

generation of these runs Following the xaxis is divided

into bins of length each The rules with are included

in the rightmost bin In each histogram the b est cross and mean

circle tnesses are plotted for each bin The y axis interval for

tnesses is also

We p erformed dierent runs of the GA with the parameters describ ed ab ove each with

a dierent randomnumb er seed On each run the GA was iterated for generations We

found that running the GA for longer than this up to generations did not result in

improved tness The results of this set of runs are displayed in Figure Figure a is

a histogram of the frequency of rules in the initial p opulations as a function of merging

together the rules from all initial p opulations thus the total numb er of rules represented

in this histogram is The bins in this histogram are the same ones that were used by

Packard each of width Packards highest bin contained only rules with that

is rules that consist of all s We have merged this bin with the immediately lower bin

As was said earlier the initial p opulation consists of randomly generated rules uniformly

spread over the values b etween and Also plotted are the mean and b est tness

values for each bin These are all around which is exp ected for a set of randomly

generated rules under this tness function The b est tnesses are slightly higher in the very

low and very high bins This is b ecause rules with output bits that are almost all s or

s correctly classify all low density or all high density initial congurations In addition

these CA obtain small partial credit on some high density low density initial congurations

Such rules thus have tness sightly higher than

Figure b shows the histogram for the nal generation merging together rules

from the nal generations of all runs Again the mean and b est tness values for each

bin are plotted

In the nal generation the mean tnesses in each bin are all around The exceptions

are the central bin with a mean tness of and the leftmost bin with a mean tness of

The leftmost bin contains only ve ruleseach at right next to the the bins

upp er limit The standard deviations of tness for each bin not shown in the gure are

all approximately except the leftmost bin which has a standard deviation of The

b est tnesses for each bin are all b etween and except the leftmost bin which has

a b est tness of Under this tness function the GKL rule has tness the GA

never found a rule with tness ab ove

As was mentioned ab ove the tness function is sto chastic a given rule might b e assigned

a dierent tness each time the tness function is evaluated The standard deviation under

the present tness scheme on a given rule is approximately This indicates that the

dierences among the b est tnesses plotted in the histogram are not signicant except for

that in the leftmost bin

The lower mean tness in the central bin is due to the fact that the rules in that bin

largely come from nonelite rules generated by crossover and mutation in the nal generation

This is a combinatorial eect the density of CA rules as a function of is very highly p eaked

ab out as already noted We will return to this combinatorial drift eect shortly

Many of the rules in the middle bin have not yet undergone selection and thus tend to have

lower tnesses than rules that have b een selected in the elite This eect disapp ears in

Figure which includes only the elite rules at generation for the runs As can b e

seen the dierence in mean tness disapp ears and the height of the central bin is decreased

by half

The results presented in Figure b are strikingly dierent from the results of the original

exp eriment In the nal generation histogram in Figure most of the rules clustered around 1

best fitness 0.75 mean fitness

0.5

fraction of total rules 0.25

0 0 0.25 0.5 0.75 1

λ

Figure Histogram including only the elite rules from the nal gen

erations of the runs cf Figure b with the prop ortionaltness

function

either or Here though there are no rules in these regions Rather the

c

rules cluster much closerwith a ratio of variances of b etween the two distributionsto

Recall this clustering is what we exp ect from the basic exchange symmetry of

the task

c

One rough similarity is the presence of two p eaks centered around a dip at a

phenomenon which we will explain shortly and which is a key to understanding how the GA

is working But there are signicant dierences even within this similarity In the original

exp eriments the p eaks are in bins centered ab out and In Figure b

though the p eaks are very close to b eing centered in the neighb oring binsthose

with and Thus the ratio of original to current p eak spread is roughly a

factor of Additionally in the nalgeneration histogram of Figure the two highest bin

p opulations are roughly ve times as high as the central bin whereas in the nalgeneration

histogram of Figure b the two highest bins are roughly three times as high as the central

bin Finally the nalgeneration histogram in Figure shows the presence of rules in every

bin but in the new nalgeneration histogram there are only rules in six of the central bins

Similar to the original exp eriment we found that on any given run the p opulation was

clustered ab out one or the other p eak but not b oth Thus in the histograms that merge

all runs two p eaks app ear This is illustrated in Figure which displays histograms from

the nal generation of two individual runs In one of these runs the p opulation clustered to

the left of the central bin in the other run it clustered to the right of the center The fact

that dierent runs result in dierent clustering lo cations is why we p erformed many runs

and merged the results rather than p erforming a single run with a much larger p opulation

The latter metho d might have yielded only one p eak Said a dierent way indep endent of

the p opulation size a given run will b e driven by and the p opulation organized around the

t individuals that app ear earliest Thus examining an ensemble of individual runs reveals

more details of the evolutionary dynamics

The asymmetry in the heights of the two p eaks in Figure b results from a small

statistical asymmetry in the results of the runs There were out of runs in which

the rules clustered at the lower bin and out of runs in which the rules clustered at

the higher bin This dierence is not signicant but explains the small asymmetry in the

p eaks heights

We extended of the runs to generations and found that not only do the tnesses

not increase further but the basic shap e of the histogram do es not change signicantly

Eects of Drift

The results of our exp eriments suggest that for the task an evolutionary pro cess

c

mo deled by a genetic algorithm tends to select rules with This is what we exp ect

from the theoretical discussion given ab ove concerning this task and its symmetries We

will delay until the next section a discussion of the curious feature near viz the

dip surrounded by two p eaks Instead here we fo cus on the largerscale clustering in that

region

To understand this clustering we need to understand the degree to which the selection of

rules close to is due to an intrinsic selection pressure and the degree to which it is

due to drift By drift we refer to the force that derives from the combinatorial asp ects of

CA space as explored by random selection genetic drift along with the eects of crossover

and mutation The intrinsic eects of random selection with crossover and mutation are to

move the p opulation irresp ective of any selection pressure to This is illustrated by

the histogram mosaic in Figure These histograms show the frequencies of the rules in the

p opulation as a function of every generations from runs on which selection according

to tness was turned o That is on these runs the tness of the rules in the p opulation

was never calculated and at each generation the selection of the elite group of strings was

p erformed at random Everything else ab out the runs remains the same as b efore Since

there is no selection drift is the only force at work here As can b e seen under the eects

of random selection crossover and mutation by generation the p opulation has largely

drifted to the region of and this clustering b ecomes increasingly pronounced as the

run continues

This drift to is related to the combinatorics of the space of bit strings For

N

2

binary binary CA rules with neighb orho o d size N r the space consists of all

N

strings of length Denoting the subspace of CA with a xed and N as CA N we

see that the size of the subspace is binomially distributed with resp ect to

 

N

jCA N j

N

N

The distribution is symmetric in and tightly p eaked ab out with variance

Thus the vast ma jority of rules is found at The steepness of the binomial distribu

tion near its maximum gives an indication of the magnitude of the drift force Note that

the last histogram in Figure gives the GAs rough approximation of this distribution

Drift is thus a p owerful force moving the p opulation to cluster around For

comparison Figure gives the rulefrequencyversus histograms for the runs of our 1

0.75

0.5

fraction of total rules 0.25

0 0 0.25 0.5 0.75 1 λ 1

0.75

0.5

fraction of total rules 0.25

0 0 0.25 0.5 0.75 1

λ

Figure Histograms from the nal generations of two individual runs

of the GA employing prop ortional tness Each run had a p opulation

of rules The nal distribution of rules in each of the runs we

p erformed resembled one or the other of these two histograms 0.6 0 5 10 15 20 25 30

P

0

0.6 35 40 45 50 55 60 65

P

0

0.6 70 75 80 85 90 95 100

P

0

0 λ 1 0 λ 1 0 λ 1 0 λ 1 0 λ 1 0 λ 1 0 λ 1

Figure Noselection mosaic Rulefrequencyversus histograms

given every ve generations for p opulations evolved under the genetic

algorithm with no selection that is the tness function was not cal

culated As b efore each histogram is merged from runs each run

had a p opulation of rules The generation numb er is given in the

upp er left corner of each histogram

0.6 0 5 10 15 20 25 30

P

0

0.6 35 40 45 50 55 60 65

P

0

0.6 70 75 80 85 90 95 100

P

0

0 λ 1 0 λ 1 0 λ 1 0 λ 1 0 λ 1 0 λ 1 0 λ 1

Figure Proportionaltness mosaic Rulefrequencyversus his

tograms given every ve generations merged from the GA runs

with prop ortional tness Each run had a p opulation of rules

prop ortionaltness exp eriment every ve generations The last histogram in this gure is

the same one that was displayed in Figure b Figure gives the merged data from the

entire p opulation of each run every ve generations A similar mosaic plotting only the elite

strings at each generation lo oks qualitatively similar

Figure lo oks very similar to Figure up to generation The main dierence in

generations is that Figure indicates a more rapid p eaking ab out  = The in

creased sp eed of movement to the center over that seen in Figure is presumably due to the

additional evolutionary pressure of prop ortional tness At generation something new

app ears The p eak in the center has b egun to shrink signicantly and the two surrounding

bins are b eginning to rival it in magnitude By generation the rightofcenter bin has

exceeded the central bin and by generation the histogram has develop ed two p eaks sur

rounding a dip in the center The dip b ecomes increasingly pronounced as the run continues

but stabilizes by generation or so

The dierences b etween Figure and Figure over all generations shows that the

p opulations structure in each generation is not entirely due to drift Indeed after generation

the distinctive features of the p opulation indicates new qualitatively dierent and unique

prop erties due to the selection mechanism The two p eaks represent a symmetry breaking

in the evolutionary pro cessthe rules in each individual run initially are clustered around

 = but move to one side or the other of the central bin by around generation The

causes of this symmetry breaking will b e discussed in the next subsection

Evolutionary Mechanisms Symmetry Breaking and the Dip at  =

At this p oint we move away from questions related to the original exp eriment and instead

concentrate on the mechanisms involved in pro ducing our results Two ma jor questions need

to b e answered Why in the nal generation are there signicantly fewer rules in the central

bin than in the two surrounding bins And what causes the symmetry breaking that b egins

near generation seen in Figure

In the briefest terms the answer obtained by detailed analysis of the GA runs is

the following The course of CA evolution under our GA roughly falls into four strategy

ep o chs Each ep o ch is asso ciated with an discovered by the GA for solving the

problem Though the absolute time at which these app ear in each run varies

somewhat each run basically passes through each of these four ep o chs in succession The

ep o chs are shown in Figure which plots the b est tness the mean tness of the elite

strings and the mean tness of the p opulation versus generation for one typical run of the

GA The b eginnings of ep o chs through are p ointed out on the b esttness plot Ep o ch

b egins at generation

Ep o ch Randomly generated rules

The rst ep o ch starts at generation when the b est tness in the initial generation is

approximately and the  values are uniformly distributed b etween and No rule

is much tter than any other rule though as was seen in Figure a rules with very low

and very high  tend to have slightly higher tness The strategy hereif it can b e called

K

o

epoch

epoch

Fitness

epoch

b est tness

elite mean tness

p opulation mean tness

Generation

Figure Best tness elite mean tness and p opulation mean tness

versus generation for one typical run The b eginnings of ep o chs

are p ointed out on the b esttness plot Ep o ch b egins at generation

this at allderives from only the most elementary asp ect of the task Rules either sp ecialize

for  >  congurations by mapping highdensity neighb orho o ds in the CA rule table to

c

or sp ecialize for  <  congurations by mapping lowdensity neighb orho o ds to

c

Ep o ch Discovery of two halves of the rule table

The second ep o ch b egins when a rule is discovered in which most neighb orho o d patterns in

the rule table that have  <  map to and most neighb orho o d patterns in the rule table

c

that have  >  map to This is roughly correlated with the left and right halves of the

c

rule table namely neighb orho o ds to and to resp ectively

Such a strategy is presumably easy for the GA to discover due to singlep oint crossovers ten

dency to preserve contiguous sections of the rule table It diers from the accidental strategy

of ep o ch in that there is now an organization to the rule table output bits are roughly

asso ciated with densities of neighb orho o d patterns It is the rst signicant attempt at dis

tinguishing initial congurations with more s than s and vice versa Under our tness

function the tness of such rules is approximately b etween and which is signicantly

higher than the tness of the initial random rules This innovation typically o ccurs b etween

generations and in the run displayed in Figure it o ccurred in generation and can

b e seen as the steep rise in the b esttness plot at that generation All such rules tend to

have  close to There are many p ossible variations on these rules with similar tness so

such rulesall close to  =b egin to dominate in the p opulation This along with the

natural tendency for the p opulation to drift to  = is the cause of the clustering around

 = seen by generation in Figure For the next several generations the p opulation

tends to explore small variations on this broad strategy This can b e seen in Figure as

the leveling o in the b esttness plot b etween generations and (a) (b) 0 0

t t

148 148

0 i 148 0 i 148

Figure Spacetime diagrams of one ep o ch rule with 

that increases suciently large blo cks of adjacent or nearly adjacent

s Both diagrams have L and are iterated for time steps

the time displayed here is shorter than the actual time allotted under

the GA In a  and  In b  and

Thus in a the classication is incorrect but partial

credit is given in b it is correct

Ep o ch Growing blo cks of s or s

The next ep o ch b egins when the GA discovers one of two new strategies The rst strategy is

to increase the size of a suciently large blo ck of adjacent or nearly adjacent s the second

strategy is to increase the size of a suciently large blo ck of adjacent or nearly adjacent s

Examples of these two strategies are illustrated in Figures and These gures give

spacetime diagrams from two rules that marked the b eginning of this ep o ch in two dierent

runs of the GA Figure illustrates the action of a rule discovered at generation of one

run This rule has   : which means that the rule maps most neighb orho o ds to Its

strategy is to map initial congurations to mostly sthe congurations it pro duces have

 <  unless the initial conguration contains a suciently large blo ck of s in which

c

case it increases the size of that blo ck The left spacetime diagram Figure a shows how

the rule evolves an initial conguration with  <  to a nal lattice with mostly s This

c

pro duces a fairly go o d score The right spacetime diagram Figure b shows how the

rule evolves an initial conguration with  >  The initial conguration contains a few

c

suciently large blo cks of adjacent or nearly adjacent s and the size of these blo cks is

quickly increased to yield a nal lattice with all s for a p erfect score The tness of this

rule at generation was 

Figure illustrates the action of a second rule discovered at generation in another

run This rule has   : which means that the rule maps most neighb orho o ds to Its

strategy is the inverse of the previous rule It maps initial congurations to mostly s unless

the initial conguration contains a suciently large blo ck of s in which case it increases the (a) (b) 0 0

t t

148 148

0 i 148 0 i 148

Figure Spacetime diagrams of one ep o ch rule with 

that increases suciently large blo cks of adjacent or nearly adjacent

s In a the initial conguration with  maps to a correct

classication pattern of all s In b the initial conguration with

 is not correctly classied  but partial credit

is given

size of that blo ck The left spacetime diagram a illustrates this for an initial conguration

with  <  here a suciently large blo ck of s app ears in the initial conguration and is

c

increased in size yielding a p erfect score The right spacetime diagram b shows the action

of the same rule on an initial conguration with  >  Most neighb orho o ds are mapp ed

c

to so the nal conguration contains mostly s yielding a fairly high score The tness

of this rule at generation was 

The general idea b ehind these two strategies is to rely on statistical uctuations in the

initial congurations An initial conguration with  >  is likely to contain a suciently

c

large blo ck of adjacent or nearly adjacent s The rule then increases this regions size to

yield the correct classication Similarly this holds for the CA in Figure with resp ect to

blo cks of s in initial congurations with  <  In short these strategies are assuming

c

that the presence of a suciently large blo ck of s or s is a go o d predictor of 

Similar strategies were discovered in every run They typically emerge by generation

A given strategy either increased blo cks of s or blo cks of s but not b oth These

strategies result in a signicant jump in tness typical tnesses for the rst instances of

such strategies range from to This jump in tness can b e seen in the run of

Figure at approximately generation and is marked as the b eginning of ep o ch This

is the rst ep o ch in which a substantial increase in tness is asso ciated with a symmetry

breaking in the p opulation which will b e explained b elow

The rst instances of ep o ch strategies typically have a numb er of problems As can

b e seen in Figures and the rules often rely on partial credit to achieve fairly high

tness on structurally incorrect classication They typically do not get p erfect scores on (a) (b) 0 0

t t

148 148 0 i 148 0 i 148 (c) 0

t

148

0 i 148

Figure Spacetime diagrams illustrating three typ es of classica

tion errors committed by ep o ch rules a growing a blo ck of s in

a sea of  <  b growing a blo ck of s for an initial conguration

c

with  >  to o slowly the correct xed p oint of all s do es not o ccur

c

until iteration and c generating a blo ck of s from a sea of

 <  and growing it so that  >  the incorrect xed p oint of all

c c

s o ccurs at iteration The initial conguration densities are a

  : b   : and c   : (a) (b) 0 0

t t

148 148

0 i 148 0 i 148

Figure Spacetime diagrams of one ep o ch rule with   :

that increases suciently large blo cks of adjacent or nearly adjacent

s In a   : in b   : Both initial congurations

are correctly classied

many initial congurations The rules also often make mistakes in classication Three

common typ es of classication errors are illustrated in Figure Figure a illustrates a

rule increasing a to osmall blo ck of s and thus misclassifying an initial conguration with

 <  Figure b illustrates a rule that do es not increase blo cks of s fast enough on an

c

initial conguration with  >  leaving many incorrect bits in the nal lattice Figure c

c

illustrates the creation of a blo ck of s that did not app ear in an initial conguration with

 <  ultimately leading to a misclassication The rules that pro duced these diagrams

c

come from ep o ch in various GA runs

The increase in tness seen in Figure b etween generation and or so is due to

further renements of the basic strategies that correct these problems to some extent

Ep o ch Reaching and staying at a maximal tness

In most runs the b est tness is typically at its maximum value of to by generation

or so In Figure this o ccurs at approximately generation and is marked as the

b eginning of ep o ch The b est tness do es not increase signicantly after this the GA

simply nds a numb er of variations of the b est strategies that all have roughly the same

tness When we extended of the runs to generations we did not see any signicant

increase in the b est tness

The actions of the b est rules from generation of two separate runs are shown in Fig

ures and The leftmost spacetime diagrams in each gure are for initial congurations

with  <  and the rightmost diagrams are for initial congurations with  >  The

c c

rule illustrated in Figure has  : its strategy is to map initial congurations to s

unless there is a suciently large blo ck of adjacent or nearly adjacent s which if present (a) (b) 0 0

t t

148 148

0 i 148 0 i 148

Figure Spacetime diagrams of one ep o ch rule with   :

that increases suciently large blo cks of adjacent or nearly adjacent

s In a   : in b   : Both initial congurations

are correctly classied

is increased The rule shown in Figure has  : and has the opp osite strategy Each

of these rules has tness  They are b etter tuned versions of the rules in Figures

and

Symmetry breaking in ep o ch

Notice that the  values of the rules that have b een describ ed are in the bins centered around

and rather than In fact it seems to b e much easier for the GA to discover

versions of the successful strategies close to  : and  : than to discover them

close to  = though some instances of the latter rules were found Why is this One

reason is that rules with high or low  work well by specializing The rules with low 

map most neighb orho o ds to s and then increase suciently large blo cks of s when they

app ear Rules with high  sp ecialize in the opp osite direction A rule at  = cannot

easily sp ecialize in this way Another reason is that a successful rule that grows suciently

large blo cks of say s must avoid creating a suciently large blo ck of s from an initial

conguration with less than half s Doing so will lead it to increase the blo ck of s and

pro duce an incorrect answer as was seen in Figure An easy way for a rule to avoid

creating a suciently large blo ck of s is to have a low  This ensures that lowdensity

initial congurations will quickly map to all s as was seen in Figure Likewise if a rule

increases suciently large blo cks of s it is safer for the rule to have a high  value so it will

avoid creating suciently large blo cks of s where none existed A rule close to  = will

not have this safety margin and may b e more likely to inadvertently create a blo ck of s or

s that will lead it to a wrong answer A nal element that contributes to the diculty of

nding go o d rules with  = is the combinatorially large numb er of rules there In eect

the search space is much larger which makes the global search more dicult Lo cally ab out

a given adequate rule at  = there are many more rules close in Hamming distance and

thus reachable via mutation that are not markedly b etter

Once the more successful versions of the ep o ch strategies are discovered in ep o ch

their variants spread in the p opulation and the most successful rules have  on the low or

high side of  = This explains the shift from the clustering around  = as seen

in generations in Figure to a twop eaked distribution that b ecomes clear around

generation The rules in each run cluster around one or the other p eak sp ecializing

in one or the other way We b elieve this typ e of symmetry breaking is a key mechanism

that determines much of the p opulation dynamics and the GAs successor lack thereofin

optimization

How do es this analysis of the symmetry breaking jib e with the argument given earlier

that the b est rules for the  = task must b e close to  = None of the rules found

c

by the GA had a tness as high as the tness of the GKL rule whose  is exactly

That is the evolved rules make signicantly more classication errors than the GKL rule

To obtain the tness of the GKL rule a numb er of careful balances in the rule table must b e

achieved This is evidently very hard for the GA to do esp ecially in light of the symmetries

in the task and their sub optimal breaking by the GA

Performance of the Evolved Rules

Recall that the prop ortional tness of a rule is the fraction of correct cell states at the nal

time step averaged over initial congurations This tness gives a rule partial credit

for getting some nal cell states correct However the actual task is to relax to either

all s or all s dep ending on the initial conguration In order to measure how well the

evolved rules actually p erform the task we dene the performance of a rule to b e the fraction

of times the rule correctly classies initial congurations averaged over a large numb er of

initial congurations Here credit is given only if the initial conguration relaxes to exactly

the correct xed p oint after some numb er of time steps We measured the p erformance of

each of the elite rules in the nal generations of the runs by testing it on randomly

generated initial congurations that were uniformly distributed in the interval   

letting the rule iterate on each initial condition for time steps Figure displays the

mean p erformance diamonds and b est p erformance squares in each  bin This gure

shows that while the mean p erformances in each bin are much lower than the mean tnesses

for the elite rules shown in Figure the b est p erformance in each bin is roughly the same as

the b est tness in that bin In some cases the b est p erformance in a bin is slightly higher

than the b est tness shown in Figure This is b ecause dierent sets of initial conditions

were used to calculate tness and p erformance This dierence can pro duce small variations

in the tness or p erformance values The b est p erformance we measured was  Under

this measure the p erformance of the GKL rule is  Thus the GA never discovered

a rule that p erformed as well as the GKL rule even up to generations In addition

when we measure the p erformance of the ttest evolved rules on larger lattice sizes their

p erformances decrease signicantly while that of the GKL rule remains roughly the same 1

best performance 0.75 mean performance

0.5

fraction of total rules 0.25

0 0 0.25 0.5 0.75 1

λ

Figure Performances of the rules evolved with the prop ortional

tness function Only the elite rules from generation are included

in this histogram The mean p erformance in each bin op en dia

monds and the b est p erformance in each bin black squares is plot

ted

Using Performance as the Fitness Criterion

Can the GA evolve b etterp erforming rules on this task To test this we carried out an

additional exp eriment in which p erformance as dened in the previous section is the tness

criterion As b efore at each generation each rule is tested on initial congurations that

are uniformly distributed over density values However in this exp eriment a rules tness

is the fraction of initial congurations that are correctly classied An initial conguration

is considered to b e incorrectly classied if any bits in the nal lattice are incorrect Aside

from this mo died tness function everything ab out the GA remained the same as in the

prop ortionaltness exp eriments We p erformed runs of the GA for generations each

The results are given in Figure which gives a histogram plotting the frequencies of the

elite rules from generation of all runs as a function of  As can b e seen the shap e

of the histogram again has two p eaks centered around a dip at  = This shap e results

from the same symmetrybreaking eect that o ccurred in the prop ortionaltness case these

runs also evolved essentially the same strategies as the ep o ch strategies describ ed earlier

The b est and mean p erformances here are comparable to the b est p erformances in the

prop ortionaltness case the b est p erformances found here are  The p erformance as

a function of  for one of the b est rules is plotted in Figure for lattice sizes of

the lattice size used for testing the rules in the GA runs and We used the same

pro cedure to make these plots as was describ ed earlier for Figure As can b e seen the

p erformance according to this measure is signicantly worse than that of the GKL rule cf

Figure esp ecially on larger lattice sizes The worst p erformances for all lattice sizes are

centered close to the rules  value of  1

best performance 0.75 mean performance

0.5

fraction of total rules 0.25

0 0 0.25 0.5 0.75 1

λ

Figure Results from our exp eriment with p erformance as the

tness criterion The histogram plots the frequencies of elite rules

merged from the nal generations generation of runs in which

the p erformancetness function was used

1

0.75

0.5 149 cell lattice 599 cell lattice 0.25 999 cell lattice fraction of correct classifications 0 0 0.25 0.5 0.75 1

ρ

Figure Performance of one of the b est rules evolved using p erfor

mance tness plotted as a function of  Performance plots are

given for three lattice sizes the size of the lattice used in the

GA runs and This rule has   : 1

best fitness 0.75 mean fitness

0.5

fraction of total rules 0.25

0 0 0.25 0.5 0.75 1

λ

Figure Results from our exp eriment in which a diversity

enforcement mechanism was added to the GA The histogram plots

the frequencies of rules merged from the entire p opulation at gener

ation of runs in which our diversityenforcement scheme was

used

Adding A DiversityEnforcement Mechanism

The description given ab ove of the four ep o chs in the GAs search explains the results of

our exp eriment but it do es not explain the dierence b etween our results and those of the

original exp eriment rep orted in One dierence b etween our GA and the original was

the inclusion in the original of a diversityenforcement scheme that p enalized newly formed

rules that were to o similar in Hamming distance to existing rules in the p opulation To

test the eect of this scheme on our results in one set of exp eriments we included a similar

scheme In our scheme every time a new string is created through crossover and mutation

the average Hamming distance b etween the new string and the elite stringsthe strings

that are copied unchangedis measured If this average distance is less than of the

string length here bits then the new string is not allowed in the new p opulation New

strings continue to b e created through crossover and mutation until new strings have met

this diversity criterion We note that many other diversityenforcement schemes have b een

develop ed in the GA literature eg crowding

The results of this exp eriment are given in Figure The histogram in that gure

represents the merged rules from the entire p opulation at generation of runs of the

GA using the prop ortionaltness function and our diversityenforcement scheme As can

b e seen the histogram in this gure is very similar to that in Figure b The only ma jor

dierence is the signicantly lower mean tness in the middle and leftmost bins which results

from the increased requirement for diversity in the nal nonelite p opulation We conclude

that the use of this diversityenforcement scheme was not resp onsible for the dierence

b etween the results from and our results

Dierences Between Our Results and the Original Exp eriment

As was seen in Figure b our results are strikingly dierent from those rep orted in

These exp erimental results along with the theoretical argument that the most successful

rules for this task should have close to lead us to conclude that the interpretation of

the original results as giving evidence for the hyp otheses concerning evolution computation

and is not correct However we do not know what accounted for the dierences b etween

our results and those obtained in the original exp eriment We sp eculate that the dierences

are due to additional mechanisms in the GA used in the original exp eriment that were not

rep orted in

Although the results were very dierent there is one qualitative similarity the rule

frequencyversus histograms in b oth cases contained two p eaks separated by a dip in the

center As already noted in our histogram the two p eaks were closer to by a

factor of but it is p ossible that the original results were due to a mechanism similar to

i the ep o ch sensitivity to initial conguration and p opulation asymmetry ab out

or ii the symmetry breaking we observed in ep o ch as describ ed ab ove Perhaps these

were combined with some additional force in the original GA that kept rules far away from

Unfortunately the b est and mean tnesses for the bins were not rep orted for

the original exp eriment As a consequence we do not know whether or not the p eaks in

the original histogram contained hightness rules or even if they contained rules that were

more t than rules in other bins Our results and the basic symmetry in the problem suggest

otherwise

General Discussion

What We Have Shown

The results rep orted in this pap er have demonstrated that the results from Packards original

exp eriment do not hold up under our exp eriments We conclude that the original exp eriment

do es not give rm evidence for the hyp otheses it was meant to test rst that rules capable of

p erforming complex computation are most likely to b e found close to values and second

c

that when CA rules are evolved by a GA to p erform a nontrivial computation evolution will

tend to select rules close to values

c

As we argued theoretically and as our exp erimental results suggest the most successful

rules for p erforming a given classication task will b e close to a particular value of that

dep ends on the particular of the task Thus for this class of computational tasks the

c

values asso ciated with an edge of chaos are not correlated with the ability of rules to

c

p erform the task

In the remainder of this section we step back from these particular exp eriments and

discuss in more general terms the ideas that motivated these studies

Dynamical Behavior and Computation

As was noted earlier Langton presented evidence that given certain caveats regarding the

radius r and numb er of states K there is some correlation b etween and the b ehavior

of an average CA on an average initial conguration Behavior was characterized

in terms of quantities such as singlesite entropy twosite mutual information dierence

pattern spreading rate and average transient length The correlation is quite go o d for very

low and very high values which predict xedp oint or shortp erio d b ehavior However

for intermediate values there is a large degree of variation in b ehavior Moreover there

is no precise correlation b etween these values and the lo cation of a b ehavioral phase

transition other than that describ ed by Wo otters and Langton in the limit of innite K

The remarks ab ove and all the exp erimental results in are concerned with the re

lationship b etween and the dynamical b ehavior of CA They do not directly address the

relationship b etween and computational capability of CA The basic hyp othesis was that

correlates with computational capability in that rules capable of complex and in particular

universal computation must b e or at least are most likely to b e found near values As

c

10

far as CA are concerned the hyp othesis was based on the intuition that complex compu

tation cannot b e supp orted in the shortp erio d or chaotic regimes b ecause the phenomena

that apparently o ccur only in the complex nonp erio dic nonchaotic regimes such as

long transients and long spacetime correlation are necessary to supp ort complex computa

tion There has thus far b een no exp erimental evidence correlating with an indep endent

measure of computation Packards exp eriments were intended to address this issue since

they involved an indep endent measure of computationp erformance on a particular complex

computational taskbut as we have shown these exp eriments do not provide evidence for

the hyp othesis linking values with computational ability In order to test this hyp othesis

c

more general measures of computation need to b e used

The argument that complex computation cannot o ccur in the shortp erio d or chaotic

regimes may seem intuitively correct but there is actually a theoretical framework and

strong exp erimental evidence to the contrary Hanson and Crutcheld have devel

op ed a metho d for ltering out chaotic domains in the spacetime diagram of a CA some

times revealing particles that have the nonp erio dic nonchaotic prop erties of structures

in Wolframs Class CA That is with the appropriate lter applied complex structures

can b e uncovered in a spacetime diagram that to the human eye and to the statistics used

in app ears to b e completely random As an extreme example it is conceivable that

such lters could b e applied to a seemingly chaotic CA and reveal that the CA is actually

implementing a universal computer with guns implementing AND OR and NOT

gates etc Hanson and Crutchelds results strikingly illustrate that apparent complexity

of b ehaviorand apparent computational capabilitycan dep end on the implicit lter

imp osed by ones chosen statistics

10

In the context of continuousstate dynamical systems it has b een shown that there is a direct relationship

b etween intrinsic computational capability of a pro cess and the degree of of that pro cess at the

phase transition from order to chaos Computational capability was quantied with the statistical complexity

a measure of the amount of memory of a pro cess and via the detection of an emb edded computational

mechanism equivalent to a stack automaton

What Kind of Computation in CA Do We Care Ab out

In the section ab ove the phrases complex computation and computational capability

were used somewhat lo osely As was discussed in Section there are at least three dierent

interpretations of the notion of computation in CA The notion of a CA b eing able to

p erform a complex computation such as the task where the CA p erforms the

c

same computation on all initial congurations is very dierent from the notion of a CA

b eing capable under some sp ecial set of initial congurations of simulating a universal

computer Langtons sp eculations regarding the relationship b etween dynamical b ehavior

and computational capability seemed to b e more concerned with the latter than the former

though the implication is that the capability to sustain long transients long correlation

lengths and so on are necessary for b oth notions of computation

If computationally capable is taken to mean capable under some initial congura

tions of universal computation then one might ask why this is a particularly imp ortant

prop erty of CA on which to fo cus In CA were used merely as a vehicle to study

the relationship b etween phase transitions and computation with an emphasis on universal

computation But for those who want to use CA as scientic mo dels or as practical compu

tational to ols a fo cus on the capacity for universal computation may b e misguided If a CA

is b eing used as a mo del of a natural pro cess eg turbulence then it is currently of limited

interest to know whether or not the pro cess is in principle capable of universal computation

if universal computation will arise only under some sp ecially engineered initial conguration

that the natural pro cess is extremely unlikely to ever encounter Instead if one wants to

understand emergent computation in natural phenomena as mo deled by CA then one should

try to understand what computation the CA intrinsically do es rather than what it

is in principle capable of doing only under some very sp ecial initial congurations Thus

understanding the conditions under which a capacity for universal computation is p ossible

will not b e of much value in understanding the natural systems mo deled by CA

This general p oint is neither new nor deep Analogous arguments have b een put forth

in the context of neural networks for example While many constructions have b een made

of universal computation in neural networks eg some psychologists eg have

argued that this has little to do with understanding how brains or minds work in the natural

world

Similarly if one wants to use a CA as a parallel computer for solving a real problem

such as face recognitionit would b e very inecient if not practically imp ossible to solve

the problem by say programming Conways Game of Life CA to b e a universal computer

that simulates the action of the desired face recognizer Thus understanding the conditions

under which universal computation is p ossible in CA is not of much practical value either

In addition it is not clear that anything like a drive toward universalcomputational

capabilities is an imp ortant force in the evolution of biological organisms It seems likely

that substantially less computationallycapable prop erties play a more frequent and robust

role Thus asking under what the conditions evolution will create entities including CA

capable of universal computation may not b e of great imp ortance in understanding natural

evolutionary mechanisms

In short it is mathematically imp ortant to know that some CA are in principle capable

of universal computation But we argue that this is by no means the most scientically

interesting prop erty of CA More to the p oint this prop erty do es not help scientists much

in understanding the emergence of complexity in nature or in harnessing the computational

capabilities of CA to solve real problems

Conclusion

The main purp ose of this study was to examine and clarify the evidence for various hyp othe

ses related to evolution dynamics computation and cellular automata We hop e this study

has shed some new and constructive light on these issues As a result of our study we have

identied a numb er of evolutionary mechanisms such as the role of combinatorial drift and

the role of symmetry and the imp ediments to emerging computational strategies caused by

symmetry breaking For example we have found that the breaking of the goal tasks sym

metries in the early generations can b e an imp ediment to further optimization of individuals

in the p opulation The symmetry breaking results is a kind of sub optimal sp eciation in the

p opulation that is stable or at least metastable over long times The symmetrybreaking

eects we describ ed here may b e similar to symmetrybreaking phenomena such as bilateral

symmetry and handedness that emerge in biological evolution It is our goal to develop a

more rigorous framework for understanding these mechanisms in the context of evolving CA

We b elieve that a deep understanding of these mechanisms in this relatively simple context

can yield insights for understanding evolutionary pro cesses in general and for successfully

applying evolutionarycomputation metho ds to complex problems

Though our exp eriments did not repro duce the results rep orted in we b elieve that

the original conception of using GAs to evolve computation in CA is an imp ortant idea

Aside from its p otential for studying various theoretical issues it also has a p otential prac

tical side that could b e signicant As was mentioned earlier CA are increasingly b eing

studied as a class of ecient parallel the main b ottleneck in applying CA more

widely to parallel computation is programmingin general it is very dicult to program CA

to p erform complex tasks Our results suggest that the GA has promise as a metho d for ac

complishing such programming automatically In order to further test the GAs eectiveness

as compared with other search metho ds we p erformed an additional exp eriment comparing

the p erformance of our GA on the task with the p erformance of a simple steep est

c

ascent hillclimbing metho d We found that the GA signicantly outp erformed hillclimbing

reaching much higher tnesses for an equivalent numb er of tness evaluations This gives

some evidence for the relative eectiveness of GAs as compared with simple gradient ascent

metho ds for programming CA Koza has also evolved CA rules using a very dierent

typ e of representation scheme it is a topic of substantial practical interest to study the

relationship of representation and GA success on such tasks

Acknowlegments

This research was supp orted by the Santa Fe Institute under the Adaptive Computation and

External Faculty Programs and by the University of California Berkeley under contract

AFOSR Thanks to Doyne Farmer Jim Hanson Erica Jen Chris Langton Wentian

Li Cris Mo ore and for many helpful discussions and suggestions concerning

this pro ject Thanks also to Emily Dickinson and Terry Jones for technical advice

References

E Berlekamp J H Conway and R Guy Winning ways for your mathematical plays Aca

demic Press New York NY

J P Crutcheld and J E Hanson Turbulent pattern bases for cellular automata Technical

Rep ort SFI Santa Fe Institute

J P Crutcheld and K Young Inferring statistical complexity Physical Review Letters

J P Crutcheld and K Young Computation at the onset of chaos In W H Zurek edi

tor Complexity Entropy and the of Information pages Addison Wesley

Redwo o d City CA

P Gonzaga de Sa and C Maes The GacsKurdyumovLevin automaton revisited Journal of

Statistical Physics

D Farmer T Tooli and S Wolfram editors Cel lular Automata Proceedings of an Inter

disciplinary Workshop North Holland Amsterdam

P Gacs G L Kurdyumov and L A Levin Onedimensional uniform arrays that wash out

nite islands Probl Peredachi Inform

D E Goldb erg Genetic Algorithms in Search Optimization and Machine Learning Addison

Wesley Reading MA

J Guckenheimer and P Holmes Nonlinear Oscil lations Dynamical Systems and Bifurcations

of Vector Fields SpringerVerlag New York

H A Gutowitz editor Cel lular Automata MIT Press Cambridge MA

J E Hanson and J P Crutcheld The attractorbasin p ortrait of a

Journal of

J H Holland Adaptation in Natural and Articial Systems The University of Michigan Press

Ann Arb or MI

S A Kauman Requirements for evolvability in complex systems Orderly dynamics and

frozen comp onents Physica D

S A Kauman and S Johnson Coevolution to the edge of chaos Coupled tness landscap es

p oised states and coevolutionary avalanches In C G Langton G Taylor J Doyne Farmer

and S Rasmussen editors Articial Life II pages Redwo o d City CA Addison

Wesley

J R Koza Genetic programming On the programming of computers by means of natural

selection MIT Press Cambridge MA

C G Langton Computation at the edge of chaos Phase transitions and emergent computa

tion Physica D

W Li Nonlo cal cellular automata In L Nadel and D Stein editors Lectures in

Complex Systems pages Addison Wesley Redwo o d City CA

W Li and N H Packard The structure of the elementary cellular automata rule space

Complex Systems

W Li N H Packard and C G Langton Transition phenomena in cellular automata rule

space Physica D

K Lindgren and M G Nordahl Universal computation in a simple onedimensional cellular

automaton Complex Systems

N Packard Complexity of growing patterns in cellular automata In J Demongeot E Goles

and M Tchuente editors Dynamical Behavior of Automata Theory and Applications Aca

demic Press New York

N H Packard Personal communication

N H Packard Adaptation toward the edge of chaos In J A S Kelso A J Mandell and

M F Shlesinger editors Dynamic Patterns in Complex Systems pages Singap ore

World Scientic

J B Pollack The induction of dynamical recognizers Machine Learning

K Preston and M Du Modern Cel lular Automata Plenum New York

A Rosenfeld Parallel image pro cessing using cellular arrays Computer

D E Rumelhart and J L McClelland PDP mo dels and general issues in cognitive

In D E Rumelhart and J L McClelland editors Paral lel Distributed Processing volume

Cambridge MA MIT Press

H Siegelman and E D Sontag Neural networks are universal computing devices Technical

Rep ort SYCON Rutgers Center for Systems and Control Rutgers University New

Brunswick NJ

T Tooli and N Margolus Cel lular Automata Machines A new environment for modeling

MIT Press Cambridge MA

S Wolfram Universality and complexity in cellular automata Physica D

S Wolfram editor Theory and applications of cel lular automata World Scientic Singap ore

W K Wo otters and C G Langton Is there a sharp phase transition for deterministic cellular

automata Physica D

A Wuensche and M Lesser The global dynamics of cel lular automata Santa Fe Institute

Studies in the of Complexity Addison Wesley Reading MA