<<

Error Thresholds and their Relation to

Optimal Rates

Gabriela Ochoa Inman Harvey and Hilary Buxton

Centre for the Study of

Centre for Computational Neuroscience and Rob otics

COGS The University of Sussex

Falmer Brighton BN QH UK

fgabroinmanhhilarybgcogssusxacuk

Abstract The error threshold a notion from molecular evolution

is the critical mutation rate b eyond which structures obtained by the

evolutionary pro cess are destroyed more frequently than selection can

repro duce them We argue that this notion is closely related to the more

familiar notion of optimal mutation rates in Evolutionary Algorithms

EAs This corresp ondence has b een intuitively p erceived b efore

However no previous study to our knowledge has b een aimed at

explicitly testing the hyp othesis of such a relationshi p Here wepropose

a metho dology for doing so Results on a restricted range of tness land

scap es suggest that these two notions are indeed correlated There is not

however a critically precise optimal mutation rate but rather a range

of values pro ducing similar nearoptimal p erformance When recombi

nation is used b oth error thresholds and optimal mutation ranges are

lower than in the asexual case This knowledge mayhave b oth theoret

ical relevance in understanding EA b ehavior and practical implicati ons

for setting optimal values of evolutionary parameters

Intro duction

The error threshold a notion from molecular evolution is the critical mu

tation rate b eyond which structures obtained by the evolutionary pro cess are

destroyed more frequently than selection can repro duce them With mutation

rates ab ove this critical value an optimal solution would not b e stable in the

p opulation ie the probability that the p opulation loses these structures is no

longer negligible On the other hand an optimum mutation rate a more fa

miliar notion within the EAs community isthemutation value whichsolves

a sp ecied search or optimization problem with optimal eciency that is with

the least numb er of generations or function evaluations

The notion of error threshold seems to b e intuitively related to the idea of an

optimal balance b etween exploitation and exploration in genetic search In this

sense we argue that optimal mutation rates are related to error thresholds The

aim of this pap er is to test this hyp othesis using an empirical approach together

with knowledge from molecular evolution theory

Optimal parameter settings have b een the sub ject of numerous studies within

the EA community and particular emphasis has b een placed on

nding optimal mutation rates There is however no conclusive agreement

on what is b est most p eople use what has worked well in previously rep orted

cases It is very dicult to formulate a priori general principles ab out parameter

settings in view of the variety of problem typ es enco dings and p erformance cri

teria p ossible in dierent applications Our hyp othesis that optimal mutation

rates are correlated to the notion of error thresholds promises practical rele

vance and useful guidelines in nding optimal parameter settings thus enhancing

evolutionary search

In the remainder of the pap er we summarize the knowledge from molecular

evolution relevant to our argument the notions of quasispecies and error thresh

olds we discuss the relation b etween error thresholds and optimal mutation

rates and we describ e the tness landscap e used for our exp eriments the Royal

Staircase functions Thereafter we describ e the empirical metho dology used to

test the hyp othesis under studywe present the exp erimental results obtained

and we discuss the insight gained

Quasisp ecies And Error Thresholds

The concept of a quasisp ecies was develop ed in the context of p olynucleotide

replication and in particular studies of early RNA evolution A pro

tein space or more generally a sequence space can b e mo delled as the space

of all p ossible sequences of length drawn from a nite alphab et of size A

Each sequence has a tness value which sp ecies its replication rate or exp ected

numb er of ospring p er unit time The tnesses of all A p ossible sequences

dene a tness landscap e When A a binary alphab et the tness land

scap e is equivalent to sp ecifying tness valuesateachvertex of a dimensional

hyp ercub e with some mathematical imagination and some caution this

can b e pictured as spread out over a geographical landscap e where tness is

analogous to height and the dynamics of evolution of a p opulation corresp onds

to movement of the p opulation over such a landscap e

Given an innite p opulation and a sp ecied mutation rate governing errors

in asexual replication one can determine the stationary sequence distribution

reached after any transients from some original distribution have died away

Unless the mutation rate is to o large or dierences in tnesses to o small

the p opulation will typically cluster around the ttest sequences forming a

concentrated cloud the average Hamming distance b etween twomemb ers of

such a distribution drawn at random will b e relatively small Such a clustered

distribution is called a quasisp ecies As the mutation rate is increased the

lo cal distribution widens and ultimately loses its hold on the lo cal optimum

This can b e seen at its clearest in an extreme form of a tness landscap e

whichcontains a single p eak of tness all other sequences having a tness

of With an innite p opulation there is a phase transition at a particular error

rate p the mutation rate at eachofthe lo ci in a sequence In this critical

error rate the error threshold is determined analytically Equation and it

is dened as the rate ab ovewhich the prop ortion of the innite p opulation on

the p eak drops to chance levels

ln

p

In equation represents the selectiveadvantage of the master sequence

over the rest of the p opulation and the chromosome length In the simplest

case is the ratio of the master sequence repro duction rate tness to the

average repro duction rate of the rest

Error Thresholds In Finite Populations

In the calculations of an error threshold for innite asexually replicating

p opulations are extended to nite p opulations we shall call the critical rate p

M

for a p opulation of size M Finite p opulations lose grip on the solitary spike

of sup erior tness easily b ecause of the added hazard of natural uctuations

in this case In we derived a reformulation of the Nowak and Schuster

analytical expression This new expression equation explicitly approximates

the extent of the reduction in the error threshold as wemove from innite to

nite p opulations The expression strictly should b e an innite series in which

successive terms get smaller here we are ignoring all after the rst few

p p

ln ln

p p

p

M



M M

Error Thresholds and Optimal Mutation Rates

The notion of error threshold seems to b e intuitively related to the idea of an

optimal balance b etween exploitation and exploration in genetic search Too low

amutation rate implies to o little exploration in the limit of zero mutation suc

cessive generations of selection removeallvariety from the p opulation and once

the p opulation has converged to a single p ointingenotyp e space all further ex

ploration ceases On the other hand clearlymutation rates can b e to o excessive

in the limit where mutation places a randomly chosen allele at every lo cus on an

ospring genotyp e then the evolutionary pro cess has degenerated into random

search with no exploitation of the acquired in preceding generations

Any optimal mutation rate must lie b etween these two extremes but its

precise p osition will dep end on several factors including in particular structure

of the tness landscap e It can however b e hyp othesized that where evolution

pro ceeds through a successiveaccumulation of information then a mutation rate

close to the error threshold is an optimal mutation rate for the landscap e under

study since this should maximise the search done through mutation sub ject to

the constraint of not losing information already gained The main purp ose of

our pap er is to empirically test this hyp othesis section

Some biological evidence supp orts the relationship b etween error thresholds

and optimal mutation rates Eigen and Schuster havepointed out that viruses

which are very eciently evolving entities live within and close to the error

thresholds given by the known rates of nucleotide This corresp on

dence has also b een noticed b efore in the GA community Hesser and Manner

devised a heuristic formula for optimal setting of mutation rates inspired by

previous work on error thresholds Kauman p also suggest a

relationship b etween these two notions

Royal Staircase Functions

van Nimwegen and Crutcheld prop osed the Royal Staircase functions for

analyzing ep o chal evolutionary search This class of functions are related to the

previous Royal Road functions In the authors justify their particular

choice of tness function b oth in terms of biological motivations and in terms of

articial evolution issues In short many biological systems and articial evolu

tion problems have highly degenerate genotyp etophenotyp e maps that is the

mapping from genetic sp ecication to tness is a manytoone function Conse

quently the numb er of dierent tness values that genotyp es can takeismuch

smaller than the numb er of dierentgenotyp es Moreover due to its high dimen

sionality it is p ossible for the genotyp e to break into networks of connected

sets of equaltness genotyp e that can reacheach other via elementary genetic

variation steps suchaspointmutation These connected subsets of isotness

genotyp es are referred to as neutral networks

Our pap er is guided by the working hyp othesis that many real searchprob

lems have genotyp e search spaces that decomp ose into a number of such neutral

networks Such neutrality has b een observed in problem domains as diverse as

molecular folding evolvable hardware and evolutionary rob otics One

symptom of evolutionary search where neutral networks are imp ortant is that

of long p erio ds of sometimes noisy tness stasis search along a neutral

network punctuated by o ccasional tness leaps transitions to a higher neu

tral network The Royal Staircase class of tness functions capture the essential

elements discussed ab ove and are suitable for evaluating our hyp othesis They

are dened as follows

Genotyp es are sp ecied by binary strings s s s s s f gof

  L i

length L NK

Starting from the rst p osition the number I s of consecutive s in a string

is counted

The tness f s of string s with I s consecutive ones followed by a zero

is f sbI sK c The tness is thus an integer b etween and N

corresp onding to plus the numb er of consecutive fullyset blo cks starting

from the left

L

The single global optimum is s namely the string of all s

Fixing N number of blocks and K bits p er blo ck determines a particular problem or tness landscap e

Exp erimental Design

The approachtaken here is to indep endently assess error thresholds and optimal

mutation rates comparing then these two measures

For the exp eriments weusedRoyal Staircase functions section The ratio

nale for this choice is twofold First b ecause we agree with their prop osers

that these functions despite their simplicityhave some ingredients encountered

in evolutionary search problems Secondly b ecause Staircase functions havea

step feature similar to that of single p eak landscap es Theoretical results on er

ror thresholds are available for single p eak landscap es Error thresholds can b e

extended to other landscap es however a degree of ruggedness is needed

We used a generational GA with tness prop ortional selection and without

elitism Fitness functions were Staircase functions Sp ecicallywe tested dif

ferent functions choices of N and K N K and N K

Population size was genetic op erators were standard bit mutation and two

point crossover with a rate of Several mutation rates were tested from

to exp ected mutations p er bit The algorithm was run in two mo des Asexual

using mutation only and Sexual using b oth mutation and recombination Each

run lasted a maximum of generations

Empirically Determined Optimal Mutation Rates

For the purp ose of this pap er we dened the optimal mutation rate as that

which nds the p eak on average with the least numb er of generations

For determining optimal mutation rates as dened ab ove we ran the GA

starting from a random p opulation and stored the generation numb er at which

the p eak was attained for the rst time This measure was averaged over

trials for eachmutation rate tested

Empirically Determined Error Thresholds

The error threshold is the critical mutation rate b eyond which structures ob

tained by the evolutionary pro cess are destroyed more frequently than selection

can repro duce them Aiming at capturing this denition in an algorithm we

designed the following metho d for empirically estimating error thresholds

For the selected range of mutation rates

Start from a p opulation of all s that is all memb ers on the p eak

Run the GA for a maximum of generations or until the whole p opula

tion has completely lost the p eak

Counthowmany times out of trials the p opulation completely loses

the p eak

For lowmutation rates at least one memb er of the p opulation is on the

p eak during all the generations for all the trials As the mutation rate is

increased there is a p oint where the p opulation completely loses the p eak for

some or all the trials The error threshold is identied as the mutation rate

where this transition o ccurs The observations are approximate in that rstly

precision is limited to the mutation step size used and secondly the limit of

generations was assumed to b e suciently long for the purp ose

Validating the Empirical Metho d For validating the empirical metho d de

scrib ed ab ove we designed the following exp eriment we considered a single p eak

landscap e with and for distinct string lengths and

we calculated the error threshold analytically using equation valid for nite

asexual p opulations The p opulation size M was Results of these calcula

tions are shown in the second column of table Next for the the same landscap e

and settings we estimated the error thresholds empirically following our metho d

Results of these estimations are shown in the third column of the table There is

reasonable agreementbetween analytical and empirical gures though worst at

short string lengths The empirical gures were always higher than the analytical

ones which is related to the limited numb er of generations used The higher the

numb er of generations the lower one should exp ect the empirical gure to b e

the analytical gure assumes an innite numb er of generations

Table Comparing Analytical and Empirical Error Thresholds on a Single Peak

Landscap e

String Length Analytical Empirical

Results

Figure summarizes exp erimental results for the six landscap es studied The

curves show the numb er of generations to reach the global p eak as a function

of the mutation rate for asexual and sexual p opulations Eachdatapoint gives

the numb er of generations for nding the p eak averaged over runs Optimal

mutation rates are those which nd the p eak with the least number of gen

erations Two general trends maybeobserved if one excludes the rst result

shown First there is not a single critically precise optimal mutation rate but

instead a range of mutation values pro ducing nearoptimal results The curves

are Ushap ed with a at b ottom Secondly the curves for sexual p opulations

are shifted to the left that is to lower mutation values when compared to the

asexual p opulation curves

In the plots we indicate the empirically estimated error thresholds for asexual

solid arrows and sexual dotted arrows p opulations Error thresholds for the

landscap es studied were found to b e within the range of optimal mutation rates

for b oth asexual and sexual p opulations The results supp ort the hyp othesized

relationship b etween these two measures

An exception to these general trends is the plot for N K This

scenario is equivalent to a single p eak landscap e where a single string has the

highest tness and all the others strings have the same but lower tness The

rationale for hyp othesising that optimal mutation rates are correlated with op

timal mutation rates Section was that of maximising search sub ject to the

constraint of not losing information already gained However this landscap e is

an extreme case where there is no intermediate step no accumulation of in

formation whichmight b e lost through excessivemutation The p eak is found

randomly without any gradual approach Here high mutation rates close to

were optimal for b oth asexual and sexual p opulations

Notice that error thresholds for sexual p opulations were in all cases lower

than for asexual p opulations A similar trend is observed for optimal mutation

rates this is compatible with the hyp othesized correlation b etween error thresh

olds and optimal mutation rates

The exp eriments determining optimal mutation rates showed standard devi

ations of the same order as the average numb er of generations measured Thus

there were large runtorun variations in the time to reach the optimal string

This observation was also rep orted byvan Nimwegen and Crutcheld who p er

formed similar exp eriments using the Royal Staircase function Figure

shows standard deviations for the N K landscap e

Discussion

Our results suggest that error thresholds and optimal mutation rates are indeed

correlated This empirical evidence supp orts previous intuitions exp ecting this

correlation There is not however a single critically precise

value for the optimal mutation rate but instead a range of values pro ducing

nearoptimal p erformance The error threshold on the other hand is a more

precise measure Hence mutation rates slightly lower or higher than estimated

error thresholds are likely to pro duce nearoptimal results

The implication of this nding is twofold First theoretically in helping to

understand EAs b ehavior as insights regarding error thresholds will b e reected

in our understanding of optimal mutation rates Second practically as heuristics

for nding error thresholds will provide useful guidelines for setting optimal

mutation rates thus improving the p erformance of EAs

In our exp eriments b oth error thresholds and optimal mutation ranges were

lower for sexual compared to asexual p opulations This result has b een observed

b efore a recentwork from the evolutionary biology literature studied the

role of recombination on evolving p opulation of viruses particularly the eect

of recombination on the magnitude of the error threshold They rep ort that

recombination shifts the error threshold to lower mutation rates Moreover in

now in the realm of genetic algorithms we found that recombination shifts N = 1, K = 10 N = 2, K = 10

250 1200 Asex Asex 1000 200 Sex Sex 800 150 600 0.055 0.09 0.11 0.035 100 400

50 200

0 0 0.000 0.001 0.005 0.010 0.020 0.030 0.040 0.050 0.060 0.070 0.080 0.090 0.100 0.110 0.120 0.130 0.140 0.000 0.001 0.005 0.010 0.030 0.050 0.070 0.090 0.110 0.130 0.150 0.170 0.190 0.210

N = 3, K = 10 N = 4, K = 5

3500 500 Asex Asex 3000 Sex 400 Sex 2500

2000 300 0.045 0.025 0.035 0.035 1500 200 1000 100 500 0 0 0.000 0.001 0.005 0.010 0.015 0.020 0.025 0.030 0.035 0.040 0.045 0.050 0.055 0.060 0.065 0.070 0.000 0.001 0.005 0.010 0.030 0.050 0.070 0.090 0.110 0.130 0.150 0.170 0.190

N = 5, K = 5 N = 6, K = 5

500 1400 Asex Asex 1200 400 Sex Sex 1000 300 0.03 0.035 800 0.025 0.025 200 600 400 100 200 0 0 0.000 0.001 0.005 0.010 0.020 0.030 0.040 0.050 0.060 0.070 0.080

0.000 0.001 0.005 0.010 0.020 0.030 0.040 0.050 0.060 0.070 0.080 0.090 0.100

Fig For the six dierent landscap es explored plots showthenumb er of generations

for nding the p eak Y axis as a function of the mutation rate X axis note the

scale is not linear at the lower end of the axis for b oth asexual and sexual p opulation s

Error thresholds are indicated by solid vertical arrows asexual and dotted vertical

arrows sexual N = 2, K = 10

1400 Asex 1200 Sex

1000

800

600

400

200

0

-200

0.000 0.001 0.005 0.010 0.020 0.030 0.040 0.050 0.060 0.070 0.080 0.090 0.100 0.110 0.120 0.130 0.140

Fig To demonstrate the signicantvariance in the results for one of the landscap es

weshow error bars for one standard deviation in the numb er of generations for

nding the p eak

optimal mutation rates to lower values This evidence p oints indirectly towards

the relationship b etween the notions of error thresholds and optimal mutation

rates

In this pap er we explored a single family of tness functions namely Royal

Staircase functions Wehave found however compatible results using the Kau

man NK family of landscap es Moreover the metho ds discussed here can b e ap

plied elsewhere Error thresholds and hence the implications of our results hold

for landscap es with certain degree of discontinuityThus they are not applica

ble to smo oth monotonic landscap es or regions Results will also b e mo died

where there is elitism either explicit or that implicit in some tournament selec

tion algorithms for steadystate GAs

Finally there is a circularity implicit in using the metho d suggested here for

assessing optimal mutation rates in real problems as opp osed to toy landscap es

namely that one cannot estimate the error threshold until after one has found

the global p eak However knowledge of error thresholds in one region of the

landscap e or in one memb er of a class of landscap es may b e of guidance in

assessing error thresholds and thus optimal mutation rates elsewhere For this

to b e the case one needs to make some assumptions of statistical regularitybut

such assumptions are necessary anyway for EAs to b e practical at all So despite

this circularity there is the prosp ect that in real EA applications where the tness

landscap e is unknown except to the extent that it is sampled exp erimentallythe

metho dology given ab ove allows for exp erimental assessment of error thresholds

and thus guidance for setting optimal mutation rates

Acknowledgements Thanks to A Meier for supp ort and critical reading

References

M C Bo erlijst S Bonho eer and M A Nowak Viral quasisp ecies and recom

bination Proc R Soc London B

K A DeJong AnAnalysis of the Behavior of a Class of Genetic Adaptive Systems

PhD thesis UniversityofMichigan Ann Arb or MI

M Eigen Selforganization of matter and the evolution of biological macro

molecules Naturwissenschaften

M Eigen J McCaskill and PSchuster Molecular quasisp ecies J Phys Chem

M Eigen and PSchuster The Hypercycle A Principle of Natural Self

Organization SpringerVerlag

J Grefenstette Optimisation of control parameters for genetic algorithms IEE

Trans SMC

I HarveyP Husbands D Cli A Thompson and N Jakobi Evolutionary

rob otics the sussex approach Robotics and Autonomous Systems

I Harvey and A Thompson Through the labyrinth evolution nds a way A silicon

ridge In Proc of The First International Conference on Evolvable Systems From

Biology to Hardware ICES SpringerVerlag

J Hesser and R Manner Towards an optimal mutation probability for genetic

algorithms In HPSchwefel and R Manner editors Paral lel Problem Solving

from Nature SpringerVerlag Lecture Notes in Computer Science Vol

MA Huynen Exploring phenotyp e space through neutral evolution Technical

Rep ort Preprint Santa Fe Institute

S A Kauman The Origins of Order SelfOrganization and Selection in Evolu

tion Oxford University Press

J Maynard Smith and the concept of a space Nature

M Mitchell S Forrest and J H Holland The Royal Road for genetic algorithms

tness landscap es and GA p erformance In F J Varela and P Bourgine editors

Proceedings of the First European ConferenceonArticial Life TowardaPractice

of Autonomous Systems pages MIT Press Cambridge MA

M Nowak and PSchuster Error thresholds of replication in nite p opulation s

Mutation frequencies and the onset of Mullers ratchet J Theor Biol

G Ochoa and I HarveyRecombination and error thresholds in nite p opulation s

In W Banzhaf and C Reeves editors Foundations of Genetic Algorithms FOGA

Morgan Kauman

G Ochoa I Harvey and H Buxton On recombination and optimal muta

tion rates In Proceedings of Genetic and Evolutionary Computation Conference

GECCO To app ear

JD Schaer RA Caruana LJ Eshelman and R Das A study of control param

eters aecting online p erformance of genetic algorithms for function optimization

In J D Schaer editor Proceedings of the rdICGA Morgan Kaufmann

PSchuster Extended molecular evolutionary biology Articial life bridging the

gap b etween chemestry and biology Articial Life

E van Nimwegen and J P Crutcheld Optimizing ep o chal evolutionary search

Populationsi ze dep endent theoryTechnical Rep ort Preprint Santa Fe Institute