Regimes of asexual in rugged fitness landscapes

Macroevolution and

The adaptive landscape

Sequence space, fitness and Fisher-Wright sampling

Regimes of asexual population dynamics

Evolutionary trajectories in the quasispecies model

Joint work with Kavita Jain JSTAT (2005) P04008, q-bio.PE/0606025, q-bio.PE/0508008 Supported by DFG within SFB/TR 12 “Symmetries and Universality in Mesoscopic Systems” & SFB 680 “Molecular Basis of Evolutionary Innovations” Macroevolution: Episodic patterns in the fossil record

60

Number of extinct families of known marine organisms (Sepkoski, 1992) 40

Extinction rate is intermittent with power 20 law event size distribution per My (families)

Decline of the average 0 600 400 200 0

extinction rate age (My) non-stationary dynamics Newman & Eble (1999) Microevolution: of viral populations

Fitness decline induced in populations of bacteriophage φ6 by successive population bottlenecks

Fitness recovery through increasing population size

Step-like fitness changes reflect single deleterious or beneficial (compensatory)

Evidence for ruggedness of the fitness landscape

C.L. Burch, L. Chao, Genetics 151, 921 (1999) Microevolution: Adaptation of viral populations

Fitness decline induced in populations of bacteriophage φ6 by successive population bottlenecks

Fitness recovery through increasing population size

Step-like fitness changes reflect single deleterious or beneficial (compensatory) mutations

Evidence for ruggedness of the fitness landscape

C.L. Burch, L. Chao, Genetics 151, 921 (1999) Microevolution: Adaptation of bacterial populations

S.F. Elena, R.E. Lenski, Nature Reviews Genetics 4, 457 (2003)

12 populations of E. coli propagated in identical environments 1.6

1.5 • • • Step-wise increase of cell • • s • • s 1.4 • • size and fitness reflects e ••

n •

t •

i •

f •• • selection of rare beneficial 1.3 • • • • e • • v i • • t • mutations; 4 - 6 steps in • • • • • a • • l 1.2 •

e •

10000 generations r • • • 1.1 •

Fitness increase slows 1.0 • down, but rate of genetic 100 1000 10000 change does not time (generations) fit: Sibani, Brandt & Alstrøm (1998)

Parallel and divergent changes Adaptation as a hill-climbing process

S. Wright, 1932

fitness landscape

population

Initial population placed in a new environment Adaptation as a hill-climbing process

S. Wright, 1932

fitness landscape

phenotypic trait

Rapid adaptation to a nearby fitness maximum Adaptation as a hill-climbing process

S. Wright, 1932

fitness landscape

phenotypic trait

Stabilizing selection at the fitness peak; -selection balance Adaptation as a hill-climbing process

S. Wright, 1932

fitness landscape

phenotypic trait

Rare, rapid transition through non-adaptive zone to a higher fitness peak: Adaptation as a hill-climbing process

S. Wright, 1932

fitness landscape

phenotypic trait

Rare, rapid transition through non-adaptive zone to a higher fitness peak: (G.G. Simpson, 1944) Adaptation as a hill-climbing process

S. Wright, 1932

fitness landscape

phenotypic trait

Goal: To turn this mental imagery into a theory of adaptation Sequence space

Each individual carries a genetic sequence of length L

σ σ σ σ σ ¥ £ £ £ with ¢§¦ ¢§¨ ¢ © ¢ ¢ ¢ ¡ 1 2 ¤ L i

Simplification: Binary sequences σ i 0¢ 1

Point mutations change individual letters in the sequence:

σ σ i 1 i

The Hamming distance between two sequences σ σ is the number of ¢ letters in which they differ:

L

2

σ σ σ σ

d ¢ ∑ ¤ ¤ ¡ ¡ i i

i 1

L

Total number of sequences: S 2 volume

Maximal distance between two sequences: L log S diameter 2 ¡ ¤

infinite dimensionality! L-dimensional hypercubes/Hamming graphs Fitness

The fitness W σ of genotype σ is the expected number of offspring of ¡ ¤ an individual carrying σ

σ σ The mapping W is very complicated: ¡ ¤

genotype + fitness environment

Single peak landscapes: σ σ σ σ : master sequence W w d ¢ ¤ ¤ ¡ ¤ ¡ ¡ 0 0

Maximally rugged fitness landscape: Fitnesses W σ are uncorrelated ¡ ¤ quenched random variables drawn from a common distribution

random energy model (REM) of spin glasses (Derrida, 1981)

house of cards model of (Kingman, 1977) Evolution of asexual populations

Basic model: Stochastic Fisher-Wright sampling of a finite population

N individuals

1 generations

Each individual choses an ancestor from the preceding generation

Genotype σ is chosen with probability W σ ¡ ¤

Point mutations occur with probability µ per site and generation

Interplay of disorder with two distinct sources of fluctuations µ 1 N¢ ¤ ¡ ¡ Fixation

In the absence of mutations µ 0 the population becomes genetically ¤ ¡ ∞ homogeneous (monomorphic) for t

When a single mutant σ is introduced into a monomorphic population with

σ ∞ σ genotype , the outcome for t is either fixation (all ) or loss of the mutation (all σ)

σ σ Fixation of deleterious mutations (W W ) is exponentially unlikely ¡ ¤ ¡ ¤ for large N

σ σ Beneficial mutations W ¡ W are fixed with probability ¡ ¤ ¡ ¤ ¤ ¡

W σ

¡ ¤

Π σ σ

¢ 1 ¤ ¡ W σ ¡ ¤ Neutral evolution and adaptive walks

¢ £ In a flat fitness landscape [W ¡ £¤ ] the population diffuses randomly in sequence space ()

Average genetic distance between two individuals: (Derrida & Peliti 1991)

L 4µN

σ σ

d ¢ ¥ ¡ ¤ ¦ µ 2 1 § 4 N

µ

L N ¨ 1 d ¨ 1 population is monomorphic most of the ¥ ¦ time and evolves by fixation of rare neutral mutations

¢ £ In the presence of selection [ ¡ £ ¤ ] the population performs an uphill

W © adaptive walk by fixation of rare beneficial mutations (Gillespie, 1984)

Adaptive walks terminate at local fitness maxima ¢ 5 ¢ 6 ¢ 7 Adaptive walk: µ ¡ ¡ L 15 ¡ N 1024 ¡ 10 10 10

1 + X28795 (0-00------0--) × X26725 (0-01------0--) ✳ X21345 (1-00------0--) 0.5 ■ X6634 (1-10------0--) Ο X52 (1-10------1--)

0

1 + X28795 (--0----0----0--) × X24458 (--0----1----0--) ✳ X13912 (--0----1----1--) 0.5 ■ X6507 (--1----1----1--)

0

1 + X28795 (--0---0--0-----) × X25483 (--0---0--1-----) ✳ X11692 (--0---1--1-----) 0.5 ■ X2947 (--1---1--1-----)

0 1 10 100 1000 ¢ 4

“Clonal interference”: µ ¡ L 15 ¡ N 1024 10

1 + X28795 (--0------00---) × X20940 (--0------10---) ✳ X4688 (--1------00---) 0.5 ■ X159 (--1------01---)

0

1 + X28795 (--000---0-000--) × X20940 (--000---0-100--) ✳ X9195 (--001---0-000--) 0.5 ■ X4117 (--010---0-100--) Ο X1524 (--110---0-100--) ● X940 (--110---1-101--)

0

1 + X28795 (-----00----0---) × X14622 (-----00----1---) ✳ X2711 (-----10----1---) 0.5 ■ X5 (-----01----1---)

0 1 10 100 1000 ¢ 4

Locally deterministic evolution: µ ¡ L 6 ¡ N 16384 10

1 + X55 (00001-) ● X40 (01001-) × X28 (00101-) 0.5 ✳ X5 (00111-) ■ X4 (01000-) Ο X1 (11010-)

0

1 + X55 (--001-) × X28 (--101-) ✳ X5 (--111-) 0.5 ■ X2 (--010-)

0

1 + X55 (--00--) × X28 (--10--) ✳ X5 (--11--) 0.5

0 1 10 100 1000 Classification of evolutionary regimes

K. Jain, JK, q-bio.PE/0606025

Key parameters: Population size N, mutation probability µ per site & generation, sequence length L

LNµ: Number of mutants produced per generation

µ LN ¨ 1: Adaptive walk of a monomorphic population

µ 1 ¨ LN ¨ L: Stochastic regime with clonal interference

µ LN ¡ L: Deterministic evolution within a shell of size

lnN d ¡ ln µ ¢ ¢

around the dominating genotype

d L: Fully deterministic evolution quasispecies dynamics ¡

The quasispecies model

M. Eigen, Naturwissenschaften 58, 465 (1972); K. Jain, JK, q-bio.PE/0508008

Unnormalized population fraction σ satisfies linear time evolution Z ¢ t ¡ ¤

µ ¡ ¢ σ σ σ σ σ § ¢ ¢ Z ¢ t 1 ∑W M Z t ¡ ¤ ¡ ¤ ¡ ¤ ¡ ¤

σ

£ µ £ µ µ ¡ ¢ ¡ ¢ ¡ ¢

ˆ ˆ ˆ ˆ ˆ § diagonal, random Z t 1 E Z t ¢ E WM ¢ W : ¡ ¤ ¡ ¤

¥

µ σ σ σ σ σ σ ¤ ¤ d ¤ L d d ¢ ¢ ¢ ¡ ¡ ¡ ¡ ¢

Mutation matrix: σ σ µ µ ¦ µ M ¢ 1 ¡ ¤ ¤ ¡

σ σ

d ¤ ¡ ¢

σ δ σ µ 0 Initial condition σ σ after one generation ¢ Z ¢ 0 Z 1 ¡ ¤ ¡ ¤ ¤ 0

population fraction drops to 1 N at distance ¡

lnN

d ¦ d ¡ ln µ ¢ ¢

Ev olutionar 0.5 0.5 0.5 0 1 0 1 0 1 1 y tr ajector

ies

L are 10

15 ¢

deter

µ ministic

10

¥ ¢ 8 10

and

¥ ¢ 6 10

independent ¥ 100 ● Ο ■ ✳ × + ● Ο ■ ✳ × + ● Ο ■ ✳ × + X X X X X X X X X X X X X X X X X X 4 4928 28795 4928 28795 (0) 4928 28795 11920 5 5 9195 9195 9195 4688 4688 4688 5 1 1 (1) (0) (1) (1) (0) (1) (2) (2) (1) (1) (1) (1) (1) (1) (2) (7) (7) ▼ ▲ ∆ X X X 74 (2) 0 1 (10) (7)

of µ : 1000 The strong selection limit

JK, C. Karl, Physica A 318, 137 (2003)

µ For 0 mutations are irrelevant after the first generation: £ £ £ µ t 0 ¥ t 1 ¡ ¢ ¡ ¢

Z t Eˆ Z 0 Eˆ Z 1 ¡ ¤ ¡ ¤ ¡ ¤ ¡ ¡

¥ σ σ

t 1 d ¤ ¡ ¢

σ ¦ σ σ µ 0

Z ¢ t W W ¡ ¤ ¡ ¤ ¡ ¤ 0

Logarithmic population variables evolve linearly in time:

σ ¦ σ µ σ σ ¢ lnZ ¢ t lnW t 1 ln d ¡ ¤ ¡ ¤ ¡ ¤ ¡ ¤ ¢ ¢ 0

Change in µ is compensated by rescaling of time with ln µ ¢ ¢ ¢

Most populated genotype σ is determined by maximizing σ with t lnZ ¢ t ¤ ¡ ¤ ¡ respect to σ Geometry of the evolutionary race

lnZ(σ,t) σ*(t)

t Only the most fit mutant d=1 in each shell of constant 2 Hamming distance σ σ d ¢ 3 ¡ ¤ 0

needs to be considered L

system size 2 L

L ¢ ¢ σ £ σ σ σ ¡ lnZ ¡ t lnW t 1 d ¢ ¢ ¢ ¢ 0 Geometry of the evolutionary race

lnZ(σ,t) σ*(t)

t d=1 Contenders and 2 spectators: Contenders 3 σ are fitness records k with σ σ W ¡ W ¡ ¤ ¡ ¤

k 1 k

L ¢ ¢ σ £ σ σ σ ¡ lnZ ¡ t lnW t 1 d ¢ ¢ ¢ ¢ 0 Geometry of the evolutionary race

lnZ(σ,t) σ*(t)

Contenders and spectators: Contenders t σ d=1 are fitness records k with σ σ 2 W ¡ W ¡ ¤ ¡ ¤

k 1 k 3

Not all contenders are winners; some are bypassed

L ¢ ¢ σ £ σ σ σ ¡ lnZ ¡ t lnW t 1 d ¢ ¢ ¢ ¢ 0 Geometry of the evolutionary race

lnZ(σ,t)

Contenders and σ*(t) spectators: Contenders σ are fitness records k with σ σ W ¡ W ¡ ¤ ¡ ¤

k 1 k t d=1 Not all contenders are 2 3 winners; some are bypassed

The evolutionary trajectory is composed of non- bypassed fitness records; how many? L ¢ ¢ σ £ σ σ σ ¡ lnZ ¡ t lnW t 1 d ¢ ¢ ¢ ¢ 0 Bypassing for i.i.d. shell fitnesses

Simplification: Replace logarihmic shell fitnesses

F max lnW σ k ¡ ¤

σ σ 

d ¤ k ¢ ¡ 0

by i. i. d. random variables from a distribution P F ¡ ¤

The number of records is lnL with Poisson distribution

β β Numerical conjecture: Bypassing probability 1 with 1 depending on the tail of P F JK, C. Karl (2003) ¡ ¤

(i) exponential-like: β 1 2 ¡

¥ ¥ δ

1

(ii) power law: β δ 1 2δ 1 P F F ¤ ¡ ¤ ¡ ¤ ¡ ¡ ν

β ν ν § § ¢ (iii) bounded: 2 3 2 P F F ¡ F ¤ ¡ ¤ ¡ ¤ ¡ ¤ ¡ ¡

Fluctuations in the number of non-bypassed records are sub-Poissonian

Analytic proof by first-passage techniques C. Sire, S.N. Majumdar, D.S. Dean, J. Stat. Mech. (2006) L07001 Bypassing in sequence space

K. Jain, JK, JSTAT (2005) P04008

L Number of sequences at distance d from σ is 0 d ¡

Probability to find a new fitness record at d: (M.C.K. Yang, 1975)

L 1 2d L ¡ d ¡

¦ ∞

for P d L¢ d ¢ d L 1 2 ¡ ¤ ¡ ¡ ∑d L 1 d L ¡ 

k 1 k ¡

independent of the fitness distribution

Total number of records ¢ has mean and variance

2 2

¦ ¦ £ £ £ £ £ £ £ £ ¢ ¢ ¢

1 ln2 N 0 307 N¢ 3ln2 2 N 0 0794 N ¡ ¥ ¦ ¤ ¡ ¥ ¦ ¦ ¤ ¥

Bypassing reduces this to £ L for exponential-like fitness distributions, ¡¥¤ ¤

and to £ 1 for power-law fitness distributions ¡ ¤ Distribution of evolution times

T T T 3 2 1 fitness

time

Global fitness maximum is reached at time T1

Universal power law tail of distribution of time Tk of the k’th last jump: ¥

1 k ¢ ¡ P T T k ¡ ¤ k k

with prefactor depending on fitness distribution and sequence length.

Expected evolution time is infinite: T ∞ ¥ 1 ¦

P Exponential

P(T) o w 0.0001 1e-06 1e-05 er 0.001 0.01 0.1 1 la 1 w fitness fitness Nonlocal shell model, power law p(F) with exponent 2, 1000000 runs distr distr ib ib ution: ution: 10 P

P ¡ F

1

¤ ¡

T ¤

T

F

¥

¤

¡ δ

L

¡ 1

T

¢

2 100 P

1 ¡

T ¤ L=20 L=15 L=10

L=5

¤ L

2 ¥

L

δ ¡ T 2

P Exponential All scaled P(T) o w 0.0001 1e-06 1e-05 er 0.001 inter 0.01 0.1 10 1 la 1 w mediate fitness Nonlocal shell model, power law p(F) with exponent 2, P(T) scaled according to theory fitness maxima distr distr ib ib ution: ution: are 10 b P ypassed

P ¡ F

1

¤ ¡

T ¤

T

F

¥ ¤

in

¡ δ the

L

¡ 1

T ¢

po

2 100 w P er

1 ¡ T

la ¤ w L=20 L=15 L=10

L=5

case! ¤ L

2 ¥

L

δ ¡ T 2 Summary

Evolutionary biology & population genetics a rich source of stochastic many-body problems with disorder

State of the art largely restricted to neutral evolution, small number of sites

( ), non-interacting (=multiplicative) fitness landscapes L 1¢ 2

Distinct fluctuations from mutations & sampling noise

Different kinds of “thermodynamic limits”: ∞

N quasispecies model ∞

L infinite sites model

Future directions (in SFB 680) (i) Speed of adaptation in smooth landscapes (ii) Timing of adaptive events in the infinite sites model (iii) Modes of reproduction in diploids (sexual, parthenogenetic, selfing)