Evolutionary Dynamics ©David Krakauer Evolutionary Theory - specialities

/neutral • Phylogenetic reconstruction/ theory inference • • Niche Construction • Quasispecies theory • Phenotypic plasticity & learning • Game theory • Gene-Culture Coevolution • Hierarchical & (Price equation) • Evolutionary ecology (macro, etc)

• Adaptive Dynamics The Intellectual Foundations are simple, The Orbit of influence is wide... Evolutionary Stoichiometry

replication

ri gi 2gi −Energ−−−−y−+−−Res−−o−urces−−→ Evolutionary Stoichiometry

competition

c g + g ij g i j −−→ j Evolutionary Stoichiometry

mij gi gj −Ra−−dia−−tio−→n

H(i,j) L H(i,j) m = µ (1 µ) − ij − Evolutionary Stoichiometry

recombination b g + g ijl g j l −−→ i

bijl = 1, if i = j = l

1 1 H(j,l) b = (1 c) + c if i = j or i = l ijl 2 − 2 ￿ ￿ ￿ ￿ 1 H(j,l) b = c if H(i, j) + H(i, l) = H(j, l) ijl 2 ￿ ￿ Evolutionary Stoichiometry

Development

dijl ontogenetic : g + g p j l −−→ i

Typically Treated Thus

d g i p i −→ i ri gi 2gi −→ cij

gi + gj rj gj n genome−−→s g˙ = g (r f¯) i i i − n where f¯ = rigi and cij = 1 i ￿ gi(t = 0) = 1 i ￿ : Frequency dependent Replicator Equation r (g) g i 2g i −−−→ i cij gi + gj rj gj n genome−−→s

g˙i = gi(ri(g) f¯) n − where f¯ = ri(g)gi and cij = 1 i ￿ Evolutionary Game Theory: Frequency dependent Replicator Equation g˙ = g (r (g) f¯) i i i −

Payoff Matrix P = [pij] with linear payoffs: n ri(g) = gjpij j ￿ Evolutionary Game Theory: Frequency dependent Replicator Equation

n n n g˙ = g ( g p g g p ) i i j ij − j k jk j j ￿ ￿ ￿k Stable Equilibria of Replicator Eq. and Nash Equilibria

Mathematica Demo Mutation & Sequence Space Biophysics: Eigen et al. Proc. Natl. Acad. Sci. USA 85 (1988) 5915

I Z 10 ol o

0 00 0 000

FIG. 3. The iterative buildup of sequence space, starting with one position. Each additional position requires a doubling of the former diagram and to connect corresponding points in both diagrams (which represent nearest neighbors). The final hypercube of dimension v contains as subspaces ( )2,k hypercubes of dimension k. space concept to requires the introduction of a with x being a measure of deviation from tree-likeness. value topography. Value landscapes have rugged fractal (Likewise, x and y together measure the deviation from ideal structures, causing populations to accumulate on ridges and bundle-likeness.) For partly randomized bundles, x and y are peaks in the mountainous regions (12, 13), which is the deeper nonzero and of similar magnitude, with x (by definition) being reason for the metric nonuniformity commonly found in the smaller of both parameters. comparative sequence analysis. Why do we call this method statistical geometry? There are (4) different quartets that can be formed from a set of n Statistical Geometry sequences (e.g., 27,405 for n = 30 sequences). Hence, the averages of x, y, and !14(a + b + c + d) for a set of n Statistical geometry as such can be exemplified with mere sequences usually are statistically well-defined parameters. distance relationships. Two sequences define one distance; If a tree is constructed by compromises that yield an optimal three can always be fitted into a tripod diagram, because they fit, and x/y average values of -0.5 or higher are found, one yield three explicit equations for the three unknown seg- should be suspicious. Randomization then has proceeded so ments. The tripod, however, may be unrealistic, because the far that a tree cannot be discriminated from a bundle. On the precursor, the tripodal node, may not have existed. The truth other hand, one can prove mathematically (ref. 14; see also then emerges by adding a fourth sequence. Four sequences ref. 15 and references therein) that, if in a set of more than define six distances and hence match a diagram that, in four sequences all x values are zero while 9 is nonzero, the general, has six segments, as shown in Fig. 4a. The three total set has an exact tree-like topology. Unfortunately, types of segments can be obtained from statistical geometry based on distance only is not very sensitive in differentiating topologies, the main shortcoming AB + CD = a + b + c + d + 2x = S (small) being neglect of positional information. As explained above, such information is available from order relationships in AC + BD = a + b + c + d + 2y = M (medium) sequence space. In sequence space formally the procedure is analogous to AD + BC = a + b + c + d + 2x + 2y = L (large), that in distance space: For each quartet of sequences, we analyze the optimal network connecting the four sequences as 2x = L - M, 2y = L - S, and a + b + c + d = S + M in sequence space and try to reconstruct a geometry that is - L. representative for the whole family of sequences. We begin The diagram reduces to an ideal bundle if both x and y are with the case of binary (R, Y) sequences (Fig. 4b). There are zero and to a tree-like dendrogram, with finite branching eight distinguishable classes of positions in three categories: distance y, if only x is zero. The general "net" form in Fig. 0, all four sequences having equal occupation; 1, one se- 4a is due to the presence of reverse and parallel , quence differing from the three others (a, f3, 'y, 8); and 2, two A

3d !

By B

FIG. 4. Representative geometries of quartet combinations of sequences in distance space (a), RY sequence space (b), and AUGC sequence space (c). Biophysics: Eigen et al. Proc. Natl. Acad. Sci. USA 85 (1988) 5915

I Z 10 ol o

0 00 0 000

FIG. 3. The iterative buildup of sequence space, starting with one position. Each additional position requires a doubling of the former diagram and to connect corresponding points in both diagrams (which represent nearest neighbors). The final hypercube of dimension v contains as subspaces ( )2,k hypercubes of dimension k. space concept to evolution requires the introduction of a with x being a measure of deviation from tree-likeness. value topography. Value landscapes have rugged fractal (Likewise, x and y together measure the deviation from ideal structures, causing populations to accumulate on ridges and bundle-likeness.) For partly randomized bundles, x and y are peaks in the mountainous regions (12, 13), which is the deeper nonzero and of similar magnitude, with x (by definition) being reason for the metric nonuniformity commonly found in the smaller of both parameters. comparative sequence analysis. Why do we call this method statistical geometry? There are (4) different quartets that can be formed from a set of n Statistical Geometry sequences (e.g., 27,405 for n = 30 sequences). Hence, the averages of x, y, and !14(a + b + c + d) for a set of n Statistical geometry as such can be exemplified with mere sequences usually are statistically well-defined parameters. distance relationships. Two sequences define one distance; If a tree is constructed by compromises that yield an optimal three can always be fitted into a tripod diagram, because they fit, and x/y average values of -0.5 or higher are found, one yield three explicit equations for the three unknown seg- should be suspicious. Randomization then has proceeded so ments. The tripod, however, may be unrealistic, because the far that a tree cannot be discriminated from a bundle. On the precursor, the tripodal node, may not have existed. The truth other hand, one can prove mathematically (ref. 14; see also then emerges by adding a fourth sequence. Four sequences ref. 15 and references therein) that, if in a set of more than define six distances and hence match a diagram that, in four sequences all x values are zero while 9 is nonzero, the general, has six segments, as shown in Fig. 4a. The three total set has an exact tree-like topology. Unfortunately, types of segments can be obtained from statistical geometry based on distance only is not very sensitive in differentiating topologies, the main shortcoming AB + CD = a + b + c + d + 2x = S (small) being neglect of positional information. As explained above, such information is available from order relationships in AC + BD = a + b + c + d + 2y = M (medium) sequence space. In sequence space formally the procedure is analogous to AD + BC = a + b + c + d + 2x + 2y = L (large), that in distance space: For each quartet of sequences, we analyze the optimal network connecting the four sequences as 2x = L - M, 2y = L - S, and a + b + c + d = S + M in sequence space and try to reconstruct a geometry that is - L. representative for the whole family of sequences. We begin The diagram reduces to an ideal bundle if both x and y are with the case of binary (R, Y) sequences (Fig. 4b). There are zero and to a tree-like dendrogram, with finite branching eight distinguishable classes of positions in three categories: distance y, if only x is zero. The general "net" form in Fig. 0, all four sequences having equal occupation; 1, one se- 4a is due to the presence of reverse and parallel mutations, quence differing from the three others (a, f3, 'y, 8); and 2, two A

3d !

By B

FIG. 4. Representative geometries of quartet combinations of sequences in distance space (a), RY sequence space (b), and AUGC sequence space (c). Biophysics: Eigen et al. Proc. Natl. Acad. Sci. USA 85 (1988) 5915

I Z 10 ol o

0 00 0 000

FIG. 3. The iterative buildup of sequence space, starting with one position. Each additional position requires a doubling of the former diagram and to connect corresponding points in both diagrams (which represent nearest neighbors). The final hypercube of dimension v contains as subspaces ( )2,k hypercubes of dimension k. space concept to evolution requires the introduction of a with x being a measure of deviation from tree-likeness. value topography. Value landscapes have rugged fractal (Likewise, x and y together measure the deviation from ideal structures, causing populations to accumulate on ridges and bundle-likeness.) For partly randomized bundles, x and y are peaks in the mountainous regions (12, 13), which is the deeper nonzero and of similar magnitude, with x (by definition) being reason for the metric nonuniformity commonly found in the smaller of both parameters. comparative sequence analysis. Why do we call this method statistical geometry? There are (4) different quartets that can be formed from a set of n Statistical Geometry sequences (e.g., 27,405 for n = 30 sequences). Hence, the averages of x, y, and !14(a + b + c + d) for a set of n Statistical geometry as such can be exemplified with mere sequences usually are statistically well-defined parameters. distance relationships. Two sequences define one distance; If a tree is constructed by compromises that yield an optimal three can always be fitted into a tripod diagram, because they fit, and x/y average values of -0.5 or higher are found, one yield three explicit equations for the three unknown seg- should be suspicious. Randomization then has proceeded so ments. The tripod, however, may be unrealistic, because the far that a tree cannot be discriminated from a bundle. On the precursor, the tripodal node, may not have existed. The truth other hand, one can prove mathematically (ref. 14; see also then emerges by adding a fourth sequence. Four sequences ref. 15 and references therein) that, if in a set of more than define six distances and hence match a diagram that, in four sequences all x values are zero while 9 is nonzero, the general, has six segments, as shown in Fig. 4a. The three total set has an exact tree-like topology. Unfortunately, types of segments can be obtained from statistical geometry based on distance only is not very sensitive in differentiating topologies, the main shortcoming AB + CD = a + b + c + d + 2x = S (small) being neglect of positional information. As explained above, such information is available from order relationships in AC + BD = a + b + c + d + 2y = M (medium) sequence space. In sequence space formally the procedure is analogous to AD + BC = a + b + c + d + 2x + 2y = L (large), that in distance space: For each quartet of sequences, we analyze the optimal network connecting the four sequences as 2x = L - M, 2y = L - S, and a + b + c + d = S + M in sequence space and try to reconstruct a geometry that is - L. representative for the whole family of sequences. We begin The diagram reduces to an ideal bundle if both x and y are with the case of binary (R, Y) sequences (Fig. 4b). There are zero and to a tree-like dendrogram, with finite branching eight distinguishable classes of positions in three categories: distance y, if only x is zero. The general "net" form in Fig. 0, all four sequences having equal occupation; 1, one se- 4a is due to the presence of reverse and parallel mutations, quence differing from the three others (a, f3, 'y, 8); and 2, two A

3d !

By B

FIG. 4. Representative geometries of quartet combinations of sequences in distance space (a), RY sequence space (b), and AUGC sequence space (c). Biophysics: Eigen et al. Proc. Natl. Acad. Sci. USA 85 (1988) 5915

I Z 10 ol o

0 00 0 000

FIG. 3. The iterative buildup of sequence space, starting with one position. Each additional position requires a doubling of the former diagram and to connect corresponding points in both diagrams (which represent nearest neighbors). The final hypercube of dimension v contains as subspaces ( )2,k hypercubes of dimension k. space concept to evolution requires the introduction of a with x being a measure of deviation from tree-likeness. value topography. Value landscapes have rugged fractal (Likewise, x and y together measure the deviation from ideal structures, causing populations to accumulate on ridges and bundle-likeness.) For partly randomized bundles, x and y are peaks in the mountainous regions (12, 13), which is the deeper nonzero and of similar magnitude, with x (by definition) being reason for the metric nonuniformity commonly found in the smaller of both parameters. comparative sequence analysis. Why do we call this method statistical geometry? There are (4) different quartets that can be formed from a set of n Statistical Geometry sequences (e.g., 27,405 for n = 30 sequences). Hence, the averages of x, y, and !14(a + b + c + d) for a set of n Statistical geometry as such can be exemplified with mere sequences usually are statistically well-defined parameters. distance relationships. Two sequences define one distance; If a tree is constructed by compromises that yield an optimal three can always be fitted into a tripod diagram, because they fit, and x/y average values of -0.5 or higher are found, one yield three explicit equations for the three unknown seg- should be suspicious. Randomization then has proceeded so ments. The tripod, however, may be unrealistic, because the far that a tree cannot be discriminated from a bundle. On the precursor, the tripodal node, may not have existed. The truth other hand, one can prove mathematically (ref. 14; see also then emerges by adding a fourth sequence. Four sequences ref. 15 and references therein) that, if in a set of more than define six distances and hence match a diagram that, in four sequences all x values are zero while 9 is nonzero, the general, has six segments, as shown in Fig. 4a. The three total set has an exact tree-like topology. Unfortunately, types of segments can be obtained from statistical geometry based on distance only is not very sensitive in differentiating topologies, the main shortcoming AB + CD = a + b + c + d + 2x = S (small) being neglect of positional information. As explained above, such information is available from order relationships in AC + BD = a + b + c + d + 2y = M (medium) sequence space. In sequence space formally the procedure is analogous to AD + BC = a + b + c + d + 2x + 2y = L (large), that in distance space: For each quartet of sequences, we analyze the optimal network connecting the four sequences as 2x = L - M, 2y = L - S, and a + b + c + d = S + M in sequence space and try to reconstruct a geometry that is - L. representative for the whole family of sequences. We begin The diagram reduces to an ideal bundle if both x and y are with the case of binary (R, Y) sequences (Fig. 4b). There are zero and to a tree-like dendrogram, with finite branching eight distinguishable classes of positions in three categories: distance y, if only x is zero. The general "net" form in Fig. 0, all four sequences having equal occupation; 1, one se- 4a is due to the presence of reverse and parallel mutations, quence differing from the three others (a, f3, 'y, 8); and 2, two A

3d !

By B

FIG. 4. Representative geometries of quartet combinations of sequences in distance space (a), RY sequence space (b), and AUGC sequence space (c). Biophysics: Eigen et al. Proc. Natl. Acad. Sci. USA 85 (1988) 5915

I Z 10 ol o

0 00 0 000

FIG. 3. The iterative buildup of sequence space, starting with one position. Each additional position requires a doubling of the former diagram and to connect corresponding points in both diagrams (which represent nearest neighbors). The final hypercube of dimension v contains as subspaces ( )2,k hypercubes of dimension k. space concept to evolution requires the introduction of a with x being a measure of deviation from tree-likeness. value topography. Value landscapes have rugged fractal (Likewise, x and y together measure the deviation from ideal structures, causing populations to accumulate on ridges and bundle-likeness.) For partly randomized bundles, x and y are peaks in the mountainous regions (12, 13), which is the deeper nonzero and of similar magnitude, with x (by definition) being reason for the metric nonuniformity commonly found in the smaller of both parameters. comparative sequence analysis. Why do we call this method statistical geometry? There are (4) different quartets that can be formed from a set of n Statistical Geometry sequences (e.g., 27,405 for n = 30 sequences). Hence, the averages of x, y, and !14(a + b + c + d) for a set of n Statistical geometry as such can be exemplified with mere sequences usually are statistically well-defined parameters. distance relationships. Two sequences define one distance; If a tree is constructed by compromises that yield an optimal three can always be fitted into a tripod diagram, because they fit, and x/y average values of -0.5 or higher are found, one yield three explicit equations for the three unknown seg- should be suspicious. Randomization then has proceeded so ments. The tripod, however, may be unrealistic, because the far that a tree cannot be discriminated from a bundle. On the precursor, the tripodal node, may not have existed. The truth other hand, one can prove mathematically (ref. 14; see also then emerges by adding a fourth sequence. Four sequences ref. 15 and references therein) that, if in a set of more than define six distances and hence match a diagram that, in four sequences all x values are zero while 9 is nonzero, the general, has six segments, as shown in Fig. 4a. The three total set has an exact tree-like topology. Unfortunately, types of segments can be obtained from statistical geometry based on distance only is not very sensitive in differentiating topologies, the main shortcoming AB + CD = a + b + c + d + 2x = S (small) being neglect of positional information. As explained above, such information is available from order relationships in AC + BD = a + b + c + d + 2y = M (medium) sequence space. In sequence space formally the procedure is analogous to AD + BC = a + b + c + d + 2x + 2y = L (large), that in distance space: For each quartet of sequences, we analyze the optimal network connecting the four sequences as 2x = L - M, 2y = L - S, and a + b + c + d = S + M in sequence space and try to reconstruct a geometry that is - L. representative for the whole family of sequences. We begin The diagram reduces to an ideal bundle if both x and y are with the case of binary (R, Y) sequences (Fig. 4b). There are zero and to a tree-like dendrogram, with finite branching eight distinguishable classes of positions in three categories: distance y, if only x is zero. The general "net" form in Fig. 0, all four sequences having equal occupation; 1, one se- 4a is due to the presence of reverse and parallel mutations, quence differing from the three others (a, f3, 'y, 8); and 2, two A

3d !

By B

FIG. 4. Representative geometries of quartet combinations of sequences in distance space (a), RY sequence space (b), and AUGC sequence space (c). Biophysics: Eigen et al. Proc. Natl. Acad. Sci. USA 85 (1988) 5915

I Z 10 ol o

0 00 0 000

FIG. 3. The iterative buildup of sequence space, starting with one position. Each additional position requires a doubling of the former diagram and to connect corresponding points in both diagrams (which represent nearest neighbors). The final hypercube of dimension v contains as subspaces ( )2,k hypercubes of dimension k. space concept to evolution requires the introduction of a with x being a measure of deviation from tree-likeness. value topography. Value landscapes have rugged fractal (Likewise, x and y together measure the deviation from ideal structures, causing populations to accumulate on ridges and bundle-likeness.) For partly randomized bundles, x and y are peaks in the mountainous regions (12, 13), which is the deeper nonzero and of similar magnitude, with x (by definition) being reason for the metric nonuniformity commonly found in the smaller of both parameters. comparative sequence analysis. Why do we call this method statistical geometry? There are (4) different quartets that can be formed from a set of n Statistical Geometry sequences (e.g., 27,405 for n = 30 sequences). Hence, the averages of x, y, and !14(a + b + c + d) for a set of n Statistical geometry as such can be exemplified with mere sequences usually are statistically well-defined parameters. distance relationships. Two sequences define one distance; If a tree is constructed by compromises that yield an optimal three can always be fitted into a tripod diagram, because they fit, and x/y average values of -0.5 or higher are found, one yield three explicit equations for the three unknown seg- should be suspicious. Randomization then has proceeded so ments. The tripod, however, may be unrealistic, because the far that a tree cannot be discriminated from a bundle. On the precursor, the tripodal node, may not have existed. The truth other hand, one can prove mathematically (ref. 14; see also then emerges by adding a fourth sequence. Four sequences ref. 15 and references therein) that, if in a set of more than define six distances and hence match a diagram that, in four sequences all x values are zero while 9 is nonzero, the general, has six segments, as shown in Fig. 4a. The three total set has an exact tree-like topology. Unfortunately, types of segments can be obtained from statistical geometry based on distance only is not very sensitive in differentiating topologies, the main shortcoming AB + CD = a + b + c + d + 2x = S (small) being neglect of positional information. As explained above, such information is available from order relationships in AC + BD = a + b + c + d + 2y = M (medium) sequence space. In sequence space formally the procedure is analogous to AD + BC = a + b + c + d + 2x + 2y = L (large), that in distance space: For each quartet of sequences, we analyze the optimal network connecting the four sequences as 2x = L - M, 2y = L - S, and a + b + c + d = S + M in sequence space and try to reconstruct a geometry that is - L. representative for the whole family of sequences. We begin The diagram reduces to an ideal bundle if both x and y are with the case of binary (R, Y) sequences (Fig. 4b). There are zero and to a tree-like dendrogram, with finite branching eight distinguishable classes of positions in three categories: distance y, if only x is zero. The general "net" form in Fig. 0, all four sequences having equal occupation; 1, one se- 4a is due to the presence of reverse and parallel mutations, quence differing from the three others (a, f3, 'y, 8); and 2, two A

3d !

By B

FIG. 4. Representative geometries of quartet combinations of sequences in distance space (a), RY sequence space (b), and AUGC sequence space (c). Biophysics: Eigen et al. Proc. Natl. Acad. Sci. USA 85 (1988) 5915

I Z 10 ol o

0 00 0 000

FIG. 3. The iterative buildup of sequence space, starting with one position. Each additional position requires a doubling of the former diagram and to connect corresponding points in both diagrams (which represent nearest neighbors). The final hypercube of dimension v contains as subspaces ( )2,k hypercubes of dimension k. space concept to evolution requires the introduction of a with x being a measure of deviation from tree-likeness. value topography. Value landscapes have rugged fractal (Likewise, x and y together measure the deviation from ideal structures, causing populations to accumulate on ridges and bundle-likeness.) For partly randomized bundles, x and y are peaks in the mountainous regions (12, 13), which is the deeper nonzero and of similar magnitude, with x (by definition) being reason for the metric nonuniformity commonly found in the smaller of both parameters. comparative sequence analysis. Why do we call this method statistical geometry? There are (4) different quartets that can be formed from a set of n Statistical Geometry sequences (e.g., 27,405 for n = 30 sequences). Hence, the averages of x, y, and !14(a + b + c + d) for a set of n Statistical geometry as such can be exemplified with mere sequences usually are statistically well-defined parameters. distance relationships. Two sequences define one distance; If a tree is constructed by compromises that yield an optimal three can always be fitted into a tripod diagram, because they fit, and x/y average values of -0.5 or higher are found, one yield three explicit equations for the three unknown seg- should be suspicious. Randomization then has proceeded so ments. The tripod, however, may be unrealistic, because the far that a tree cannot be discriminated from a bundle. On the precursor, the tripodal node, may not have existed. The truth other hand, one can prove mathematically (ref. 14; see also then emerges by adding a fourth sequence. Four sequences ref. 15 and references therein) that, if in a set of more than define six distances and hence match a diagram that, in four sequences all x values are zero while 9 is nonzero, the general, has six segments, as shown in Fig. 4a. The three total set has an exact tree-like topology. Unfortunately, types of segments can be obtained from statistical geometry based on distance only is not very sensitive in differentiating topologies, the main shortcoming AB + CD = a + b + c + d + 2x = S (small) being neglect of positional information. As explained above, such information is available from order relationships in AC + BD = a + b + c + d + 2y = M (medium) sequence space. In sequence space formally the procedure is analogous to AD + BC = a + b + c + d + 2x + 2y = L (large), that in distance space: For each quartet of sequences, we analyze the optimal network connecting the four sequences as 2x = L - M, 2y = L - S, and a + b + c + d = S + M in sequence space and try to reconstruct a geometry that is - L. representative for the whole family of sequences. We begin The diagram reduces to an ideal bundle if both x and y are with the case of binary (R, Y) sequences (Fig. 4b). There are zero and to a tree-like dendrogram, with finite branching eight distinguishable classes of positions in three categories: distance y, if only x is zero. The general "net" form in Fig. 0, all four sequences having equal occupation; 1, one se- 4a is due to the presence of reverse and parallel mutations, quence differing from the three others (a, f3, 'y, 8); and 2, two A

3d !

By B

FIG. 4. Representative geometries of quartet combinations of sequences in distance space (a), RY sequence space (b), and AUGC sequence space (c). Biophysics: Eigen et al. Proc. Natl. Acad. Sci. USA 85 (1988) 5915

I Z 10 ol o

0 00 0 000

FIG. 3. The iterative buildup of sequence space, starting with one position. Each additional position requires a doubling of the former diagram and to connect corresponding points in both diagrams (which represent nearest neighbors). The final hypercube of dimension v contains as subspaces ( )2,k hypercubes of dimension k. space concept to evolution requires the introduction of a with x being a measure of deviation from tree-likeness. value topography. Value landscapes have rugged fractal (Likewise, x and y together measure the deviation from ideal structures, causing populations to accumulate on ridges and bundle-likeness.) For partly randomized bundles, x and y are peaks in the mountainous regions (12, 13), which is the deeper nonzero and of similar magnitude, with x (by definition) being reason for the metric nonuniformity commonly found in the smaller of both parameters. comparative sequence analysis. Why do we call this method statistical geometry? There are (4) different quartets that can be formed from a set of n Statistical Geometry sequences (e.g., 27,405 for n = 30 sequences). Hence, the averages of x, y, and !14(a + b + c + d) for a set of n Statistical geometry as such can be exemplified with mere sequences usually are statistically well-defined parameters. distance relationships. Two sequences define one distance; If a tree is constructed by compromises that yield an optimal three can always be fitted into a tripod diagram, because they fit, and x/y average values of -0.5 or higher are found, one yield three explicit equations for the three unknown seg- should be suspicious. Randomization then has proceeded so ments. The tripod, however, may be unrealistic, because the far that a tree cannot be discriminated from a bundle. On the precursor, the tripodal node, may not have existed. The truth other hand, one can prove mathematically (ref. 14; see also then emerges by adding a fourth sequence. Four sequences ref. 15 and references therein) that, if in a set of more than define six distances and hence match a diagram that, in four sequences all x values are zero while 9 is nonzero, the general, has six segments, as shown in Fig. 4a. The three total set has an exact tree-like topology. Unfortunately, types of segments can be obtained from statistical geometry based on distance only is not very sensitive in differentiating topologies, the main shortcoming AB + CD = a + b + c + d + 2x = S (small) being neglect of positional information. As explained above, such information is available from order relationships in AC + BD = a + b + c + d + 2y = M (medium) sequence space. In sequence space formally the procedure is analogous to AD + BC = a + b + c + d + 2x + 2y = L (large), that in distance space: For each quartet of sequences, we analyze the optimal network connecting the four sequences as 2x = L - M, 2y = L - S, and a + b + c + d = S + M in sequence space and try to reconstruct a geometry that is - L. representative for the whole family of sequences. We begin The diagram reduces to an ideal bundle if both x and y are with the case of binary (R, Y) sequences (Fig. 4b). There are zero and to a tree-like dendrogram, with finite branching eight distinguishable classes of positions in three categories: distance y, if only x is zero. The general "net" form in Fig. 0, all four sequences having equal occupation; 1, one se- 4a is due to the presence of reverse and parallel mutations, quence differing from the three others (a, f3, 'y, 8); and 2, two A

3d !

By B

FIG. 4. Representative geometries of quartet combinations of sequences in distance space (a), RY sequence space (b), and AUGC sequence space (c). Replicator-Mutator Equation

2n g˙ = g r (g)m g f¯ i j j ij − i j ￿

H(i,j) L H(i,j) m = µ (1 µ) − ij − Curious Implications..... Delta-function landscape

r

i=0 i Error Threshold

gi

µ Error Threshold Delta function: s 1 µ < = L L

µ µ µ L = 1 L = 2 L = 3 L = 4 Landscape

Delta function: s 1 µ< L ≈ L

Multiplicative function: µ < s

The “Life Cone”

The evolutionary future time in generations in time

space in mutations per genome

The evolutionary past Max n subject to preserving g1 in the locally stable equilibrium distribution

2n g˙i = gjrj(g)mij gif¯ Biophysics: Eigen et al. − Proc. Natl. Acad. Sci. USA 85 (1988) 5915 j ￿ I Z 10 ol o H0 (i,00j) 0 000L H(i,j) m = µ (1 µ) − ij −

FIG. 3. The iterative buildup of sequence space, starting with one position. Each additional position requires a doubling of the former diagram and to connect corresponding points in both diagrams (which represent nearest neighbors). The final hypercube of dimension v contains as subspaces ( )2,k hypercubes of dimension k. space concept to evolution requires the introduction of a with x being a measure of deviation from tree-likeness. value topography. Value landscapes have rugged fractal (Likewise, x and y together measure the deviation from ideal structures, causing populations to accumulate on ridges and bundle-likeness.) For partly randomized bundles, x and y are peaks in the mountainous regions (12, 13), which is the deeper nonzero and of similar magnitude, with x (by definition) being reason for the metric nonuniformity commonly found in the smaller of both parameters. comparative sequence analysis. Why do we call this method statistical geometry? There are (4) different quartets that can be formed from a set of n Statistical Geometry sequences (e.g., 27,405 for n = 30 sequences). Hence, the averages of x, y, and !14(a + b + c + d) for a set of n Statistical geometry as such can be exemplified with mere sequences usually are statistically well-defined parameters. distance relationships. Two sequences define one distance; If a tree is constructed by compromises that yield an optimal three can always be fitted into a tripod diagram, because they fit, and x/y average values of -0.5 or higher are found, one yield three explicit equations for the three unknown seg- should be suspicious. Randomization then has proceeded so ments. The tripod, however, may be unrealistic, because the far that a tree cannot be discriminated from a bundle. On the precursor, the tripodal node, may not have existed. The truth other hand, one can prove mathematically (ref. 14; see also then emerges by adding a fourth sequence. Four sequences ref. 15 and references therein) that, if in a set of more than define six distances and hence match a diagram that, in four sequences all x values are zero while 9 is nonzero, the general, has six segments, as shown in Fig. 4a. The three total set has an exact tree-like topology. Unfortunately, types of segments can be obtained from statistical geometry based on distance only is not very sensitive in differentiating topologies, the main shortcoming AB + CD = a + b + c + d + 2x = S (small) being neglect of positional information. As explained above, such information is available from order relationships in AC + BD = a + b + c + d + 2y = M (medium) sequence space. In sequence space formally the procedure is analogous to AD + BC = a + b + c + d + 2x + 2y = L (large), that in distance space: For each quartet of sequences, we analyze the optimal network connecting the four sequences as 2x = L - M, 2y = L - S, and a + b + c + d = S + M in sequence space and try to reconstruct a geometry that is - L. representative for the whole family of sequences. We begin The diagram reduces to an ideal bundle if both x and y are with the case of binary (R, Y) sequences (Fig. 4b). There are zero and to a tree-like dendrogram, with finite branching eight distinguishable classes of positions in three categories: distance y, if only x is zero. The general "net" form in Fig. 0, all four sequences having equal occupation; 1, one se- 4a is due to the presence of reverse and parallel mutations, quence differing from the three others (a, f3, 'y, 8); and 2, two A

3d !

By B

FIG. 4. Representative geometries of quartet combinations of sequences in distance space (a), RY sequence space (b), and AUGC sequence space (c). The “Life Cone”

The evolutionary future time in generations in time

space in mutations per genome

The evolutionary past The “Life Cone”

The evolutionary 1 mutation per genome per generation future Microbes obey the Eigen Law 1 µ = n time in generations in time

space in mutations per genome

The evolutionary past Testing the Prediction: 1 n = metabolic constraint µ

Evol. Maximization 1 1 µ = km− 4 n = km 4 Prokaryotic life evolves on the membrane of life cone time in generations in time

space in mutations per genome in microbes is extracting adaptive information at the maximum possible velocity Selection as Statistical Inference ∆g (t) i = g (t 1)(r (g) f¯) ∆t i − i − P (Y X) P (X Y ) = P (X) | | P (Y ) L P (X Y ) = P (X) X | L¯

L¯ = P (Y ) = P (Y X)P (X) | x ω ￿∈ L P (t) = P (t 1) X X X − L¯ L 1 ∆P (t) = P (t 1)( X 1) = P (t 1) (L L¯) X X − L¯ − X − L¯ X −

∆P (t) = P (t 1)(f f¯), where f = L /L¯ X X − t − t X Selection as Learning: (Imitation & Operant) Biological Complexity is the mechanical/inferential support for making more accurate hypotheses about an expanding number of environmental regularities

But Evolutionary Light Speed Constrains the Maximum Rate of Inference Introducing Phenotypes and hierarchical selection: deriving the Price Equation

E˙ [p]=Cov(r, p)+E[˙p]

Insights from Page, Sigmund & others 2n g˙ = g r (g)m g r¯ i j j ji − i j ￿ gi pi gi(t = 0) = 1 r¯ = rigi → i i ￿ ￿ p¯ = E[p]= pigi i ￿ E˙ [p]= pig˙i + gip˙i i i 2n ￿ ￿ E˙ [p]= p [ g r (g)m g r¯)] + g p˙ i j j ji − i i i i j i ￿ ￿ ￿ E˙ [p]= p g r m r¯p¯ + E[˙p] i j j ji − ij ￿ E˙ [p]= p g r r¯p¯ + g r m (p p )+E[˙p] j j j − j j ji i − j j j i ￿ ￿ ￿ Cov(x, y)=E[(x µ)(y ν)] = E[xy] µν − − − Cov(r, p)= p g r r¯p¯ j j j − j ￿ E[r∆ p]= g r m (p p ) m j j ji i − j j i ￿ ￿

E˙ [p]=Cov(r, p)+E[˙p]+E[r∆mp]

The Price Equation Uses of Price Equation

• Fisher’s fundamental theorem • Kin selection (Hamilton’s rule) • • Evolution of cooperation Phenotypic Evolution Deriving Adaptive Dynamics

dx 1 δf(x ,x) = µσ2N¯(x) ￿ dt 2 δx￿ ￿x￿=x ￿ ￿ ￿ Evolutionary Game Adaptive Dynamics for Continuous Traits

dx 1 δf(x ,x) = µσ2N¯(x) ￿ dt 2 δx￿ ￿x￿=x ￿ ￿ ￿ E˙ [p]=Cov(r, p)+E[˙p]+E[r∆mp]

E[r∆mp]=0 E[˙p]=0 E˙ [p]=Cov(r, p)=Cov(r(p; g),p)

∞ f (n)(a) Taylor f(x)= (x a)n series n! − n=0 ￿ ∂r(a, g) r(p; g)=r(¯p; g)+ (p p¯) ∂a |a=¯p − ∂r(a, g) E˙ [p]=Cov[p, r(¯p; g)+ (p p¯)] ∂a |a=¯p − ∂r(a, g) E˙ [p]=V ar(p) ∂a |a=¯p Conclusions 1.

• Simple stoichiometry allows us to derive many of the fundamental, equations of evolutionary dynamics

• How genomes change in frequency as a result of frequency-dependence, density dependence and mutation.

• These equations provide insights into how total genomic information is constrained by mutation rates - Eigen law

• Allow us to study game dynamics in an evolutionary and ecological framework

• Through AD & Price Eq. provide the basis for many current studies on cooperation, kin & group selection, microbial dynamics and cultural/language evolution. Select Bibliography

Advanced • Sigmund K. Hofbauer, J, Evolutionary Games and Population Dynamics. (1998) Intermediate • Kimura, M. The Neutral Theory of Molecular Evolution. (1983) • Eigen, M. . "Self-organization of matter and evolution of biological Macromolecules". Naturwissenschaften 58 (10): 465. (1971) • Price, G.R. . "Selection and ". Nature 227: 520–521. (1970) Advanced • Jaynes, E.T. Probability Theory: The Logic of Science. (1998) Intermediate • Frank, S.A. . Foundations of Social Evolution. PUP. (1998) Nowak, Martin , Evolutionary Dynamics: Exploring the Equations of Life, Intro • Belknap Press. (2006)

Intermediate • Rice. S. Evolutionary Theory: Mathematical and Conceptual Foundations. (2004)