Quantifying the tape of life: Ancestry-based metrics provide insights and intuition about evolutionary dynamics Emily Dolson1,2,3, Alexander Lalejini1,2,3, Steven Jorgensen1,2 and Charles Ofria1,2,3

1BEACON Center for the Study of in Action, Michigan State University, East Lansing, MI, 48824 2Department of and Engineering, Michigan State University, East Lansing, MI, 48824 3Ecology, , and Behavior Program, Michigan State University, East Lansing, MI, 48824 [email protected]

Abstract digital systems, however, allow for perfect lineage tracking at a level of granularity that is impossible in modern wet lab Fine-scale evolutionary dynamics can be challenging to tease experiments. These data allow us to replay the tape of life out when focused on broad brush strokes of whole popula- tions over long timespans. We propose a suite of diagnos- in precise detail and to tease apart the evolutionary recipe tic metrics that operate on lineages and phylogenies in digi- for any phenomenon we are interested in (McPhee et al., tal evolution experiments with the aim of improving our ca- 2016b). In one notable example, Lenski et al., used the lin- pacity to quantitatively explore the nuances of evolutionary eage of an evolved digital in to tease apart, histories in digital evolution experiments. We present three step-by-step how a complex feature (the capacity to perform types of lineage measurements: lineage length, ac- cumulation, and phenotypic volatility. Additionally, we sug- the equals logical operation) emerged (Lenski et al., 2003). gest the adoption of four phylogeny measurements from bi- Yet, tracking the full details of a single lineage, much less ology: depth of the most-recent common ancestor, phylo- a population of lineages, can be computationally expensive genetic richness, phylogenetic divergence, and phylogenetic regularity. We demonstrate the use of each metric on a set of and will inevitably generate an unwieldy amount of data that two-dimensional, real-valued optimization problems under a can be challenging to visualize or interpret (McPhee et al., range of mutation rates and selection strengths, confirming 2016a). Summary can help alleviate these issues our intuitions about what they can tell us about evolutionary by enabling the user to focus on aggregate trends across a dynamics. population rather than needing to examine each individual’s lineage. The question is how to effectively summarize a path Introduction through fitness space. One useful abstraction is to treat the Evolution is a collective effect of many smaller events such path as a sequence of states. Here, we use phenotypes and as replication, variation, and competition that occur on a genotypes as the states in the sequence, but we could just fine-grained temporal scale. While evolution’s emergent na- as easily use some other descriptor of the lineage’s position ture can be fascinating, it also presents challenges to study- in the fitness landscape at a given point in time. With this ing the short-term mechanisms that, in aggregate, govern abstraction in hand, a few metrics are easily formalized: the long-term results. In computational evolutionary systems, number of unique states, the number of transitions between we can theoretically collect data to help untangle these states, and the amount of time spent in each state. Addition- mechanisms. In practice, however, the sheer number of con- ally, we may care about how the transitions between states stituent events produce an overwhelming quantity of data. happened. What led to them? Were those muta- In response, we have developed a standardized suite of di- tions beneficial, deleterious, or neutral at the time? These agnostic metrics to summarize short-term evolutionary dy- mutations are particularly notable because they did not sim- namics within a population by measuring lineages and phy- ply appear briefly, but stood the test of time, leaving descen- logenies. Here, we describe these metrics and provide ex- dants in the final population. Here, we explore a subset of perimental results to develop an intuition for what they can these metrics that we expect will be broadly useful. tell us about evolution. Whereas a lineage recounts the evolutionary history of a A lineage describes a continuous line of descent, linking single individual, a phylogeny details the evolutionary his- parents and offspring in an unbroken chain from an original tory of an entire population. Measurements that summa- ancestor. A complete lineage can provide a post-hoc, step- rize phylogenies can provide useful insight into population- by-step guide to the evolution of an extant organism where level evolutionary dynamics, such as diversification and co- each step involves replication and inherited variation. In- existence among different clades. A variety of useful phy- deed, lineage analyses are a powerful tool for disentangling logeny measurements have already been developed by bi- evolutionary dynamics in both natural and digital systems; ologists (Tucker et al., 2017). These measurements tend to

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.26883v1 | CC BY 4.0 Open Access | rec: 23 Apr 2018, publ: 23 Apr 2018 treat the phylogeny as a graph and make calculations about the direct application of many techniques de- its topology. Tucker et al. group them into three broad cat- signed to operate over sequences such as sequential pattern egories: assessments of the quantity of evolutionary history mining, trend analysis, et cetera (Han et al., 2011). represented by a population, assessments of the amount of Only asexual lineages where genetic material is exclu- divergence within that evolutionary history, and assessments sively vertically transmitted can be directly abstracted as of the topological regularity of the . Such a linear sequence of states. Sexual reproduction (and any measurements can help quantify the behavior of the popula- form of horizontal gene transfer) complicates matters sig- tion as a whole, providing insight into interactions between nificantly as such lineages are more appropriately repre- its members. Thus, they are useful indicators of the presence sented by trees rooted at the extant organism, branching for of various types of eco-evolutionary dynamics. each contributor of genetic material. One possibility is to Here, we present three types of lineage measurements compress sexual lineages into linear sequences of states by and suggest adopting four phylogeny measurements from modeling sexual reproduction events as asexual reproduc- biology; these are lineage length, mutation accumulation, tion events, designating one parent to be a part of the lineage phenotypic volatility, depth of the most-recent common an- and considering the genetic contributions of other parents as cestor, phylogenetic richness, phylogenetic divergence, and sources of (mutations). The primary down- phylogenetic regularity. For each metric, we discuss its ap- side to this approach is its lossy-ness (i.e. the fact that it dis- plication and our intuition for what it can tell us about evolu- cards potentially important parentage information). Alter- tion. We evaluate our intuition on a set of two-dimensional, natively, we can extend our metrics to operate over the more real-valued optimization problems under a range of mutation complex state sequences that constitute the lineages of sexu- rates and selection strengths. For this work, we restrict our ally reproducing . One such approach would be to attention to asexually reproducing populations; however, we consider all possible ancestor paths for an extant individual, suggest how these metrics can extend to sexual populations. calculating a given metric for each of them and then averag- In addition to demonstrating a range of metrics that are ing the resulting values together. Another approach would useful to digital evolution research, we intend for this work be to divide an organism into its constituent parts that are in- to begin a conversation within the artificial life commu- herited atomically (such as genes or instructions, depending nity about how we quantify, interpret, and compare ob- on the representation); an organism would then be viewed as served evolutionary histories. There have been extensive ef- a collection of lineages rather than a single one. Assessing forts to improve our ability to represent and visualize both the efficacy of these and potentially other approaches would lineages and phylogenies (Standish and Galloway, 2002; be a useful line of research to pursue in the future. Burlacu et al., 2013; McPhee et al., 2016b,a; Lalejini and Ofria, 2016), which are indispensable for building intuitions and qualitatively understanding the dynamics embedded in Lineage Length Lineage length describes the number of a population’s evolutionary history. However, we are un- states traversed by a lineage. If a state is defined as a single aware of efforts to formalize a suite of quantitative lineage individual, lineage length is a count of the number of genera- and phylogeny-based metrics for computational evolution. tions. Generation count is most useful in systems where gen- erational turnover is not fixed, but instead determined by the Metrics life history strategies of organisms. For lineages that span Code for all of our metrics is open source and avail- equal lengths of time, more generations imply faster repli- able in the Empirical (https://github.com/ cation rates (e.g. r-selected lineage) while fewer generations devosoft/Empirical). Empirical is a C++ library imply slower replication rates (e.g. K-selected lineage). built to facilitate writing efficient and easily sharable scien- Lineage length becomes a more flexible and informa- tific software. Empirical is a header-only library, so adding tive metric if we consider more abstract definitions of states these metrics to an existing project has minimal overhead. along a lineage. We might measure lineage length where a state represents a sequence of individuals that share a partic- Lineage Metrics ular phenotypic or genotypic characteristic. In these cases, Each of the three lineage metrics that we discuss — lineage lineage length only increases when the characteristic of in- length, mutation accumulation, and phenotypic volatility — terest changes from parent to offspring. For example, in an reduces a lineage to a linear sequence of states where each environment where organisms must perform tasks to be suc- state represents an individual or sequence of individuals that cessful, we might define state as the set of tasks performed share a common genotypic or phenotypic characteristic of by an individual. In this scenario, lineage length would only interest; Figure 1 is given as a toy example to help guide increase when the set of tasks performed by an ancestor our discussion of these metrics. While we limit our focus changes; sequential ancestors that perform the identical sets to three lineage metrics, this abstraction places lineages in a of tasks would be compressed into a single state in the se- form suitable for a wide range of measurements, including quence, even if other traits differ.

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.26883v1 | CC BY 4.0 Open Access | rec: 23 Apr 2018, publ: 23 Apr 2018 Figure 1: Four methods of representing a lineage. This example lineage has accumulated three mutations (one reverse mutation and two substitutions) and gone through three distinct phenotypes. In (A), each state along the lineage represents a single individual; lineage length is the number of generations spanned by the lineage (eight). In (B), states represent the sequence of genotypes along the lineage, reducing lineage length to four. In (C) states represent the sequence of phenotypes along the lineage; lineage length is the number of times a different phenotype is expressed (three). In (D), states are a particular phenotypic characteristic; here, lineage length is two.

Mutation Accumulation Mutation accumulation defines eage (although the same concept can be applied to specific a set of measurements that track mutational changes across phenotypic traits or other types of state). In systems with a lineage. These changes can be measured as the magnitude discrete/categorical phenotypes, this can be measured by of the change (for real-valued genomes) or as the total count summing the number of times the phenotype changes. A re- of changes (for discrete-valued genomes). Mutation effects lated but subtly different measurement in such systems is the can also be tracked to gain insights about their distribution number of unique phenotypes on a lineage. In most cases, along a given lineage. Measures of mutation accumulation these values will be similar; a discrepancy would suggest along the lineages of successful individuals can help tease that the lineage was cycling through a set of phenotypes. apart the relative importance of different types of mutational Such behavior could, for example, be indicative of some events when compared to what is expected by chance. form of evolutionary bet-hedging (Beaumont et al., 2009). In conjunction with collected fitness information, the In systems with continuous-valued phenotypes, a subtly class of a mutation (e.g. beneficial, deleterious, or neutral) different approach is needed to measure phenotypic volatil- can also be tracked. Different evolutionary conditions are ity, because there are no discrete state transitions. Instead, expected to cause different distributions of mutations along we can measure the overall variance in phenotype along a a lineage (Barrick and Lenski, 2013); deviations revealed lineage. In some cases, it may be desirable to smooth out by measures of mutation accumulation can act as a barome- the noise inherent in a real-valued phenotype. We can do ter for unexpected evolutionary dynamics. The number and so by instead taking the variance of the moving average of magnitude of deleterious mutations along a lineage can tell fitness, to more closely approximate the idea of measuring us both about the ruggedness of the fitness landscape, and phase transitions. about a lineage’s ability to cross fitness valleys (Covert et al., 2013). Similarly, an elevated measure of neutral mutations Summary statistics Each of these metrics can be calcu- relative to beneficial or deleterious mutations can tell us that lated for each member of the population at each time step. the fitness landscape has a large neutral area that the lineage Doing so, however, would produce an amount of data so is spending most of its time drifting around. large that it would be difficult to make sense of. Instead, we need to come up with ways to generate useful summaries. Phenotypic Volatility Phenotypic volatility addresses the There are two main approaches to doing so: 1) choose a rate at which phenotype changes as you move down a lin- small number of representative lineages from a given time

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.26883v1 | CC BY 4.0 Open Access | rec: 23 Apr 2018, publ: 23 Apr 2018 point, or 2) collect summary statistics about the distribution logical interactions within the population. The metrics dis- of metric values across the population. cussed here can provide insight into the of those in- A single lineage can be chosen by selecting the lineage teractions and their long-term evolutionary effects. As a re- of a representative organism (either the most fit or the more sult, they are often referred to as phylogenetic diversity met- numerous; here we use the most fit). In populations where rics (Tucker et al., 2017). diverse strategies coexist, this approach can be uninforma- An important distinction between phylogenies in natural tive as any one lineage is unlikely to be representative of all versus computational systems is that natural phylogenies are successful lineages. One alternative is to filter out lineages generally inferred from extant taxa, whereas computational that do not have offspring some predetermined number of phylogenies are directly recorded. Inferred phylogenies do generations later as such lineages were likely not represen- not contain internal nodes except at branch points. They tative of an important subset of the population. Still, any ap- also do not contain history prior to the most recent common proach based on measuring only a subset of lineages can be ancestor (MRCA) of all extant organisms. For consistency, challenging to interpret when the current dominant lineage we exclude pre-MRCA taxa from our analyses. However, (or lineages) is replaced with a different one; such changes we will not remove non-branching internal nodes, as these can introduce a discontinuity if the value is being measured only serve to make our phylogenies more informative. over time. If graceful responses to changes in which lineage Here we provide a high-level summary of phylogeny met- is dominant are required, it can be advantageous to instead rics that we expect will be particularly useful. For more met- measure summary statistics (e.g. mean, variance, and range) rics and more detail on all of these metrics, see (Winter et al., across the entire population. 2013; Tucker et al., 2017). In scenarios with frequent selective sweeps, the dominant lineage will likely be similar to the average lineage, as most Depth of Most-Recent Common Ancestor The depth of of the population will be closely related. When the pop- the MRCA (i.e., the number of steps it is from the original ulation contains more phylogenetic diversity, however, the ancestor) is an informative metric and is easy to calculate. dominant lineage may differ from the mean. Of course, the A recent MRCA implies frequent selective sweeps and less nature of such differences is likely informative about the long-term stable coexistence between clades. Measuring the evolutionary dynamics occurring in the population. frequency with which the MRCA changes (i.e. the number of coalescence events) can also be informative, as some con- Phylogeny metrics ditions can inflate the length of the lineage relative to other conditions without actually increasing the frequency of se- These metrics operate on entire phylogenies rather than sin- lective sweeps. This scenario is particularly likely when the gle lineages within a population, eliminating the need to population size is changing over time. A downside to the identify a representative organism or lineage. Because they depth of MRCA as a metric is that any population that does use data from the entire population, phylogeny metrics can have a stable ecology will likely never change its MRCA af- be more computationally expensive to calculate than single ter the very beginning of evolution (which at least allows us lineage metrics. On the other hand, because most lineages to detect stably coexistence in the population). tend to share substantial history, phylogeny metrics can usu- ally be calculated more rapidly than full-population lineage Phylogenetic Richness Measurements of phylogenetic metrics. Note that phylogenies can be constructed with re- richness quantify the total amount of evolutionary history gard to any taxonomic level of organization, be it individual, contained in a set of taxa. The most traditional metric of genotype, phenotype, etc. Thus, when we refer generally to phylogenetic richness is “Phylogenetic Diversity”, which is items in a phylogeny, we will use the term taxa. calculated as the number of nodes in the minimum spanning A standard technique for saving memory and time when tree from the MRCA to all extant taxa (Faith, 1992). An- working with phylogenies in computational systems is to other approach is to calculate the pairwise distances between “prune” them, removing dead (extinct) branches. Since all all taxa and sum them (Tucker et al., 2017). A third approach of the phylogeny metrics we discuss here are borrowed from is to sum evolutionary distinctiveness, a measurement of a natural systems (where we do not have information about taxon’s evolutionary uniqueness (Isaac et al., 2007), across taxa without offspring), they all are designed to work on all extant taxa (Tucker et al., 2017). pruned phylogenies. Thus, for the remainder of this paper, we will assume we are working with pruned phylogenies. Phylogenetic Divergence Measurements of phylogenetic In populations without ecological forces promoting co- divergence quantify how distinct the taxa in the population existence, phylogenies should coalesce periodically, result- are from each other and are often averaged across individ- ing in pruned lineages that mostly consist of a single path. ual taxa. For example, one option is to average the pair- When there is strong selection, this coalescence should hap- wise distances across all taxa in the population (Webb and pen even more rapidly. Thus, phylogenies with topologies Losos, 2000). Similarly, phylogenetic divergence can be cal- that deviate from that expectation are an indication of eco- culated by averaging the evolutionary distinctiveness across

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.26883v1 | CC BY 4.0 Open Access | rec: 23 Apr 2018, publ: 23 Apr 2018 For each test problem, we evolved populations of 1000 organisms under a range of mutation rates and selection strengths for 5000 generations. Each organism’s genome consisted of two floating point numbers that defined its po- sition in the fitness landscape. We initialized populations by randomly generating a number of organisms equal to the population size. To determine which organisms reproduced each generation, we used tournament selection. We evolved populations under five different tournament sizes: one, two, four, eight, and sixteen. Tournament size represents strength of selection where higher tournament sizes correspond to a strong selection and lower tournament sizes correspond to weak selection (Blickle and Thiele, 1995). A tournament size of one is equivalent to no selection pressure (i.e. as ev- ery organism in the population has an equal chance of being selected to reproduce). Organisms selected to reproduce did so asexually. Values in an offspring’s genome were mutated by adding noise given by a normal distribution with a mean of 0; the ‘mutation rate’ of a treatment defined the standard deviation used to define this normal distribution and was given as a proportion of the test problem’s domain. We pre- Figure 2: The fitness landscapes used in this experiment: vented mutations from causing a value to exceed the valid A) Himmelblau, B) Six-humped Camel Back, C) Shubert, domain of the given problem. For each problem and tour- and D) Composition Function 2. Interactive versions avail- able at https://emilydolson.github.io/fitness_ nament size, we evolved populations at eight mutation rates: landscape_visualizations. 1e-08, 1e-07, 1e-06, 1e-05, 1e-04, 1e-03, 1e-02, and 1e-01. We also ran a second set of experiments to explore each taxon in the population. the impact of ecological dynamics on these metrics. For these experiments, we generated a stable ecology use the Phylogenetic Regularity Measurements of phylogenetic Eco-EA as a selection technique (Goings et al., regularity quantify how balanced the branches are in a phy- 2012). Eco-EA is a technique for creating niches that pro- logeny and are often the variances of values calculated for mote stable diversification in the context of an evolution- individual taxa. Just as the mean of the pairwise distances ary algorithm. In our test problems, we created niches between all taxa in the population is a measurement of phy- associated with spatial locations across the fitness land- logenetic divergence, taking their variance is a measurement scape. For all experimental conditions, we ran ten repli- of phylogenetic regularity. The same is true of the variance cates, each with a unique random number seed. Our exper- of evolutionary distinctiveness across the population. iment is implemented using the Empirical library; our im- plementation can be found at https://github.com/ Test Problems stevenjson/ALife2018-Lineage. To understand the metrics defined above, the test problems Data Analysis used need to be well understood and studied. The bench- mark functions from the GECCO Competition on Niching 3D visualizations Methods meet both of these requirements and allow us to In order to make these metrics useful, we must have an accu- visualize the actual fitness landscape, due to the low dimen- rate understanding of how various measurements correspond sionality of the problems (Li et al., 2013). For each prob- to the actual behavior of lineages. The most direct way to lem, the X and Y coordinates offered by a given organism confirm our expectations is to visualize the path that each are translated by the function into a fitness value. We chose lineage takes through the fitness landscape, mapping the x, y, a diverse subset of these functions (Himmelblau, Shubert, and z (fitness) coordinates of each ancestor of each member Composition Function 2, and Six-Humped Camel Back) as of the population (Virgo et al., 2017). Creating such a visu- our test problems in order to gain a broad understanding of alization entails condensing a large quantity of information our metrics. We used the implementations of these problems into a limited space. When projected onto two dimensions, at https://github.com/mikeagn/CEC2013 (C++ lineages can obscure parts of the fitness landscape (and each for fitness calculations during evolution, Python for post-hoc other). To mitigate this problem, we built a thee-dimensional analysis). Figure 2 shows the fitness landscapes defined by (see Figure 3) using the A-Frame Framework, each of our four chosen test problems. available at https://www.aframe.io.

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.26883v1 | CC BY 4.0 Open Access | rec: 23 Apr 2018, publ: 23 Apr 2018 Figure 3: A close-up on two adjacent peaks in the Shubert func- tion fitness landscape. Lineages are depicted as paths fading from white to black over evolutionary time. The lienages shown here evolved under a mutation rate of 0.01. A) Was evolved using a tournament size of 2, whereas B) was evolved using a tournament size of 16. These figures neatly illustrate how increased tournament size keeps the lineage near the tops of the peaks.

A-Frame supports building 3D scenes and rendering them to a variety of platforms. In the simplest case, the visualization is rendered in WebGL, allowing it to be viewed in a standard web browser. Mouse interactions such as rotation make it possible to view the visualization from all angles, and WebGL’s use of the graphics card allows it to render data-rich visualizations. A-Frame also supports rendering the page with WebVR, allowing it to be viewed using various headsets. These platforms allow the user to explore the data in three dimensions. For the data interpretation in this paper, we used an Oculus Rift to provide us with fine-grained control of which Figure 4: Values of example metrics across different mutation rates for each of the four problems. All lineage-based metrics are part of the visualization we were looking at. Our full calculated on the lineage of the fittest organism at the final time visualization, complete with data, can be viewed on the point; population-level means behaved similarly. All experiments web or in VR at https://stevenjson.github. shown here used a tournament size of 4. Circles are medians, ver- io/ALife2018-Lineage/analysis/fitness_ tical lines show inter-quartile range, and shaded area is a boot- landscape_visualizations/. strapped 95% confidence interval around the mean. Note that both axes are on log scales. Metric analysis Material). Variance of evolutionary distinctiveness and pair- We analyzed trends in our metrics using the R Statisti- wise distance between taxa (phylogenetic regularity met- cal Computing Language (R Core Team, 2017). Specif- rics) behaved similarly to the phylogenetic divergence met- ically, we used the ggplot2 library for all graphs in- rics. This pattern makes sense, as most phylogenetic diver- cluded in this paper (Wickham, 2009). All analy- gence on these landscapes will produce unbalanced phylo- sis scripts are available at https://github.com/ genetic trees. If there were stable coexistence between mul- stevenjson/ALife2018-Lineage. tiple clades, we would expect to see a reduced correlation between the phylogenetic divergence metrics and the phy- Results and Discussion logenetic regularity metrics. Increased mutation rate also Overall, our results were consistent with evolutionary the- increases the number of deleterious steps taken, a logical ory. As mutation rate increases, coalescence takes longer, consequence of increasing mutation relative to strength of as evidenced by the fact that the MRCA is farther back in selection. time at higher mutation rates (see Figure 4). Consequently, Similarly, increasing tournament size generally increases phylogenetic richness (as measured by phylogenetic diver- the rate of coalescence, as higher tournament sizes corre- sity) is higher at high mutation rates. Phylogenetic diver- spond to stronger selection (see Figure 5). As a result, all of gence, measured here as mean pairwise distance between the measurements of phylogenetic richness and divergence taxa, is similarly higher at high mutation rates. Evolutionary decrease as tournament size increases. MRCA depth, on the distinctiveness, being another measurement of phylogenetic other hand, increases, directly reflecting the increased fre- divergence, behaved almost identically (see Supplementary quency of selective sweeps.

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.26883v1 | CC BY 4.0 Open Access | rec: 23 Apr 2018, publ: 23 Apr 2018 Figure 5: Values of example metrics across different tournament Figure 6: Values of example metrics across different mutation sizes for each of the four problems. All experiments shown here rates for each of the four problems under a diversity-preserving used a mutation rate of 0.001. All lineage-based metrics are calcu- selection regime, Eco-EA. All lineage-based metrics are calcu- lated on the lineage of the fittest organism at the final time point; lated on the lineage of the fittest organism at the final time point; population-level means behaved similarly. Circles are medians, population-level means behaved similarly. All experiments shown vertical lines show inter-quartile range, and shaded area is a boot- here used a tournament size of 4. Circles are medians, vertical lines strapped 95% confidence interval around the mean. Note that both show inter-quartile range, and shaded area is a bootstrapped 95% axes are on log scales. confidence interval around the mean. Note that both axes are on log scales. Surprisingly, there is no clear effect of tournament size on the count of deleterious steps along the dominant lineage (as at most mutation rates, they tend to remain on a single peak. evidenced by the fact that the confidence intervals all over- Thus, we can infer that this effect is the result of a drift- lap). Values for all selection schemes and tournament sizes like phenomenon where, at sufficiently high mutation rates, hover near 2,500, meaning that a deleterious step is taken in all members of the population are constantly somewhat dis- roughly half of the 5000 generations. This result is partially placed from their local fitness peak. an effect of mutation rate; at the lowest mutation rate, there Having reinforced our intuition about these metrics in a is a clear trend toward fewer deleterious steps as tournament simple system, we can now expand them to a slightly more sizes increase (see Supplementary Material). However, the complex system. A large proportion of interesting short- effect of mutation rate on the relationship between tourna- term evolutionary dynamics relate to interaction between in- ment size and dominant deleterious steps is complex, partic- dividuals in the population (i.e. ecological dynamics). In ularly for composition function 2 (see Supplementary Ma- particular, such interactions often promote the stable coex- terial). These trends likely share a common cause with the istence of clades occupying different niches. As such, it is thresholding effect evident in Figure 4, where the number of important to establish a baseline for how our metrics respond deleterious steps along the dominant lineage abruptly climbs to ecological coexistence. between mutation rates of 10−7 and 10−5 and remains rela- Indeed, the presence of stabilizing ecological dynamics tively flat over other mutation rates. Based on an inspection substantially changes the values we observe for most met- of the 3D fitness landscape visualizations, we can see that rics (see Figure 6). Perhaps the least surprising of these is this is not an effect of lineages moving from peak-to-peak; MRCA depth is far lower than it was for tournament selec-

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.26883v1 | CC BY 4.0 Open Access | rec: 23 Apr 2018, publ: 23 Apr 2018 tion, reflecting the rarity of coalescence events under these Burlacu, B., Affenzeller, M., Kommenda, M., Winkler, S., and conditions. Consequently, phylogenetic diversity is higher, Kronberger, G. (2013). Visualization of genetic lineages and as the extant population represents a greater amount of evo- inheritance information in genetic programming. page 1351. ACM Press. lutionary history. Relatedly, mean pairwise distance among Covert, A. W., Lenski, R. E., Wilke, C. O., and Ofria, C. (2013). extant taxa is higher in the presence of ecology, as clades in Experiments on the role of deleterious mutations as stepping different niches continue to diverge. Interestingly, the rela- stones in adaptive evolution. Proceedings of the National tionship of many metrics (e.g. deleterious steps and phylo- Academy of Sciences, 110(34):E3171–E3178. ) to mutation rate is reversed in the presence Faith, D. P. (1992). Conservation evaluation and phylogenetic di- of ecology. Explaining the underlying mechanisms behind versity. Biological Conservation, 61(1):1–10. these distinctions is beyond the scope of this paper, but the Goings, S., Goldsby, H. J., Cheng, B. H., and Ofria, C. (2012). ease with which the metrics identified their presence clearly An ecology-based to evolve solutions to complex problems. Artificial Life, 13:171–177. indicates their power. Han, J., Pei, J., and Kamber, M. (2011). Data mining: concepts Conclusions and techniques. Elsevier. Isaac, N. J. B., Turvey, S. T., Collen, B., Waterman, C., and Baillie, Our goals for this work are two-fold: 1) to suggest a set J. E. M. (2007). Mammals on the EDGE: Conservation Priorities of metrics that will improve our capacity to quantitatively Based on Threat and Phylogeny. PLOS ONE, 2(3):e296. understand evolutionary histories in digital evolution exper- Lalejini, A. and Ofria, C. (2016). The Evolutionary Origins of iments, and 2) to spark a conversation in the computational Phenotypic Plasticity. In Proceedings of the Artificial Life Con- ference 2016 evolution community about how to quantify, interpret, and , pages 372–379. The MIT Press. compare observed evolutionary histories. With feedback Lenski, R. E., Ofria, C., Pennock, R. T., and Adami, C. (2003). The evolutionary origin of complex features. Nature, from the community, we will expand our suite of lineage 423(6936):139–144. and phylogeny metrics, compiling accessible descriptions Li, X., Engelbrecht, A., and Epitropakis, M. G. (2013). Bench- and examples of each metric. mark Functions for CEC2013 Special Session and Competi- We have demonstrated that these metrics behave reason- tion on Niching Methods for Multimodal Function Optimiza- ably on a set of toy problems with simple organisms. Hav- tion. page 10. ing established baseline expectations for their responses to McPhee, N. F., Casale, M. M., Finzel, M., Helmuth, T., and Spec- tor, L. (2016a). Visualizing Genetic Programming Ancestries. In common conditions, our next step is to apply these metrics Proceedings of the 2016 on Genetic and Evolutionary Compu- to more complex scenarios: populations of digital organisms tation Conference Companion, GECCO ’16 Companion, pages that we evolve in a variety of qualitatively different environ- 1419–1426, New York, NY, USA. ACM. ments where we would expect to observe a wide range of McPhee, N. F., Donatucci, D., and Helmuth, T. (2016b). Us- evolutionary dynamics. It is under these conditions that we ing Graph to Explore the Dynamics of Genetic Pro- expect that true value of these metrics to become clear. gramming Runs. In Genetic Programming Theory and Practice XIII, Genetic and , pages 185–201. Acknowledgements Springer, Cham. R Core Team (2017). R: A Language and Environment for Statisti- We thank members of the MSU Digital Evolution Lab for cal Computing. R Foundation for Statistical Computing, Vienna, helpful comments and suggestions on this manuscript. This Austria. research was supported by the National Science Foundation Standish, R. K. and Galloway, J. (2002). Visualising Tierras tree (NSF) through the BEACON Center (Cooperative Agree- of life using Netmap. In ALife VIII Workshop proceedings, page ment DBI-0939454), Graduate Research Fellowships to ED 171. and AL (Grant No. DGE-1424871), and NSF Grant No. Tucker, C. M., Cadotte, M. W., Carvalho, S. B., Davies, T. J., Fer- rier, S., Fritz, S. A., Grenyer, R., Helmus, M. R., Jin, L. S., Moo- DEB-1655715 to CO. Michigan State University provided ers, A. O., Pavoine, S., Purschke, O., Redding, D. W., Rosauer, computational resources through the Institute for Cyber- D. F., Winter, M., and Mazel, F. (2017). A guide to phyloge- Enabled Research and the Digital Scholarship Lab. Any netic metrics for conservation, community ecology and macroe- opinions, findings, and conclusions or recommendations ex- cology. Biological Reviews, 92(2):698–715. pressed in this material are those of the author(s) and do not Virgo, N., Agmon, E., and Fernando, C. (2017). Lineage selection necessarily reflect the views of the NSF or MSU. leads to evolvability at large population sizes. pages 420–427. MIT Press. References Webb, C. O. and Losos, A. E. J. B. (2000). Exploring the Phylo- genetic Structure of Ecological Communities: An Example for Barrick, J. E. and Lenski, R. E. (2013). Genome dynamics during Rain Forest Trees. The American Naturalist, 156(2):145–155. . Nature Reviews Genetics, 14(12):827. Wickham, H. (2009). ggplot2: elegant graphics for data analysis. Beaumont, H. J., Gallie, J., Kost, C., Ferguson, G. C., and Rainey, Springer New York. P. B. (2009). Experimental evolution of bet hedging. Nature, 462(7269):90. Winter, M., Devictor, V., and Schweiger, O. (2013). Phylogenetic diversity and nature conservation: where are we? Trends in Blickle, T. and Thiele, L. (1995). A of tour- Ecology & Evolution, 28(4):199–204. nament selection. In ICGA, pages 9–16. Citeseer.

PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.26883v1 | CC BY 4.0 Open Access | rec: 23 Apr 2018, publ: 23 Apr 2018