Ancestry-Based Metrics Provide Insights And

Quantifying the tape of life: Ancestry-based metrics provide insights and intuition about evolutionary dynamics Emily Dolson1;2;3, Alexander Lalejini1;2;3, Steven Jorgensen1;2 and Charles Ofria1;2;3 1BEACON Center for the Study of Evolution in Action, Michigan State University, East Lansing, MI, 48824 2Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, 48824 3Ecology, Evolutionary Biology, and Behavior Program, Michigan State University, East Lansing, MI, 48824 [email protected] Abstract digital systems, however, allow for perfect lineage tracking at a level of granularity that is impossible in modern wet lab Fine-scale evolutionary dynamics can be challenging to tease experiments. These data allow us to replay the tape of life out when focused on broad brush strokes of whole populations over long timespans. We propose a suite of diagnos- in precise detail and to tease apart the evolutionary recipe tic metrics that operate on lineages and phylogenies in digi- for any phenomenon we are interested in (McPhee et al., tal evolution experiments with the aim of improving our ca- 2016b). In one notable example, Lenski et al., used the lin- pacity to quantitatively explore the nuances of evolutionary eage of an evolved digital organism in Avida to tease apart, histories in digital evolution experiments. We present three step-by-step how a complex feature (the capacity to perform types of lineage measurements: lineage length, mutation accumulation, and phenotypic volatility. Additionally, we sug- the equals logical operation) emerged (Lenski et al., 2003). gest the adoption of four phylogeny measurements from bi- Yet, tracking the full details of a single lineage, much less ology: depth of the most-recent common ancestor, phylo- a population of lineages, can be computationally expensive genetic richness, phylogenetic divergence, and phylogenetic regularity. We demonstrate the use of each metric on a set of and will inevitably generate an unwieldy amount of data that two-dimensional, real-valued optimization problems under a can be challenging to visualize or interpret (McPhee et al., range of mutation rates and selection strengths, confirming 2016a). Summary statistics can help alleviate these issues our intuitions about what they can tell us about evolutionary by enabling the user to focus on aggregate trends across a dynamics. population rather than needing to examine each individual’s lineage. The question is how to effectively summarize a path Introduction through fitness space. One useful abstraction is to treat the Evolution is a collective effect of many smaller events such path as a sequence of states. Here, we use phenotypes and as replication, variation, and competition that occur on a genotypes as the states in the sequence, but we could just fine-grained temporal scale. While evolution’s emergent na- as easily use some other descriptor of the lineage’s position ture can be fascinating, it also presents challenges to study- in the fitness landscape at a given point in time. With this ing the short-term mechanisms that, in aggregate, govern abstraction in hand, a few metrics are easily formalized: the long-term results. In computational evolutionary systems, number of unique states, the number of transitions between we can theoretically collect data to help untangle these states, and the amount of time spent in each state. Addition- mechanisms. In practice, however, the sheer number of con- ally, we may care about how the transitions between states stituent events produce an overwhelming quantity of data. happened. What mutations led to them? Were those muta- In response, we have developed a standardized suite of di- tions beneficial, deleterious, or neutral at the time? These agnostic metrics to summarize short-term evolutionary dy- mutations are particularly notable because they did not sim- namics within a population by measuring lineages and phy- ply appear briefly, but stood the test of time, leaving descen- logenies. Here, we describe these metrics and provide ex- dants in the final population. Here, we explore a subset of perimental results to develop an intuition for what they can these metrics that we expect will be broadly useful. tell us about evolution. Whereas a lineage recounts the evolutionary history of a A lineage describes a continuous line of descent, linking single individual, a phylogeny details the evolutionary his- parents and offspring in an unbroken chain from an original tory of an entire population. Measurements that summa- ancestor. A complete lineage can provide a post-hoc, step- rize phylogenies can provide useful insight into population- by-step guide to the evolution of an extant organism where level evolutionary dynamics, such as diversification and co- each step involves replication and inherited variation. In- existence among different clades. A variety of useful phy- deed, lineage analyses are a powerful tool for disentangling logeny measurements have already been developed by bi- evolutionary dynamics in both natural and digital systems; ologists (Tucker et al., 2017). These measurements tend to PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.26883v1 | CC BY 4.0 Open Access | rec: 23 Apr 2018, publ: 23 Apr 2018 treat the phylogeny as a graph and make calculations about the direct application of many data mining techniques de- its topology. Tucker et al. group them into three broad cat- signed to operate over sequences such as sequential pattern egories: assessments of the quantity of evolutionary history mining, trend analysis, et cetera (Han et al., 2011). represented by a population, assessments of the amount of Only asexual lineages where genetic material is exclu- divergence within that evolutionary history, and assessments sively vertically transmitted can be directly abstracted as of the topological regularity of the phylogenetic tree. Such a linear sequence of states. Sexual reproduction (and any measurements can help quantify the behavior of the popula- form of horizontal gene transfer) complicates matters sig- tion as a whole, providing insight into interactions between nificantly as such lineages are more appropriately repre- its members. Thus, they are useful indicators of the presence sented by trees rooted at the extant organism, branching for of various types of eco-evolutionary dynamics. each contributor of genetic material. One possibility is to Here, we present three types of lineage measurements compress sexual lineages into linear sequences of states by and suggest adopting four phylogeny measurements from modeling sexual reproduction events as asexual reproduc- biology; these are lineage length, mutation accumulation, tion events, designating one parent to be a part of the lineage phenotypic volatility, depth of the most-recent common an- and considering the genetic contributions of other parents as cestor, phylogenetic richness, phylogenetic divergence, and sources of genetic variation (mutations). The primary down- phylogenetic regularity. For each metric, we discuss its ap- side to this approach is its lossy-ness (i.e. the fact that it dis- plication and our intuition for what it can tell us about evolu- cards potentially important parentage information). Alter- tion. We evaluate our intuition on a set of two-dimensional, natively, we can extend our metrics to operate over the more real-valued optimization problems under a range of mutation complex state sequences that constitute the lineages of sexu- rates and selection strengths. For this work, we restrict our ally reproducing organisms. One such approach would be to attention to asexually reproducing populations; however, we consider all possible ancestor paths for an extant individual, suggest how these metrics can extend to sexual populations. calculating a given metric for each of them and then averag- In addition to demonstrating a range of metrics that are ing the resulting values together. Another approach would useful to digital evolution research, we intend for this work be to divide an organism into its constituent parts that are in- to begin a conversation within the artificial life commu- herited atomically (such as genes or instructions, depending nity about how we quantify, interpret, and compare ob- on the representation); an organism would then be viewed as served evolutionary histories. There have been extensive ef- a collection of lineages rather than a single one. Assessing forts to improve our ability to represent and visualize both the efficacy of these and potentially other approaches would lineages and phylogenies (Standish and Galloway, 2002; be a useful line of research to pursue in the future. Burlacu et al., 2013; McPhee et al., 2016b,a; Lalejini and Ofria, 2016), which are indispensable for building intuitions and qualitatively understanding the dynamics embedded in Lineage Length Lineage length describes the number of a population’s evolutionary history. However, we are un- states traversed by a lineage. If a state is defined as a single aware of efforts to formalize a suite of quantitative lineage individual, lineage length is a count of the number of genera- and phylogeny-based metrics for computational evolution. tions. Generation count is most useful in systems where gen- erational turnover is not fixed, but instead determined by the Metrics life history strategies of organisms. For lineages that span Code for all of our metrics is open source and avail- equal lengths of time, more generations imply faster repli- able in the Empirical

Load more