The History of Life Through Evolutionary Computation

Prior to our understanding of DNA and yeast. This publication provided a offered to solve the phylogenetic recon- molecular sequence information, paleon- tremendous opportunity for the use of struction problem, some using evolution- tologists inferred the history of life on biological sequence information as “mol- ary algorithms. Earth by comparing morphological charac- ecular fossils” of information that could teristics (preserved phenotypic traits) be compared between extant organisms Challenges in phylogenetics found in fossils. Accurate estimation of this to determine their evolution. Seemingly The phylogenetics problem can be history of life or “phylogeny” was general- all that was required was additional loosely defined as the search for a tree- ly only possible for those organisms that sequence information and better comput- like structure that defines ancestral rela- had “hard parts” capable of preservation, ers and algorithms for their interpreta- tionships between related objects over died in a “fossilization-friendly” context, tion. Assessing the reliability of phyloge- time. The related objects can be any- and were unearthed over time only to be nies involves difficult statistical and com- thing from biological sequence informa- discovered by a very fortunate paleontolo- putational problems, including the NP- tion to morphological characters. The gist. According to Newton and Laporte, complete problems of sequence align- divergence over evolutionary time rep- paleontologists are the first to recognize ment and discovering the best phyloge- resented is captured in a treelike struc- the incompleteness of the fossil record and netic tree that fits the data. ture termed a “phylogeny” (Fig. 1). The the resulting difficulty of formulating evo- Given modern databases filled with number of unrooted, birfurcating tree lutionary relationships between extant sequence information, interest has turned topologies T for n taxa is given by organisms. It is a shame that many of from one of generating sequence to Earth’s organisms have very likely left no rapid interpretation and discovery of (2n − 5)! trace of their existence. “true” phylogenies for their application in T = .(1) ( − ) n−3 Our ability to infer evolutionary histo- not only the resolution of the history of n 3 !2 ries changed significantly following the life but also for epidemiology as it relates discovery of the genetic code. Fitch and to human disease. Three developments In general, three possible methods Margoliash were the first to offer a com- have been essential in this progression: are used to search for the best topolo- puter process for the construction of 1) the development of criteria and algo- gy from this set of possible trees: 1) phylogenetic trees from protein sequence rithms for discriminating among potential exhaustive methods, 2) branch and information, specifically for 20 phylogenies, 2) increased computational bound methods, and 3) heuristic meth- cytochrome C protein sequences (the power over time, and 3) the rapid ods. In analogy to the traveling sales- most for any protein to that date) that increase in sequence data availability. An man problem (TSP), exhaustive meth- had been elucidated from humans to assortment of algorithms has been ods are useful when the number of The History of Life Gary B. Fogel Through Evolutionary Computation ©CORBIS, RUBBERBALL AUGUST/SEPTEMBER 2005 0278-6648/05/$20.00 © 2005 IEEE 29 possible tree topologies (TSP routes) is efficiently search large numbers of char- is the one that groups taxa together with low. This is possible when the number acter states and infer “reasonable” his- similar characteristics and minimizes the of n taxa being compared is low (on torical relationships between organisms. number of observed overall changes in the order of ≤12) but rapidly becomes Additional confidence in a proposed the tree. There are several statistical infeasible with larger numbers of taxa. phylogeny results from overlap of “best” inconsistencies with this approach, most Branch and bound methods exclude trees generated from different sequence notably long-branch attraction and trees that do not meet specific criteria, or character sets. unequal rates of evolution. reducing the search space to a more It should be noted that one key dis- Distance methods are a second com- reasonable size and increasing the covery of 20th century evolutionary biol- mon method to infer phylogeny. These probability of an exact solution of ogy was the determination of a “tree of methods use pairwise distance calcula- merit. However, there are still limita- life” based purely on molecular tions for characters (e.g., a pairwise cal- tions on the upper number of taxa that sequences for a set of genes (ribosomal culation of mismatches measured across can be used with this approach. RNA genes) that are common to essen- all positions of a nucleotide sequence Heuristic searches commonly are tially all forms of extant life. According to alignment). A model of sequence evolu- used to build tree topologies either by Pace, the tree of life has reshaped our tion is then applied to the distance matrix changing the order in which the trees thinking of the evolution of life on Earth to correct for unobserved changes and are built or via branch swapping with and helped us identify the three major chance similarities. If the model of some metric of scoring (such as a parsi- kingdoms of life (eukarya, bacteria, and sequence evolution is accurate, then the mony) being used to determine which archaea). Phylogenetic methods are now correct tree can be recovered. These changes are more useful than others. It integral parts of almost every major area models may be specific to the characters is easy to envision how evolutionary of evolutionary biology and several parts that are being investigated. It is clear that computation can be applied in this of ecology. Those interested in learning models of sequence evolution do not regard as a method of global optimiza- more about recent discoveries in this work equally well over all character sets. tion where the best resulting tree topol- area are referred to Assembling the Tree Thus, this approach works maximally as ogy is estimated after searching only a of Life and the Tree of Life web project well as the model of sequence evolution. small fraction of the search space. An <http://tolweb.org/tree/phylogeny.html>. Maximum likelihood methods repre- unfortunate consequence of this approx- sent a third common method of phyloge- imation is that the resulting tree topolo- Phylogenetic methods netic inference. This approach is general- gy cannot be guaranteed to be optimal; Correct alignment of the sequence ly similar to maximum parsimony. How- however, it does allow the researcher to information is generally a critical first ever, maximum likelihood methods use a component of phyloge- specified model of sequence evolution to netic analysis and can assign a value of confidence to ancestral be difficult when there states. Maximum parsimony methods are large numbers of assume that a shared character between Human AATAATATTCTTTATGACAACTCATTTGATTT sequences to be com- two extant taxa must have also existed in Monkey AATAATATTCTTTATGACAATTCATTTGATTT pared, when the the ancestor of those two taxa. Maximum Mouse AATAATATTCTTTATCGCAACTCATTTGATTT sequences are long, and likelihood methods assign a confidence when the sequences to the existence of that ancestral charac- Kangaroo AATAAAATTCTTTATCGCAATTCATTTGAAAT have limited evolution- ter based on the distance (or length of ary conservation. time) that exists between the two extant Parsimony is a com- taxa. As a result, maximum likelihood mon method used to methods tend to be more computational- infer phylogenetic trees. ly intensive than parsimony methods. This method is based They also share the same requirements on the hypothesis that as distance methods for a model of Human the “best” tree in the sequence evolution specific to the char- space of all possible acter set under investigation. However, trees is the one that maximum likelihood approaches can Monkey explains the relation- improve tree inference when the ship between the extant sequences are not closely related. All taxa with the fewest three of the above approaches are the Mouse number of evolutionary subject of considerable investigation in changes throughout the the literature and are the subject of opti- tree. The principle of mization with simulated evolution. Kangaroo inheritance implies that Fig. 1 Phylogenetic reconstruction. The nucleotide sequences when a characteristic Applications of evolutionary for four organisms are aligned and in this simple example are state is modified in a computation to phylogenetics highly similar except for six positions noted as (“|”). At each of species, the descen- Evolutionary computation has been these informative positions (characters), change has occurred dents of that species applied to phylogenetic reconstruction over evolutionary time. All other nucleotide positions are invariant and not required for proper phylogenetic will have a high proba- for 10 years. Matsuda was the first to reconstruction. Combining key characters and resolving the bility of sharing that use evolutionary algorithms for phylo- true phylogeny is challenging when the number and type of characteristic. Thus the genetic reconstruction, doing so with characters/taxa increases. most parsimonious tree protein sequences. Since that time, 30 IEEE POTENTIALS these strategies have been extended for PAUP

The History of Life Through Evolutionary Computation

Completely Derandomized Self-Adaptation in Evolution Strategies

A New Biogeography-Based Optimization Approach Based on Shannon-Wiener Diversity Index to Pid Tuning in Multivariable System

Genetic Algorithms for Applied Path Planning a Thesis Presented in Partial Ful Llment of the Honors Program Vincent R

The Roles of Evolutionary Computation, Fitness Landscape, Constructive Methods and Local Searches in the Development of Adaptive Systems for Infrastructure Planning

Sustainable Growth and Synchronization in Protocell Models

Arxiv:1803.03453V4 [Cs.NE] 21 Nov 2019

A Note on Evolutionary Algorithms and Its Applications

Analysis of Evolutionary Algorithms

Genetic Algorithms and Evolutionary Computation

Biogeography-Based Optimization with Blended Migration for Constrained Optimization Problems

A Short Introduction to Evolutionary Computation (In 50 Minutes)

A Model of Protocell Based on the Introduction of a Semi-Permeable Membrane in a Stochastic Model of Catalytic Reaction Networks