A Rooted Net of Life David Williams1*†, Gregory P Fournier2*†, Pascal Lapierre3, Kristen S Swithers1, Anna G Green1, Cheryl P Andam1 and J Peter Gogarten1*
Total Page:16
File Type:pdf, Size:1020Kb
Williams et al. Biology Direct 2011, 6:45 http://www.biology-direct.com/content/6/1/45 REVIEW Open Access A Rooted Net of Life David Williams1*†, Gregory P Fournier2*†, Pascal Lapierre3, Kristen S Swithers1, Anna G Green1, Cheryl P Andam1 and J Peter Gogarten1* Abstract Phylogenetic reconstruction using DNA and protein sequences has allowed the reconstruction of evolutionary histories encompassing all life. We present and discuss a means to incorporate much of this rich narrative into a single model that acknowledges the discrete evolutionary units that constitute the organism. Briefly, this Rooted Net of Life genome phylogeny is constructed around an initial, well resolved and rooted tree scaffold inferred from a supermatrix of combined ribosomal genes. Extant sampled ribosomes form the leaves of the tree scaffold. These leaves, but not necessarily the deeper parts of the scaffold, can be considered to represent a genome or pan- genome, and to be associated with members of other gene families within that sequenced (pan)genome. Unrooted phylogenies of gene families containing four or more members are reconstructed and superimposed over the scaffold. Initially, reticulations are formed where incongruities between topologies exist. Given sufficient evidence, edges may then be differentiated as those representing vertical lines of inheritance within lineages and those representing horizontal genetic transfers or endosymbioses between lineages. Reviewers: W. Ford Doolittle, Eric Bapteste and Robert Beiko. Open peer review reconstructed, revealing new aspects of the evolutionary Reviewed by W. Ford Doolittle, Eric Bapteste and Robert process. Beiko. For the full reviews, see the Reviewers’ Comments Substantial incongruities in gene histories and uneven section. taxonomic distributions of gene families within groups of organisms have challenged a tree-like bifurcating process Background as an adequate model to describe organismal evolution The use of DNA and protein sequence residues as charac- [4-6]. In addition, evidence is abundant that the evolu- ter states for phylogenetic reconstruction was a profound tionary history of Eukarya includes numerous primary, breakthrough in biology [1]. It has facilitated advances in secondary and tertiary endosymbiotic events often pro- population genetics and reconstructions of evolutionary viding important traits such as photosynthesis [7]. These histories encompassing all life with most of the molecular inferences have caused a shift in the consensus among diversity found among microorganisms [2]. While progress evolutionary biologists towards a view that the horizontal in theoretical aspects of reconstruction has allowed more transfer of genetic material relative to vertical inheritance confident and detailed inferences, it has also revealed the is a major source of evolutionary innovation [5,8,9]. With necessity for caution, as these inferences can be misleading a growing recognition for the need to represent more if methodologies are not applied with care. At the same than just the lines of vertical inheritance, various alterna- time, exponentially growing sequence databases including tive models have been suggested. These vary in detail but complete genome sequences [3] have allowed a more broadly describe a reticulated network representation of complete picture of biological lineages over time to be organismal relationships [4,6,10-12]. * Correspondence: [email protected]; [email protected]; The Rooted Net of Life [email protected] † Contributed equally In this manuscript we present a model, the Rooted Net of 1Department of Molecular and Cell Biology, University of Connecticut, Storrs, Life, in which the evolutionary relationships of organisms CT 06269-3125, USA aremorefullydescribedthaninexistingTreeofLife 2Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA concepts [13,14]. Importantly, we address the observation Full list of author information is available at the end of the article that organisms consist of many discrete evolutionary © 2011 Williams et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Williams et al. Biology Direct 2011, 6:45 Page 2 of 20 http://www.biology-direct.com/content/6/1/45 units: open reading frames, operons, plasmids, chromo- in practice, explicitly described by molecular phylogenies. somes and in some cases plastids and other organelles, As such, the ToCD is only knowable insofar that a verti- each with discrete and possibly different evolutionary his- cal signal is preserved; if all gene histories were domi- tories. These multiple histories are combined and plotted nated by random horizontal transfer, there would be no as a single reticulated network phylogenetic representa- connection between cellular and genetic history. Addi- tion in which misleading artifacts of reconstruction and tionally, the ToCD concept fails when a new cell is cre- loss of information due to the averaging of phylogenetic ated through the fusion of two cells. If this fusion is part signals are minimized. In some instances it may be possi- of the sexual life cycle, the principle of the ToCD is vio- ble to assign some edges as representative of ancestral lated, but the deviations may be inconsequential if phylo- vertical descent by genetic inheritance and other edges as geny is considered at a larger scale. However, instances reticulations due to horizontal genetic transfers. In other of symbioses that lead to lineage and/or cell fusions instances, this decision is less certain, for example, did between divergent partners (as in the serial endosymbio- the ancestor of the Thermotogales acquire the ribosome sis theory for eukaryogenesis, if mitochondria and plas- from a relative of the Aquificales, or did the Thermoto- tids are no longer considered individual cells) lead to gales acquire most of their genes from the clostridia? reticulations in the ToCD. Therefore, when all life is (See “Highways of Gene Sharing” below for details.) included, the ToCD does not represent a strictly bifurcat- Despite the distinct evolutionary histories among the ing process. genes in an organism, when they are found together in Bridging the gap between gene and species trees has an extant genome, they are assigned to the same terminal traditionally been approached via two methods: (1) super- node and edge that remains intact until their histories matrix methods, which seek to infer a species tree by the differ. This organism-genome definition includes his- concatenation of a large number of genes, integrating tories of endosymbioses, which evolved to a point of across many sites within aligned sequences to arrive at a bidirectional dependence e.g., mitochondria and plastids well-supported, comprehensive tree [17]; and (2) supertree with the “host” cell [7], but excludes parasitisms and methods, which integrate across phylogenies calculated for mutualisms in which partners are facultative or inter- many individual genes [18]. Both methods attempt to changeable e.g., the gut microflora of animals [15]. Ribo- arrive at a consensus phylogeny to approximate the spe- somal RNA and protein sequences are combined into a cies tree by overcoming the insufficient and occasionally supermatrix and used to infer a well-resolved phyloge- conflicting phylogenetic information that each molecular netic tree scaffold which we anticipate to mostly, but not unit (typically genes) can provide. However, if applied necessarily, approximate the vertical descent of a coher- indiscriminately, biased horizontal gene transfer can invali- ent biological entity (but see the “Endosymbioses” section date these methodologies, as multiple strong, distinct below). One terminal node may represent a group of phylogenetic patterns can exist within a dataset [10,19]. In sequenced genomes sharing very similar ribosomal this case, it is possible that the resulting phylogeny will sequences. All other genetic sequences including plas- not only be incorrect, but even contain bipartitions not mids and chromosomes are assigned to tips by member- supported by any subset of the data due to fallacious aver- ship within these ribosome-defined pan-genomes and are aging between signals [20]. While these approaches further grouped into homologous gene families across acknowledge that a comprehensive history of life must other tips. Reconstructed phylogenetic trees of each are take into account many individual gene histories, it is clear superimposed on top of the scaffold, forming reticula- that, at best, this is insufficient for capturing the true com- tions where necessary. plexity of the evolution of life. In supermatrix approaches, to avoid averaging over phy- The Ribosomal Tree Scaffold logenies with conflicting phylogenetic signal, gene families The complex relationship between individual genetic with conflicting gene phylogenies are usually removed. components and the evolutionary history of organisms This results in genome or species phylogenies that only must be well understood in order for a biologically mean- represent a small fraction of the genetic information ingful, comprehensive history of life to be assembled within each organism, the so-called “tree of one percent” from molecular data. Since species are propagated