<<

Williams et al. Biology Direct 2011, 6:45 http://www.biology-direct.com/content/6/1/45

REVIEW Open Access A Rooted Net of Life David Williams1*†, Gregory P Fournier2*†, Pascal Lapierre3, Kristen S Swithers1, Anna G Green1, Cheryl P Andam1 and J Peter Gogarten1*

Abstract Phylogenetic reconstruction using DNA and protein sequences has allowed the reconstruction of evolutionary histories encompassing all life. We present and discuss a means to incorporate much of this rich narrative into a single model that acknowledges the discrete evolutionary units that constitute the organism. Briefly, this Rooted Net of Life genome phylogeny is constructed around an initial, well resolved and rooted tree scaffold inferred from a supermatrix of combined ribosomal genes. Extant sampled ribosomes form the leaves of the tree scaffold. These leaves, but not necessarily the deeper parts of the scaffold, can be considered to represent a genome or pan- genome, and to be associated with members of other gene families within that sequenced (pan)genome. Unrooted phylogenies of gene families containing four or more members are reconstructed and superimposed over the scaffold. Initially, reticulations are formed where incongruities between topologies exist. Given sufficient evidence, edges may then be differentiated as those representing vertical lines of inheritance within lineages and those representing horizontal genetic transfers or endosymbioses between lineages. Reviewers: W. Ford Doolittle, Eric Bapteste and Robert Beiko.

Open peer review reconstructed, revealing new aspects of the evolutionary Reviewed by W. Ford Doolittle, Eric Bapteste and Robert process. Beiko. For the full reviews, see the Reviewers’ Comments Substantial incongruities in gene histories and uneven section. taxonomic distributions of gene families within groups of organisms have challenged a tree-like bifurcating process Background as an adequate model to describe organismal The use of DNA and protein sequence residues as charac- [4-6]. In addition, evidence is abundant that the evolu- ter states for phylogenetic reconstruction was a profound tionary history of Eukarya includes numerous primary, breakthrough in biology [1]. It has facilitated advances in secondary and tertiary endosymbiotic events often pro- population and reconstructions of evolutionary viding important traits such as photosynthesis [7]. These histories encompassing all life with most of the molecular inferences have caused a shift in the consensus among diversity found among microorganisms [2]. While progress evolutionary biologists towards a view that the horizontal in theoretical aspects of reconstruction has allowed more transfer of genetic material relative to vertical inheritance confident and detailed inferences, it has also revealed the is a major source of evolutionary innovation [5,8,9]. With necessity for caution, as these inferences can be misleading a growing recognition for the need to represent more if methodologies are not applied with care. At the same than just the lines of vertical inheritance, various alterna- time, exponentially growing sequence databases including tive models have been suggested. These vary in detail but complete genome sequences [3] have allowed a more broadly describe a reticulated network representation of complete picture of biological lineages over time to be organismal relationships [4,6,10-12].

* Correspondence: [email protected]; [email protected]; The Rooted Net of Life [email protected] † Contributed equally In this manuscript we present a model, the Rooted Net of 1Department of Molecular and Cell Biology, University of Connecticut, Storrs, Life, in which the evolutionary relationships of organisms CT 06269-3125, USA aremorefullydescribedthaninexistingTreeofLife 2Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA concepts [13,14]. Importantly, we address the observation Full list of author information is available at the end of the article that organisms consist of many discrete evolutionary

© 2011 Williams et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Williams et al. Biology Direct 2011, 6:45 Page 2 of 20 http://www.biology-direct.com/content/6/1/45

units: open reading frames, operons, plasmids, chromo- in practice, explicitly described by molecular phylogenies. somes and in some cases plastids and other organelles, As such, the ToCD is only knowable insofar that a verti- each with discrete and possibly different evolutionary his- cal signal is preserved; if all gene histories were domi- tories. These multiple histories are combined and plotted nated by random horizontal transfer, there would be no as a single reticulated network phylogenetic representa- connection between cellular and genetic history. Addi- tion in which misleading artifacts of reconstruction and tionally, the ToCD concept fails when a new cell is cre- loss of information due to the averaging of phylogenetic ated through the fusion of two cells. If this fusion is part signals are minimized. In some instances it may be possi- of the sexual life cycle, the principle of the ToCD is vio- ble to assign some edges as representative of ancestral lated, but the deviations may be inconsequential if phylo- vertical descent by genetic inheritance and other edges as geny is considered at a larger scale. However, instances reticulations due to horizontal genetic transfers. In other of symbioses that lead to lineage and/or cell fusions instances, this decision is less certain, for example, did between divergent partners (as in the serial endosymbio- the ancestor of the Thermotogales acquire the ribosome sis theory for eukaryogenesis, if mitochondria and plas- from a relative of the Aquificales, or did the Thermoto- tids are no longer considered individual cells) lead to gales acquire most of their genes from the clostridia? reticulations in the ToCD. Therefore, when all life is (See “Highways of Gene Sharing” below for details.) included, the ToCD does not represent a strictly bifurcat- Despite the distinct evolutionary histories among the ing process. genes in an organism, when they are found together in Bridging the gap between gene and species trees has an extant genome, they are assigned to the same terminal traditionally been approached via two methods: (1) super- node and edge that remains intact until their histories matrix methods, which seek to infer a species tree by the differ. This organism-genome definition includes his- concatenation of a large number of genes, integrating tories of endosymbioses, which evolved to a point of across many sites within aligned sequences to arrive at a bidirectional dependence e.g., mitochondria and plastids well-supported, comprehensive tree [17]; and (2) supertree with the “host” cell [7], but excludes parasitisms and methods, which integrate across phylogenies calculated for mutualisms in which partners are facultative or inter- many individual genes [18]. Both methods attempt to changeable e.g., the gut microflora of animals [15]. Ribo- arrive at a consensus phylogeny to approximate the spe- somal RNA and protein sequences are combined into a cies tree by overcoming the insufficient and occasionally supermatrix and used to infer a well-resolved phyloge- conflicting phylogenetic information that each molecular netic tree scaffold which we anticipate to mostly, but not unit (typically genes) can provide. However, if applied necessarily, approximate the vertical descent of a coher- indiscriminately, biased horizontal gene transfer can invali- ent biological entity (but see the “Endosymbioses” section date these methodologies, as multiple strong, distinct below). One terminal node may represent a group of phylogenetic patterns can exist within a dataset [10,19]. In sequenced genomes sharing very similar ribosomal this case, it is possible that the resulting phylogeny will sequences. All other genetic sequences including plas- not only be incorrect, but even contain bipartitions not mids and are assigned to tips by member- supported by any subset of the data due to fallacious aver- ship within these ribosome-defined pan-genomes and are aging between signals [20]. While these approaches further grouped into homologous gene families across acknowledge that a comprehensive must other tips. Reconstructed phylogenetic trees of each are take into account many individual gene histories, it is clear superimposed on top of the scaffold, forming reticula- that, at best, this is insufficient for capturing the true com- tions where necessary. plexity of the evolution of life. In supermatrix approaches, to avoid averaging over phy- The Ribosomal Tree Scaffold logenies with conflicting phylogenetic signal, gene families The complex relationship between individual genetic with conflicting gene phylogenies are usually removed. components and the evolutionary history of organisms This results in genome or species phylogenies that only must be well understood in order for a biologically mean- represent a small fraction of the genetic information ingful, comprehensive history of life to be assembled within each organism, the so-called “tree of one percent” from molecular data. Since species are propagated by the [13,21]. While such empirical approaches naturally result reproduction of individuals within a population, and gen- in a dataset dominated by the ribosomal machinery, they erated by the divergence of populations over time, cytolo- are philosophically unsatisfying not only in that they disre- gically speaking, a single vertical tree of descent exists, at gard all other gene histories (many, if not most, of which least for prokaryotes that procreate through division of will be congruent across most of the tree, with the possible the parent cell. However, in principle, this “tree of cellu- exception of closely related groups where transfers are far lar divisions” [16] (ToCD) can only be indirectly inferred more frequent), but also because they are not definitive; from molecular data, as opposed to gene trees, which are, revisiting gene phylogenies and definitions of sequence Williams et al. Biology Direct 2011, 6:45 Page 3 of 20 http://www.biology-direct.com/content/6/1/45

similarity with more advanced techniques could always ribosomal dataset. As many ribosomal protein sequences add or remove genes from the dataset, affecting the are very short, there is limited phylogenetic information inferred conclusions. The history of accounting for hori- per protein, and the resulting tree topologies reflect this in zontal gene transfer (HGT) within phylogenies shows a their lack of resolution. Therefore, a stringent criterion is normalizing progression from the filtering of genomic required for the identification of clear conflicts, as poorly “noise”, to the cataloging of HGT events as unique excep- supported conflicts within these trees may merely reflect a tions, to the acknowledgement of HGT as a major force in very weak power of detection for actual events. evolution [5,9,22]. Acceptance of the relevance of HGT for The use of the ribosome in providing a scaffold for a Net reconstructing life’s history also follows this progression, of Life reconstruction is also fitting in that a recent study and any serious attempt to capture a universal evolution- has also used universal ribosomal proteins for an empirical ary schema must include reticulations, not merely as a rooting of their respective universal tree [19]. In this study, decoration, but as intrinsic and essential to the under- ancestral reconstruction of ribosomal protein sequences standing of the whole. identifiedauniquecompositional signature along the However, it is clear that regardless of its primacy (or branch on the bacterial side of the tripartition between the lack thereof), a reference tree representing a robust, con- three domains. Compared with simulations and other sistent evolutionary signal is an essential initial scaffold parts of the tree, this branch showed a significant under- for any such holistic effort. Such a reference tree should representation of amino acids presumed to be more recent be not only highly resolved and robust against artifacts, additions to the genetic code (Tyr, Trp, Phe, Cys), and a but reflect a biological reality consistent with its central significant over-representation of those presumed to be organizing role, as opposed to an empirically determined the most ancient (Gly, Ala). As the current state of the collection of genes which are solely defined by their uni- genetic code is a character shared between all domains, versal presence. A ribosomal tree, derived from the con- this signal should be preferentially detected on the branch catenated sequences of both ribosomal RNAs and closest to its formative state, that is, the branch that con- proteins, is well-suited to this purpose [4,23,24]. The tains the root. high level of sequence conservation within the ribosome, While, strictly speaking, this only explicitly roots the combined with infrequent horizontal transfer of its con- “ribosomal tree of life” [19], it is a reasonable starting stituent molecular elements between distantly related point for rooting the reticulate phylogeny, as it serves to groups, makes this an ideal candidate for providing a polarize the proposed scaffold, allowing the full complexity scaffold reference phylogeny [22,25]. of reticulations in a comprehensive evolutionary history to Toverifythecongruenceoftheevolutionarysignal also be rooted with respect to one another. The majority within the ribosome, highly supported bifurcations of molecular phylogenies rooted using ancient gene dupli- between all sets of ribosomal gene trees were compared, cations placed the root in the same location (see review in identifying cases where specific topologies were consis- [26]); and the deep split between Bacteria and Archaea is tently in conflict with others. In such cases, the particular also recovered from genome-wide analyses using midpoint sequences for those species in the conflicted area of the rooting of splits trees, and averaging over phylogenies of tree would not be included in the concatenation, in order nearly universal protein families [27-29]. Interestingly, to avoid fallacious signal averaging within the dataset. The reconciliations of gene trees to the reference scaffold tree vast majority of comparisons showed no highly supported can also provide further support for the correct rooting, as conflicts, while 23 intra-order conflicts were identified alternative placements of the root should consistently within 10 groups across three domains. As these groups force less parsimonious reconciliations, if incorrect. It may tend to be highly similar to one another at the ribosomal even be seen that a distinct subset of reconciliations for sequence level, and do not challenge the relationships related genes are more parsimonious with an alternative between larger phylogenetic categories that are of the rooting (e.g., on the archaeal or eukaryotic branch), sup- most evolutionary interest in a ToL/rooted Net of Life porting HGT events occurring between the stem groups (RNoL), these were preserved within the dataset. Addition- of each domain, which would be extremely difficult to ally, three inter-order conflicts were detected, with Metha- infer otherwise. nosaeta thermophila L29 showing strong support for grouping with Methanomicrobiales, and Staphylococcus Examples of reticulations aureus S19 and L5 showing strong support for grouping There are many organismal lineages that have been with Lactobacilliales. No inter-domain conflicts were involved in horizontal genetic transfers, some at frequen- detected. It is important to note that this methodology cies sufficient to be considered highways of gene sharing does not specifically detect horizontal transfers; rather, it [10,24], thus leading to many different gene histories in simply identifies well-supported conflicts that would the (s) of one organism [8]. When these violate the assumptions necessary for a concatenated organismal histories are considered internally consistent Williams et al. Biology Direct 2011, 6:45 Page 4 of 20 http://www.biology-direct.com/content/6/1/45

and tree-like, conventional phylogenetic reconstruction Highways of gene sharing methods that combine sequence data often reflect an There is evidence that Thermotogales have a significant average between distinct signals. This is especially a pro- portion of their genomes transferred from the Firmicutes blem in those cases where highways of gene sharing and Archaea, about 48% and 11%, respectively [48]. between divergent organisms dominate the phylogenetic Averaging across the whole genome with supertree or information retained in the analyzed genomes. Multiple supermatrix methods places the Thermotogales with the endosymbioses have occurred in many lineages, therefore Firmicutes [48,49] and neither highways of gene sharing, organismal histories are better represented by a Rooted nor the history of the ribosome emerges from the aver- Net of Life able to reflect both vertical descent and hori- aged signal. A similar case is seen for the Aquificales, zontal genetic transfers. Here we outline examples that which according to averaging methods are placed with demonstrate a bifurcating tree-like phylogeny as an the Epsilonproteobacteria, apparently due to an over- inadequate depiction of the history of life. whelming number of HGTs from that group [50]. 16S rRNA gene trees and concatenated ribosomal gene trees Horizontal genetic transfer place both the Thermotogales and the Aquificales, as There are numerous important gene sharing events, deeply branching bacterial lineages [48,50]. Other exam- some between members of different Domains of life, that ples include the Thermoplasmatales, an acidophilic eur- arelostwhenonlyasingulartree of life is considered. yarchaeal order, with about 58% of their genome inferred These include inventions of new metabolic pathways, to have been transferred from the phylogenetically such as a single transfer event in which genes encoding distant crenarchaeal Sulfolobales [51-53]; and Methano- acetate kinase and phosphoacetyltransferase were trans- sarcina mazei, with about 33% of its genome identified as ferred to the Methanosarcina from cellulolytic clostridia transferred from bacteria [54]. Such examples continue allowing the use of acetate as a substrate for methano- to emerge, and more are likely to be discovered as the genesis (acetoclastic methanogenesis) [30]. There are also number of sequenced genomes increases. many examples of gene transfers from bacterial to single celled eukaryotes. The Fungi acquired many genes Endosymbioses involved in various metabolic processes from both the We consider an organism to be a group of distinct evolu- Proteobacteria and Actinobacteria [31-36]. The proto- tionary units currently engaged in an obligate mutualism. zoan Blastocystis, found in various gut environments, has Thus we include the bacterium Thermotoga petrophila acquired genes involved in energy metabolism, adhesion with its set of ancestrally archaeal genes as a single organ- and osmotrophy from bacteria. These transfers have ism, assigned to a single terminal node on the Rooted Net allowed for successful of Blastocystis spp. to of Life. Likewise, we would consider an animal with its digestive environments [37]. Genes involved in organic numerous mitochondria-containing cells or a plant with carbon and nitrogen utilization, the urea cycle, cell wall its many mitochondria-and chloroplast-containing cells as silification and DNA replication, repair and recombina- respectively assignable to terminal nodes. tion have all been transferred from bacteria to the dia- The events that led to these relationships can be consid- toms [38]. Bdelloid rotifers, metozoan freshwater ered large-scale horizontal genetic transfers in which an invertebrates, have acquired genes for a xylosidase, cell entire chromosome, along with a cell membrane, is wall peptidoglycan synthesis and various reductases and engulfed via endosymbiosis. Subsequent evolution leads to dehydrogenases from bacteria [39]. A pivotal gene trans- an obligate mutualism [55] with gene transfer from the fer from the bacteria to the Cnidarians allowed for the endosymbiont chromosome to the host nuclear chromo- development of the stinging cells that this lineage uses to somes [56]. The primary endosymbiosis leading to plastids capture prey [40]. The gene encodes a polyanionic poly- refers to an original uptake and retention of an ancestral mer (PGA), which, when present in large quantities in cyanobacterium by an ancestral eukaryote [57]. Extant the stinging cells (nematocysts), causes an explosive, organisms retaining this ancestral condition are the Glau- stinging discharge to be released upon contact [41]. cophytes, Red Algae and Green Algae. Other lineages Examples of gene transfers from bacteria to multicellular underwent secondary and even tertiary endosymbioses [7] eukaryotes include ancestral bacterivorous nematodes providing not only prominent morphological features but acquiring cell wall degradation genes from a bacterial also defining metabolic pathways (e.g., photosynthesis). In lineage [42-44]. These genes are required for the initial tracing the genealogies of these discrete evolutionary step in parasitizing plants, enabling the free living nema- units, numerous reticulations within the ribosomal tree tode to “transition” into a parasite [45]. Other examples scaffold itself are necessary, and these reticulations are include Wolbachia endosymbiont sequences in the X congruent with the lineages of other genes present on the chromosome of the host adzuki bean beetle [46] and in endosymbiont chromosome. These examples illustrate the the Aedes aegypti genome [47]. reticulate complexities within all Domains of Life, and Williams et al. Biology Direct 2011, 6:45 Page 5 of 20 http://www.biology-direct.com/content/6/1/45

show that the assumption of a single, bifurcating organis- mal tree is problematic not only within specific groups of prokaryotes. However, to say the history of life is better represented by a Rooted Net of Life is not to say there is no structure or form to it; rather, that the structure and story is too complex for a single tree-like narrative to con- tain [58].

Reconstructing the Rooted Net of Life Phylogenetic reconstruction suffers less stochastic error when more data are available for most branch-length sce- narios [59]. In reconstructing the Rooted Net of Life model proposed here, whole-genome data sets are required to provide both the tree-like ribosome scaffold and the potential reticulations from other gene trees. One extreme approach for mitigating stochastic error would be multiple whole-genome alignments, but this would not be realistic (or even possible given the incom- plete homology of gene families across extant life) because the discrete evolutionary histories within organ- Figure 1 Phylogenetic tree used to simulate genome evolution isms would not described. Where regions of a genome including a directed highway of gene sharing. Two different trees were tested, one having a slightly longer internal branch of arelikelytohavehadthesamehistories,combining 0.05 substitutions per site compared to the other tree with only sequences to improve resolution is a useful approach and 0.01 substitutions per site. Genome B’ was used as a donor for is discussed in detail below. It is important to note that genes transferred into the lineage leading to genome F. Genome B’ even well resolved phylogenies may be deceptive, with was not included in the phylogenetic reconstruction and genes ’ reconstruction artifacts masking complex evolutionary from genome B were used as replacements for their orthologs in genome F. The simulations were repeated with increasing amount events if the reconstruction model was inadequate to of transfers from genome B’ to F. The genome sequences were describe the evolutionary process [60]. This is especially generated using Evolver from the PAML package [113]. Each likely when incorporating diverse homologous sequences simulated genomes contained a total of 100 genes, each 300 amino as is necessary in a Net of Life reconstruction. acids long.

Mitigation of stochastic error: combining sequences for improved resolution lineage whose neighboring branch was separated by 0.05 To solve difficult phylogenies, it is sometimes advanta- substitutions per site (Figure 2A), the supermatrix geous to use information from many genes in order to approach (concatenation of genes) was able to recover extract phylogenetic signals which otherwise may be too the correct tree topology only when less than 25% of the dilute if taken from individual genes. As previously men- genes underwent homologous replacement. In contrast, tioned, two widely used methods consist of concatenation embedded quartet decomposition followed by supertree of multiple genes (supermatrix) [17] and construction of reconstruction recovered the correct topology, even consensus phylogenies using several trees calculated when 45% of the genes underwent HGT replacement from individual genes (supertrees) [18]. It is believed that (Figure 2A). At more than 50% HGT, genome F was these phylogenomic methods are capable of capturing a recovered as the sister group to B, reflecting a situation plurality consensus of a dataset while minimizing the where the signal due to ancestry is overwhelmed by a presence of artifacts in the data such as presence of gene highway of gene sharing. When the recipient lineage is transfers or low phylogenetic signals. However, if too positioned closer to its sister group, the supermatrix many conflicts are present in the datasets or the phyloge- approach was even more susceptible to HGT (Figure 2B). netic signal is too weak, the resulting consensus tree may Thepresenceof10to15%ofmisleadingsignalinthe notbeinformative,asitmaynotaccuratelyreflectthe concatenated dataset was sufficient to induce the recov- history of any of its constituent datasets [61]. This can be ery of the wrong topology in the majority of cases. In the illustrated using simple genome simulations involving a same situation, the quartet based supertree approach singlehighwayofgenesharingbetweentwounrelated failed in the presence of 35% or more of conflicting sig- lineages (Figure 1) where supertrees based on embedded nals. In contrast, when no gene transfers were simulated quartet decomposition outperformed gene concatena- and the amount of phylogenetic signal varied only tions (Figure 2). When genes were transferred to a between datasets, supermatrix approaches fared better in Williams et al. Biology Direct 2011, 6:45 Page 6 of 20 http://www.biology-direct.com/content/6/1/45

Figure 2 Comparison of supermatrix and supertree approaches for recovering the correct tree following horizontal genetic transfer. Horizontal genetic transfer was simulated between lineage B’ and F (Figure 1) with an internal branch of 0.05 (A) or 0.01 substitutions per site (B). The frequency with which the correct tree is recovered from supermatrix and supertree approaches from data that include increasing amounts of genes transferred along a single highway of gene sharing was tested. Each simulated genome contained a total of 100 genes, each 300 amino acids long. Genes were concatenated into a single sequence from each simulated genome for the supermatrix tree calculation or alternatively, gene trees were calculated individually from each gene for the supertree approach. The sequences were not realigned to avoid any additional artifact potentially introduced from alignment algorithms. Neighbor-joining trees were calculated with Kimura correction in ClustalW version 2.0.12 [114]. Maximum likelihood trees were calculated with PhyML V.3.0 [115] with Pinvar, JTT model and estimated gamma distribution under 4 categories. The embedded quartet trees [116] as well as the resulting plurality trees (supertree) were calculated from the individual gene family trees using Quartet Suite v.1.0 [117]. The simulations were repeated 100 times to measure the reproducibility of the different tree reconstruction methods in recovering the original tree topology. extracting the correct phylogenetic signal compared to Accounting for heterogeneous evolutionary processes supertrees (data not shown). The reconstruction of phylogenetic trees from biological These results indicate that when using sets of genes sequences relies on estimation of the evolutionary dis- that are known to be less frequently transferred, as may tance between the sequences of interest. This estimate is be case for ribosomal proteins, a supermatrix approach obtained from evolutionary models that describe the is preferable, whereas for datasets where cryptic high- probability of different nucleotide or amino acid substitu- ways of gene sharing may connect divergent organisms, tions [62]. Traditional evolutionary models are based on supertree approaches such as quartet decomposition a set of simplifying assumptions, and when these assump- may be more accurate. An additional source of error tions are violated by the dataset examined, incorrect trees caused by the stochastic way in which lineages sort dur- may be recovered [62,63]. In phylogenetic reconstruction ing can result in anomalous gene trees in on a RNoL scale, where a large degree of sequence diver- phylogenetic inference [59]. This can arise during peri- sity is included, these simplifying assumptions run an ods of rapid diversification where short edges are pre- even greater risk of violating observed biological realities sent in gene trees and is not mitigated by combining not explicitly described within the reconstruction model. more genes into a single analysis. Some of these challenges to evolutionary models are Williams et al. Biology Direct 2011, 6:45 Page 7 of 20 http://www.biology-direct.com/content/6/1/45

described below, along with the work being done to over- Improvements to evolutionary models are constantly come them. being made, and lead to improved ability to distinguish Extant lineages can substantially differ in base and phylogenetic information from noise. These new models amino acid composition, a phenomenon known as com- increase the number of parameters used to describe the positional heterogeneity [62,64]. In many cases, this is data, and this strategy is merited in many cases. However, driven by physiological adaptation to environments with it is important to recognize that adding unimportant distinct demands on protein physiochemistry (e.g.,ther- parameters decreases the power to draw conclusions mophily, halophily). Changes in nucleotide composition [64], and that not all datasets will be best described by of the genome (e.g., high or low G+C content) can also the same model. Including more parameters does not occur within specific lineages, indirectly affecting amino necessarily improve reconstruction - for example, evolu- acid composition. Models that assume compositional tionary models that use different parameters for each homogeneity (constant sequence composition through- branch of the tree are often outperformed by models that out the tree) tend to group lineages with similar compo- allow for only two different sets of parameters, one for sitions together, regardless their actual evolutionary each major clade on a tree [64,71]. As evolutionary mod- history, and produce high bootstrap values for these els are being developed and improved, it is important incorrect topologies [62]. A solution to the problem of that methods for selecting the best model for a dataset describing compositionally heterogeneous datasets is the also be explored [71], as has been done in some cases implementation of models that allow for different equili- [64], and developed for use by wider audiences. brium frequencies (parameters to describe sequence Other artifacts can also be present within reconstruc- composition) on different parts of the tree [62,64]. tions, independent of rate and composition model para- Another challenge for evolutionary models is heterota- meters.Longerbrancheswilltendtogrouptogether chy, the variability in evolutionary rate at a site on differ- regardless of their true relationships [72], a phenomenon ent branches of the tree [63]. Heterotachy can cause seen in the artifactual placement of microsporidia as a evolutionary models to group taxa on long branches deep branching eukaryotic lineage [73,74]. Periods of rapid together, affecting both maximum parsimony and maxi- diversification causing shorter branches will leave recon- mum likelihood methods [65], and producing incorrect struction vulnerable to the node-density effect where trees with high bootstrap support [63]. The deleterious branch lengths may be overestimated in areas of the tree effect of heterotachy on phylogenetic reconstruction can with more nodes [75]. Although balanced taxon sampling be mitigated by the use of probabilistic models with suffi- may mitigate some of these artifacts, the course of evolu- cient parameters to correctly describe this phenomenon tion is not obliged to supply phylogenetic distributions [63,65]. that are easily reconstructed across the whole Net of Life Most current evolutionary models are also ignorant of [73], thus the development of improved algorithms is an secondary and tertiary structure - that is, they assume that important area of research. substitutions at one site are completely independent of substitutions at another, an assumption that is violated by Acknowledging diversity within the Rooted Net of Life the sequence evolution of protein and ribozyme coding Biological evolution has manifested itself in an impress- genes (including ribosomal RNA). Models of nucleotide ivearrayofdiversity.Lifehistories among organisms substitution that weigh the rate of nonsynonymous vary widely with corresponding differences in population nucleotide substitutions by their effect on protein tertiary dynamics and modes of diversification (“speciation”), structure [66], or that estimate the variation in nonsynon- perhaps most significantly between unicellular and mul- ymous substitution rate in a sequence [67], are being ticellular lineages. These two groups differ greatly in developed. These models show promise, especially for the their propensity for horizontal genetic transfer with detection of positive selection, but remain computationally implications for the interpretation of gene tree conflicts. expensive and are outperformed at phylogenetic recon- For multicellular organisms with somatic cell lines, the struction by site-independent models [68]. Accounting for probability of horizontally transferred genetic material structural information is also known to improve RNA being copied into the progeny of the host is much lower alignments, especially in divergent sequences [69], and than for unicellular organisms. However, examples of models that account for secondary structure when per- the former do exist. As noted above, these are often forming phylogenetic reconstruction are under develop- transfers from a bacterial symbiont to the host genome. ment. These models improve phylogenetic trees in some Interpretation of gene trees conflicting with the back- situations [70], but produce incorrect results in some bone reference tree should thus be informed by life his- others [69]. Nevertheless, they show promise and deserve tories and other prior biological knowledge of the further investigation. lineages concerned: a conflicting topology among Williams et al. Biology Direct 2011, 6:45 Page 8 of 20 http://www.biology-direct.com/content/6/1/45

unicellular taxa is more likely to be due to HGT than is exclusively and explicitly represents a history of vertical a conflict among multicellular taxa where an alternative descent. For this reason, even if the initial reference hypothesis of differential gene loss or incomplete lineage topology is carefully chosen, a simple application of this sorting may be preferred. approach has a limited capacity to reflect a comprehen- When considering macroevolutionary relationships, sive evolutionary history of life. However, these conflicting topologies within closely related groups, approaches can be accommodated within the RNoL which are more likely even for ribosomal genes, will not model by removing assumptions equating the reference change the deeper relationships. Of 568 species of Bac- tree with vertical inheritance, and extending subsequent teria and Archaea represented in the NCBI Complete analyses to take more complex events into account, such Microbial Genomes database in late 2009 [76], 235 had as those previously described (e.g., endosymbioses, line- diversity among multiple 16S rRNA copies [77]. In the age-specific trends of HGT vs. duplication). In these majority of cases intragenomic sequence diversity is less models as in the RNoL, there will be an inevitable “thin- than that conventionally defined for interspecies diversity ning” of edges towards the root because of genetic losses [78]. Of the 2.5% of species with sequenced representa- (genes, plasmids, organelles etc). Assigning these losses tives that exceeded the interspecies limit [77]Thermoa- to HGT events or to lineages of vertical descent will not naerobacter tengcongensis with 6.7% diversity and certain be possible in regions of lower phylogenetic resolution lineages of Halobacteriales including Haloarcula carlsba- where there are ambiguities associated with HGT; but in dense [79] and Halomicrobium mukohataei JCM 9738(T) principle this model provides a retrodictive representa- [80] are of particular note. While resolution at deeper tion of biological evolution levels would be unaffected, there is sufficient divergence in this small minority potentially to cause resolution pro- Conclusion blems at the genus level. The use of a supermatrix As more genome sequence data have become available including ribosomal proteins, which are single copy and are analyzed, evolutionary biologists and philoso- genes [77], would mitigate this. Thus the use of riboso- phers have begun to question the legitimacy of the Tree mal sequences (protein and rRNA) as a scaffold of mostly of Life concept. Various analytical approaches for dealing vertical descent onto which a Rooted Net of Life can be with the newly inferred and distinctly non tree-like nat- inferred is not negated. However, the correlation between ure of organismal lineages have been presented with dif- scaffold and vertical inheritance is not inviolate, or essen- fering underlying assumptions with respect to the nature tial to the construction of such a Rooted Net: the transfer of the evolutionary process [28,58,86-88]. We have of an entire ribosome may be inferred by a topological described a Rooted Net of Life model of evolution, incongruity between the initial scaffold and a large accommodating the numerous examples of reticulated majority of the other gene phylogenies associated with histories, that is better able to describe the history of life that lineage. than the pervasive Tree of Life concept while retaining retrodictive power. Retrodiction is lost in some alterna- Reconciling gene histories tive propositions that phenetically cluster extant organ- Various approaches for obtaining a single supertree from isms by patterns of diversity left by the evolutionary several gene trees within the same set of genomes (some- process. The macromolecular sequences of the ribosome, times referred to as a “species tree” in the literature) have homologous in all cellular life, provide the information to been proposed [81-83]. As emphasized above, such reconstruct an initial scaffold of predominately, but not approaches are only appropriate for situations where necessarily, vertical descent. This averages over many HGT between divergent lineages is unlikely - either reticulations at lower taxonomic levels, and includes a because of the nature of the lineages considered (multi- few large-scale reticulations where the ribosomes in the cellular) or the nature of the sequences used (e.g., riboso- eukaryotic organelles are mapped to the same tips as mal). Rather than infer a new topology representing a those of the nucleocytoplasmic components. All other “species” tree, related algorithms have been developed by genetic sequences can then be recruited to combine with Beiko and Hamilton [84] and Lawrence and Alm [85] this ribosome-based scaffold to more fully depict and using a predetermined reference topology with similari- better define both the vertical and horizontal compo- ties to the model proposed here. In the latter, through a nents of the history of life. process called “reconciliation”,genetreetopologiesare chosen that both support the sequence data and mini- Reviewers’ comments mize a cost function determined by gene loss, gain and Reviewer 1: W. Ford Doolittle, Dalhousie University transfer relative to a reference phylogeny. Reticulations “Rooted Net of Life” might well be the right name for what representing HGT are therefore accommodated, although I suspect is currently the most popular way of thinking unlike the model proposed here, the initial topology about microbial phylogeny within the systematics and Williams et al. Biology Direct 2011, 6:45 Page 9 of 20 http://www.biology-direct.com/content/6/1/45

evolution community, and Williams et al. do a fine job of organisms (such as described in [94]) from the averaging. articulating this view as a model. Still, some critique seems This scaffold is not intended to tell us anything about the called for. central tendency of the genome or of the organism. If for First, one might object that there is a conflict with the part of the phylogeny the central tendency of the genome other paper from the Gogarten lab included in this spe- agrees with the central tendency of the ribosome, then there cial thematic series of Biology Direct. If gene transfer can is no indication for highways of gene sharing that are not be so biased as to assume responsibility for certain ami- biased by close relationship. If the two conflict, such as in noacyl tRNA synthetase tree topologies - which I take to case of the extreme thermophilic bacteria, we can conclude be the import of the Andam and Gogarten submission - that genes were transferred with a bias determined by then why do we not also assume that to be the case for other factors such as the ecological niche. We cannot distin- genes that do not so readily lend themselves to analysis guish a priori the transfer of the ribosome from a highway as do those homeoallelic exemplars? And why do we of gene-sharing through which the majority of genes were assume that “phylogenetic bias” so often trumps other transferred; however, increased taxon sampling may detect sorts of physiological, ecological or geographical biases? transfers spread out over time, as would be expected for a No doubt the Tree of Life, constructed by either super- transfer bias caused by a shared ecological niche, and matrix or supertree methods (which Willams et al. thereby allow us to discriminate this from a single event distinguish very nicely) tells us something about central leading to the formation of a chimera between two tendencies in prokaryotic evolution, but it is only the partners. “complexity hypothesis” that holds out some promise Trickle-down transfer vs shared ancestry: We cannot that the first of these methods might give us something exclude the possibility that an organism replaced its ribo- like the Tree of Cell Divisions. some, either though acquisition of a superoperon in a single Authors’ response: To avoid confusion, we briefly want transfer, or through many transfer events that are biased to summarize the interplay between HGT and our rooted not by close relationship (reflecting recent shared ancestry) Net of Life proposal. In light of the homeoallelic exemplars but through other factors, such as a shared ecological niche. and other evidence for biased gene transfer [89-91],we The ribosomal scaffold would place the recipient’sribosome indeed need to reconcile our proposal to the possibility of close to the donating lineage. In case frequent transfer and phylogenetically biased transfers. recombination events occur within a group, individuals Transfer of ribosomal components between close rela- within this group in the ribosomal scaffold will appear tives: Undoubtedly, highly conserved ribosomal components more related to one another, and organisms not participat- are frequently transferred between close relatives and fol- ing in the frequent within-group transfers may be left lowing transfer are integrated into the recipient’s genome. behind [22]. In either of these cases, the ribosomal scaffold At least for ribosomal RNAs, it was shown convincingly does not represent the tree of cells but only the history of the that a gene acquired through transfer does recombine with ribosome. In many instances it will be possible to further the homolog already present in the recipient (see discussion elucidate the history of the genome, as is exemplified by the in [22,92]and [93] for examples), thus turning the riboso- thermophilic bacteria [48,50], and this might allow further mal RNA into a mosaic. However, most of these transfers inference regarding a likely tree of cells. However, the rela- are indeed between close relatives and only become detect- tionship between organisms is not sufficiently described by able when many genomes of close relatives are analyzed. a single tree, and the RNoL provides a first step to elucidate The proposed ribosomal scaffold averages over these trans- the history. If the complexity hypothesis is true for the ribo- fers and subsequent recombination events. Consequently, somal components, the ribosomal scaffold may be similar the transfers between close relatives will only rarely affect to the tree of cell divisions. However, this is not a precondi- the relative placement of families and higher taxonomic tion to reconstruct the RNoL. Reconstructing the RNoL will units; however, the scaffold may be an unreliable reference identify those parts of life’s history where a single tree of cell for within family and within-genera phylogenies. divisions provides an incomplete narrative. Transfer of ribosomal components between divergent Reviewer 1 continued: Second, we might ask why the organisms: Screening individual ribosomal protein families microbial systematics and evolution community still feels for phylogenetic conflict, and assigning the sequences from that we need some single way of describing the relation- the recipient and its descendants to different data parti- ships of organisms and some singly historical “metanarra- tions, will avoid averaging over transfers between less tive” to undergird it. I’d guess our colleagues doing human related organisms. However, individual ribosomal proteins linguistic, cultural and social history would see this as an contain little phylogenetic information, and thus this screen unnecessarily simplistic and ultimately misleading aspira- will be unreliable for within-family transfers. The riboso- tion (see for instance [95]). Is it just our need to defend mal scaffold will tell us about the central tendency of the Darwinism from its politically powerful opponents that ribosome, after removing transfers between divergent causes us to cling to it? Williams et al. Biology Direct 2011, 6:45 Page 10 of 20 http://www.biology-direct.com/content/6/1/45

Authors’ response: This is a fascinating question. In lateral branches to this tree. The rNOL research program the context of this manuscript, we make the assumption really goes beyond the research program associated with that there is a single “true” sequence of events or organi- the TOL. The former nodes and edges are not directly zation of matter on the temporal and spatial biological comparable to the nodes and edges represented in the scale (i.e., Life on Earth). The goal of reconstructing the TOL. Therefore the nodes and edges of the rNOL and of resulting relationships between organisms is therefore to the TOL cannot really be interpreted alike. It would be recover a single, historical description - but any such misleading, therefore, and for the sake of convenience - a attempts are limited by the methods used and the data rhetorical trick - to describe the rNOL with the words available (which at present do impose limitations on the and notions designed to analyze the TOL. Tree-thinking confidence of historical events/relationships). should not be directly imported en bloc into rNOL- Indeed, this proposed Rooted Net of Life is intended as a thinking, as if not much was changing when the rNOL phylogeny of biological lineages that accounts for the hori- replaces the TOL to represent evolution. If the interest of zontal exchange of genetic material and is composed from evolutionists shifts from the TOL to the rNOL, some gene families found in sequenced genomes. It therefore has new concepts are needed to interpret the rNOL. This the same limitations as conventional phylogenetic com- fundamental aspect of the transition from a TOL to a parative methods (it requires accurate alignments for rNOL should be made much more explicit in this MS. I homologous comparisons, three or more tips for a rooted would like to suggest that the authors devote a short but reconstruction etc). We think a strength of this model is its entirely novel section to the issue of rNOL-thinking, that direct depiction of evolutionary events allowing historical shows that going from the TOL to the rNOL requires inferences rather than phenetic approaches (such as split- significant (and not just minor) conceptual adjustments. graphs representations or clustering genomes by genome Authors’ response: We agree that adoption of the content etc). which serve a different purpose in evolutionary RNoL concept requires conceptual adjustments. Change is biology. no longer gradual along a lineage, but often instantaneous due to HGT. Nodes no longer represent exclusively events Reviewer 2: Eric Bapteste, Université Pierre et Marie Curie of lineage divergence, but also the confluence of genetic Peter Gogarten and his team play a major role in the information. Most microbiologists do recognize the impor- debate on the Tree of Life (TOL). Therefore, their contri- tance of the processes that lead to reticulation, but only bution to this special issue on how to go beyond the TOL phylogeneticists have struggled to incorporate the diversity is of unquestionable importance. They propose the recon- of biological processes into their reconstruction of evolu- struction of a “rooted net of life” (rNOL) as a new reason- tionary history. Given that processes of reticulated evolu- able goal for phylogenomics. In many respect, this notion tion are the focus of much research in microbiology, we do seems sound: it is likely a research program that many not think it necessary do devote additional space in the phylogenomicists will be tempted to embrace. In particu- current manuscript to its discussion. lar, I entirely agree that organisms consist of many discrete Reviewer 2 continued: For example, the authors pro- evolutionary units, with multiple histories, a fact that is pose that each organism in a rNOL is represented by a lost with the TOL, and therefore the TOL is not sufficient single node and a single edge, unless the organism to capture true complexity of the evolution of life. It is changes. For them a node is a meeting place for a possi- also important to reckon that a universal evolutionary ble genetic melting pot: the organism lies where various schema must include reticulations, not merely as decora- units join in a collective obligate mutualism. This notion tion but as an intrinsic feature. of an organism is interesting, but is it the organismal Two major comments however. First, the rNOL is not notion associated with the TOL? I would say “no”. the only possible research path for evolutionists “beyond Authors’ response: By “terminal node” we mean to the TOL”. Second, if embraced, important conceptual clar- refer to the “tips” of the inferred gene and ribosome trees ifications are still required to interpret the rNOL, because from which the network will be constructed. All sequences it cannot be done merely with the concepts of the TOL. A at these tips are taken from sequenced genomes (that is well understood rNOL is not just a TOL plus some fancy all chromosomes and plasmids sequenced from a sampled lateral edges, it is not quite “phylogenetic business almost “organism”) and so members of different gene families can as usual”. be confidently associated with one another, at the tips, on Major comments that basis. This model is intended as a phylogeny as 1. The rNOL is not the TOL opposed to a more general clustering scheme based on This claim is crucial and should be made more signifi- evolutionary relationships. Internal nodes do, therefore, cant, because it has practical and conceptual implica- represent ancestral organisms inasmuch as the resolution tions. The move from a TOL to a rNOL is more than of the data allows. Gene family members lost from an just an extension of the TOL, through the addition of ancestral organism along a lineage cannot of course be Williams et al. Biology Direct 2011, 6:45 Page 11 of 20 http://www.biology-direct.com/content/6/1/45

represented via this comparative approach and thus inter- fact consist of many edges from the combined phylogenies nal edges and nodes can only be a partial representation of the gene families and ribosome. Tracing that vertical of the genome complement of an ancestral organism. edge either towards or away from the root will cross nodes (Further inferences of what could be missing from such an at which reticulations will leave or join it, so that all inferred ancestral genome complement could perhaps be genomic components of an ancestral organism for which made though). It would be permissible to take a single the phylogenetic comparative approach is suited will be ribosome as representative of a group of sequenced gen- represented, regardless of mobility. Notable omissions are omes (defined by ribosome gene sequence similarity) and discussed below. include the pan-genome of those organisms in the same Reviewer 2 continued: However, with such a definition, way. the organism itself changes each time a new genetic unit Reviewer 2 continued: Why does it matter? Because (i.e. one or several genes, or a symbiont) comes in or goes then the vertical backbone of the rNOL does not track out of the collective obligate mutualism. Therefore, in the organismal evolution. It tracks the evolution of the least rNOL every lateral connection in addition to the vertical mobile units of this collective obligate mutualism, or, if splits gives rise to a new organism. New names are needed one wishes, it captures the “(less mobile) background to describe these nodes, which do not exist on a tree. This organism”. in turn has an important consequence for another default Authors’ response: The reviewer makes an insightful notion of tree-thinking: the notion of (phylogenetic) spe- observation here and below. However, something we per- cies. Phylogeneticists cannot track species as easily on a haps failed to make clear in the original MS is that the rNOL as they were hoping to do on a TOL. What type of ribosomal tree-shaped scaffold need not represent the line “chunk of the rNOL” corresponds to a species cannot of vertical descent if the topologies of the other gene probably be decided without considering what biological families suggest otherwise. In fact, where there is insuffi- features the in-edges and out-edges provide or remove cient evidence to attribute any one set of internal edges to from the “background organism”.Inotherwords,not the line of vertical descent, we do not consider an agnostic every edge (and not all sets of nodes/not every node) cre- attitude to be a problem. But we do anticipate that many ates a new species. How is it decided what edge does and of the edges will be less ambiguous and assignable as what edge does not define a new species? We need names either representative of a horizontal genetic transfer or to distinguish these edges. (And this is without mentioning vertical genetic inheritance. The ribosomal scaffold serves the fact that sometimes “species” of interest lie in the very only as an initial, well resolved rooted phylogeny with mesh of the lateral edges, precisely when gene exchanges which other gene family phylogenies can be compared as are the defining criteria of an evolutionary unit one wishes a means of inferring a rooted net. The meaning of the to call a species rather than organisms with a conserved term “reconciliation” as most often used in the literature vertical core). As the rNOL would be a real opportunity to (in the context of a “species tree” and several “gene trees”) acknowledge the multiple processes at play in evolution, would be inappropriate here and so we agree the term this clarifying goal is also part of this new research pro- “species tree” is best avoided. Another reason to object to gram. It likely requires creating suitable concepts, rather the term “species” is the difficulty in applying the already than importing “good old notions” that worked (to some troublesome idea of a macrobial species to the microbial extent) soley for the vertical process (e.g. the tree of cell diversity of which most of the RNoL consists. division is not telling us where a species starts or ends, However, we would suggest the term “organismal lineage” etc.). Advocates of the rNOL should therefore refrain from is not such a problem. As the reviewer suggests for the calling the vertical part of the rNOL the “species tree” or RNoL model, the identity of the organism will change along the “organismal tree": species/organisms may not be asetof“vertical” edges as nodes due to reticulations are defined by vertical processes to start with. There are many crossed and genes are gained. This seems comparable to reasons to give a more accurate name to that likely impor- the accepted use of this term in a ToL model where the tant vertical backbone, while not conflating it with a “spe- conceptual identity of an organism could change along an cies tree”. I encourage the authors to rephrase their MS edge due to adaptation to a changing environment, or even accordingly, where necessary, and to replace “species tree” more abruptly before and after a bifurcating speciation or “organismal tree” or “TOL” by “vertical backbone” or event. by “tree of the least frequently transferred units” when We agree with the reviewer that these vertical edges, that is what they mean. Discriminating a vertical backbone where identified, are likely to capture more of the “(less in the net of life matters, and calling it the TOL may limit mobile) background organism"’, because of the difficulty of the deeper meaning of the rNOL enterprise. (Interested mapping with any certainty to map the more mobile readers can also refer to [96]). genetic elements to deeper edges. However, a vertical edge Authors’ response: We agree with the reviewer and midway between the root and tip of the RNoL would in have updated the manuscript accordingly. Williams et al. Biology Direct 2011, 6:45 Page 12 of 20 http://www.biology-direct.com/content/6/1/45

Reviewer 2 continued: 2. The rNOL presented here reconstruct life’s history, this reconstruction will often be is a rNOC, but is the rNOC inclusive enough to ambiguous and incomplete. For example, at present no describe evolution? algorithm exists that would allow the reconstruction of As it is described in the MS, the rNOL seems first con- ancient gene families that have left no extant descendants. cerned with the evolution of cells and that of cellular While a complete reconstruction of life’s phylogeny will genomes. Where are the plasmids and the viruses in the likely be impossible, we believe that the RNoL will provide rNOL? Is their evolution also modeled by it, and where? a more detailed and more accurate phylogeny than is pos- Or, unfortunately their evolution is not really repre- sible under the ToL paradigm. sented, meaning that the rNOL has room only for cellu- Reviewer 2 continued: Other research paths are also lar genomes and not all evolving elements with DNA possible beyond the TOL. genomes? It is unclear how the many plasmidic and viral This is not a major criticism, simply an observation: genomes (some of which are without homologues to cel- the evolutionary literature about what evolutionists lular genomes and to other plasmids and viruses), or could do if the TOL were no longer their default option even how ORFan genes, or all the sequences too diver- is a bit more heterogeneous than suggested in this MS. gent to be aligned and put in a tree, or the many environ- Some more literature could have been cited at places to mental genes, could fit in a single rNOL. Where do they put the rNOL solution retained by the authors in a lar- fit? The reference scaffold of the rNOL, based on riboso- ger scientific perspective. I can think of at least two very mal RNAs and proteins, seems largely to act as the refer- different options that were not discussed here, and I ence phylogeny of ribocells [97]. would like to encourage the authors to quote them Authors’ response: The limitations of the RNoL are the somewhere in the slightly revised version of their MS: same as those of the comparative methods that are used a) Pattern pluralism [58] that questions whether we need to construct it. True ORFans (i.e. open reading frames to replace a unique representation by another unique that have no detectable homolog in any other genome) representation. See also [98] that explicitly proposes to would not provide information on the topology but could model different evolutionary outcomes with different evo- be included in the model as tip metadata (quantified per lutionary patterns (one tree, one rNOL, disconnected gen- genome). Comparison of the tips, each being all sequence ome networks based on shared sequences, etc.). About data from a sampled organism or the pan-genome of a these latter genome networks, see all the refs in [99], and group of organisms with similar ribosome sequences, pro- the research program suggested in [100]. vides the internal topology. b) Analyses of phylogenetic forests [28,86-88]. Unrooted Thus the contents of a plasmid can be treated in the gene trees can be analyzed through various methods of same way as any other chromosomal gene: its position at tree-cutting, the most famous so far being the methods of the tips is defined by the other sequences sampled with it quartet decomposition that can inform us about evolution from an organism or group. We would expect to recognize without necessarily providing a grand rooted unified evo- reticulations leading from these gene trees closer to the lutionary scheme, or requiring the reduction to a single tips than is typically found for chromosomal genes. Proph- graph (tree-like or web-like). agesequencescanbeincorporatedinthesameway. I feel it is important to acknowledge that how to go Although tips are defined as organismal (pan)genomes, beyond the TOL is itself debated. viral genomes are not in principle excluded and the Authors’ response: We added and discussed some of reviewer makes a salient enquiry in this respect. The only the suggested citations in the revised manuscript and we limitation for inclusion is homology shared with enough expanded the discussion of the RNoL concept. However, for phylogeny reconstruction. the goal of this manuscript was to propose an approach Reviewer 2 continued: As such, the rNOL describes a that allows reconstructing evolutionary history. There are larger part of the history of life than the TOC (tree of many very useful approaches in comparative cells), yet it does not really describe the “full history of that allow identification of genomic islands, molecular life”.That’s why it is important to acknowledge that parasites, prophages and agents of gene transfer that are going beyond the TOL could be achieved by using addi- important in understanding microbial genetics and tional/alternative paths than the rNOL. mechanisms of molecular evolution. However, these have Authors’ response: In “The Rooted Net of Life” section only limited value for reconstructing the more ancient his- we say “evolutionary relationships of organisms are more tory of life. We already devoted a significant portion of fully described than in existing Tree of Life concepts”. This the manuscript to discuss consensus tree approaches and was the meaning intended in the conclusion but was mis- their limitations; however, we do not think it will improve communicated in error and the manuscript has been readability of the manuscript if we add a more detailed revised. The reviewer is correct in pointing out limitations discussion of other approaches that use phylogenetic infor- of the RNoL. While the RNoL provides an approach to mation retained in gene families to detect plurality and Williams et al. Biology Direct 2011, 6:45 Page 13 of 20 http://www.biology-direct.com/content/6/1/45

conflicting phylogenetic signals. We and others have co- resolution. Therefore, a stringent criterion is required for authored manuscripts on this question in the past the identification of clear conflicts, as poorly supported [101,102], and the interested reader is invited to consult conflicts within these trees reflect a very weak power of these and the manuscripts mentioned by the reviewer for detection for biological events. The manuscript has been further information on how to extract and use phyloge- changed to communicate more clearly communicate the netic information from genome data. objectives of the conflict detection, and to elaborate on the Reviewer 2 continued: details of the methodology. As is also now stated in the Minor comments manuscript, it is important to note that the RNoL metho- The authors indicate that “many, if not most of [the dology is initially agnostic about “transfers” since the genes] will be congruent across most of the tree”.Ido backbone reference tree is simply meant to be a cohesive not think we know that (most of the time this is not scaffold; gene phylogenies are reconciled to this scaffold, tested but assumed), and for the datasets that I tested I resulting in reticulations. Only once a robust, rooted did not observe this kind of agreement. Rather most of network of life is generated can something approximating the prokaryotic/viral/plasmidic genes are surprisingly a “vertical” signal be discerned (if even then), and then incongruent. We will hopefully have some data pub- reticulations with respect to this history be described as lished on that question in future works (Leigh et al.,in horizontal gene transfers. However, this being said, it is prep.), but the thousands upon thousands of microbial not surprising that a technique dedicated to detecting pos- trees I had the opportunity to view are in my opinion sible transfer events (instead of highly-supported conflicts more messy than suggested here. See also [103] for mul- among greater taxonomic categories), would find more tiple phylogenetic histories in E. coli strains. conflicts. Authors’ response: As is now better described in the As far as the comment referring to evidence within E. manuscript using more precise nomenclature, the objec- coli strains for multiple histories, while transfers between tive of testing for ribosomal congruence was to determine closely related groups may be universally occurring at to what extent the ribosomal proteins could be used as a high rates, mediated by homologous recombination rooted reference backbone tree upon which to map gene machinery acting on high sequence similarity, these kinds reticulations. To this end, we constructed phylogenies for of events are omitted by the resolution of our approach, as ribosomal proteins (both universal core proteins and they are not “interesting” from the perspective of deep evo- domain-specific proteins). Comparing highly supported lutionary questions, and may fundamentally differ in bifurcations between all sets of trees, we identified cases mechanism. where specific proteins were consistently in conflict with Reviewer 2 continued: The sentence “it is clear that others. As such, the particular sequences for those species [. . . ] a reference tree representing a history of predomi- in the conflicted area of the tree would not be included in nantly vertical descent is an essential scaffold for any the concatenation, in order to avoid fallacious signal such holistic effort” is certainly correct, but maybe not as averaging within the dataset. The vast majority of com- dramatically as evolutionists have long thought. First, parisons showed no highly supported conflicts, while 23 such a unique reference tree cannot be produced for all intra-order conflicts were identified within 10 groups evolving forms. Viruses and plasmids from isolated across three domains. As these groups tend to be highly genetic worlds (see [99]) can never branch in a single ver- similar to one another at the ribosomal sequence level, tical tree. More than one vertical tree would be required and do not challenge the relationships between larger to describe their history. If the number of viruses without phylogenetic categories that are of the most evolutionary direct connection to the cellular gene pool increases, this interest in a ToL/RNoL, these were preserved. Addition- genetic disconnection will increasingly become a pro- ally, three inter-order conflicts were detected, with Metha- blem. Second, the “organizing importance” of the histori- nosaeta thermophila L29 showing strong support for cal tree also largely depends on the (relative) lack of grouping with Methanomicrobiales, and Staphylococcus information regarding other possible organizing meta- aureus S19 and L5 showing strong support for grouping data: had we more knowledge on DNA vehicles and orga- with Lactobacilliales. No inter-domain conflicts were nismal lifestyles for instance, we might decide that detected. lifestyle is an essential scaffold for an holistic effort. It is important to note that this methodology was not Maybe it would be worth encouraging, along with the designed to detect horizontal transfers; rather, simple reconstruction of a rNOL, the development of additional well-supported conflicts that would violate the assump- organizing scaffolds for microbial evolution rather than tions necessary for a concatenated ribosomal dataset. to give this major role only to the history of vertical des- As many ribosomal protein sequences are very short, cent. Yes, history matters (we would not be evolutionists there is limited phylogenetic information per protein, and otherwise), but to what extent it is of “organizing impor- the resulting tree topologies reflect this in their lack of tance” is largely an empirical question : what proportion Williams et al. Biology Direct 2011, 6:45 Page 14 of 20 http://www.biology-direct.com/content/6/1/45

of the genetic characters are well explained based on the Reviewer 2 continued: “118 species": what species? vertical tree vs what proportion are well explained Please be precise: prokaryotes, eukaryotes? (although in different terms) using another interpretative Authors’ response: We sampled across available gen- framework [88]? In lineages with open pangenomes, life- omes of Bacteria, Archaea, and Eukaryotes to the Order style may matter more than vertical descent, at least at and Phylum level, respectively. some scale of the analysis. Open lineages [104] will also Reviewer 2 continued: The authors suggest that root- be a problem. ing the ribosomal tree of life should help by polarizing What the “biological meaning” is of the central (verti- the complex reticulations of the many gene trees cal) trend is a really good question, and should be treated mapped onto it. This seems optimistic: individual gene first like that: as a question, even though it may be tempt- phylogenies can be so messy (due to duplication, losses, ing to assume that the vertical trend has good explana- and recombinational lateral gene transfer in addition to tory power. Many evolutionists hope it does, but we do speciation) that even knowing how to root the riboso- not really know that. In the reconstruction of the rNOL, mal tree may not be that decisive for the polarization of it should be carefully tested to what extent the gene his- these gene trees. What can be done when there are mul- tories are (largely) disconnected from the vertical history. tiple copies of the same species? And why should we In other words, maybe the authors could add some root patchy gene trees, for instance trees with three bac- thoughts to the following issue: Should the methodologi- teria and one archaeon, between archaea and bacteria? cal approach to the rNOL be quite the same than the Such small trees are typical outcomes of lateral gene methodological approach to the TOL, or would not be transfers: rooting them according to the ribosomal tree additional and better congruence tests required to justify of life would hide these transfers by making us believe the vertical backbone? Can the goal of obtaining a rNOL that patchy gene families are ancestral gene families lost be a sufficient justification for combining sequences for everywhere but in these particular lineages. improved resolution (a classical approach well described Authors’ response: We agree that mapping a gene tree in the authors’ text) without testing the congruence of onto the ribosomal scaffold is a complex, non-trivial pro- these sequences? Should the assumption that there is a cess that needs to consider probabilities of gene duplica- real meaningful vertical history recorded in the genes tions, gene loss, and gene transfer. Certainly, mapping a used to build the background be tested? It seems that gene with sporadic disjoint distribution will need to incor- rNOL builders should not rely on aprioriassumptions porate gene transfer relative to the ribosomal scaffold. about the rate of HGT of genes, and that some tests are Furthermore, the comment on messiness is entirely correct. critical. The authors have convincingly argued that, In many instances multiple mappings are possible, espe- depending on the expected rate of HGT, supermatrices cially if extinct and unsampled lineages are taken into or supertrees should be preferred: what to do when we consideration. Especially for small gene families the dis- do not know the amount of HGT in our taxa, over time ? tinction between gene-transfer donor and recipient often The transition from TOL to rNOL is largely determined is not possible. The identification of donors and recipients by the fact that HGT may be major in some genomes is certainly probabilistic and not absolute. However, these and lineages, not the TOL. Thus, maybe a little section limitations not withstanding, the availability of a rooted entitled ‘Practical consequences of the TOL to rNOL reference tree greatly facilitates the integration between transition” could discuss this aspect in a few sentences? If gene and reference tree [84,85]. one wants to put his/her hopes in algorithmic develop- Reviewer 2 continued: “Themajorityofmolecular ment to improve tree reconstruction models, improved phylogenies rooted using ancient gene duplications . . .": models should account for lineages with different rates of Please remind the readers how many phylogenies did HGT (as the developments discussed in “Accounting for that amount to? heterogeneous evolutionary processes” clearly indicate). Authors’ response: The better resolved phylogenies Authors’ response: Many interesting points are raised with ancient gene duplications include the ATPase cata- here. With reference to the “organizing importance” of evo- lytic and noncatalytic subunits, several aminoacyl-tRNA lutionary events, the ToL has been used to apply a strictly synthetases, elongation factor proteins, dehydrogenases, hierarchical classification system to extant organisms. carbamoylphosphate synthetases, and the signal recogni- Although we are promoting the RNoL an improved alter- tion particle/ftsZ proteins. For details see [26]. native phylogeny, we are not promoting a specific means of Reviewer 2 continued: There are many more exam- classification based on it. We agree that any felling of a ples of bacterial HGT to eukaryotes (in algae, rotifers, ToL concept and its associated tree-thinking casts doubt cnidarian), . . . over the utility of a hierarchical classification system also Authors’ response: More examples have been added “rooted” in the same concept. to the manuscript Williams et al. Biology Direct 2011, 6:45 Page 15 of 20 http://www.biology-direct.com/content/6/1/45

Reviewer 2 continued: “more complex than a single steps in implementing the RNoL concept will be to recon- tree-like narrative": I agree entirely, and you could have cile the history of these co-evolving systems of well- quoted [58] on about that topic (and other things) resolved protein-coding genes with the ribosomal scaffold. Authors’ response: We broadly subscribe to process and “pattern pluralism”, specifically that different repre- Reviewer 3: Robert Beiko, Dalhousie University sentations of relationships will be appropriate for different In this paper, the authors describe a representation of purposes. We hope we have been more precise in commu- evolution they feel would be appropriate to capture both nicating that the rooted Net of Life is intended as a phylo- the vertical and important lateral phylogenetic signals of geny retaining the power of retrodiction where the gene trees. The model would use a tree based on a conca- resolution of reconstructed component gene trees allows. tenated ribosomal dataset as a “scaffold” over which Other (and we would say, less narrative) ways of depicting could be laid frequently observed conflicting signals à la relationships between extant organisms are certainly Thermotogae, Aquificae, Thermoplasmatales, etc. valuable as discussed in our response above. These The idea is certainly an attractive one, but the paper is approaches, such as an unrooted network with weighted quite short on detail and I’m not sure how this model edges defined by the proportion of homologous sequences will hold up in the face of the data. Specifically: shared between pairs of nodes representing genomes Ribosomal proteins clearly do tend to stick together in (Figure 1in [105]), and different approaches to extract interaction and evolutionary terms, but the statements and compare phylogenetic information retained in a set of about there being no LGT outside the order level in a genome [87,88,105-108]certainly depict evolutionary whole bunch of ribo-proteins very much conflicts with our information, but largely serve a different purpose. In addi- results and those of other groups. For example, the Aquifi- tion to the ribosome, other characteristics have been used cae have some ribosomal proteins that are shared exclu- to place organisms into a taxonomic framework, and, per- sively with Archaea, or have strongest affinities with them. haps surprisingly given what we have learned about gene Please elaborate on your unpublished results. Are they transfer, many of these approaches have resulted in simi- based on a somehow restricted subset of ribosomal pro- lar groups as the ribosomal rRNA [109]. There is value in teins? Did you use special reconstruction techniques (e.g., exploring different taxonomic classification schemes [110], correcting for compo or rate biases as alluded to later in but here we restrict ourselves to discussing a particular the manuscript)? Is the result based on concatenations, or phylogenetic framework, that at least initially will not comparisons of individual gene trees? impact current microbial taxonomic practice. Given that Authors’ response: SeeresponsetoReviewer2.Inthis the rooted Net of Life includes reticulations, it is not way, the concatenated ribosomal tree is only special in its intended as an explanandum for Darwin’s explanans robust consistent phylogenetic signal, which increases confi- [58]. dence in reconciliation topologies. While the resulting infer- Reviewer 2 continued: “if too many conflicts are pre- ences about vertical inheritance may very well map to this sent in the datasets or the phylogenetic signal is too ribosomal tree in many instances, this is not an a priori weak [. . . ] these artifacts”. Please add a few references assumption in our method, nor is it an assured outcome. after this sentence - there are many Reviewer 3 continued: There is a LOT of LGT, and Authors’ response: More references have been added considering all lateral relationships leads to the “hazes” of to the manuscript the Dagan/Martin papers. Of course these trees are pre- Reviewer 2 continued: I understand and appreciate sented in a way to maximize the visual impact of LGT, but why the authors prefer to use the ribosomal genes over an there is still the question of how an insane number of average tree to build the vertical backbone, yet as a plura- alternative relationships are going to be displayed on a listic thinker I would be happier if several rNOLs were reference backbone. Do you envision some kind of filter- reconstructed based on different vertical backbones (i.e. ing procedure by which infrequent avenues of gene shar- for different gene selections), so users could estimate how ing are suppressed? Would filtering be based on numbers important the choice of the vertical backbone may be (or of events relative to genome size? Would short-distance finally may not be) for future evolutionary conclusions. paths (e.g., within genera or named species) be suppressed Authors’ response: There is no other dataset that has since they are expected to occur for various mechanistic as strong a signal and as biologically valid justification as reasons ? the ribosome. Other backbones would likely represent How would the tree/network actually be inferred and more horizontal transfers between divergent organisms displayed? It is not a trivial matter to overlay a large set than the ribosomal backbone. However, there are a few of reticulations onto a tree. Galled networks and cluster systems, such as the multi-subunit V/A/F-ATPases [111] networksaimtodothis,buteventheyhaveconsider- that have good phylogenetic resolution over most of the able difficulty in capturing the complex relationships evolutionary history of cellular organisms. One of the first among a relatively small set of trees [112]. Williams et al. Biology Direct 2011, 6:45 Page 16 of 20 http://www.biology-direct.com/content/6/1/45

Authors’ response: These are excellent points. factors. While beyond the methodologies and scope of this Firstly, as we have now articulated better in the manu- manuscript, once a rNOL is constructed, a carefully devel- script, phylogenetically biased transfers occurring over oped set of such criteria could be used to evaluate reticula- “short” distances are averaged over so that sub-order tions, determining to what degree signals reflect vertical relationships with potentially high frequencies of genetic descent, artifacts, noise, highways of gene transfer, or other exchange are not explicitly depicted. patterns of inheritance. For now, while the choice of the On a broader scale, there may still be a sufficiently ribosome is arbitrary in the absence of initial assumptions high frequency of reticulations to demand special consid- of vertical vs. horizontal inheritance, it is deliberate in the eration when plotting. Effectively depicting a reticulated cohesive, robust signal it represents, which is necessary in a phylogeny covering all three domains in a static two scaffold. dimensional figure probably is not possible. A filtering Reviewer 3 continued: “The transfer of an entire ribo- procedure is a good idea, perhaps in the context of a some...” Wait, doesn’t this invalidate the whole model and computer based interactive graphical display so that contradict what you have been saying for the entire manu- levels of detail can be adjusted for clarity when viewing script? Many of the concatenated ribo analyses (e.g., Bous- a particular part of the model. A range of filtering cri- sau et al. 2008, which you cite) ultimately make some teria could be implemented including, where known, assertion that the ribosome is king, and that this signal is inferred function, distance over vertical edges, frequency the one that must be correct, even in the face of over- between certain lineages. Using a range of filtering cri- whelming evidence from other gene trees and systems. To teria could also be adapted to inferring the nature of continue beating the unicellular, hyperthermophilic Aqui- biases (including more frequent avenues) among certain fex horse, most molecular systems (e.g.,brokenoutby gene families and between certain lineages. COG category) favour Epsilonproteobacteria-Aquificae Reviewer 3 continued: “...the ToCD is only knowable linkages rather than the canonical, ribosomal Aquificae insofar that a vertical signal is preserved...” To this I +Thermotogae story. What would it take, then, to con- would add “and identifiable as such”. It very well may be vince someone that the ribosome really has been trans- that whatever extant set of organisms are the closest cel- ferred, and that Aquificae+Epsilonproteobacteria is “real"? lular sisters to the Aquificae do indeed share some phylo- Authors’ response: In the original abstract where we genetic affinities with them, but short of privileging said “predominantly vertical lines of descent ” and in the certain molecular systems such as the ribosome or cell introduction where we said “the mostly vertical evolu- wall synthesis, it is statistically very difficult to decide tionary descent of a coherent biological entity” with which of the phylogenetic affinities, none of which con- respect to the ribosome phylogeny scaffold, we were stitutes a majority of the overall signal, is the one to be anticipating that a ribosome would prove to be rarely pinned down as “sister” to the Aquificae. transferred for the reasons discussed below. We realize Authors’ response: We agree that it has not yet been this speculation may have been unhelpful and have proven beyond reasonable doubt that the Aquificales are made revisions emphasizing that vertical inheritance of not epsilonproteobacteria that picked up a ribosome from the ribosome need not be the rule. We also realize the an ancient lineage by HGT. The assumption that the ribo- sub-heading “The Reticulated Ribosomal Tree” was posi- some of the Aquficales and Thermotogales reflects their ver- tively misleading (reticulations are only labeled HGTs tical ancestry indeed reflectsbiasinconsideringthe given sufficient evidence) and apologize accordingly! Our phylogenetic import of particular molecular systems. We speculation that total ribosomal transfer is extremely note that this bias is not a prerequisite for reconstructing unlikely, was due to these reasons: the RNoL; however, it does influence the interpretation. There is no a priori reason why such bias is unreasonable 1. Several operons (of both protein and RNA) would or undesirable, provided it is not arbitrary; even in tradi- all have to be transferred, involving many many kilo- tional taxonomies, the usefulness of characters is evaluated bases of sequence and numerous independent events; based on their utility in defining groups, frequency of gain/ 2. Ribosomal components are highly expressed, and loss, or ease of identification. In the light of gene-based phy- for all these dozens of extra proteins and large RNAs, logenies and horizontal transfer, the problem therefore the cellular economy would provide strong selection appears to be that no quantitative, objective means yet against their successful transfer unless there was exists for weighing the often disparate phylogenetic signals some major advantage; inferred for different parts of the molecular machinery. It is 3. What major advantage could an entire transfer clear that different kinds of genes are transferred with dif- provide? Antibiotic resistance could be achieved by ferent frequencies between groups at varying taxonomic the transfer of single riboproteins, in most cases; levels, and that this is influenced by protein function, the 4. Having two functional ribosomes with so many structure of macromolecular systems, as well as other highly similar, but slightly different subunits floating Williams et al. Biology Direct 2011, 6:45 Page 17 of 20 http://www.biology-direct.com/content/6/1/45

around would likely poison both assembly processes Institute of Technology, Cambridge, MA 02139, USA. 3Biotechnology- and be extremely lethal; Bioservices Center, University of Connecticut, Storrs, CT 06269-3149, USA. 5. Since the native ribosome must be lost, and this Authors’ contributions can’t happen without the new one being replaced, JPG, GPF and DW conceived of the study. PL, GPF and CPA performed data analyses. JPG, DW, GPF, PL, KSS and AGG participated in its design and both must be expressed at the same time, but see (4); wrote the manuscript. All authors read and approved the final manuscript. 6. In the case that subunits are compatible enough to GPF and DW contributed equally. avoid toxicity, then one would expect more random Competing interests subunit loss resulting in a hybrid ribosome. This is The authors declare that they have no competing interests. not observed. Received: 7 February 2011 Accepted: 21 September 2011 Data that would convince us of a ribosomal transfer to Published: 21 September 2011 the ancestor of the Thermotogales or Aquificales would be References a strong coherent signal for many other genes placing a 1. Zuckerkandl E, Pauling L: Molecules as documents of evolutionary history. large part of the remainder of the genome at a single Journal of Theoretical Biology 1965, 8(2):357-66. 2. Woese CR, Kandler O, Wheelis ML: Towards a natural system of organisms point, e.g., a finding that the majority of genes in the - proposal for the domains archaea, bacteria, and eucarya. Proceedings of Thermotogales appear specifically related to the Thermo- the National Academy of Sciences of the United States of America 1990, anaerobacter lineage would support these as a possible 87(12):4576-4579. sistergroup to the Thermotogales in a tree of cell division. 3. Wu DY, Hugenholtz P, Mavromatis K, Pukall R, Dalin E, Ivanova NN, Kunin V, Goodwin L, Wu M, Tindall BJ, Hooper SD, Pati A, Lykidis A, Spring S, However, this is not what we observe. If the ribosome were Anderson IJ, D’haeseleer P, Zemla A, Singer M, Lapidus A, Nolan M, transferred in a trickle down fashion (see above) then dif- Copeland A, Han C, Chen F, Cheng JF, Lucas S, Kerfeld C, Lang E, Gronow S, Chain P, Bruce D, Rubin EM, Kyrpides NC, Klenk HP, Eisen JA: A ferent signals for different ribosomal components might be phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. detected. Our preliminary data suggest the opposite, that Nature 2009, 462(7276):1056-1060. genes from clostridia and archaea appear to be continu- 4. Dagan T, Artzy-Randrup Y, Martin W: Modular networks and cumulative ously acquired in the different lineages of the Thermoto- impact of lateral transfer in prokaryote genome evolution. Proceedings of the National Academy of Sciences of the United States of America 2008, gales. In contrast, the ribosomal components contain a 105(29):10039-10044. weak but consistent signal that is reinforced as more ribo- 5. Doolittle WF: Phylogenetic classification and the universal tree. Science somal components are added to the analysis. 1999, 284(5423):2124-2128. 6. Hilario E, Gogarten J: Horizontal transfer of ATPase genes - the tree of life Reviewer 3 continued: Aself-servingcomment:our becomes a net of life. BioSystems 1993, 31(2-3):111-119. 2008 paper in Systematic Biology [61] dealt extensively 7. Keeling PJ: Diversity and evolutionary history of plastids and their hosts. American Journal of Botany 2004, 91(10):1481-1493. with the averaging of phylogenetic signals that goes on 8. Gogarten J, Townsend J: Horizontal gene transfer, genome innovation in genome phylogeny analysis; it may be worth citing in and evolution. Nature Reviews Microbiology 2005, 3(9):679-687. the discussion of phylogenetic signal averaging, since it 9. Ochman H, Lawrence J, Grolsman E: Lateral gene transfer and the nature demonstrates that the robustness of inference is highly of bacterial innovation. Nature 2000, 405(6784):299-304. 10. Beiko R, Harlow T, Ragan M: Highways of gene sharing in prokaryotes. dependent on both the rate and regime of LGT. Proceedings of the National Academy of Sciences of the United States of Authors’ response: We added this citation to the America 2005, 102(40):14332-14337. discussion 11. Gribaldo S, Brochier C: Phylogeny of prokaryotes: does it exist and why should we care? Research In Microbiology 2009, 160(7):513-521. Reviewer 3 continued: Finally, a grammatical com- 12. Valas RE, Bourne PE: Save the tree of life or get lost in the woods. Biology ment: Compound adjectives must be hyphenated, e.g. Direct 2010, 5. “genome-wide analyses” and elsewhere. 13. Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P: Toward automatic reconstruction of a highly resolved tree of life. Science 2006, Italicize “Methanosarcina mazei”. 311(5765):1283-1287. Authors’ response: We changed the text as suggested. 14. Simonson AB, Servin JA, Skophammer RG, Herbold CW, Rivera MC, Lake JA: Decoding the genomic tree of life. Proceedings of the National Academy of Sciences of the United States of America 2005, 102:6608-6613. 15. Wostmann BS, Larkin C, Moriarty A, Brucknerkardoss E: Dietary-intake, List of abbreviations energy-metabolism, and excretory losses of adult male germfree Wistar RNoL: rooted net of life; HGT: horizontal genetic transfer; ToCD: tree of rats. Laboratory Animal Science 1983, 33:46-50. cellular divisions; ToL: tree of life. 16. Cavalier-Smith T: Rooting the tree of life by transition analyses. Biology Direct 2006, 1. Acknowledgements and Funding 17. Delsuc F, Brinkmann H, Philippe H: Phylogenomics and the reconstruction This work was supported by National Science Foundation Assembling the of the tree of life. Nature Reviews Genetics 2005, 6(5):361-375. Tree of Life (DEB 0830024) and NASA exobiology (NNX08AQ10G, 18. Bininda-Emonds ORP: The evolution of supertrees. Trends In Ecology & NNX07AK15G) programs. GPF is recipient of a NASA Postdoctoral Fellowship Evolution 2004, 19(6):315-322. at the Massachusetts Institute of Technology. The Leverhulme Trust funded 19. Fournier GP, Gogarten JP: Rooting the ribosomal tree of life. Molecular the publication charges as part of the project “Questioning the Tree of Life”. Biology and Evolution 2010, 27(8):1792-1801. 20. Wilkinson M, Cotton JA, Lapointe FJ, Pisani D: Properties of supertree Author details methods in the consensus setting. Systematic Biology 2007, 56(2):330-337. 1Department of Molecular and Cell Biology, University of Connecticut, Storrs, 21. Dagan T, Martin W: The tree of one percent. Genome Biology 2006, 7(10). CT 06269-3125, USA. 2Department of Biological Engineering, Massachusetts Williams et al. Biology Direct 2011, 6:45 Page 18 of 20 http://www.biology-direct.com/content/6/1/45

22. Gogarten J, Doolittle W, Lawrence J: Prokaryotic evolution in light of gene 43. McCarter JP: Nematology: terra incognita no more. Nature Biotechnology transfer. Molecular Biology and Evolution 2002, 19(12):2226-2238. 2008, 26(8):882-884. 23. Gogarten JP: The early evolution of cellular life. Trends In Ecology & 44. Mitreva M, Smant G, Heider J: Role of horizontal gene transfer in the Evolution 1995, 10(4):147-151. evolution of plant parasitism among nematodes. Horizontal Gene Transfer: 24. Swithers KS, Gogarten JP, Fournier GP: Trees in the web of life. Journal of Genomes in Flux 2009, 517-535. Biology 2009, 8(6):54. 45. Dieterich C, Sommer RJ: How to become a parasite - lessons from the 25. Sorek R, Zhu YW, Creevey CJ, Francino MP, Bork P, Rubin EM: Genome- genomes of nematodes. Trends In Genetics 2009, 25(5):203-209. wide experimental determination of barriers to horizontal gene transfer. 46. Kondo N, Nikoh N, Ijichi N, Shimada M, Fukatsu T: Genome fragment of Science 2007, 318:1449-1452. Wolbachia endosymbiont transferred to X chromosome of host insect. 26. Zhaxybayeva O, Lapierre P, Gogarten JP: Ancient gene duplications and Proceedings of the National Academy of Sciences of the United States of the root(s) of the tree of life. Protoplasma 2005, 227:53-64. America 2002, 99(22):14280-14285. 27. Dagan T, Roettger M, Bryant D, Martin W: Genome Networks Root the 47. Klasson L, Kambris Z, Cook PE, Walker T, Sinkins SP: Horizontal gene Tree of Life between Prokaryotic Domains. Genome Biology and Evolution transfer between Wolbachia and the mosquito Aedes aegypti. BMC 2010, 2:379-392. Genomics 2009, 10. 28. Puigbo P, Wolf YI, Koonin EV: The tree and net components of prokaryote 48. Zhaxybayeva O, Swithers KS, Lapierre P, Fournier GP, Bickhart DM, evolution. Genome Biol Evol 2010, 2:745-56. Deboy RT, Nelson KE, Nesbo CL, Doolittle WF, Gogarten JP, Noll KM: On the 29. Wolf YI, Aravind L, Grishin NV, Koonin EV: Evolution of aminoacyl-tRNA chimeric nature, thermophilic origin, and phylogenetic placement of the synthetases - analysis of unique domain architectures and phylogenetic Thermotogales. Proceedings of the National Academy of Sciences of the trees reveals a complex history of horizontal gene transfer events. United States of America 2009, 106(14):5865-5870. Genome Research 1999, 9(8):689-710. 49. Gophna U, Doolittle WF, Charlebois RL: Weighted genome trees: 30. Fournier G, Gogarten J: Evolution of acetoclastic methanogenesis in refinements and applications. Journal of Bacteriology 2005, Methanosarcina via horizontal gene transfer from cellulolytic Clostridia. 187(4):1305-1316. Journal of Bacteriology 2008, 190(3):1124-1127. 50. Boussau B, Gueguen L, Gouy M: Accounting for horizontal gene transfers 31. Hall C, Brachat S, Dietrich FS: Contribution of horizontal gene transfer to explains conflicting hypotheses regarding the position of aquificales in the evolution of Saccharomyces cerevisiae. Eukaryotic Cell 2005, the phylogeny of Bacteria. BMC Evolutionary Biology 2008, 8. 4(6):1102-1115. 51. Angelov A, Liebl W: Insights into extreme thermoacidophily based on 32. Fitzpatrick DA, Logue ME, Butler G: Evidence of recent interkingdom genome analysis of Picrophilus torridus and other thermoacidophilic horizontal gene transfer between bacteria and Candida parapsilosis. BMC archaea. Journal of Biotechnology 2006, 126:3-10. Evolutionary Biology 2008, 8. 52. Frickey T, Lupas AN: PhyloGenie: automated phylome generation and 33. Garcia-Vallve S, Romeu A, Palau J: Horizontal gene transfer of glycosyl analysis. Nucleic Acids Research 2004, 32(17):5231-5238. hydrolases of the rumen fungi. Molecular Biology and Evolution 2000, 53. Futterer O, Angelov A, Liesegang H, Gottschalk G, Schleper C, Schepers B, 17(3):352-361. Dock C, Antranikian G, Liebl W: Genome sequence of Picrophilus torridus 34. Schmitt I, Lumbsch HT: Ancient Horizontal Gene Transfer from Bacteria and its implications for life around pH 0. Proceedings of the National Enhances Biosynthetic Capabilities of Fungi. PLoS One 2009, 4(2). Academy of Sciences of the United States of America 2004, 35. Marcet-Houben M, Gabaldon T: Acquisition of prokaryotic genes by 101(24):9091-9096. fungal genomes. Trends In Genetics 2010, 26:5-8. 54. Deppenmeier U, Johann A, Hartsch T, Merkl R, Schmitz R, Martinez-Arias R, 36. Strope PK, Nickerson KW, Harris SD, Moriyama EN: Molecular evolution of Henne A, Wiezer A, Bäumer S, Jacobi C, Brüggemann H, Lienard T, urea amidolyase and urea carboxylase in fungi. BMC Evolutionary Biology Christmann A, Bameke M, Steckel S, Bhattacharyya A, Lykidis A, Overbeek R, 2011, 11. Klenk HP, Gunsalus R, Fritz HJ, Gottschalk G: The genome of 37. Denoeud F, Roussel M, Noel B, Wawrzyniak I, Da Silva C, Diogon M, Methanosarcina mazei: Evidence for lateral gene transfer between Viscogliosi E, Brochier-Armanet C, Couloux A, Poulain J, Segurens B, bacteria and archaea. Journal of Molecular Microbiology and Biotechnology Anthouard V, Texier C, Blot N, Poirier P, Ng GC, Tan K, Artiguenave F, 2002, 4(4):453-461. Jaillon O, Aury JM, Delbac F, Wincker P, Vivares C, El Alaoui H: Genome 55. Goksøyr J: Evolution of eucaryotic cells. Nature 1967, 214:1161. sequence of the stramenopile Blastocystis, a human anaerobic parasite. 56. Martin W, Brinkmann H, Savonna C, Cerff R: Evidence for a chimeric nature Genome Biology 2011, 12(3):R29. of nuclear genomes - eubacterial origin of eukaryotic glyceraldehyde-3- 38. Bowler C, Allen AE, Badger JH, Grimwood J, Jabbari K, Kuo A, Maheswari U, phosphate dehydrogenase genes. Proceedings of the National Academy of Martens C, Maumus F, Otillar RP, Rayko E, Salamov A, Vandepoele K, Sciences of the United States of America 1993, 90(18):8692-8696. Beszteri B, Gruber A, Heijde M, Katinka M, Mock T, Valentin K, Verret F, 57. Jarvis P, Soll M: Toc, Tic, and chloroplast protein import. Biochimica Et Berges JA, Brownlee C, Cadoret JP, Chiovitti A, Choi CJ, Coesel S, De Biophysica Acta-molecular Cell Research 2001, 1541(1-2):64-79. Martino A, Detter JC, Durkin C, Falciatore A, Fournet J, Haruta M, 58. Doolittle WF, Bapteste E: Pattern pluralism and the Tree of Life Huysman MJJ, Jenkins BD, Jiroutova K, Jorgensen RE, Joubert Y, Kaplan A, hypothesis. Proceedings of the National Academy of Sciences of the United Kroger N, Kroth PG, La Roche J, Lindquist E, Lommer M, Martin-Jezequel V, States of America 2007, 104(7):2043-2049. Lopez PJ, Lucas S, Mangogna M, McGinnis K, Medlin LK, Montsant A, 59. Degnan J, Rosenberg N: Discordance of species trees with their most Oudot-Le Secq MP, Napoli C, Obornik M, Parker MS, Petit JL, Porcel BM, likely gene trees. PLoS Genetics 2006, 2(5):762-768. Poulsen N, Robison M, Rychlewski L, Rynearson TA, Schmutz J, Shapiro H, 60. Rodriguez-Ezpeleta N, Brinkmann H, Roure B, Lartillot N, Lang BF, Siaut M, Stanley M, Sussman MR, Taylor AR, Vardi A, von Dassow P, Philippe H: Detecting and overcoming systematic errors in genome-scale Vyverman W, Willis A, Wyrwicz LS, Rokhsar DS, Weissenbach J, Armbrust EV, phylogenies. Systematic Biology 2007, 56(3):389-399. Green BR, Van De Peer Y, Grigoriev IV: The Phaeodactylum genome reveals 61. Beiko RG, Doolittle WF, Charlebois RL: The Impact of Reticulate Evolution the evolutionary history of diatom genomes. Nature 2008, on Genome Phylogeny. Systematic Biology 2008, 57(6):844-856. 456(7219):239-244. 62. Galtier N, Gouy M: Inferring phylogenies from DNA-sequences of unequal 39. Gladyshev EA, Meselson M, Arkhipova IR: Massive horizontal gene transfer base compositions. Proceedings of the National Academy of Sciences of the in bdelloid rotifers. Science 2008, 320(5880):1210-1213. United States of America 1995, 92(24):11317-11321. 40. Denker E, Bapteste E, Le Guyader H, Manuel M, Rabet N: Horizontal gene 63. Kolaczkowski B, Thornton JW: Performance of maximum parsimony and transfer and the evolution of cnidarian stinging cells. Current Biology likelihood when evolution is heterogeneous. Nature 2004, 2008, 18(18):R858-R859. 431(7011):980-984. 41. Candela T, Fouet A: Poly-gamma-glutamate in bacteria. Molecular 64. Foster PG: Modeling compositional heterogeneity. Systematic Biology 2004, Microbiology 2006, 60(5):1091-1098. 53(3):485-495. 42. Mayer W, Schuster L, Bartelmes G, Dieterich C, Sommer R: Horizontal gene 65. Philippe H, Zhou Y, Brinkmann H, Rodrigue N, Delsuc F: Heterotachy and transfer of microbial cellulases into nematode genomes is associated long-branch attraction in phylogenetics. BMC Evolutionary Biology 2005, 5. with functional assimilation and gene turnover. BMC Evolutionary Biology 2011, 11:31. Williams et al. Biology Direct 2011, 6:45 Page 19 of 20 http://www.biology-direct.com/content/6/1/45

66. Robinson DM, Jones DT, Kishino H, Goldman N, Thorne JL: Protein 90. Andam CP, Williams D, Gogarten JP: Biased gene transfer mimics patterns evolution with dependence among codons due to tertiary structure. created through shared ancestry. Proceedings of the National Academy of Molecular Biology and Evolution 2003, 20(10):1692-1704. Sciences of the United States of America 2010, 107(23):10679-10684. 67. Huelsenbeck JP, Jain S, Frost SWD, Pond SLK: A Dirichlet process model 91. Popa O, Hazkani-Covo E, Landan G, Martin W, Dagan T: Directed networks for detecting positive selection in protein-coding DNA sequences. reveal genomic barriers and DNA repair bypasses to lateral gene Proceedings of the National Academy of Sciences of the United States of transfer among prokaryotes. Genome Research 2011, 21(4):599-609. America 2006, 103(16):6263-6268. 92. Morandi A, Zhaxybayeva O, Gogarten JP, Graf J: Evolutionary and 68. Rodrigue N, Kleinman C, Philippe H, Lartillot N: Computational methods diagnostic implications of intragenomic heterogeneity in the 16S rRNA for evaluating phylogenetic models of coding sequence evolution with gene in Aeromonas strains. Journal of Bacteriology 2005, 187(18):6561-6564. dependence between codons. Molecular Biology and Evolution 2009, 93. Silver AC, Williams D, Faucher J, Horneman AJ, Gogarten JP, Graf J: 26(7):1663-1676. Complex Evolutionary History of the Aeromonas veronii Group Revealed 69. Letsch HO, Kuck P, Stocsits RR, Misof B: The impact of rRNA secondary by Host Interaction and DNA Sequence Data. PLoS One 2011, 6(2). structure consideration in alignment and tree reconstruction: simulated 94. Brochier C, Bapteste E, Moreira D, Philippe H: Eubacterial phylogeny based data and a case study on the phylogeny of hexapods. Molecular Biology on translational apparatus proteins. Trends In Genetics 2002, 18:1-5. and Evolution 2010, 27(11):2507-2521. 95. Gray RD, Bryant D, Greenhill SJ: On the shape and fabric of human 70. Erpenbeck D, Nichols SA, Voigt O, Dohrmann M, Degnan BM, Hooper JNA, history. Philosophical Transactions of the Royal Society B: Biological Sciences Worheide G: Phylogenetic analyses under secondary structure-specific 2010, 365(1559):3923-3933[http://rstb.royalsocietypublishing.org/content/ substitution models outperform traditional approaches: case studies 365/1559/3923.abstract]. with diploblast LSU. Journal of Molecular Evolution 2007, 64(5):543-557. 96. Bapteste B, Burian : Evolutionary Genomics: Statistical and Computational 71. Dutheil J, Boussau B: Non-homogeneous models of sequence evolution Methods Humana press; 2011, chap. Philosophy and evolution: minding the in the Bio++ suite of libraries and programs. BMC Evolutionary Biology gap between Evolutionary Patterns and Tree-like Patterns. 2008, 8. 97. Forterre P: Manipulation of cellular syntheses and the nature of viruses: 72. Felsenstein J: Cases in which parsimony or compatibility methods will be The virocell concept. Comptes Rendus Chimie 2011, 14(4):392-399[http:// positively mislead. Systematic Zoology 1978, 27:401-410. www.sciencedirect.com/science/article/pii/S1631074810001724]. 73. Inagaki Y, Susko E, Fast N, Roger A: Covarion shifts cause a long-branch 98. Bapteste E, Burian R: On the need for integrative phylogenomics, and attraction artifact that unites microsporidia and archaebacteria in EF-1α some steps toward its creation. Biology and Philosophy 2010, 25(4):711-736 phylogenies. Molecular Biology and Evolution 2004, 21(7):1340-1349. [http://dx.doi.org/10.1007/s10539-010-9218-2]. 74. Keeling PJ, McFadden GI: Origins of microsporidia. Trends In Microbiology 99. Halary S, Leigh J, Cheaib B, Lopez P, Bapteste E: Network analyses 1998, 6:19-23. structure genetic diversity in independent genetic worlds. Proceedings of 75. Webster A, Payne R, Pagel M: Molecular phylogenies link rates of the National Academy of Sciences 2010, 107:127-132[http://www.pnas.org/ evolution and speciation. Science 2003, 301(5632):478. content/107/1/127.abstract]. 76. NCBI Complete Microbial Genomes. [http://www.ncbi.nlm.nih.gov/ 100. Skippington E, Ragan MA: Lateral genetic transfer and the construction of genomes/MICROBES/Complete.html]. genetic exchange communities. FEMS Microbiology Reviews 2011, 77. Pei AY, Oberdorf WE, Nossa CW, Agarwal A, Chokshi P, Gerz EA, Jin ZD, 35(5):707-735[http://dx.doi.org/10.1111/j.1574-6976.2010.00261.x]. Lee P, Yang LY, Poles M, Brown SM, Sotero S, DeSantis T, Brodie E, 101. Xu Y, Gogarten J: Computational Methods for Understanding Bacterial Nelson K, Pei ZH: Diversity of 16S rRNA genes within individual and Archaeal Genomes. In Series on Advances in and prokaryotic genomes. Applied and Environmental Microbiology 2010, . Volume 7. Imperial College Press; London; 2008. 76(12):3886-3897. 102. Zhaxybayeva O: Detection and quantitative assessment of horizontal 78. Stackebrandt E, Goebel BM: A place for DNA-DNA reassociation and 16S gene transfer. Methods in Molecular Biology 2009, 532:195-213. ribosomal-RNA sequence analysis in the present species definition in 103. Touchon M, Hoede C, Tenaillon O, Barbe V, Baeriswyl S, Bidet P, Bingen E, bacteriology. International Journal of Systematic Bacteriology 1994, Bonacorsi S, Bouchier C, Bouvet O, Calteau A, Chiapello H, Clermont O, 44(4):846-849. Cruveiller S, Danchin A, Diard M, Dossat C, El Karoui M, Frapy E, Garry L, 79. Boucher Y, Douady C, Sharma A, Kamekura M, Doolittle W: Intragenomic Ghigo J, Gilles A, Johnson J, Le Bouguénec C, Lescat M, Mangenot S, heterogeneity and intergenomic recombination among haloarchaeal Martinez-Jéhanne V, Matic I, Nassif X, Oztas S, Petit M, Pichon C, Rouy Z, rRNA genes. Journal of Bacteriology 2004, 186(12):3980-3990. Ruf C, Schneider D, Tourret J, Vacherie B, Vallenet D, Médigue C, Rocha E, 80. Cui HL, Zhou PJ, Oren A, Liu SJ: Intraspecific polymorphism of 16S rRNA Denamur E: Organised genome dynamics in the Escherichia coli species genes in two halophilic archaeal genera, Haloarcula and Halomicrobium. results in highly diverse adaptive paths. PLoS Genetics 2009, 5: [http:// Extremophiles 2009, 13:31-37. www.scopus.com/scopus/inward/record.url?eid=2-s2.0- 81. Ané C, Larget B, Baum D, Smith S, Rokas A: Bayesian estimation of 59249089471&partnerID=40]. concordance among gene trees. Molecular Biology and Evolution 2007, 104. Boucher Y, Bapteste E: Revisiting the concept of lineage in prokaryotes: a 24(2):412-426. phylogenetic perspective. Bioessays 2009, 31(5):526-536[http://dx.doi.org/ 82. Kubatko L, Carstens B, Knowles L: STEM: species tree estimation using 10.1002/bies.200800216]. maximum likelihood for gene trees under coalescence. Bioinformatics 105. Zhaxybayeva O, Gogarten JP: Bootstrap, Bayesian probability and 2009, 25(7):971-973. maximum likelihood mapping: exploring new tools for comparative 83. Bansal MS, Burleigh JG, Eulenstein O: Efficient genome-scale phylogenetic genome analyses. BMC Genomics 2002, 3. analysis under the duplication-loss and deep coalescence cost models. 106. Zhaxybayeva O, Lapierre P, Gogarten JP: Genome mosaicism and BMC Bioinformatics 2010, 11. organismal lineages. Trends In Genetics 2004, 20(5):254-260. 84. Beiko RG, Hamilton N: Phylogenetic identification of lateral genetic 107. Hamel L, Zhaxybayeva O, Gogarten JP: PentaPlot: A software tool for the transfer events. BMC Evolutionary Biology 2006, 6. illustration of genome mosaicism. BMC Bioinformatics 2005, 6. 85. Lawrence AD, Alm EJ: Rapid evolutionary innovation during an Archean 108. Zhaxybayeva O, Gogarten J, Charlebois R, Doolittle W, Papke R: genetic expansion. Nature 2011, 469:93-96. Phylogenetic analyses of cyanobacterial genomes: Quantification of 86. Puigbo P, Wolf YI, Koonin EV: Search for a ‘Tree of Life’ in the thicket of horizontal gene transfer events. Genome Research 2006, 16(9):1099-1108. the phylogenetic forest. Journal of Biology 2009, 8(6):59. 109. Woese CR: Bacterial Evolution. Microbiological Reviews 1987, 51(2):221-271. 87. Lapointe FJ, Lopez P, Boucher Y, Koenig J, Bapteste E: Clanistics: a multi- 110. Bapteste E, Boucher Y: Lateral gene transfer challenges principles of level perspective for harvesting unrooted gene trees. Trends In microbial systematics. Trends In Microbiology 2008, 16(5):200-207. Microbiology 2010, 18(8):341-347. 111. Gogarten JP, Starke T, Kibak H, Fishmann J, Taiz L: Evolution and Isoforms 88. Schliep K, Lopez P, Lapointe FJ, Bapteste E: Harvesting Evolutionary of V-ATPase Subunits. Journal of Experimental Biology 1992, 172:137-147. Signals in a Forest of Prokaryotic Gene Trees. Molecular Biology and 112. Beiko R: Telling the Whole Story in a 10,000-Genome World. Biology Evolution 2011, 28(4):1393-1405. Direct 2011, 6:34[http://www.biology-direct.com/content/6/1/34]. 89. Hooper SD, Mavromatis K, Kyrpides NC: Microbial co-habitation and lateral 113. Yang Z: PAML 4: Phylogenetic analysis by maximum likelihood. Molecular gene transfer: what transposases can tell us. Genome Biology 2009, 10(4). Biology and Evolution 2007, 24(8):1586-1591. Williams et al. Biology Direct 2011, 6:45 Page 20 of 20 http://www.biology-direct.com/content/6/1/45

114. Larkin M, Blackshields G, Brown N, Chenna R, McGettigan P, McWilliam H, Valentin F, Wallace I, Wilm A, Lopez R, Thompson J, Gibson T, Higgins D: Clustal W and Clustal X version 2.0. Bioinformatics 2007, 23(21):2947-2948. 115. Guindon S, Dufayard J, Lefort V, Anisimova M, Hordijk W, O G: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Systematic Biology 2010, 59:307-21. 116. Zhaxybayeva O, Gogarten JP: An improved probability mapping approach to assess genome mosaicism. BMC Genomics 2003, 4. 117. Piaggio-Talice R: Quartet suite v1.0 (supertrees by quartets). 2004 [http:// genome.cs.iastate.edu/CBL/download/].

doi:10.1186/1745-6150-6-45 Cite this article as: Williams et al.: A Rooted Net of Life. Biology Direct 2011 6:45.

Submit your next manuscript to BioMed Central and take full advantage of:

• Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit