From Phylogenetics to Phylogenomics: the Evolutionary Relationships of Insect Endosymbiotic Γ-Proteobacteria As a Test Case
Total Page:16
File Type:pdf, Size:1020Kb
Syst. Biol. 56(1):1–16, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150601109759 From Phylogenetics to Phylogenomics: The Evolutionary Relationships of Insect Endosymbiotic γ-Proteobacteria as a Test Case INAKI˜ COMAS,ANDRES´ MOYA, AND FERNANDO GONZALEZ´ -CANDELAS Institut Cavanilles de Biodiversitat i Biologia Evolutiva. Universitat de Valencia,` Valencia, Spain; E-mail: [email protected] (I.C.); [email protected] (A.M.); [email protected] (F.G.-C.) Abstract.—The increasing availability of complete genome sequences and the development of new, faster methods for phylogenetic reconstruction allow the exploration of the set of evolutionary trees for each gene in the genome of any species. This has led to the development of new phylogenomic methods. Here, we have compared different phylogenetic and phylogenomic methods in the analysis of the monophyletic origin of insect endosymbionts from the γ -Proteobacteria, a hotly debated issue with several recent, conflicting reports. We have obtained the phylogenetic tree for each of the 579 identified protein-coding genes in the genome of the primary endosymbiont of carpenter ants, Blochmannia floridanus, after determining their presumed orthologs in 20 additional Proteobacteria genomes. A reference phylogeny reflecting the monophyletic origin of insect endosymbionts was further confirmed with different approaches, which led us to consider it as the presumed species tree. Remarkably, only 43 individual genes produced exactly the same topology as this presumed species tree. Most discrepancies between this tree and those obtained from individual genes or by concatenation of different genes were due to the grouping of Xanthomonadales with β-Proteobacteria and not to uncertainties over the monophyly of insect endosymbionts. As previously noted, operational genes were more prone to reject the presumed species tree than those included in information-processing categories, but caution should be exerted when selecting genes for phylogenetic inference on the basis of their functional category assignment. We have obtained strong evidence in support of the monophyletic origin of γ -Proteobacteria insect endosymbionts by a combination of phylogenetic and phylogenomic methods. In our analysis, the use of concatenated genes has shown to be a valuable tool for analyzing primary phylogenetic signals coded in the genomes. Nevertheless, other phylogenomic methods such as supertree approaches were useful in revealing alternative phylogenetic signals and should be included in comprehensive phylogenomic studies. [Concatenate alignment; incongruence; microbial evolution; phylome; supermatrix alignment; supertree analysis.] The evolution of Bacteria is strongly influenced by the markers have been shown to be laterally transferred in fluid nature of their genomes. Processes such as duplica- some cases (Asai et al., 1999; Yap et al., 1999). tion, gene gain and loss limit the inference of their evo- The advantages of multiple gene approaches versus lutionary relationships. However, in the last decade the those based on single genes are a priori evident. In the- genomic revolution has represented not only a change ory, we can evade the single gene evolutionary histo- in scale for the analysis of sequences but also for phylo- ries in favor of a common “true” phylogenetic signal; genetic inference. The phylogenomic approach, initially it is possible to avoid problems derived from insuffi- proposed as the application of phylogenetic analysis cient sample size by addition of more sites from multiple to help annotating complete genome sequences (Eisen, genes or to compensate for biased base compositions. In 1998), has transformed into the (possibly) most appropri- practice, some of these problems will also affect phylo- ate way to derive the phylogenetic history of organisms genies reconstructed from large data sets, but others will through genome-scale phylogenetic inference (O’Brien be substantially reduced and/or easily diagnosed. Sev- and Stanyon, 1999; Sicheritz-Pont´en and Andersson, eral alternative methods to single-gene tree phylogenies 2001; Eisen and Fraser, 2003). This leap from phylogenet- have been proposed recently (reviewed in Delsuc et al., ics to phylogenomics has allowed avoiding some of the 2005). These methods are based on different kinds of se- inherent problems in the inference of species trees from quence characters and genome structure, such as gene single gene phylogenies. However, it has also brought content (Snel et al., 1999; Gu et al., 2004; Huson and Steel, new issues in the reconstruction of species trees and even 2004), gene order (Wolf et al., 2001; Korbel et al., 2002; questioned the existence of such a single tree for all the Belda et al., 2005), concatenated sequences (Brown et al., genes in the genome of a bacterial species (Doolittle, 1999; 2001; Rokas et al., 2003), or gene tree–based techniques Bapteste et al., 2005). (Bininda-Emonds et al., 2002; Bininda-Emonds, 2004b). The single-gene approach reached its peak with the In this article we focus on methods based on direct or use of ribosomal RNA sequences as phylogenetic mark- indirect analyses of genomic sequence data. ers for microorganisms (Woese, 1987). These genes have The concatenation approach has proven useful in been, and still are, a powerful tool in bacterial phylo- many cases and its use is increasingly common (Bapteste genetic analysis. Their properties as a good marker for et al., 2002; Rokas et al., 2003). The method has been phylogenetic inference, such as universal presence and justified in cases where single-gene phylogenetic sig- evolutionary conservation, have enabled the proposal of nals are insufficient (Herniou et al., 2001), when there a universal tree of life (Woese et al., 1990) and the clas- is heterogeneity in the evolutionary rates (Bapteste et al., sification and reconstruction of evolutionary relation- 2002; Gontcharov et al., 2004), or when a high influ- ships for the three domains in the absence of genomic ence of nonstandard evolutionary patterns in the shap- data (Woese and Fox, 1977). However, even ribosomal ing of gene trees, such as horizontal gene transfer, is to 1 2 SYSTEMATIC BIOLOGY VOL. 56 be expected (Brochier et al., 2002). This method increases is an ongoing debate on whether bacterial endosym- phylogenetic signal by joining the sequences from multi- bionts of insects from the gamma subdivision of Pro- ple genes, thus creating a supermatrix of characters, and teobacteria conform to a monophyletic group (Gil et al., generally recovers accepted (and presumably correct) 2003; Lerat et al., 2003; Canback et al., 2004) or they are phylogenies with highly supported nodes, thus evad- paraphyletic and their grouping results from artefacts ing many of the above pitfalls (for instance, insufficient in the phylogeny reconstruction process (Charles et al., amounts of informative sites or particular gene histories) 2001; Herbeck et al., 2004; Belda et al., 2005). This study as far as the correct model of evolution is used, although investigates the evolutionary relationships of five γ - it may not overcome problems associated to systematic Proteobacteria endosymbionts, including the three Buch- biases in the data (Phillips et al., 2004). nera strains sequenced so far. Several evolutionary fea- Contrary to the concatenation approach, consensus tures of these genomes, such as their high evolutionary and supertree approaches have an indirect association rates and low G+C content, and genome disintegration with the genome sequence. The consensus method is led to different methodological problems (Moreira and based on the integration of multiple source trees into Philippe, 2000; Sanderson and Shaffer, 2002). a single topology. When the initial data set does not Our work is based on the determination of the phy- include the same taxa in all the gene trees, then a su- lome (Sicheritz-Pont´en and Andersson, 2001)—the set of pertree is constructed combining the overlapping topolo- phylogenetic trees for each protein-coding gene in the gies (Bininda-Emonds, 2004a). All these methods have genome—of Blochmannia floridanus (Gil et al., 2003), the proven useful for phylogenetic inference (Daubin et al., primary endosymbiont of carpenter ants. We have used 2002; Rokas et al., 2003), but they have different possible it as a starting point to explore the phylogenetic land- associated errors. For instance, the strength of the super- scape of γ -Proteobacteria. In this work we have used matrix approach decreases when the number of shared an array of phylogenetic and phylogenomic techniques orthologous genes is low or when many of the genes to infer the phylogenetic relationships among 21 Pro- concatenated are influenced by horizontal gene trans- teobacteria. In consequence, we have analyzed several fer events or hidden paralogies (Daubin et al., 2002). On phylogenetic hypotheses for these species through the the other hand, a supertree approach has also its own examination of the phylogenies derived from the set sources of problems. The use of wrong source trees, the of protein-coding genes in Blochmannia and their com- indirect relationship with molecular data, taxon hetero- parison with topologies obtained from the 16S rDNA geneity among the trees, or lack of a universal method- and different phylogenomic methods. Our aim has been ology for assessing the reliability of the nodes are among not only to test alternative species tree hypotheses but the most