Mammalian Evolution May Not Be Strictly Bifurcating Open Access Research Article
Total Page:16
File Type:pdf, Size:1020Kb
Mammalian Evolution May not Be Strictly Bifurcating Bjo¨rn M. Hallstro¨m1 and Axel Janke*,2 1Department of Cell and Organism Biology, Division of Evolutionary Molecular Systematics, University of Lund, Lund, Sweden 2LOEWE—Biodiversita¨t und Klima Forschungszentrum BiK-F, Frankfurt, Germany *Corresponding author: E-mail: [email protected]. Associate editor: Arndt von Haeseler Abstract The massive amount of genomic sequence data that is now available for analyzing evolutionary relationships among 31 placental mammals reduces the stochastic error in phylogenetic analyses to virtually zero. One would expect that this would make it possible to finally resolve controversial branches in the placental mammalian tree. We analyzed a 2,863,797 nucleotide- Research article long alignment (3,364 genes) from 31 placental mammals for reconstructing their evolution. Most placental mammalian relationships were resolved, and a consensus of their evolution is emerging. However, certain branches remain difficult or virtually impossible to resolve. These branches are characterized by short divergence times in the order of 1–4 million years. Computer simulations based on parameters from the real data show that as little as about 12,500 amino acid sites could be sufficient to confidently resolve short branches as old as about 90 million years ago (Ma). Thus, the amount of sequence data should no longer be a limiting factor in resolving the relationships among placental mammals. The timing of the early radiation of placental mammals coincides with a period of climate warming some 100–80 Ma and with continental fragmentation. These global processes may have triggered the rapid diversification of placental mammals. However, the rapid radiations of certain mammalian groups complicate phylogenetic analyses, possibly due to incomplete lineage sorting and introgression. These speciation-related processes led to a mosaic genome and conflicting phylogenetic signals. Split network methods are ideal for visualizing these problematic branches and can therefore depict data conflict and possibly the true evolutionary history better than strictly bifurcating trees. Given the timing of tectonics, of placental mammalian divergences, and the fossil record, a Laurasian rather than Gondwanan origin of placental mammals seems the most parsimonious explanation. Key words: continental drift, Cretaceous warming, genome analysis, hybridization, phylogenomics, split decomposition. Introduction Despite the amount of available data, some branches in the placental mammal tree are only weakly supported and As genomic sequences are the ultimate source of molecular phylogenomic analyses leave some branches as yet com- data for evolutionary studies, the wealth of information pletely unresolved due to the variable and poor support available from the whole-genome sequencing of metazoans has revolutionized evolutionary studies. After publication for different evolutionary scenarios (Hallstro¨m and Janke of the human genome (Lander et al. 2001; Venter et al. 2008). This was unexpected because, theoretically, the 2001), it was decided that low-coverage (2Â) genome se- sheer amount of genomic data should have easily over- quences from additional mammalian species would be come stochastic errors from single and multiple (20) gene beneficial for more accurate sequence annotation and analyses, a problem that vexed molecular phylogenetic for studying the evolution of disease genes. The data from studies before the wealth of information from the genomic these genome projects also enable us to study the evolu- age (Kullberg et al. 2008). tion of mammalian lineages with a previously unimaginable To date, three important but difficult to resolve branch- amount of data, opening the filed of phylogenomics. The ing points in the mammalian tree have been identified. The first phylogenomic study of placental mammals initially in- first is the primary branching among placental mammals, volved some 200,000 nucleotides (nt) of protein-coding that is, between the superorders Xenarthra (in this study: sequences (Nikolaev et al. 2007). Improving the sequence sloth and armadillo), Afrotheria (elephant, tenrec, and hy- and taxon coverage, phylogenomic analyses of mammals rax), and Boreoplacentalia (all remaining species used in included 2.2 million nucleotides (Mnt) from about 2,840 this study). Previous resolutions of their divergences de- protein-coding genes (Hallstro¨m et al. 2007) or some pended on the choice of analytical methodology and type 1,700 conserved genome loci (Wildman et al. 2007). The of data but did not enable the rejection of alternative hy- largest and most complete data set so far analyzed more potheses (Nishihara et al. 2007; Hallstro¨m and Janke 2008). than 2.8 Mnt from 3,012 protein-coding genes (Hallstro¨m Even the analyses of retroposon insertions, otherwise gen- and Janke 2008). However, phylogenomics has failed to re- erally regarded as a solid phylogenetic marker system, failed liably resolve certain branches of the placental mammal to resolve a clear bifurcation in the most basal divergences tree, thereby posing new problems and questions in animal among placental mammals (Churakov et al. 2009; Nishihara evolution (Hallstro¨m and Janke 2008). et al. 2009), thus, supporting the original sequence-based © The Author(s) 2010. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Open Access 2804 Mol. Biol. Evol. 27(12):2804–2816. 2010 doi:10.1093/molbev/msq166 Advance Access publication June 29, 2010 Mammalian Phylogenomics · doi:10.1093/molbev/msq166 MBE findings (Hallstro¨m and Janke 2008). Retroposon insertions branches in the mammalian tree and to study the tree like- are, with a few exceptions (Cantrell et al. 2001; van de ness of the mammalian genome data. The number of pla- Lagemaat et al. 2005), free of homoplasies (Steel and Penny cental mammals included in this study has been increased 2000). Therefore, the apparently contradicting results from to 31, 12 (63%) more than in a previous phylogenomic anal- sequence data and retroposon insertion analyses require ysis (Hallstro¨m and Janke 2008), and the genome data from a natural explanation. Similar observations have also been six outgroup species were included to ensure a solid root of made for the other two poorly or unresolved mammalian the tree from a maximal taxon sampling. The previously phylogenetic branches. The detection of apparently con- unresolved branchings inside the Laurasiaplacentalia, flicting phylogenetic signals for the position of the Scan- which include Perissodactyla, Carnivora, Artiodactyla, dentia (tree shrews) relative to Primates and Glires Cetacea, Chiroptera, and Lipotyphla (Arnason et al. (rodents and lagomorphs) within the Euarchontoglires 2008), may be clarified by the new genome data of a peris- clade, and the position of Chiroptera (bats) relative to sodactyl (horse), a second chiropteran (mega bat), a ceta- Artiodactyla (even-toed ungulates), Carnivora, Lipothypla cean (dolphin), and an artiodactylan (alpaca). The genome (hedghog, common shrew, and allies), and Perissodactyla data from a hyracoid (hyrax) and a second xenarthran (odd-toed ungulates) within the Laurasiaplacentalia, leave (sloth) are invaluable additions to the hitherto long and both controversial and poorly supported (Nishihara et al. undivided branches of Afrotheria and Xenarthra, allowing 2006; Janecka et al. 2007; Kriegs et al. 2007; Hallstro¨m and the basal divergences among placental mammals to be ex- Janke 2008). amined in more detail. Computer simulations were made A common feature of these three problematic divergen- to estimate the amount of data needed to resolve short ces is the divergence of groups within 1–4 million years branches, and finally, split decomposition methods (Huson (My) of one another (Hallstro¨m and Janke 2008). Such time and Bryant 2006) were used for visualizing conflict in the intervals may have been too short for the respective ge- tree. nomes to gather enough substitutions to resolve rapid di- vergences some 100 My later or limited taxon sampling Materials and Methods may impede such phylogenetic analyses. Another possible Data reason for the problems involved in clearly resolving short Predicted cDNA sequences from all tetrapods with assem- central branches of the placental mammalian tree by se- blies and gene builds in release 54 of ENSEMBL were down- quence and retroposon data may be speciation-associated loaded from ftp://ftp.ensembl.org/pub/current_fasta/. In processes, such as species hybridization and incomplete lin- total, 37 species (table 1) were included in the data build. eage sorting (Nei 1987). Species hybridization leads to in- The taxon sampling represents 16 of the 21 extant euthe- trogression, the incorporation of genes from one species rian orders. The sequence data from metatherian (marsu- into the gene pool of another species, whereas incomplete pials), prototherian (monotremes), avian (bird), reptile, and lineage sorting produces a pattern of allele fixations from amphibian species were collected for rooting the placental ancestral polymorphisms that does not reflect the species mammal tree. For increasing the usable sequence length history.