Protein Sequence Signatures Support the African Clade of Mammals
Total Page:16
File Type:pdf, Size:1020Kb
Protein sequence signatures support the African clade of mammals Marjon A. M. van Dijk*, Ole Madsen*, Franc¸ois Catzeflis†, Michael J. Stanhope‡, Wilfried W. de Jong*§¶, and Mark Pagel¶ʈ *Department of Biochemistry, University of Nijmegen, P.O. Box 9101, 6500 HB Nijmegen, The Netherlands; †Institut des Sciences de l’E´ volution, Universite´Montpellier 2, 34095 Montpellier, France; ‡Queen’s University of Belfast, Biology and Biochemistry, Belfast BT9 7BL, United Kingdom; §Institute for Systematics and Population Biology, University of Amsterdam, 1090 GT Amsterdam, The Netherlands; and ʈSchool of Animal and Microbial Sciences, University of Reading, Whiteknights, Reading RG6 6AJ, United Kingdom Edited by Elwyn L. Simons, Duke University Primate Center, Durham, NC, and approved October 9, 2000 (received for review May 12, 2000) DNA sequence evidence supports a superordinal clade of mammals subfamily living outside of Madagascar. To assess the signifi- that comprises elephants, sea cows, hyraxes, aardvarks, elephant cance of the candidate signatures, we use likelihood methods shrews, golden moles, and tenrecs, which all have their origins in (20) to reconstruct their most probable ancestral states at the Africa, and therefore are dubbed Afrotheria. Morphologically, this basal node of the Afrotherian clade. These calculations use a appears an unlikely assemblage, which challenges—by including phylogeny reconstructed independently of the protein under golden moles and tenrecs—the monophyly of the order Lipotyphla investigation. We further use likelihood and combinatorial meth- (Insectivora). We here identify in three proteins unique combina- ods to estimate the probability of the signatures on three tions of apomorphous amino acid replacements that support this alternative morphology-based trees that are incompatible with clade. The statistical support for such ‘‘sequence signatures’’ as an African clade. We then combine the evidence from CRYAA, unambiguous synapomorphic evidence for the naturalness of the AQP2, and IRBP by using Bayesian techniques to yield a Afrotherian clade is reported. Using likelihood, combinatorial, and posterior probability for the Afrotherian clade. Demonstrating Bayesian methods we show that the posterior probability of the the statistical improbability of such events in the course of mammalian tree containing the Afrotherian clade is effectively 1.0, biological evolution (21) may help to escape from the current based on conservative assumptions. Presenting sequence data for stalemating in the molecules-versus-morphology debate on ver- another African insectivore, the otter shrew Micropotamogale tebrate phylogeny (3). lamottei, we demonstrate that such signatures are diagnostic for including newly investigated species in the Afrotheria. Sequence Materials and Methods signatures provide ‘‘protein-morphological’’ synapomorphies that Searching for Afrotherian Signatures. Databases were searched for may aid in visualizing monophyletic groupings. sets of protein sequences that included representatives of at least four Afrotherian orders, i.e., Proboscidea (elephants), Sirenia olecular sequence data are increasingly used in mamma- (sea cows), Hyracoidea (hyraxes), Tubulidentata, Macros- Mlian phylogeny and recently have led to a number of celidea, and Afrosoricida (golden moles and tenrecs; ref. 7). This unorthodox proposals (1–3). These proposals range from the yielded data sets of CRYAA, AQP2, IRBP, von Willebrand ␣ ␥ ␣ claim that the guinea pig is not a rodent (4) to making whales and factor, -2B adrenergic receptor, -fibrinogen, hemoglobin-  hippos sister groups (5). One of the most remarkable proposi- and - , and cytochrome b. The AQP2 data set was comple- tions is that of an ‘‘African clade’’ in which species as diverse as mented with newly determined sequences of pig, fin whale, and elephant shrews (Macroscelidea), golden moles (Chrysochlori- sperm whale (see below). From these data sets, one or, if dae), and tenrecs (Tenrecidae) are grouped with aardvarks available, two representatives of all included eutherian orders (Tubulidentata) and paenungulates (elephants, sea cows, and were taken. When more than two species were available for an hyraxes; refs. 6 and 7). All of the African clade species find their order, only the two most divergent sequences were retained. This fossil roots in Africa, and most are still confined to this continent, increases the homoplastic background, and thus the significance hence the name Afrotheria (7). The sequence evidence for of retrieved signatures. Retaining all sequences would make the Afrotheria is unanimous and strong, deriving from various taxon representation unbalanced and hamper the signature nuclear and mitochondrial genes (6–10). Morphologically, how- searches. The selected sequences were aligned, using PILEUP, and ever, there is no evidence whatsoever for a natural grouping of manually edited. Where available, two divergent Marsupialia these taxa (11–14), prompting us to subject the molecular were included as outgroups. For full species names and accession evidence to further scrutiny. numbers, and for protein alignments of CRYAA, AQP2, and If Afrotheria is a real clade, it might be possible to find specific IRBP, see Table 4 and Figs. 3–5, which are published as combinations of amino acid replacements in the proteins that supplemental data on the PNAS web site, www.pnas.org. support them. These replacements would represent synapomor- Candidate sequence signatures were retrieved from the align- phous character states, as remnants of mutational events during the last common ancestry of a clade. Several authors have used the concept of such ‘‘sequence signatures’’ qualitatively in mo- This paper was submitted directly (Track II) to the PNAS office. lecular phylogeny (e.g., refs. 15–19), but thorough statistical Abbreviations: CRYAA, ␣A-crystallin; AQP2, aquaporin-2; IRBP, interphotoreceptor retinol- binding protein; ML, maximum likelihood. interpretations are lacking. We here search for the presence of unique Afrotherian See commentary on page 1. sequence signatures in nine protein data sets—eight nuclear and Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. AJ251100–AJ251106, AJ277647, and AJ270463–AJ270468). one mitochondrial—that include at least four Afrotherian or- ␣ ¶To whom reprint requests should be addressed. E-mail: [email protected] or ders. Putative Afrotherian signatures were traced in A- [email protected]. crystallin (CRYAA), aquaporin-2 (AQP2), and interphotore- The publication costs of this article were defrayed in part by page charge payment. This ceptor retinol-binding protein (IRBP). To demonstrate the article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. diagnostic value of the signatures we seek their presence in §1734 solely to indicate this fact. CRYAA and AQP2 of other potential members of the African Article published online before print: Proc. Natl. Acad. Sci. USA, 10.1073͞pnas.250216797. clade, including the otter shrew—representing the only tenrecid Article and publication date are at www.pnas.org͞cgi͞doi͞10.1073͞pnas.250216797 188–193 ͉ PNAS ͉ January 2, 2001 ͉ vol. 98 ͉ no. 1 Downloaded by guest on September 24, 2021 Fig. 1. Afrotherian signatures in CRYAA, AQP2, and IRBP. (A) Positions from an alignment of 26 eutherian and two marsupial CRYAA sequences at which the same putatively apomorphous replacement occurs in 4–6 eutherian sequences. In black are replacements that occur in at least four of the five Afrotheria. Note that ‘‘.’’ indicates all residues that did not pass the 4–6 search window; they may be identical to the two top out-group sequences or be apomorphies occurring in Ͻ4orϾ6 sequences; x is unassigned residue; * denotes species that are included in the trees in Fig. 2. (B) Apomorphous replacements occurring in 5–7 AQP2 sequences, considering armadillo as out-group for the other Eutheria; rodents as out-group yields the same signature (shown in Fig. 4). (C) Positions in IRBP that pass the search for four- to six-species eutherian clades. Note that at certain positions different putatively apomorphous replacements fulfil the search criterium and may set apart different clades (e.g., 13E for Afrotheria, 13G for Cetartiodactyla). (D and E) Afrotherian signature positions in newly determined CRYAA and AQP2 sequences, respectively. ments by using the spreadsheet SIGNWIN (available from the izing to exon 1 and a reverse primer complementary to the 3Ј end authors). No phylogenetic information is included in this search; of exon 2 (22). AQP2 was sequenced (23) for pig (Sus scrofa), SIGNWIN solely selects positions at which a designated number of sperm whale (Physeter macrocephalus), manatee (Trichechus in-group species have the same putatively apomorphous replace- manatus), fin whale (Balaenoptera physalus), tree hyrax (Den- ment, considering the out-group residue(s) as plesiomorphous drohyrax dorsalis), tail-less tenrec, small Madagascar hedgehog, condition. The selection window is set to be appropriate for the and otter shrew. number of species for which the monophyly is investigated. Thus, when searching for positions that might support the monophyly Phylogenetic Tree Reconstruction. To study the evolution of the of the five Afrotheria among the 26 selected eutherian CRYAA candidate Afrotherian signatures found in CRYAA, AQP2, and sequences (Figs. 1A and 3), the window is set at 5 Ϯ 1. This allows IRBP (see Results), phylogenetic