Molecular Systematics: Perfect Sines of Evolutionary History? Michael M
Total Page:16
File Type:pdf, Size:1020Kb
R816 Dispatch Molecular systematics: Perfect SINEs of evolutionary history? Michael M. Miyamoto Short interspersed repetitive elements — SINEs — are species carrying it form a ‘monophyletic’ group, and in being championed as near-perfect phylogenetic that sense the SINE is an example of what is known in the characters; they have recently been used with notable trade as a shared derived character or ‘synapomorphy’: in success to resolve some phylogenetic conundrums, but other words, the insertion event is assumed to have they do have certain limitations that restrict their use as occurred in the stem lineage from which the various ‘perfect’ characters for molecular systematics. species are descended. Address: Department of Zoology, P.O. Box 118525, University of Florida, Gainesville, Florida 32611-8525, USA. The use of SINE insertions as phylogenetic markers E-mail: [email protected] starts with the identification and characterization of new retroposons from a few target species of the study group Current Biology 1999, 9:R816–R819 [4,6]. Locus-specific primers that match the flanking 0960-9822/99/$ – see front matter regions of these elements are then designed for PCR © 1999 Elsevier Science Ltd. All rights reserved. amplification of the corresponding orthologous regions from the other species. The amplified products of the The ultimate goal of phylogenetic systematics is to different species are resolved by gel electrophoresis, with provide an accurate estimate of the true history of life [1]. long and short fragments corresponding to those with and Towards this goal, systematists are constantly searching without the targeted SINE, respectively. These for new characters that are free of the various shortcom- experiments are followed by Southern hybridizations, ings — such as parallel changes in different lineages or first with a SINE-specific probe and then with one for the reversals to a more primitive state, collectively known as flanking regions, to confirm the presence or absence of ‘homoplasies’ — that can mask the true phylogenetic the targeted element and the fidelity of the amplifica- history [2]. In molecular systematics, DNA sequences, as tions. In most cases, final confirmation of the presence or generally obtained these days by the polymerase chain absence of a SINE is obtained by sequencing the PCR reaction (PCR) and automated sequencing, remain the products. The entire procedure is repeated for other principal source of new information for phylogeny estima- SINEs and loci to generate a data set for the study group tion. But both DNA and protein sequence data are vul- with many presence/absence characters. nerable to the confounding effects of parallel and back substitutions, thereby necessitating the use of statistical The retroposition process that generates a new SINE methods to distinguish signal from noise [3]. At this time, insertion is duplicative: the parent SINE is retained at no single source of comparative information — not even its original location, while giving rise to copies that insert complete genome sequences [1] — appears to be entirely at new sites throughout the genome [4,5]. SINE inser- free of error and ambiguity for the determination of tions are accepted to be free of the problem of conver- species’ phylogenies. gence (Figure 1a), as the vast number of available target sites within a eukaryotic genome and the near-random This situation is changing, according to a recent series of integration pattern mean that the chance of precisely the articles by Okada and his colleagues [4,5]. They are same insertion occurring more than once is negligible. championing the use of short interspersed repetitive They are similarly accepted to be free of reversals elements — SINEs — as near-perfect phylogenetic (Figure 1b), as no specific mechanism is known for their characters for molecular systematics. SINEs are precise deletion. SINEs can be lost through non-specific retroposons of 70–500 bases in length, which occur at deletions, but the chance that such a loss will exactly more than 104 copies per eukaryotic genome. These - match the boundaries of an element is again thought to retroposons have multiplied and inserted at various loci be negligible. As a final bonus, the duplicative nature of throughout the ancestral genomes of different lineages retroposition confers an evolutionary directionality onto via RNA intermediates and reverse transcription. These the characters, from the primitive ‘absence’ state to the ancestral retropositions have generated an extensive derived ‘presence’ state. In view of these features, SINE record of SINE insertions that varies across extant insertions are being championed as near-perfect charac- species according to their evolutionary histories. For ters for phylogeny estimation. phylogenetic purposes, this variable record is read as a series of two-state characters, the states being ‘presence’ Overall, the supporting evidence for this claim appears or ‘absence’ (of the SINE). The presence of a specific solid, with the least convincing assumption being that SINE insertion is thought to be evidence that the SINEs insert randomly into the genome. Some Dispatch R817 Figure 1 (a) Convergence Hypothetical examples illustrating potential sources of phylogenetic Estimated tree error and ambiguity with SINE insertions. The true phylogeny for A B A C species A, B and C is shown at the bottom right. (a) An independent B C insertion at the same site in two different species is a convergence that introduces error into the phylogenetic analysis. (b) A precise deletion of an existing SINE is a reversal that also introduces error. (c) An ancestral polymorphism that spans successive speciations can be randomly fixed across species in a pattern that conflicts with their true relationships. (d) The loss of a primer-annealing site in a (b) Reversal species by mutational decay prevents the PCR amplification of its A B C targeted locus, thereby introducing missing data and ambiguity into A the study. (e) The number of independent synapomorphies for a B group can be over-estimated when different SINE insertions are C derived from the same amplification event. retroelements do show specific target-site selection [7], (c) Incomplete lineage sorting and there is at least one known example of convergent A B A C SINE insertions — in deer mice ([8], reported in [2]). B These cases can, however, be regarded as exceptions to C the rule of minimum insertion-site specificity by SINEs , and other retroelements [4,5,7]. Although not entirely random, SINE insertions appear sufficiently so to justify , their acceptance as essentially convergence-free. (d) Missing data A B A B There are, however, two other potential sources of error C C (No PCR and ambiguity for phylogenetic analyses based on SINE ? product) insertions, which have been emphasized as of greater Or X concern [4,5]. The first involves the problem of incom- plete sorting of ancestral polymorphisms into the progeny lineages after speciation (Figure 1c). This source of phylo- (e) Non-independence genetic error becomes a primary concern when two or more successive speciation events occur close in time [9]. A B A B When two successive speciations are separated by N gen- C erations — where N is the effective population size — the C probability of such an error for a single SINE insertion can be as high as 0.3 [6]. Alleviation of this problem requires the use of multiple SINE insertions to corroborate the monophyly of individual groups. Locus 1 Evolutionary event A B Locus 1 with SINE Inferred synapomorphy C The second problem involves the loss over evolutionary Locus 2 Random fixation True time of the primer-annealing sites for specific loci as a Locus 2 with SINE ? Missing data species Locus 3 with X Lost primer-annealing site phylogeny result of mutational decay (Figure 1d). This loss parent SINE precludes the amplification of each targeted locus from Current Biology every species. If a PCR fragment is missing for a species, the corresponding character state is considered to be ‘missing data’, thereby introducing ambiguities into the One further possible source of error in SINE-based final character matrix. For example, in the study of cetar- phylogenetic analysis involves the non-independence of tiodactyl relationships considered below [4], 18 of the 20 SINE insertions that arose from the same multiplication insertion characters for the peccary — a pig-like event (Figure 1e). Although the separate insertions are artiodactyl — are missing data. As mutational decay is independent, the co-occurrence of these SINEs across correlated with time, this source of ambiguity is of great- taxa is not, because of their common origins from the est concern for the more distantly related species. This same amplification event. Thus, each insertion from the approach is thus effectively limited to taxa with amplification cannot be counted as a truly independent proportional sequence differences of less than about character or synapomorphy. This problem was not 25%, and to phylogenetic questions where lineages are discussed by Okada and colleagues [4,5], but it may be separated by less than about 50 million years [4,5]. significant for their work on cetartiodactyl relationships, R818 Current Biology Vol 9 No 21 Figure 2 the Hippopotamidae (Figure 2). These molecular data include DNA and protein sequences for many nuclear and (a) Morphological and paleontological data mitochondrial genes, and for complete mitochondrial DNA genomes [11,12]. Suiformes Suidae (pigs) Tayassuidae (peccaries) Artiodactyla The available insertion sequence data for Cetartio- Hippopotamidae (hippopotamuses) dactyla consist of twenty informative characters for Ruminantia fifteen SINEs, four long interspersed repetitive ele- Tylopoda (camels) Pecora (cattle, deer, ments (LINEs) and one uncharacterized retroposon [4]. giraffes, pronghorns) On the basis of these twenty characters, the ‘most-parsi- Tragulina (chevrotains) Cetacea monious’ phylogeny — the one requiring the fewest dis- Mysticeti (baleen whales) tinct character changes — requires no homoplasy and is Odontoceti (toothed whales) congruent with the large body of other molecular data (Figure 2b).