P1: FIZ/FEA/FGI P2: Fne/FGO QC: FDS/anil T1: FDX September 17, 1999 15:27 Annual Reviews AR093-12
Annu. Rev. Ecol. Syst. 1999. 30:327–62 Copyright c 1999 by Annual Reviews. All rights reserved
POLYMORPHISM IN SYSTEMATICS AND COMPARATIVE BIOLOGY
John J. Wiens Section of Amphibians and Reptiles, Carnegie Museum of Natural History, Pittsburgh, Pennsylvania 15213-4080; e-mail: [email protected].
Key Words comparative methods, phylogenetic analysis, phylogeny, intraspecific variation, species-limits ■ Abstract Polymorphism, or variation within species, is common in all kinds of data and is the major focus of research on microevolution. However, polymorphism is often ignored by those who study macroevolution: systematists and comparative evolutionary biologists. Polymorphism may have a profound impact on phylogeny reconstruction, species-delimitation, and studies of character evolution. A variety of methods are used to deal with polymorphism in phylogeny reconstruction, and many of these methods have been extremely controversial for more than 20 years. Recent research has attempted to address the accuracy of these methods (their ability to es- timate the true phylogeny) and to resolve these issues, using computer simulation, congruence, and statistical analyses. These studies suggest three things: that (a) the exclusion of polymorphic characters (as is commonly done in morphological phylo- genetics) is unjustified and may greatly decrease accuracy relative to analyses that include these characters; (b) methods that incorporate frequency information on poly- morphic characters tend to perform best, and (c) distance and likelihood methods designed for polymorphic data may often outperform parsimony methods. Although rarely discussed, polymorphism may also have a major impact on comparative studies of character evolution, such as the reconstruction of ancestral character states. Finally, polymorphism is an important issue in the delimitation of species, although this area has been somewhat neglected methodologically. The integration of within-species variation and microevolutionary processes into studies of systematics and comparative evolutionary biology is another example of the benefits of exchange of ideas between the fields of population genetics and systematics.
INTRODUCTION
One of the most important trends in systematics and evolutionary biology in re- cent years has been an increasing appreciation for the interconnectedness of these fields. For example, phylogenies are used increasingly by evolutionary biolo- gists studying? ecology and behavior (e.g. 9, 60, 82), and systematists using DNA 0066-4162/99/1120-0327$08.00 327 P1: FIZ/FEA/FGI P2: Fne/FGO QC: FDS/anil T1: FDX September 17, 1999 15:27 Annual Reviews AR093-12
328 WIENS
and RNA sequence data are beginning to incorporate more and more details of molecular evolutionary processes into their phylogenetic analyses (e.g. 130). One of the areas in which a phylogenetic approach has had an important impact is the study of within-species variation, particularly in the fields of phylogeography and molecular population genetics (e.g. 4, 45, 51, 62, 126). However, many un- resolved questions remain as to what the study of within-species variation and microevolutionary processes might have to offer between-species systematic and comparative evolutionary studies (e.g. 58). Heritable variation within species is the basic material of evolutionary change and the major subject of research on microevolutionary processes. Intraspecific variation is abundant in all kinds of phenotypic and genotypic traits, including morphology, behavior, allozymes, and DNA sequences. This variation is not really surprising because if characters vary between species, they must also vary within species, at least at some point in their evolution. In many cases, especially among closely related species, this instraspecific variation may persist and may be abundant. For example, among the nine species of the lizard genus Urosaurus,23 of 24 qualitative morphological characters that vary between species were found to vary within one or more species as well (136). I define polymorphism as variation within species that is (at least partly) inde- pendent of ontogeny and sex. I assume that this variation is genetically based and heritable, and for the purposes of this paper I deal primarily with variation in dis- crete or qualitative characters, rather than continuous variation in quantitative traits. Despite the prevalence of intraspecific variation, phylogenetic biologists have a long and continuing tradition of ignoring polymorphism. For example, mor- phological systematists often exclude characters that show any or “too much” variation within species (109a). Both molecular and morphological systematists often “avoid” or minimize polymorphism by sampling only a single individual per species. When polymorphism is dealt with explicitly, as in phylogenetic analyses of allozyme data and some studies of morphology, the appropriateness of different methods for phylogenetic analysis of these data is controversial and has been the subject of heated debate for over 20 years (e.g. 11, 12, 20, 33–35, 39, 40, 43, 75, 90–93, 96, 97, 116, 129, 130, 137–139, 142, 143). The controversy over the ef- ficacy of different methods for analyzing polymorphism is not merely academic because different methods may give very different estimates of phylogeny from the same data (Figure 1; 137). Different phylogenetic hypotheses may have very different implications for comparative evolutionary studies. But even if the tree is stable, different methods of treating within-species variation in ancestral state reconstructions may lead to radically different hypotheses about how traits evolve (see below). Descriptions of comparative methods designed for discrete traits (e.g. 76, 104, 118) rarely mention that these traits may vary within species or what the potential impact may be of this variation on the methods or results. Species-level systematics, or alpha taxonomy, also involves analyzing poly- morphic characters.? Analytically, the main task of species-level systematics is to P1: FIZ/FEA/FGI P2: Fne/FGO QC: FDS/anil T1: FDX September 17, 1999 15:27 Annual Reviews AR093-12
POLYMORPHISM IN SYSTEMATICS 329
Figure 1 Different methods for coding polymorphic characters for phylogenetic anal- ysis lead to very different hypotheses of evolutionary relationships. Results are based on morphological data for the lizard genus Urosaurus (136). Numbers at nodes indicate bootstrap values (42; bootstrap values <50% not shown). Each data set was analyzed with 1000 pseudoreplicates with the branch-and-bound search option.
distinguish between intraspecific and interspecific character variation. The delim- itation, diagnosis, and description of species is at least as important an endeavor of systematics as phylogeny reconstruction. Yet, in contrast to phylogeny recon- struction, there has been relatively little methodological improvement in this area, especially as practiced by morphological systematists, who have described and will continue to describe most of the world’s species. Alpha taxonomy is a branch of systematics that would benefit tremendously from a more explicit treatment of polymorphism. In this paper, I review the implications of within-species variation for studies of systematics and comparative biology. I first provide an overview of common methodologies? for dealing with polymorphism in phylogeny reconstruction and of P1: FIZ/FEA/FGI P2: Fne/FGO QC: FDS/anil T1: FDX September 17, 1999 15:27 Annual Reviews AR093-12
330 WIENS
some of the controversies surrounding these methods. I then describe recent studies designed to test the accuracy of these methods and resolve these controversies. I also discuss the impact, although considerably less studied, of within-species vari- ation on comparative studies of the evolution of discrete or qualitative characters. Finally, I review the problem of delimiting species and the operational criteria and methodologies used for delimiting species and distinguishing within and between species variation.
PHYLOGENY RECONSTRUCTION General Approaches Polymorphism is important in reconstructing the phylogeny among species for two reasons. First, it is common in data of all types, including morphology, molecules, and behavior. Second, when polymorphism is present, it may have a significant impact on phylogenetic analyses. In particular, various methods for dealing with polymorphism may lead to very different estimates of phylogeny, even when relationships are strongly supported by one or more methods (Figure 1). The abundance and impact of polymorphism are especially clear for closely re- lated species, but different methods for analyzing polymorphic data may affect higher-level relationships as well (e.g. relationships among genera; 138, 139). Yet, surprisingly, the issue of polymorphism is frequently ignored by systematists, particularly those working with morphological and DNA sequence data. Systematists deal with polymorphism, or avoid dealing with polymorphism, in a number of different ways. These general approaches loosely reflect different types of data. Morphologists often exclude characters in which polymorphism is observed, and in fact this is the most common reason given for excluding characters (109a). This practice may be far more common than is apparent from the literature because morphologists rarely provide criteria for excluding or including characters (109a). The next most common exclusion criterion, excluding characters that show continuous variation, also reflects the desire to avoid characters that vary within and overlap between species. The justification for excluding polymorphic characters is rarely made clear by empirical systematists. Yet, there is a persistent idea in the systematics literature, dating back to Darwin (22), that the more variation characters show within species, the less reliable they will be for inferring the phylogeny among species (32, 86, 123). There have been few empirical tests of this idea. Systematists working with sequence and restriction-site data typically deal with intraspecific variation by treating each individual organism (or each unique geno- type or haplotype) as a separate terminal unit in phylogenetic analyses. Thus, variation within species is effectively treated in the same way as variation between species (134). However, some authors have recently suggested modifications to this general? approach, specifically tailored to the problem of analyzing variation P1: FIZ/FEA/FGI P2: Fne/FGO QC: FDS/anil T1: FDX September 17, 1999 15:27 Annual Reviews AR093-12
POLYMORPHISM IN SYSTEMATICS 331
within species (e.g. 18, 19, 127, 132). Of course, one variant of this approach is to sample only a single individual from each species. This sampling regime, al- though obviously controversial (2, 127, 138, 142, 143), is often employed by both molecular and morphological systematists. A third general approach involves treating each species (or population) as a ter- minal unit in the phylogenetic analysis. This approach incorporates intraspecific variation by different methods of coding in a parsimony or discrete character frame- work or by conversion of trait frequencies to genetic distances (or direct analysis of frequencies using continuous maximum likelihood; 38). This general approach is most frequently applied to allozyme data but is sometimes used for morpho- logical data as well (12, 14, 108). A plethora of methods for dealing directly with polymorphism have been proposed and used, including at least eight parsimony coding methods (described below), two maximum likelihood methods (38, 100), and no less than 36 genetic distance methods (e.g. 114, 115, 130, 148), where each distance method is a combination of tree-building algorithm and genetic distance measure. These parsimony, likelihood, and distance methods, designed explicitly for polymorphic data, have been the subject of considerable controversy, dating back more than 20 years (20, 39, 90, 91, 96, 97, 129, 130). Two questions have been par- ticularly prominent. First, are frequency data appropriate for phylogenetic anal- ysis? Many authors have argued that the frequencies of traits or alleles within species are not useful for reconstructing phylogenies among species, largely be- cause they are thought to be too variable in space and time within species (e.g. 20, 96, 97) and are not heritable, organismal traits (e.g. 97, 122). Proponents of frequency methods have argued that frequency methods utilize valuable infor- mation ignored by other methods (e.g. a trait occurring at a frequency of 1% is different from one occurring at a frequency of 99%), even if frequencies are not stable over a macroevolutionary time scale (129, 130, 137). These authors have also argued that frequency methods downweight rare traits, and therefore they will be less subject to problems of sampling error than methods that merely treat traits as present or absent (i.e. a trait that is rare but present in several related species will be detected only sporadically with finite sample sizes, creating ho- moplasy, but this homoplasy will have little impact if frequency methods are used). The second question is whether polymorphic data should be analyzed using parsimony or distance methods (e.g. 33–35, 39, 40, 43, 90, 97, 130). Most of the debate surrounding this topic has not directly involved the accuracy of the methods, but rather issues such as the meaning of branch lengths and negative distances (e.g. 33–35, 39, 40, 43). The maximum likelihood method most widely applicable to polymorphic data (continuous maximum likelihood or CONTML; 38) has been largely ignored by empirical systematists (but see 120), presumably because it assumes a clearly unrealistic model of evolution (e.g. 71, 129). Namely, it assumes no mutations and no fixations? or losses of polymorphic traits (38). However, the sensitivity of P1: FIZ/FEA/FGI P2: Fne/FGO QC: FDS/anil T1: FDX September 17, 1999 15:27 Annual Reviews AR093-12
332 WIENS
the method to violations of these assumptions has not been thoroughly explored until recently.
Methods for Coding Polymorphic Data In this section, I briefly review some of the methods commonly used for coding polymorphic data for parsimony analysis (see also Figure 2). The terminology for these methods follows Campbell & Frost (12) and Wiens (137).
Any Instance Using this coding method, a derived trait is coded as present regardless of the frequency at which it occurs within a species (e.g. 1 to 100%). However, this method is problematic in that it can hide potentially informative reversals (12) such as the reappearance of the primitive trait as a polymorphism (e.g. a transition from 100% to 50% for the derived state). Mutation coding (96, 97) is similar to any-instance coding but potentially allows for characters with multiple derived states to be analyzed. However, its application is “frequently impossible for most loci” (96, p. 32).
Majority Using majority or “modal” coding, a species is coded as having the most common state of the polymorphic character. Potential disadvantages of this method are that it ignores the gain and loss of traits at frequencies less than 50% and that it gives a large weight to small changes in frequency close to 50% (e.g. a change from 49% to 51% has the same weight as a change from 0 to 100%).
Missing When a species that is polymorphic for a given character is coded as missing, the state is treated as unknown in the phylogenetic analysis. Any state is considered a possible assignment to the species, even if the state was not one of the ones observed to be present in the variable species (at least using PAUP). Disadvantages of the missing method are that polymorphic data cells are uninformative in tree reconstruction, and polymorphic states can be treated neither as synapomorphies nor as homoplasies.
Polymorphic Under polymorphic coding, a variable species is coded as having both states (using PAUP or MacClade). When the data are analyzed, the variable species is treated as if either state is present, but the variable cell is largely un- informative in building the tree (although some placements of the variable taxon may be considered more parsimonious than others), and the most parsimonious