<<

Ecology, and Organismal Ecology, Evolution and Organismal Biology Publications

2004 Use of Nuclear Genes for Phylogeny Reconstruction in Plants Randall L. Small University of Tennessee - Knoxville

Richard Clark Cronn United States Department of Agriculture

Jonathan F. Wendel Iowa State University, [email protected]

Follow this and additional works at: http://lib.dr.iastate.edu/eeob_ag_pubs Part of the Ecology and Evolutionary Biology Commons, Molecular Genetics Commons, and the Plant Breeding and Genetics Commons The ompc lete bibliographic information for this item can be found at http://lib.dr.iastate.edu/ eeob_ag_pubs/42. For information on how to cite this item, please visit http://lib.dr.iastate.edu/ howtocite.html.

This Article is brought to you for free and open access by the Ecology, Evolution and Organismal Biology at Iowa State University Digital Repository. It has been accepted for inclusion in Ecology, Evolution and Organismal Biology Publications by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Use of Nuclear Genes for Phylogeny Reconstruction in Plants

Abstract Molecular data have had a profound impact on the field of plant , and the application of DNA- sequence data to phylogenetic problems is now routine. The am jority of data used in plant molecular phylogenetic studies derives from chloroplast DNA and nuclear rDNA, while the use of low-copy nuclear genes has not been widely adopted. This is due, at least in part, to the greater difficulty of isolating and characterising low-copy nuclear genes relative to chloroplast and rDNA sequences that are readily amplified with universal primers. The higher level of sequence variation characteristic of low-copy nuclear genes, however, often compensates for the experimental effort required to obtain them. In this review, we briefly discuss the strengths and limitations of chloroplast and rDNA sequences, and then focus our attention on the use of low-copy nuclear sequences. Advantages of low-copy nuclear sequences include a higher rate of evolution than for organellar sequences, the potential to accumulate datasets from multiple unlinked loci, and bi-parental inheritance. Challenges intrinsic to the use of low-copy nuclear sequences include distinguishing orthologous loci from divergent paralogous loci in the same gene family, being mindful of the complications arising from concerted evolution or recombination among paralogous sequences, and the presence of intraspecific, intrapopulational and intraindividual polymorphism. Finally, we provide a detailed protocol for the isolation, characterisation and use of low-copy nuclear sequences for phylogenetic studies.

Keywords plant systematics, DNA-sequence data, chloroplast DNA, nuclear rDNA

Disciplines Ecology and Evolutionary Biology | Molecular Genetics | Plant Breeding and Genetics

Comments This article is from Australian Systematic 17 (2004): 145, doi:10.1071/SB03015.

Rights Works produced by employees of the U.S. Government as part of their official duties are not copyrighted within the U.S. The onc tent of this document is not copyrighted.

This article is available at Iowa State University Digital Repository: http://lib.dr.iastate.edu/eeob_ag_pubs/42 CSIRO PUBLISHING www.publish.csiro.au/journals/asb Australian Systematic Botany 17, 145–170

L. A. S. JOHNSON REVIEW No. 2 Use of nuclear genes for phylogeny reconstruction in plants

Randall L. SmallA,D, Richard C. CronnB and Jonathan F. WendelC

ADepartment of Botany, The University of Tennessee, Knoxville, TN 37996, USA. BUS Forest Service, Pacific Northwest Research Station, 3200 SW Jefferson Way, Corvallis, OR 97331, USA. CDepartment of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA. DCorresponding author; email: [email protected]

Abstract. Molecular data have had a profound impact on the field of plant systematics, and the application of DNA-sequence data to phylogenetic problems is now routine. The majority of data used in plant molecular phylogenetic studies derives from chloroplast DNA and nuclear rDNA, while the use of low-copy nuclear genes has not been widely adopted. This is due, at least in part, to the greater difficulty of isolating and characterising low-copy nuclear genes relative to chloroplast and rDNA sequences that are readily amplified with universal primers. The higher level of sequence variation characteristic of low-copy nuclear genes, however, often compensates for the experimental effort required to obtain them. In this review, we briefly discuss the strengths and limitations of chloroplast and rDNA sequences, and then focus our attention on the use of low-copy nuclear sequences. Advantages of low-copy nuclear sequences include a higher rate of evolution than for organellar sequences, the potential to accumulate datasets from multiple unlinked loci, and bi-parental inheritance. Challenges intrinsic to the use of low-copy nuclear sequences include distinguishing orthologous loci from divergent paralogous loci in the same gene family, being mindful of the complications arising from concerted evolution or recombination among paralogous sequences, and the presence of intraspecific, intrapopulational and intraindividual polymorphism. Finally, we provide a detailed protocol for the isolation, characterisation and use of low-copy nuclear sequences for phylogenetic studies.

SB03015 UsR.et alL. e .ofS manuclllear genesi n plant phylogeny

Introduction these tools to the exclusion of other, perhaps more The impact of molecular data on the field of plant appropriate, tools is pervasive (Alvarez and Wendel 2003). systematics can hardly be overstated. In combination with Alternatives to cpDNA and rDNA include both explicit methods for phylogenetic analysis, molecular data mitochondrial (mtDNA) and nuclear (nDNA) sequences have reshaped concepts of relationships and other than rDNA. Because of its generally slow rate of circumscriptions at all levels of the taxonomic hierarchy sequence evolution and fast rate of structural evolution (Qiu et al. 1999; Soltis et al. 1999; Crawford 2000). As (Palmer 1992; Palmer et al. 2000), mtDNA generally has molecular phylogenetic studies have accumulated, it has been ignored by plant systematists as a potential source of become apparent that different molecular tools are required data (but see e.g. Qiu et al. 1998, 1999; Freudenstein and for different questions because of varying rates of sequence Chase 2001; Anderberg et al. 2002; Sanjur et al. 2002). For evolution among , genes and gene regions. The this reason, mtDNA will not be considered further in this choice of molecular tool is of paramount importance to review. Nuclear sequences other than rDNA represent most ensure that an appropriate level of variation is recovered to of the DNA contained in any given cell, comprising both answer the phylogenetic question at hand. Nonetheless, the high-copy repetitive DNA (e.g. transposons, centromeric and plant systematics community is using only a small fraction of telomeric repeats), and low- to moderate-copy DNA the available molecular tools. The preponderance of elements, including the majority of genes. The evolutionary molecular data applied to plant systematics problems come dynamics of these two classes of DNA can be dramatically from two sources: chloroplast DNA (cpDNA) or nuclear different. For example, repetitive DNA may experience ribosomal DNA (rDNA). While the contributions of cpDNA non-Mendelian transmission, be subject to concerted and rDNA to plant systematics are undeniable, reliance on evolution, and/or be mobile within a (e.g.

© CSIRO 29 April 2004 10.1071/SB03015 1030-1887/04/020145 146 Australian Systematic Botany R. L. Small et al. transposable elements). Accordingly, distinguishing among closely related (Olmstead and Palmer 1994). sequences related by descent (orthologs) from a sea of As evolutionary divergence increases, structural mutations related (but non-orthologous) sequences may be a challenge. (inversions, indels, and expansion/shrinkage of the inverted Low-copy nuclear sequences on the other hand, typically repeat) become more pronounced, but overall gene content evolve independently of paralogous sequences and tend to be and order remain remarkably consistent. This structural stable in position and copy number (but see e.g. Fu and stability has facilitated the design of ‘universal’ PCR primers Dooner 2002), thereby facilitating identification and and heterologous cpDNA probes. In addition, and most isolation of orthologous sequences. It is this low-copy importantly, it allows the expectation that specific DNA fraction of nDNA that we wish to focus on here, particularly sequences isolated from different species are orthologous. with reference to applications at the level that most Haploidy, uniparental inheritance and the absence of systematists work, namely, among closely related species recombination among cpDNA are also important within one to several genera. We briefly review the pros and features. A central assumption of phylogenetic analysis is cons of using the now-traditional cpDNA and rDNA that terminal taxa are the product of bifurcating sequences. We then focus our attention on low-copy nuclear -splitting events, rather than products of reticulation. sequences, describing methodological, biological and For haploid (and thus non-recombinant), uniparentally conceptual issues surrounding their use in phylogenetic inherited cpDNA molecules, relationships are by definition analysis. Finally, we describe a generally applicable strategy bifurcating rather than reticulate, and hence these are for the isolation, characterisation and phylogenetic use of appropriate terminals for phylogenetic analysis (Doyle low-copy nuclear genes. 1992). Additionally, to the extent that cpDNA is haploid, intra-individual (allelic) variation is absent, thus simplifying Chloroplast DNA analyses. The haploid nature of cpDNA also serves to reduce The most widely used source of data in plant molecular the amount of intraspecific and intrapopulation variation. phylogenetic analyses has been cpDNA, either in the form of Since the effective population size of a haploid genome is restriction site or DNA sequence analysis. The use of cpDNA smaller than that of a diploid genome (1/4 in dioecious has been reviewed extensively (Olmstead and Palmer 1994; plants; 1/2 in monoecious plants), coalescence times and Soltis and Soltis 1998a) and will not be belabored here, but time to fixation of cpDNA within a population several points merit restating, with respect to the nature of are short relative to diploid genomes. cpDNA evolution and the general utility of cpDNA relative These generalisations are, however, not without to nDNA. exceptions. For example, while organellar genomes are assumed to be non-recombinant, evidence from Pinus Evolutionary dynamics of cpDNA contorta (Marshall et al. 2001) suggests cpDNA The primary advantages of cpDNA as a molecular tool lie in recombination may occur in plants, similar to reports of its relatively simple genetics. The chloroplast genome is a mtDNA recombination in hominids (Awadalla et al. 1999; circular found in multiple copies per chloroplast, Eyre-Walker 2000). Uniparental inheritance is usually and contains both coding (gene) and non-coding (intron and assumed in cpDNA studies, and while most studies have intergenic spacer) sequences. The presence of multiple confirmed uniparental inheritance of organellar genomes copies of the chloroplast genome per chloroplast, coupled (maternal in angiosperms, paternal in gymnosperms), with the presence of multiple chloroplasts per leaf cell, exceptions, including both paternal and biparental means that cpDNA is relatively high copy within a typical inheritance, have been reported in angiosperms (Birky 1995; genomic DNA prep. This property is a significant advantage Corriveau and Coleman 1988; Reboud and Zeyl 1994). The as the high copy number facilitates restriction site analysis as primary effect of these processes on molecular systematics is well as PCR amplification of specific cpDNA regions an increase in homoplasy over what might be expected from because higher copy-number sequences are more readily a uniparentally inherited, and thus non-recombining accessible. molecule. Chloroplast DNA is generally characterised as There is some irony in the realisation that the properties structurally stable, haploid, non-recombinant and generally of cpDNA that make it an attractive tool for molecular uniparentally inherited (primarily maternally in systematics also hinder its overall utility in phylogenetic angiosperms, although examples of paternal and biparental analyses. These stem from the propensity of plants to inheritance are also known), all features that facilitate its use hybridise and undergo polyploidisation. Because cpDNA is in systematic studies. Structural stability across large uniparentally inherited and haploid, it reveals only half of the evolutionary scales has been demonstrated by comparative parentage in plants of hybrid or polyploid origin (generally cpDNA mapping and , and numerous studies the seed in angiosperms because of maternal have shown that cpDNA molecules are highly conserved inheritance). Thus, cpDNA analysis of hybrid or polyploid with respect to gene content and arrangement, especially plants may incorrectly identify them as belonging to a Use of nuclear genes in plant phylogeny Australian Systematic Botany 147 of one of the two , without revealing the hybrid tandem arrays of genes composed of hundreds to thousands history. If hybridisation is followed by introgression and of copies per array (Baldwin et al. 1995; Cronn et al. 1996; subsequent fixation of the alien cpDNA, then the phylogeny Hanson et al. 1996b; Kuzoff et al. 1998; Soltis et al. 1997; may be accurately resolved with regard to the maternal Soltis and Soltis 1998b; Wendel et al. 1995a). The higher history; however, cpDNA fails to identify the phylogenetic copy number facilitates evaluation of rDNA by both conflict arising from a hybrid ancestry. restriction site- and PCR-based strategies. In addition, the repetitive structure of these arrays promotes a process of Evolutionary rates of cpDNA homogenisation (‘concerted evolution’: Zimmer et al. 1980; Generally, in plants the mitochondrial genome evolves at the Arnheim 1983) that may result in a single predominant slowest rate, the chloroplast genome at a slightly faster rate sequence across all copies and arrays. This homogenisation and the nuclear genome at the fastest rate (Wolfe, Li et al. allows PCR products to be directly sequenced, generally 1987;Gaut 1998). There is, of course, substantial rate yielding a single dominant sequence that is assumed to be variation within genomes, and coding regions evolve more representative of the underlying genomic sequences. slowly than non-coding regions (introns and intergenic While rDNA has contributed substantially to our present spacers), presumably because of selective constraints. For understanding of plant systematics, the highly repetitive this reason, cpDNA gene sequences (e.g. rbcL, atpB, matK nature of rDNA gives it properties that effect the potential and ndhF) have been used extensively at the family level and utility and reliability of these regions in phylogenetic studies above (Chase et al. 1993; Qiu et al. 1999; Soltis et al. 1999), (Alvarez and Wendel 2003; Bailey et al. 2003). These while non-coding sequences such as introns (e.g. rpL16, potential difficulties stem from the structure and rpoC1, rpS16, trnL, trnK) and intergenic spacers (e.g. evolutionary dynamics of rDNA—specifically, the presence trnT–trnL, trnL–trnF, atpB–rbcL, psbA–trnH) are used of both multiple copies per array and multiple arrays per more frequently at lower taxonomic levels (Taberlet et al. genome, and the presence, absence, or variable strength of 1991; Sang et al. 1997a; Small et al. 1998). Given the concerted evolution. As noted above, concerted evolution relatively slow rate of cpDNA evolution, however, even tends to homogenise sequences within and sometimes non-coding cpDNA regions often fail to provide significant between rDNA arrays. The important phrase here is ‘tends phylogenetic information at low taxonomic levels (Small to.’ The strength of concerted evolution is far from uniform et al. 1998). This latter problem represents a significant across repeat units, arrays and taxa. In the absence of limitation of cpDNA sequences for phylogenetic complete concerted evolution, sequence variants can arise reconstruction, as applicability is restricted to relatively deep and be maintained within and between arrays, yielding divergence histories. multiple distantly related rDNA types within individuals (Suh et al. 1993; Cronn et al. 1996; Buckler et al. 1997; Nuclear ribosomal DNA Hartmann et al. 2001; Mayol and Rossello 2001; Muir et al. To compensate for the limitations of cpDNA, as well as to 2001; Bailey et al. 2003). Indeed, rDNA pseudogenes may be obtain additional and independent estimates of phylogeny, regular constituents of many genomes, representing nuclear rDNA has been widely adopted as a tool in plant phylogenetically distant sequences that may preferentially be systematics, and is now as commonly used as cpDNA. At amplified over functional rDNA loci (Buckler and Holtsford higher taxonomic levels the slowly evolving rRNA genes are 1996; Buckler et al. 1997; Hartmann et al. 2001; Mayol and used (Hamby and Zimmer 1988, 1992; Steele et al. 1991; Rossello 2001; Chase et al. 2003). Soltis et al. 1997; Kuzoff et al. 1998; Soltis and Soltis More problematic, however, is the possibility that such 1998b), while at lower taxonomic levels internal and external erratic variation is the norm for rDNA rather than the intergenic spacers are more commonly employed (Baldwin exception, even though such findings may be generally et al. 1995; Alvarez and Wendel 2003; Bailey et al. 2003). ignored, underreported or undetected (Alvarez and Wendel 2003). For example, polymorphic positions are found Evolutionary dynamics of rDNA frequently in published rDNA datasets. Such polymorphisms In land plants rRNA genes are organised into two distinct generally are simply coded as polymorphic characters for sets of tandem arrays. The first set is composed of 5S rRNA phylogenetic analyses, ignoring the fact that these positions genes and intergenic spacers in tandem arrays at one or more are evidence of unhomogenised sequence variants, perhaps chromosomal loci. The second set includes the of distantly related sequences. In addition, sequence 18S–5.8S–26S rDNA cistron in tandem arrays at one or more polymorphism may go undetected in automated or manual chromosomal loci. Both sets of rDNA arrays have been used sequencing, or else the strongest peak (or band) at a position in systematic studies, although the 18S–5.8S–26S arrays may be scored as the ‘correct’ base. Finally, PCR bias may have been used far more frequently than 5S. As for cpDNA, selectively amplify one of the genomic sequences because of the potential phylogenetic utility of rDNA is facilitated by its differences in genomic copy number or primer affinity structure and . Ribosomal genes exist in (Wagner et al. 1994). The sum of these observations is that 148 Australian Systematic Botany R. L. Small et al. rDNA sequences obtained by direct sequencing of PCR independent confirmation of phylogenetic signal as products may fail to reveal the complexity of nuclear rDNA confirmation of genetic and physical linkage. content, and may in fact preferentially reveal paralogous rDNA sequences in different taxa or accessions. Low-copy nuclear genes While the prevalence of this phenomenon can be difficult While the use of low-copy nuclear genes for phylogeny to ascertain in diploid species, the problem becomes reconstruction is still in its relative infancy, several exacerbated in allopolyploid and hybrid taxa in which conclusions may be drawn regarding both utility and multiple divergent rDNA loci are expected to exist on limitations. Among the advantages of nuclear genes are an different donated by the different parents. For overall faster rate of sequence evolution than in organellar example, allotetraploid (AD-genome) species of Gossypium genomes, the presence of multiple independent (unlinked) contain genomes donated from an African A-genome diploid loci and biparental inheritance. Disadvantages of nuclear and a New World D-genome diploid. Direct sequencing of loci stem primarily from the more complex genetic PCR-amplified ITS fragments (Wendel et al. 1995a) from architecture and evolutionary dynamics of the nuclear the five allotetraploid species revealed only a single rDNA genome and possible difficulties in isolating and identifying sequence type characteristic of either the A- or D-genome orthologous genes. Other relevant issues include the (but not both), despite the fact that Southern hybridisation possibilities of concerted evolution and/or recombination (Wendel et al. 1995a) and FISH (Hanson et al. 1996b) among paralogous sequences and the presence of evidence indicated that both genome types were maintained intraspecific, intrapopulational and intraindividual variation in the allotetraploid. Similar results have been reported in (heterozygosity). allotetraploid Glycine (Rauscher et al. 2002) and elsewhere (reviewed in Alvarez and Wendel 2003; Bailey et al. 2003) Advantages of nuclear genes where biased amplification of ITS sequences from specific parental diploid genomes was observed, perhaps because of Rate variation in nuclear genes differential underlying copy number. One of the primary advantages of nuclear genes for phylogenetic analysis is the elevated rate of sequence Evolutionary rates of rDNA evolution relative to organellar genes. Broad surveys have One of the reasons for adding rDNA sequences to the arsenal revealed that synonymous substitution rates of nuclear genes of tools available to plant molecular systematists is to obtain are up to five times greater than those of chloroplast genes data from a source independent of cpDNA. The other and 20 times greater than those of mitochondrial genes primary reason, especially at lower taxonomic levels, is to (Wolfe et al. 1987; Gaut 1998). This elevated evolutionary obtain sequences that evolve at a faster rate, so that more rate yields a greater efficiency of sequencing effort, since phylogenetically informative characters are obtained. In more variation is detected per unit of sequence than in phylogenetic studies at low taxonomic levels in which both organellar genes. This advantage becomes particularly acute non-coding cpDNA and ITS data are collected for a set of at low taxonomic levels (Small et al. 1998) or among taxa, ITS sequences generally provide greater levels of lineages created by rapid divergence events (Cronn et al. divergence and thus greater resolution and stronger support 2002b; Malcomber 2002). For example, in an analysis of the than an equivalent sample of cpDNA sequence (e.g. Hodges relationships among the allotetraploid species of Gossypium and Arnold 1994; Gielly et al. 1996; Sang et al. 1997a; L. (Small et al. 1998), direct comparison of non-coding Whitten et al. 2000). The more rapid rate of evolution of ITS, cpDNA and nuclear-encoded alcohol dehydrogenase (AdhC) however, is tempered by its relatively short length (generally sequences showed that relationships were incompletely 500–600 bp in angiosperms) and the relatively high levels of resolved and poorly supported by >7 kb of non-coding homoplasy (Alvarez and Wendel 2003). Recently, attempts cpDNA, while data from a 1.65-kb section of AdhC provided to add the flanking external transcribed spacer (ETS) of complete and robustly supported resolution of relationships. nuclear rDNA to supplement ITS sequences have met with Extrapolation of these results suggested that to obtain some success (Baldwin and Markos 1998; Linder et al. equivalent phylogenetic resolution from the cpDNA as the 2000). The internally repetitive structure of the ETS region, AdhC sequences, >40 kb of non-coding cpDNA would have however, can make both PCR amplification and sequence to be sampled. alignment difficult, thus presenting an additional obstacle to While this example highlights a potential advantage of the widespread adoption of ETS in systematic studies. nuclear genes in terms of a faster evolutionary rate, it ignores Finally, it is important to note that because of the linked the huge range of evolutionary rates observed among nuclear nature of the 45S rDNA cistron, the ETS region will fall prey genes, regions within genes, and even sites within genes. For to the same pitfalls and problematic evolutionary dynamics example, the selection of AdhC for the study of Small et al. as ITS. For this reason, congruence between ITS- and (1998) was fortuitous, as this gene is the most rapidly ETS-derived topologies provides not so much an evolving of the Gossypium Adh gene family (Small and Use of nuclear genes in plant phylogeny Australian Systematic Botany 149

Wendel 2000a) and the fifth most rapidly evolving gene Multiple independent loci among a sample of 49 nuclear genes in Gossypium (Senchina In addition to faster evolutionary rates, nuclear genes offer et al. 2003). Variation in overall evolutionary rate is observed another vitally important feature—multiple unlinked loci whenever multiple genes are sampled in a given phylogenetic may be used for independent phylogenetic inference. context. For example, in comparisons of A- and D-genome Corroboration of phylogenetic hypotheses by independent diploid cottons for the Adh gene family of Gossypium, a datasets increases confidence in a given . greater than 3-fold range of variation was observed for Likewise, incongruence between datasets has been of use to synonymous sites (Small and Wendel 2000a). Broader infer evolutionary phenomena such as hybridisation, sampling of 16 mostly anonymous nuclear loci from introgression or lineage sorting (Wendel and Doyle 1998). In Gossypium showed a 5-fold range of variation in overall plant systematics, such comparisons are typically between evolutionary divergence (Cronn et al. 1999), and a recent cpDNA and nrDNA datasets. Given the complete linkage study of 36 ‘fibre-expressed’ genes from the same species among cpDNA sites owing to its structure (a single circular (Senchina et al. 2003) revealed a 7-fold range of variation in ) and its non-recombining nature, multiple overall divergence and a 2.9-fold range in synonymous cpDNA datasets are expected to converge on a single substitution rates. These ranges are in accordance with those topology (irrespective of whether or not this reflects the derived from other well studied groups (Gaut 1998; Mathews organismal phylogeny). Ribosomal DNA offers a single et al. 2002). Such wide ranging variation will clearly alternative from the nuclear genome, but if incongruence influence the lineage-specific utility of a given nuclear gene, between rDNA and cpDNA is identified, independent data and points to the importance of screening multiple loci and must be obtained to sort out the source of the incongruence. choosing markers on the basis of preliminary evidence from Moreover, and as alluded to above and discussed at length the taxa of interest. elsewhere (Alvarez and Wendel 2003), rDNA sequences may In addition to rate variation among genes, there is rate prove phylogenetically unreliable under certain conditions. variation among regions within genes. Nuclear-encoded In Gossypium, unexpected and anomalous phylogenetic genes can generally be divided into different functional results obtained with rDNA (Wendel et al. 1995a, 1995b) regions, such as the 5′ untranslated region (UTR), exons, have been reconciled only through the use of multiple, introns and 3′ UTR. Given the variable functions of these unlinked low-copy loci (Small et al. 1998; Liu et al. 2001; regions, and thus variable evolutionary constraints on Cronn et al. 2002b, 2003), each of which provides an them, considerable variation in evolutionary rates exists independent evaluation of the unexpected results from within a single locus. Important promoter elements rDNA. responsible for gene regulation are generally found in the Low-copy nuclear genes are capable of providing a 5′ UTR which may include relatively conserved domains virtually limitless source of additional, independent (Wray et al. 2003). In some cases, however, 5′ UTRs may phylogenetic information. Because eukaryotic nuclear contain introns that are highly variable (Liu et al. 2001). genomes are composed of multiple chromosomes that Exons are likely to be more conserved at non-synonymous undergo recombination, genes found on different sites (primarily 1st and 2nd codon positions) while chromosomes, or even on the same chromosome if synonymous 3rd codon positions typically diverge at rates sufficiently far apart, are effectively evolutionarily similar to those of non-coding regions. Nuclear independent of each other. Factors that may give rise to spliceosomal introns generally are under fewer functional phylogenetic incongruence between datasets (e.g. constraints at the sequence level, although there often are hybridisation/introgression; non-homologous gene length limits on introns that are important for proper conversion) are likely to affect a limited number of genes in mRNA processing, and occasionally regulatory elements a localised region. Acquisition of data from multiple lie within introns (e.g. Ahlandsberg et al. 2002; Hong et independent regions allows an opportunity to determine al. 2003) that also are likely to be highly conserved. phylogenetic relationships supported by a majority of the Elements in the 3′-UTR sequences control mRNA data and can highlight particular genomic regions that may processing and poly-A addition signals, but otherwise are be problematic (Cronn et al. 2002b, 2003; Rokas et al. often highly variable. Perhaps counterintuitively, 2003). accumulating evidence indicates that synonymous sites within exons (i.e. 3rd codon positions) may exhibit evolutionary rates that are equal to or greater than those Biparental inheritance found at ‘silent’ non-coding sites (introns, UTRs) A final desirable property of low-copy nuclear genes is their (Charlesworth and Charlesworth 1998; Small and Wendel explicitly biparental Mendelian inheritance. Because the 2002; Senchina et al. 2003), presumably because of chloroplast genome is usually uniparentally inherited, conserved structural features and/or existence of cpDNA datasets can tell only half the story in cases of regulatory domains in the latter. hybridisation or allopolyploidisation. Likewise, while 150 Australian Systematic Botany R. L. Small et al.

Fig. 1. Hypothetical scenario of gene duplication, followed by speciation events to depict relationships among orthologous, paralogous and homoeologous gene copies. Orthologous genes are related solely by speciation (e.g. 1A and 1B). Paralogous genes are related by gene duplication and are found both within species (e.g. 1A and 2A) or between species (e.g. 1A and 2B). Homeologous genes are orthologous genes that are related by polyploidisation (e.g. 1A′ and 1C′). A gene duplication occurs near the base of the tree, resulting in two genetic loci (Locus 1 and Locus 2). Two speciation events give rise to three extant species (Species A–C), each of which is diploid and thus contains two haploid genomes (AA, BB and CC, respectively). Hybridisation between species A and C, followed by polyploidisation results in the tetraploid AC species (with four haploid genomes, AACC). Gene copies in the tetraploid are noted with a′ to distinguish them from copies found in the diploid progenitors. Orthologous genes: 1A, 1B, 1C; 2A, 2B, 2C. Paralogous gene pairs: 1A, 2A; 1B, 2B; 1C, 2C; 1A′, 2A′; 1C′, 2C′; 1A, 2B; 1A, 2C; 1A, 2A′; 1A, 2C′; 1B, 2A; 1B, 2C; 1B, 2A′; 1B, 2C′; 1C, 2A; 1C, 2B; 1C, 2A′; 1C, 2C′; 1A′, 2A; 1A′, 2B; 1A′, 2C; 1A′, 2C′; 1C′, 2A; 1C′, 2B; 1C′, 2C; 1C′, 2A′. Homeologous gene pairs: 1A′, 1C′; 2A′, 2C′.

nrDNA is biparentally inherited, the process of concerted allotetraploid Gossypium based on cpDNA identified the evolution, array expansion/contraction and the presence of maternal lineage (Wendel 1989; Wendel and Albert 1992), paralogous sequences can make isolation of both parental and ITS analyses were complicated by bi-directional copies difficult, if not impossible (Rauscher et al. 2002; interlocus concerted evolution (Wendel et al. 1995a), Alvarez and Wendel 2003). analyses of low-copy nuclear genes from tetraploid In contrast, low-copy nuclear genes are less frequently Gossypium have unambiguously identified the lineages subject to concerted evolution (although exceptions have representing the parental donor species (Small et al. 1998; been reported—see below), thus making them ideal Cronn et al. 1999, 2002b; Small and Wendel 2000a; Liu candidates for identifying parental donors of suspected et al. 2001). Low-copy nuclear genes have been used to hybrids or polyploids. While phylogenetic studies of identify the origins of hybrid or allopolyploid taxa in a Use of nuclear genes in plant phylogeny Australian Systematic Botany 151 number of other groups as well (Sang et al. 1997b; Doyle et in the taxa of interest. The usual route taken by many al. 1999a, 1999b; Ford and Gottlieb 1999; Ge et al. 1999; investigators is to design or obtain ‘universal’ primers for a Sang and Zhang 1999; Mason-Gamer 2001). particular gene family, usually by comparison of gene sequences from a wide phylogenetic array of taxa, with Conceptual, methodological and biological issues in primers located in regions of strong sequence conservation using nuclear genes (Sang et al. 1997b; Strand et al. 1997; Mason-Gamer et al. 1998; Small et al. 1998; Evans et al. 2000; Small and Wendel Gene families and the isolation and identification of 2000a). orthologous genes PCR using ‘universal’ primers frequently results in The primary disadvantage of using nuclear genes stems from amplification of more than one product, as evidenced by the complicated genetic architecture of eukaryotic nuclear either multiple amplified bands or obvious sequence genomes and the tendency for nuclear genes to exist in gene heterogeneity in the PCR pool. Resolution of this sequence families—multiple copies of homologous genes related by complexity and subsequent development of locus-specific gene (or genome) duplication (Fig. 1) (Henikoff et al. 1997; primers generally requires isolation of individual PCR Thornton and DeSalle 2000). In plants, nuclear-gene products from the heterogeneous initial amplification families vary widely in size, ranging from genes that appear reaction, often accomplished by cloning PCR products and to be truly single copy in some species (e.g. GBSSI in diploid screening multiple clones. The most efficient approach is to Poaceae; Mason-Gamer et al. 1998) to gene families with select a few representative taxa for this initial screening. dozens to hundreds of copies (e.g. actins: Moniz de Sá and Once all apparent sequence types from the representative Drouin 1996; or small heat-shock : Waters 1995). taxa have been obtained, phylogenetic analysis of those Further complicating the situation is the fact that gene and sequences is performed. Ideally, these preliminary analyses genome duplication (and subsequent gene loss) appear to be should take place in the context of related sequences from ongoing and dynamic processes (Gottlieb and Ford 1996; other taxa in GenBank (http://www.nlm.nih.gov). Clegg et al. 1997; Wagner 1998, 2001; Grant et al. 2000; Appropriate sequences for comparison can often be obtained Lynch and Conery 2000; Small and Wendel 2000a; Bancroft from GenBank by reference to the literature, searching by 2001; Gaut 2001; Ford and Gottlieb 2002). Because of this, gene name or by BLAST searches [Basic Local Alignment characterisation of a gene family necessarily becomes a Search Tool; http://www.ncbi.nlm.nih.gov/BLAST/; lineage-specific problem, and inferences of gene family (Altschul et al. 1990)] for similar sequences. This exercise is structure from one lineage may not be applicable to other an important component of a preliminary study (see below), lineages. For example, plant Adh gene families are often as it can differentiate between relatively recent and ancient considered to be small, with one to three loci present in most gene-duplication events. Confidence in assumptions of diploids (e.g. Chang and Meyerowitz 1986; Sang et al. orthology generally are stronger when the putatively 1997b; Gaut et al. 1999). In contrast, the Adh gene families orthologous sequences in question are monophyletic and in Gossypium (Small and Wendel 2000a) and Pinus (Perry divergent from paralogs. Preliminary phylogenetic analyses and Furnier 1996) have been shown to have a minimum of also provide important information of relative rates of seven loci in diploid species. Phylogenetic analyses of the sequence evolution within and among genes (see below). Adh gene family in angiosperms suggest multiple rounds of Once an initial characterisation of a gene family has been both ancient and recent gene duplication and deletion (Clegg conducted and appropriate loci have been selected for further et al. 1997; Small and Wendel 2000a). study, the next logical step is to develop locus-specific PCR Given the complexity of nuclear genomes, the need to primers. This step can improve efficiency by reducing the characterise gene family composition prior to phylogenetic necessity of cloning heterogeneous PCR products (except for analysis is of paramount importance. Phylogenetic analyses those cases where heterozygosity is encountered). implemented to estimate organismal phylogeny assume that Locus-specific primers have the added benefit of eliminating comparisons are between strictly orthologous gene copies, the possibility of recovering cloned PCR-generated chimeras and the inclusion of a mixture of orthologous and paralogous (PCR-mediated recombinants) that can form whenever two sequences can produce robust, yet erroneous, hypotheses of highly similar templates are co-amplified in a single PCR relationships (Wendel and Doyle 1998). Rigorous reaction (Bradley and Hillis 1997; Cronn et al. 2002a). If elucidation of the size of a gene family, as well as criteria for homogeneous sequences are obtained from direct sequences establishing orthology should be considered a first step in of PCR products from all taxa of interest there is also greater preliminary studies of the potential utility of nuclear genes confidence that a single orthologous locus is being amplified (see below). and sequenced from all taxa. As gene family structure varies widely both among gene Once a group of candidate loci have been identified, families and among plant lineages, initial studies of nuclear orthology must be evaluated empirically. A number of genes should focus on characterising the target gene family criteria that vary in their methodology and assumptions have 152 Australian Systematic Botany R. L. Small et al. been suggested as evidence of orthology. These criteria analysis. In some cases, expression patterns can be inferred include overall sequence similarity, tissue specificity and from sequence data, as is the case with gene families that expression patterns (Doyle 1991; Doyle and Doyle 1999), have both cytosolic- and organellar-expressed genes. Such Southern hybridisation analysis (Cronn and Wendel 1998; cases usually represent ancient gene duplications that are Evans et al. 2000; Small and Wendel 2000a) and readily identifiable from the sequence data alone and the comparative genetic mapping (Cronn and Wendel 1998; evidence of orthology is clear. For example, both the Small and Wendel 2000a; Cronn et al. 2002b). phosphoglucose isomerase (PGI) (Gottlieb and Ford 1996; The simplest approach to identifying orthologs is through Ford and Gottlieb 1999; Ford and Gottlieb 2002) and sequence similarity and phylogenetic analysis. Clearly, glutamine synthase (GS) (Doyle 1991; Emshwiller and orthologs are expected to be more similar and Doyle 1999; Emshwiller and Doyle 2002) gene families phylogenetically more closely related to each other than to contain both cytosolic- and chloroplast-expressed any paralogous sequence, assuming complete sampling of paralogous gene copies. Sequence conservation among these genes within all taxa. This expectation can be violated, classes allows ready identification of the expression pattern however, for at least two reasons. First, complete sampling of of each paralog. Difficulties may arise, however, if a all genes of a gene family in all taxa may not be relatively recent gene duplication creates paralogous gene accomplished, owing to the challenges of generating and copies with similar or identical expression patterns, isolating all relevant gene copies from each in a study. highlighting a second difficulty of using expression patterns PCR amplification with ‘universal’ primers may result in as a criterion of orthology. Specifically, recent gene differential amplification of loci in different taxa because of duplications within a particular expression class may result imperfect pairing between PCR primers and templates or in paralogous gene copies that share both strong sequence relative qualities of template DNA (Wagner et al. 1994). similarity and expression patterns. A recent study of gene Subsequent cloning of heterogeneous PCR products samples pairs duplicated by polyploidy in allotetraploid cotton only a subset of the PCR products present, and screening of (Adams et al. 2003) shows how subsequent expression numerous clones (by sequencing or restriction digestion) is patterns of duplicated genes may be not only tissue specific, required to differentiate orthologs from paralogous but locus- or genome-specific. This could create a situation sequences. Subsequent phylogenetic analysis of the whereby one copy (e.g. the A′ genome homoeolog in Fig. 1) sequences may reveal two sequences from different taxa that would be expressed in one species, while the other copy (e.g. are more closely related to each other than either is to the C′ genome homoeolog in Fig. 1) could be expressed in a suspected paralogs; yet these sequences may indeed be closely related species. For this reason estimating paralogous rather than orthologous if they are related by a orthologous relationships among sequences on the basis of recent gene-duplication event and both paralogs are not tissue-specific mRNA pools could prove particularly sampled in both taxa. The second issue that on this challenging, especially if polyploids were included in the approach is the problem of either in vivo or in vitro analysis. An additional example is provided by the (PCR-mediated) recombination. In this scenario, closely chloroplast-expressed PGI genes of Clarkia (Onagraceae) related paralogous loci may undergo non-homologous (Gottlieb and Ford 1996; Ford and Gottlieb 1999; Ford and recombination, and phylogenetic analysis of the resulting Gottlieb 2002). chimeric sequences will confound rather than illuminate A third and more practical criterion that can strengthen relationships and inferences of orthology/paralogy. It should inferences of orthology are Southern blot hybridisation be noted that this sequence-based method of determining experiments. Once a suite of sequences have been obtained orthology represents minimal evidence for orthology and is from a given set of taxa, comparisons of sequence identity typically included as a precursor to the following three between classes of loci can be used to identify regions that methods of homology assessment. are ‘locus-specific,’ i.e. sequence motifs found in only one of A second approach to identifying orthologs is to use the sequence types or are sufficiently divergent between loci shared expression patterns as evidence of orthology (Doyle to minimise cross-hybridisation under conditions of high 1991; Doyle and Doyle 1999). One of the features of stringency. Generally, introns and 3′ UTRs make the best nuclear-gene families may be differential expression patterns candidates for such motifs. Locus-specific hybridisation (either timing or tissue specificity) of paralogous genes. In probes are then designed from these regions, amplified by fact, gene families may exist primarily because paralogous PCR, and used to probe restriction-digested genomic copies are adapted to function differentially, and if of representative taxa. Results from high-stringency orthologous genes serve the same function in different hybridisations using these ideal probes provide an species they can be expected to maintain similar expression opportunity to ‘count genes,’ since each band on the patterns. However, expression data may be difficult to resulting autoradiograph represents a single locus. If a single obtain, often requiring either detailed RNA (e.g. Northern band is detected using this approach in the taxa examined, blot or RT–PCR) or (Western blot or isozyme) this constitutes strong evidence that the locus is unique, and Use of nuclear genes in plant phylogeny Australian Systematic Botany 153 by inference, orthologous among taxa. If multiple bands are heterogeneity in these four species. Given this information, detected under stringent hybridisation conditions, then either we conducted Southern hybridisation experiments on these (1) a restriction site was present within the probe region species and found that while all other D-genome species (which can be avoided by choosing appropriate displayed a single AdhA-hybridising fragment, all species of and/or evaluating multiple enzymes), or (2) there are section Erioxylum displayed two hybridising fragments. multiple, closely related (but paralogous) loci that share high Apparently, a gene-duplication event confined to these sequence identity with the probe. species had occurred, and both loci were being amplified An example of the power of this approach is provided by with our PCR reaction conditions. the Adh gene family of Gossypium that contains a minimum Finally, the most rigorous approach for demonstrating of seven loci. Southern hybridisation experiments were orthology can be obtained by comparative genetic-mapping informative in identifying both single-copy orthologous loci studies. In practice, such analyses are restricted to taxa for shared among species as well as documenting a small gene which genetic-mapping experiments are already underway ‘subfamily’—a group of closely related genes resulting from for other reasons, simply because of the extensive time and recent gene duplications (Small and Wendel 2000a). The effort required for comparative genetic mapping. AdhA gene in Gossypium is a truly single-copy gene, and this Nevertheless, it is important to note that comparative is reflected in the Southern hybridisation results—using a mapping studies have been completed for a wide variety of small intron probe, a single band was detected in diploid plant lineages (especially crop plants), including, for species screened, and two bands were detected in the example, Asteraceae, , Chenopodiaceae, allotetraploid species, reflecting the additivity of the loci Fabaceae, Malvaceae, Pinaceae, Poaceae, Rosaceae and contributed from their diploid progenitors. . Information from these ‘data rich’ species can ‘AdhB’ in Gossypium, on the other hand, was found to be used to bolster inferences of orthology to more distant include several closely related sequence types. Again, by relatives. While evidence from sequence identity, shared using a small intron probe, Southern hybridisation showed a expression patterns and Southern hybridisation analysis can complex banding pattern with several hybridising fragments, all assist the process of inferring orthology, the retention of despite the fact that PCR-based experiments had isolated shared genomic position among species is arguably the most only a single AdhB-like sequence type. This conundrum was rigorous evidence of orthology. resolved when independent sequence data from genomic and cDNA clones of Adh sequences from Gossypium were Gene families and rate variation among genes described (Millar and Dennis 1996). The Adh sequences In addition to the requirement of orthology, rate variation is described by (Millar and Dennis 1996) were subsequently an important consideration when choosing an appropriate included in a phylogenetic analysis, with Adh sequences gene. Preliminary data from representative species for all isolated via PCR in our laboratory (Small and Wendel potential loci provide the raw data required to inform this 2000a) and found to be closely related but not orthologous to decision. Given the observation of extensive rate variation AdhB, and thus, to constitute a small gene subfamily that among nuclear genes at all levels (among gene families, presumably resulted from recent gene-duplication events. among genes within gene families, among regions within Given these data and evidence of potential interlocus genes, among plant lineages: see e.g. Gaut 1998; Gaut et al. recombination (Millar and Dennis 1996), it was determined 1996; Senchina et al. 2003; Small and Wendel 2000a), the that AdhB would be a poor choice for use in phylogenetic gene selected should provide an appropriate level of studies. sequence variation to answer the question being asked. The Screening exemplar taxa by Southern hybridisation, while appropriate level of variation depends on the level of clearly the most efficient approach for counting genes, may resolution being sought: inter- or intrafamilial studies may not always prove completely effective. While preliminary utilise more slowly evolving sequences than studies at the screening of AdhA in Gossypium indicated that AdhA was a inter- or intraspecific level. Further, the regions of a single-copy gene in the three diploid species surveyed particular gene that are to be used can vary from question to (spanning the diversity of Gossypium), a subsequent question—exons are generally easily alignable across wide phylogenetic study using AdhA of the 13 D-genome diploid phylogenetic distances (in many cases even among extant species showed a previously undetected AdhA land plants), while introns are often unalignable outside of gene-duplication event (Small and Wendel 2000b). During individual genera. Accordingly, questions focused at higher the course of data collection for this study, all species taxonomic levels may choose to exploit regions with high surveyed showed little to no sequence heterogeneity, with the exon content, while lower-level studies may emphasise exception of the group of four species that constitute intron sequence. Gossypium section Erioxylum subsection Erioxylum. Direct While examples of particular plant lineages that have sequencing of PCR products of AdhA (amplified using AdhA been broadly sampled for nuclear genes are relatively few, locus-specific primers) detected extensive sequence the available examples highlight the range in variation (and 154 Australian Systematic Botany R. L. Small et al. thus phylogenetic utility) found thus far. Our work in of this work has been on model systems such as maize, Gossypium has resulted in characterisation of a large number Arabidopsis and cotton (Miyashita et al. 1996; Gaut and of nuclear genes across the well established phylogeny of the Doebley 1997; Innan and Tajima 1997; Kawabe et al. 1997; primary genome groups. More than 50 nuclear loci have Hilton and Gaut 1998; Purugganan and Suddith 1998; been sequenced from multiple species, and the range of Kawabe and Miyashita 1999; Small et al. 1999; Kuittinen phylogenetic utility spans from an almost complete lack of and Aguade 2000; Small and Wendel 2002). variation to highly variable loci that provide robustly Allelic variation within species may take two forms, with resolved and supported depictions of relationships (Small et different implications for phylogenetic studies. First, al. 1998; Cronn et al. 1999, 2002b; Small and Wendel 2000a, irrespective of the extent of allelic variation, if all alleles of 2000b; Senchina et al. 2003). a species coalesce within that species (i.e. all alleles of a Other well studied examples include Zea (Gaut and Clegg species are more closely related to each other than they are 1993; Hanson et al. 1996a; Gaut and Doebley 1997; to any allele of a different species), then allelic variation is Eyre-Walker et al. 1998; Gaut 1998, 2001; Hilton and Gaut irrelevant to the ultimate goal of recovering a species 1998; Zhang et al. 2001), Poaceae (Mathews and Sharrock phylogeny. This type of allelic variation may be useful, 1996; Mason-Gamer et al. 1998; Mathews et al. 2000, 2002; however, for intraspecific studies, e.g. ascertaining Mason-Gamer 2001), Brassicaceae (Galloway et al. 1998; population-level relationships, , and studies Bailey and Doyle 1999; Bailey et al. 2002), Paeonia (Sang of rates and patterns of sequence evolution. et al. 1997b; Sang and Zhang 1999; Ferguson and Sang A second possibility is that allelic variation spans 2001; Tank and Sang 2001; Sang 2002), and Glycine (Doyle species boundaries (deep coalescence); i.e. some alleles of et al. 1996, 1999a; 1999b, 2000, 2002; Doyle and Doyle a species are more closely related to alleles of other 1999). In all of these examples, variation in the relative species than they are to those of the same species. A phylogenetic utility of nuclear genes is evident, again number of population genetic phenomena can give rise to highlighting the need for preliminary studies to determine this commonly observed pattern. One primary cause lies in the most appropriate locus (or loci) for a given question. the population genetics of nuclear genes. Because of the greater effective population size and faster mutation rate Intraspecific variation in nuclear genes of nuclear genes relative to organellar genes, coupled with Two features of nuclear genes that require attention in the process of recombination, extensive intraspecific phylogenetic studies are the high probability of allelic allelic variation is both expected and observed in species variation within and among populations of a species, and with sufficient population sizes. When speciation occurs, alleles that are shared between species. Because of the it is likely that both descendants of an ancestral species smaller effective population size of organellar genes, and will contain some, if not all, of the allelic variation present concerted evolution in nrDNA sequences, allelic variation is in the ancestral species. If the alleles contained in each of often low for these markers. Accordingly, a single or just a the descendant species are reciprocally monophyletic, then few individuals of a species are often sampled as alleles will coalesce within species and phylogenetic placeholders, with the implicit assumption that all alleles analysis using any alleles should reflect this history. If, on from individuals of that species will be more closely related the other hand, allelic variation in one or both of the to each other than they are to any other species. When descendant species is not monophyletic, then trans-species sampling of individuals is increased, this assumption is often polymorphism will be observed. borne out, although exceptions do exist, especially with In exceptional cases, natural selection acts to promote this nrDNA (Suh et al. 1993; Mason-Gamer et al. 1995; Levy et trans-species polymorphism. This is the case for genes that al. 1996; Mayer and Soltis 1999; Hartmann et al. 2001; undergo balancing selection where natural selection acts to Mayol and Rossello 2001; Muir et al. 2001). Allelic variation promote and maintain allelic variation. Striking examples of within individuals, populations and species in low-copy this phenomenon are found for genes involved in nuclear genes, however, can be extensive. This is due to the self-recognition (reviewed in Klein et al. 1998), for example larger effective population size of nuclear genes than that of major histocompatibility (Mhc) loci in mammals (Hughes organellar genes, the process of allelic recombination and the and Yeager 1998) and self-incompatibility (S-genes) in faster rate of molecular evolution of nuclear genes than that plants (Charlesworth and Awadalla 1998). In the case of of organellar genes. Evidence of such variation was first S-allele variation, analyses in Solanaceae have revealed recognised in isozyme analyses (Crawford 1985; Gottlieb allelic variation that exceeds not only species boundaries, but 1977), where extensive allelic variation was detected in even generic boundaries (Richman et al. 1996; Richman and many species, but also shared allelic variation among species Kohn 1999). While clearly of interest for molecular was evident. As results from sequencing studies of plant evolutionary studies, loci that tend to undergo balancing nuclear genes have accumulated, the presence of substantial selection would be poor choices for phylogenetic analyses allelic variation has been confirmed, although the majority attempting to infer species histories. Use of nuclear genes in plant phylogeny Australian Systematic Botany 155

Less dramatic examples of trans-specific polymorphism Recombination and concerted evolution are also evident for genes in which segregation of ancestral An additional feature of nuclear genes that has implications polymorphism may be the sole cause. Several examples from for phylogenetic studies is the potential for recombination maize show such a pattern. For example, alleles found in Zea both at individual loci (thus generating additional allelic mays ssp. mays are found phylogenetically intermingled with diversity), and more importantly, between paralogous genes. alleles from other subspecies of Zea mays or even other Such recombination can take place either in vivo or in vitro species of Zea (Hanson et al. 1996a; Eyre-Walker et al. 1998; (i.e. PCR-mediated recombination). Hilton and Gaut 1998; Wang et al. 1999; White and Doebley Allelic recombination (including both crossing-over and 1999; Gaut 2001; Zhang et al. 2001). Studies of several gene conversion) results in alleles that are chimeric between nuclear genes in Leavenworthia (Brassicaceae) have shown parental alleles. This phenomenon violates the assumption of similar sharing of alleles across species boundaries phylogenetic analysis that relationships among terminals are (Charlesworth et al. 1998; Liu et al. 1998; Filatov and strictly bifurcating, rather than reticulate. If, however, alleles Charlesworth 1999). A study of diversity in two within species are monophyletic, this phenomenon will not pairs of homeologous Adh loci in allotetraploid Gossypium affect reconstruction of species relationships from a gene hirsutum and G. barbadense revealed an unusual pattern of tree, although it may introduce homoplasy into the non-coalescence (Small and Wendel 2002). In this particular phylogenetic analysis (Doyle 1995, 1997). Analytical tools case, four loci were sampled (two AdhA and two AdhC loci, are available for identifying putatively recombinant one from each of the diploid progenitors). For three of four sequences and their potential parental alleles (Stephens loci, alleles coalesced within species; however, for the fourth 1985; Sawyer 1989; Jakobsen and Easteal 1996; Grassly and locus (AdhC in the D-genome of the tetraploids) alleles found Holmes 1997; Drouin et al. 1999). Thus, while allelic in G. hirsutum were placed phylogenetically into two separate recombination has implications for phylogenetic analysis, it . One of these clades included only G. hirsutum does not necessarily prevent accurate reconstruction of sequences, but the second clade included all G. barbadense supra-specific phylogenies from gene sequence data. alleles as well as several G. hirsutum alleles. This observation Recombination among paralogous sequences is significant because, regardless of the underlying cause (non-homologous recombination), however, is of greater (non-coalescence or introgression), it was found for only one consequence. This phenomenon may take place only of four loci, indicating that these population-genetic sporadically, or be a general feature of a particular gene or phenomena can act differentially across loci. gene family. One noted example is concerted evolution, which A second cause of trans-species polymorphism is tends to homogenise rDNA sequences both within and among hybridisation and introgression. A prominent feature of plant rDNA repeats (Zimmer et al. 1980; Arnheim 1983; Baldwin population biology, hybridisation, and the subsequent et al. 1995; Elder and Turner 1995). In the particular case of possibility of introgression, may be responsible for instances rDNA, which generally consists of thousands of repeats per of alleles shared among species. Many well known examples locus, concerted evolution tends to homogenise repeats such have been described of introgression of chloroplast DNA that a single predominant allelic form exists. Concerted (‘chloroplast capture’) and many isozyme studies have evolution appears to be a common feature of highly repetitive documented instances of putative introgression of nuclear nuclear sequences. Low-copy nuclear genes are not free from genes (Rieseberg and Soltis 1991; Rieseberg and Wendel concerted evolution, however, and examples exist of 1993; Rieseberg 1997). Few studies have addressed the concerted evolution even among fairly small gene families prevalence of nuclear-gene introgression, however, despite (e.g. rbcS: Clegg et al. 1997). the power of phylogenetic analysis of variable nuclear genes The effect of concerted evolution (or any recombination to illuminate the phenomenon (Sang and Zhang 1999). In the among paralogous sequences) on phylogenetic analysis case of trans-specific allelic variation in maize, the variation depends on its extent. As noted by Sanderson and Doyle may reflect a combination of non-coalescence and (1992), if concerted evolution among members of a gene introgression between subspecies of Zea mays (Hanson et al. family is absent, then complete sampling of all genes from 1996a; White and Doebley 1999). The paucity of empirical all species will result in an orthology–paralogy tree (OP tree) data may stem from the theoretical difficulty of in which sequences of orthologous loci are all more closely distinguishing non-coalescence or lineage sorting from related to each other than to any paralogous sequence, and introgression, as both processes are expected to result in phylogenetic relationships among each ortholog may be similar patterns of allele sharing. Arguing for one or the expected to reflect organismal relatioships. If, at the other other generally requires independent evidence, e.g. extreme, concerted evolution results in complete biosystematic evidence that hybridisation is occurring, or homogenisation of members of a gene family (as is often geographic evidence that allele sharing occurs only in assumed in studies of rDNA), then sampling of any gene of regions of sympatry between putatively introgressing species a gene family will result in its correct phylogenetic (e.g. Doyle et al. 1999a). 156 Australian Systematic Botany R. L. Small et al. placement (assuming concerted evolution occurs at a rate steps (Fig. 2). These include selecting candidate genes and faster than speciation). If, however, concerted evolution representative taxa for a preliminary study, isolating occurs but is incomplete, then sampled genes may represent candidate genes from representative taxa, assessing a mixture of orthologous and non-homogenised paralogous orthology among sequences isolated from representative sequences. Accurate reconstruction of organismal taxa, assessing relative rates of sequence evolution in order phylogenies from such data are practically impossible to choose among potential loci and finally, generating (Sanderson and Doyle 1992). sequences from the taxa of interest from the chosen loci. In The probability of non-homologous recombination an effort to facilitate a general application of these sequential appears to be influenced by several factors. Principle among experimental necessities, we will use as an example our these may be the degree of sequence similarity among previous investigations of the alcohol dehydrogenase (Adh) paralogs, and the genomic proximity of paralogs. As gene family in Gossypium (Small and Wendel 2000a). recombination is a similarity-driven process, paralogous sequences that are more closely related (recently diverged) Selection of candidate genes are more likely to experience non-homologous One of the great advantages of using nuclear genes for recombination. This can create a circular pathway of phylogenetic analysis is the practically unlimited number of recombinational events—after duplication, paralogous genes from which to choose (e.g. the sequences diverge in sequence, but if they retain sufficient nuclear genome contains about 26000 genes: The similarity, non-homologous recombination may result in Arabidopsis Genome Initiative 2000). While there clearly homogenisation of the paralogs, which leads to greater are differences among genes and gene families in their sequence similarity, which can then promote inter-locus potential phylogenetic utility, the huge number of possible recombination. Genomic proximity may also play an genes from which to choose indicates that a multitude of important role in the probability of non-homologous genes exist that will be useful in any given plant lineage at recombination. Because recombination occurs primarily any given phylogenetic level. It is important to note that there between homologous chromosomes, paralogous loci that are is no a priori reason to expect that any particular gene or closely linked on a chromosome may be more likely to gene family will be universally useful at any given undergo non-homologous recombination than paralogous phylogenetic depth because of the vagaries of the loci that are located on separate chromosomes. evolutionary dynamics of gene families. However, it is even Finally, the methodological concern of PCR-mediated more important to note that with a reasonable investment in recombination must be addressed, as most gene sequences characterisation, almost any gene will prove to be useful at used in molecular phylogenetic studies are generated via some level. Thus, there is no reason to limit explorations to PCR. PCR-mediated recombination is a well characterised those genes that have previously been shown to be useful in phenomenon (Myerhans et al. 1990; Bradley and Hillis other plant groups. To paraphrase and emphasise this point, 1997; Cronn et al. 2002a) that results from either there is nothing ‘special’ in the attributes of commonly used template-switching during PCR or from incompletely genes such as Adh, particularly in-as-much as relatively extended copies from one locus serving as a primer for unexplored genes (Liu et al. 2001; Malcomber 2002; Wendel subsequent extension from a paralogous locus. This is a et al. 2002) and even anonymous nuclear loci (Blake et al. notable problem for analyses of low-copy nuclear genes 1999; Cronn et al. 2002b) have proven phylogenetically because studies often utilise general or universal PCR useful. primers that amplify multiple loci, especially during Given the diversity of genes from which to choose, where preliminary stages of a study, prior to the development of should one start? One logical place is with an assessment of locus-specific primers (Sang et al. 1997b; Evans, Alice et al. genes that have been used by previous workers in taxa related 2000; Small and Wendel 2000a; Cronn et al. 2002a). Similar to the group of interest. While the list continues to grow, at to in vivo non-homologous recombination, the propensity for present a relatively small number of gene families have been PCR-mediated recombination may depend on the degree of widely investigated for their phylogenetic utility in plants sequence similarity among paralogs as well as PCR reaction (Sang 2002). These include Adh (alcohol dehydrogenase: conditions. Specifically, as noted by Cronn et al. (2002a), e.g. Gaut and Clegg 1991, 1993; Morton et al. 1996; Sang annealing temperature, amplicon length and extension time et al. 1997b; Small et al. 1998; Ge et al. 1999; Small and are factors that may be optimised to avoid PCR-mediated Wendel 2000a, 2000b), G3PDH (glyceraldehyde recombination. 3-phosphate dehydrogenase: e.g. Olsen and Schaal 1999; Olsen 2002), GBSSI (granule-bound starch synthase: e.g. Procedure to determine appropriate nuclear genes for Mason-Gamer et al. 1998; Evans et al. 2000; Mason-Gamer phylogenetic analysis 2001; Evans and Campbell 2002; Small 2003), MADS-box When planning a phylogenetic study that includes nuclear genes (e.g. pistillata, apetala1, apetala3, leafy: e.g. Bailey genes, a generalised protocol would entail several sequential and Doyle 1999; Barrier et al. 1999; Bailey, Price et al. Use of nuclear genes in plant phylogeny Australian Systematic Botany 157

Fig. 2. Generalised protocol describing the steps necessary in foundational studies of the potential phylogenetic utility of nuclear genes.

2002), PHY (phytochrome: e.g. Mathews and Sharrock not thousands of genes. This wealth of systematically useful 1996; Lavin et al. 1998; Simmons et al. 2001; Mathews et al. information has been generated as a consequence of the 2002) and PGI (phospho-glucose isomerase: e.g. Gottlieb many ‘genome projects’ being undertaken worldwide. and Ford 1996; Filatov and Charlesworth 1999; Ford and A first test of potential utility of a given gene or gene Gottlieb 1999, 2002). Many of these genes have been family, then, is whether or not it proved useful in other investigated in a wide array of taxa, including most major systems at a phylogenetic depth similar to that being angiosperm groups. In addition, however, many model attempted (e.g. interspecific, intergeneric, interfamilial). A plants, including most major crops, have associated with second advantage of a literature survey is that papers will them extensive libraries of gene sequences for hundreds if generally describe the methodology used to isolate specific 158 Australian Systematic Botany R. L. Small et al. genes and include PCR-amplification primers. These may Alignment Search Tool: http://www.ncbi.nlm.nih.gov/ include general primers that were used for preliminary BLAST/). This search tool compares an input sequence to all investigations, as well as locus-specific primers that were of the sequences in GenBank and identifies those sequences used to amplify single members of a gene family. At the time that have high sequence similarity to the input sequence. One we began our investigations of Gossypium nuclear genes, the of the advantages of BLAST searching is that results are Adh gene family was the most widely investigated retrieved without prior knowledge of gene name or nuclear-gene family in plants, making it a good candidate for taxonomic rank. If preliminary sequence data are available further study. Additionally, previous isozyme studies in for some taxon of interest (e.g. from a genome project), then Gossypium had shown that ADH protein products BLAST searches can identify sequences in GenBank that are (isozymes) were variable among Gossypium species. closely related to the sequence of interest, which can then Primary sources of candidate genes are the publicly provide useful information for primer design, identification available DNA-sequence databases (GenBank—United of gene structure (exons and introns) and preliminary States National Center for Biotechnology Information: phylogenetic analyses. In addition to searches in the general http://www.ncbi.nlm.nih.gov/; EMBL—European GenBank database, searches can be made through specific Molecular Biology Laboratory—European sequence sets such as ESTs (expressed sequence tags) and Institute: http://www.ebi.ac.uk/embl/index.html; DDJB— STSs (sequence-tagged sites) that are not included in the DNA Data Bank of Japan: http://www.ddbj.nig.ac.jp/). DNA general nucleotide database searches, as well as with specific sequences deposited in any of these databases are ultimately whole genomes (e.g. Arabidopsis thaliana and Oryza shared among all three, thus searching through one database sativa—the only plants with complete genome sequences). generally is sufficient, and because of our familiarity with Given the large number of genome sequencing, EST and GenBank, it will be used as an example in the following STS projects currently underway for various (see discussion. Searching for candidate genes by using the DNA NCBI’s Plant Genomes Central: http://www.ncbi.nlm.nih. sequence databases can be conducted in several ways. First, gov/genomes/PLANTS/PlantList.html), these searches may taxonomic queries can be made at any level of the taxonomic be especially useful for identifying candidate genes. hierarchy through the section of GenBank (http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome. Selection of representative taxa html/). Note here that GenBank uses a specific taxonomic When screening candidate genes for potential phylogenetic hierarchical nomenclature for plants that follows recent utility, one efficient approach is to select several taxa that phylogenetic studies (APGII 2003), and which must be represent the range of diversity that is ultimately to be followed for successful searches. Within angiosperms, included. This approach allows the preliminary study to be groups are classified not only by traditional ranks (e.g. relatively contained in terms of the number of sequences that subclasses, orders, families), but also by rankless but well must be generated, while still providing enough data to make accepted groups (e.g. eudicotyledons, eurosid I). If searches an informed decision with respect to phylogenetic utility. for a particular genus or species do not result in any Preliminary studies should include a minimum of two taxa, sequences of interest, the next higher level of the taxonomic as this is required to understand relative levels of divergence. hierarchy can be investigated. Preferably, up to five taxa should be included to provide a Queries can be made by gene name in the search window more comprehensive overview of potential phylogenetic of the GenBank homepage, if a particular gene family is of utility. The specific choice of taxa will depend on the amount interest. Again, nomenclature is an issue that must be dealt of preliminary data from other sources (taxonomy, with, given the variable names used for some genes. For biosystematic studies, other phylogenetic hypotheses) that is example, searches for ‘alcohol dehydrogenase AND available. For example, if a study is to investigate Magnoliophyta’ result in matches not only for medium-chain relationships among species within a genus, then species alcohol dehydrogenase sequences that have been the primary representing the major groups within that genus should be focus of molecular systematic and molecular evolutionary selected on the basis of, for example, taxonomic studies, but also matches for short-chain alcohol classification of species into subgenera or sections, dehydrogenases, cinnamyl alcohol dehydrogenases, cytogenetic data grouping species into genome groups, or unknown mRNA and genomic clones with some sequence previous phylogenetic hypotheses. similarity to alcohol dehydrogenase genes. Further, some A second criterion that should be used when considering genes may be deposited in GenBank as ‘alcohol choice of taxa to use in pilot studies is ploidy level. Given the dehydrogenase’, while others may be deposited as ‘Adh.’ added layer of sequence complexity that accompanies Multiple searches using different combinations of gene and polyploidy, it is prudent to select taxa at the lowest possible taxon names may be necessary to identify candidate genes. ploidy level for initial investigation. As ploidy level A final method of searching GenBank that can be increases, gene family size for any given gene also increases, especially useful is BLAST searching (Basic Local making it more challenging to confidently identify Use of nuclear genes in plant phylogeny Australian Systematic Botany 159 orthologous sequences. Multiple homeologous loci If locus-specific primers are available (i.e. have been (orthologous loci donated by the progenitors of the designed by other research groups), then PCR amplification allopolyploid) are expected to be found in allopolyploids, and sequencing can be relatively straightforward. It is which must be distinguished from paralogous sequences. necessary, however, to remain mindful of complications that Third, taxa for which significant background information may arise from heterozygosity, particularly if alleles differ in is available are better choices for preliminary studies than length. If, on the other hand, little or nothing is known about relatively unknown taxa. Such background information may a given candidate gene from the taxa of interest, an approach include ploidy level (see above), cytogenetic characterisation, using general primers is necessary. This approach involves previous knowledge of phylogenetic relationships, and/or either development of general gene-family primers or use of previous molecular biological studies. All of this information primers developed by others. Such primers usually are may help place preliminary molecular phylogenetic data into designed by comparing gene sequences of available genes of a useful organismal and molecular framework, which may the gene family (e.g. downloaded from GenBank) and then guide further experimental choices. placing primers in regions of high sequence conservation Finally, representative taxa for which sufficient quantities (e.g. highly conserved exon sequences). General primers can of high-quality template DNA are available should be be designed to incorporate some polymorphism within the chosen. Preliminary studies may involve numerous PCR primer if necessary, and it is useful to be aware of codon experiments to find optimal PCR conditions, and successful structure of the exons where primers are to be placed. As the PCR amplification is often dependent on DNA quality. 3′ end of the primer is the most important in terms of Additionally, if Southern hybridisation experiments are to be homology with the target, and especially the last 3′ performed (see below), large quantities (5–10 µg per nucleotide, the 3′ end of the primer should be placed at a digestion) of restriction -digestible DNA must be second codon-position nucleotide, as these are likely to be available. the most highly conserved. General primers have been For our studies of Adh in Gossypium we chose three developed for a number of gene families, including representative taxa for our initial studies. There are ~50 Adh—alcohol dehydrogenase (Sang et al. 1997b; Gaut et al. species of Gossypium distributed circumtropically with a 1999; Small and Wendel 2000a), GBSSI—granule-bound well developed infrageneric classification system (Fryxell starch synthase I (Mason-Gamer et al. 1998; Evans et al. 1968, 1979, 1992). There are three primary centers of 2000), GS—glutamine synthetase (Emshwiller and Doyle diversity in Gossypium: Australia, Africa and the New 1999) and PHY—phytochrome (Mathews and Sharrock World. Additionally, previous cytogenetic studies in 1996). In addition, the publication of Strand et al. (1997) Gossypium (reviewed in Endrizzi et al. 1985) had identified gives primers for a number of different nuclear-gene eight different genome groups, and phylogenetic studies families. based on cpDNA restriction-site data (Wendel and Albert Once primers have been designed or obtained, initial PCR 1992) had identified major clades. On the basis of this wealth optimisation experiments should be performed. Because of preliminary data we chose three species representative of general primers have not been designed to be locus-specific, the major groups: G. robinsonii (Australian C-genome), the number and size of PCR amplicons from a given taxon G. herbaceum (African A-genome), and G. raimondii (New cannot be predicted a priori. The efficiency of amplification World D-genome). Although Gossypium includes both of various loci may be affected by a number of factors, diploid and allotetraploid species, the representative taxa including template DNA quality, efficiency of the PCR chosen were all diploids to simplify estimates of gene copy polymerase (Taq, or other available polymerases), primer number. It should be noted that while the amount of annealing temperature, MgCl2 concentration and the use of preliminary data available in Gossypium facilitated our other PCR additives. Relative to other loci amplified for selection of representative taxa, any one of the above sources molecular phylogenetic studies (e.g. cpDNA, rDNA), of information (taxonomy, distribution, cytogenetics, nuclear genes exist in much lower copy number and hence previous phylogenetic hypothesis) alone would have are more difficult to amplify. In many cases, significant provided sufficient information to make an informed experimental effort is required to optimise conditions. decision on which taxa to include. Template DNA quality is important, especially when using general primers that may not match the template Isolation of candidate genes from representative taxa exactly. Optimal tissue for DNA extraction may vary across Once candidate genes and exemplar taxa have been chosen, lineages. In our experience in Malvaceae, DNA extracted the next step is to generate preliminary sequence data. This from fresh, young leaf material generally provides the step generally is accomplished via PCR amplification using best-quality DNA. Other researchers (personal either general gene family primers or locus-specific primers. communication from a reviewer) have found that DNA The choice of approach is dictated by the amount of extracted from properly processed silica gel-dried material preliminary data available to the investigator. also provides high-quality DNA. In addition, in some groups 160 Australian Systematic Botany R. L. Small et al.

Fig. 3. Gel photo demonstrating the effect of MgCl2 concentration and annealing temperature on the amplification of nuclear genes from genomic DNA. A single genomic DNA (Gossypium raimondii) was amplified with universal Adh primers across a range of annealing temperatures (46–64°C), with either 3.0 or 2.0 mM MgCl2 final concentrations. As annealing temperature increases, the number of amplified products decreases. As the MgCl2 concentration increases, the number of amplified products increases. very young leaves may contain PCR-inhibiting compounds, paralogous copies of most genes are expected to be amplified indicating that mature leaves may be more suitable tissue with a general primer set, it may be desirable in some cases sources for DNA extraction (personal communication from to amplify multiple members of a gene family for initial a reviewer). While typical CTAB DNA isolation procedures characterisation. At low annealing temperatures and high (e.g. Doyle and Doyle 1987, 1990) often result in amplifiable MgCl2 concentrations a large number of bands often are DNA for high-copy templates, these DNAs may retain observed, many of which may turn out to be non-specific sufficient impurities to inhibit the more selective amplification products (i.e. not the target sequence). amplification of low-copy nuclear genes. In these cases, the Alternatively, at high annealing temperatures and low MgCl2 use of DNA extraction kits (e.g. Plant DNeasy, Qiagen) that concentrations, fewer to no bands may be observed because produce cleaner DNAs may be indicated. Template of the high stringency of the reaction conditions. A combined concentration in a PCR reaction may also be important, optimisation approach that evaluates a range of annealing given the lower copy number of nuclear genes; temperatures (e.g. ± 5°C of the theoretical melting –1 low-concentration (<20 ng µL ) DNAs that have been temperature of the primer pair) and MgCl2 concentrations adequate for amplification of cpDNA or rDNA loci may be (e.g. 1.5–3.0 mM final concentration) will provide an insufficient for amplification of nuclear genes. estimate of the combination of conditions under which a The choice of amplification polymerase may also affect reasonable number of bands is amplified (Fig. 3). If a gradient efficiency or even ability to PCR amplify some nuclear genes. thermal cycler is available, these optimisation experiments In our laboratories we have used a wide variety of polymerases can be conducted in a single run by using a single DNA in from various suppliers and found significant variation in the one set of reactions, varying both the MgCl2 concentration ability of these enzymes to amplify low-copy nuclear and the temperature gradient. Finally, a number of PCR sequences. This is, unfortunately, an example of ‘you get what additives have been suggested to improve amplification. you pay for.’ While some lower-cost polymerases may be While we have not exhaustively screened these additives, our sufficient for amplification of high-copy templates, they may experience of amplifying low-copy loci from a variety of seed be insufficient for amplification of low-copy sequences. plants indicates that the addition of bovine serum albumin In addition to the specific polymerase used, other (BSA; final concentration of ~0.2 µg µL–1) significantly PCR-reaction conditions will affect the ability to amplify improves amplification from problematic DNA. low-copy templates. Specifically, annealing temperature, If locus-specific PCR primers are used to amplify the MgCl2 concentration and addition of PCR additives may need genes of interest and a single homogenous PCR product is to be adjusted to optimise amplification conditions. The amplified, these products may be directly sequenced. More interplay between primer-annealing temperature and MgCl2 often, however, especially when universal primers are used, concentration is especially important since increasing multiple PCR products that may vary in size are obtained. To annealing temperature and decreasing MgCl2 concentration isolate individual PCR products for sequencing, cloning of results in more specific amplification. Because multiple these PCR products into an appropriate plasmid is generally Use of nuclear genes in plant phylogeny Australian Systematic Botany 161 required (e.g. Promega’s pGEM-T). Alternatively, if multiple products of interest and (2) the relative sizes of the inserts. If bands are widely spaced in size and can be sliced different-sized PCR products were ligated and transformed individually from a gel, these products can then be directly in the same reaction, this screen will provide quick sequenced. An exception to these generalisations are studies identification of which colonies contain different-length of highly heterozygous outcrossing plants. For example, PCR products. Because a single-sized PCR product may analysis of four low-copy loci in conifers (R. Cronn, unpubl.) contain amplicons from more than one gene or multiple has revealed that heterozygosity is nearly universal across alleles of a single locus, a secondary screening using loci and species and that length polymorphism is common restriction-enzyme digestion (typically frequently cutting within intron regions. In these more challenging cases, enzymes with 4-bp recognition sequences) can be conducted cloning may be necessary. One final alternative to cloning is to differentiate among plasmids with identically sized the design of allele-specific PCR-amplification and inserts. Ideally, this two-step screening procedure will result sequencing primers (e.g. Rauscher et al. 2002) if sufficient in the identification of unique sets of plasmids: a set for each preliminary sequence data are available. size class, and subsets of plasmids with the same size, but Once PCR products have been ligated into a plasmid and different restriction digestion profiles. Once these sets have transformed into competent E. coli cells, the next decision been identified, one to several plasmids from each set should that must be made is to choose how many transformants to be screened by sequencing. Although the foregoing screen. At this point in a preliminary study thoroughness is pre-screening procedures may be skipped, the only way to be more important than speed, and although these screening sure that all variants present are sampled is to sequence all of steps are often laborious and time-consuming, they increase the colonies. the probability of completely sampling the pool of PCR Template DNA for sequencing from individual colonies products. The number of colonies to pick depends on the may be obtained either by PCR amplification from the boiled initial starting PCR product. If only one or two unique PCR colonies or by miniprep isolation of the plasmid DNA either products are expected in a pool of transformants, then a by using published protocols (Ausubel et al. 1992; smaller number of colonies can be screened, whereas if Sambrook et al. 1989) or commercially available kits. We several unique PCR products are expected, a greater number suggest that initial sequencing efforts utilise the of colonies must be screened to ensure identifying all plasmid-specific primers that generally flank the cloning site products. As a starting point, we have found that picking 20 of the plasmid (e.g. in Promega’s pGEM-T plasmid primer colonies is reasonable if a small number of PCR products is sites for the T7 and SP6 promoters are available about 50 bp expected, and 40 or more if a larger number is expected. on either side of the cloning site) as this allows sequencing Apparent transformants (i.e. white colonies in systems using reads through the PCR-primer sites and into the PCR product blue–white selection criteria with X-gal and IPTG) can itself. Depending on the size of the PCR product, internal quickly and easily be screened for inserts via PCR by using sequencing primers may be necessary to fully sequence a the following procedure. given plasmid. Such primers usually are easily designed to Individual colonies are picked from plates with a pipet tip work across a gene family by placing them in conserved and suspended in 10 µL of H2O in a numbered regions of exons (see below), and if necessary, incorporating microcentrifuge tube. Once the colony is resuspended by some ambiguity into the primer. pumping the pipettor several times, the pipet tip is applied to a fresh LB-agar–ampicillin plate that has a numbered grid Preliminary characterisation of genes isolated from which corresponds to the numbers on the microcentrifuge representative taxa tubes. This results in (1) a suspension of bacterial colonies in Gene structure tends to be fairly conserved across genes of a the microfuge tube and (2) a grid plate with corresponding given gene family. For example, plant Adh genes generally bacterial colonies that (after overnight culture) provide cells have a 10 exon/9 intron structure. There are examples of for plasmid minipreps. The suspended cells are then boiled introns that have been lost from some lineages—for example, for 10 min to lyse cells and release plasmid into the the Arabidopsis thaliana Adh gene (Chang and Meyerowitz supernatant. After centrifugation (1 min, maximum speed) to 1986), Gossypium AdhA (Small and Wendel 2000a)—but pellet the cell debris, 1 µL of this suspension can then be most genes retain the 10 exon/9 intron structure, and the used as a template in a small-volume (e.g. 10 µL) PCR intron losses are easily discerned through sequence reaction using either the gene-specific or vector-specific alignment. Identification of exon and intron regions can be primers to screen the colonies for presence of an appropriate accomplished through alignment of sequences obtained from insert. Test gels may be run using high-percentage agarose the taxa of interest, with previously published sequences gels (e.g. 2% or higher) to help distinguish among PCR downloaded from GenBank. Gene sequences deposited in products of slightly different lengths. GenBank usually have the exon/intron structure designated, A test gel of the colony PCR results will indicate (1) how and individual exon sequences can be isolated and aligned many of the apparent transformants contain the PCR individually with the preliminary data. In addition, the 162 Australian Systematic Botany R. L. Small et al.

Fig. 4. Representation of the relationship among gene sequences derived from cDNA clones, individual exon sequences and genomic DNA. (A) Diagrammatic representation of an alignment of cDNA, exon and genomic DNA sequences from a hypothetical gene with five exons (shaded and numbered boxes) and four introns (lower-case letters). (B) Alignment of a portion of Adh cDNA (Solanum tuberosum Adh1, GenBank M25154), individual exons (Zea mays Adh1, GenBank AF123535) and genomic DNA (Gossypium raimondii AdhA, GenBank AF182116). Note the conservation of the intron boundary dinucleotides (5′ ‘GT’ and 3′ ‘AG’). presence of the highly conserved 5′ ‘GT’ and 3′ ‘AG’ GenBank. A large dataset of angiosperm Adh exon intron-boundary dinucleotides can be used to confirm the sequences from GenBank was then compiled and subjected start and end points of an intron. Thus, through sequential to phylogenetic analysis. The resulting phylogenetic tree alignment of individual exons and confirmation of (Fig. 5) grouped the Gossypium Adh sequences into five intron-boundary sequences, the exon/intron structure of primary clades that included sequences from the preliminary sequence data can be defined (Fig. 4). Once these representative species. These clades were inferred to steps are accomplished for the exemplar taxa, the next logical represent orthologous genes, which consequently were step is to assess copy number and orthology of the isolated named AdhA–AdhE (Fig. 5). The inference of orthology was sequences. based on the fact that for each of the named genes, sequences of each type were isolated from each of the representative Copy number and orthology assessment species, and in each case these sequences formed As noted earlier in this review, several criteria can be used as monophyletic groups. Note that the Gossypium Adh genes evidence of orthology. In practice, the two most useful and are found in two disparate parts of the angiosperm Adh tree: efficient approaches are phylogenetic analysis and Southern AdhA, AdhB and AdhC in one clade, and AdhD and AdhE in hybridisation analysis, the former to infer orthologous a separate clade. These data indicate an ancient gene relationships among sequences, and the latter to confirm that duplication giving rise to these two primary clades, with the inference of orthology is not confounded by the presence more recent gene duplications giving rise to the genes within of multiple, closely related paralogs. each clade. Once sequence data have been collected for Once these preliminary phylogenetic analyses had been representative taxa, sequences should be trimmed to include performed, examination of the entire sequences (exons + only exons and aligned with genes from related plants (exons introns) further supported the inference of orthology of the only because introns are generally unalignable outside of genes. Intron sequences were easily alignable within putative closely related species). Phylogenetic analyses of these data orthologs, but generally unalignable between orthologs. produce a hypothesis of relationships among genes, and Furthermore, the absence of two introns in the AdhA inferred clades may represent orthologous sequence groups. sequences isolated from all representative species, but For example, in our studies of Gossypium Adh we initially present in most other angiosperm Adh sequences supported isolated several Adh sequence types from three the inference of orthology among these sequences. Thus, all representative species. Putative exons were inferred by available phylogenetic and gene structure data corroborated comparison with exons of a Solanum Adh sequence from the orthology of the named Adh genes. Use of nuclear genes in plant phylogeny Australian Systematic Botany 163

Fig. 5. Strict consensus of seven equally parsimonious trees resulting from maximum parsimony analysis of Adh exon sequences from a wide range of angiosperms (rooted with the gymnosperm Pinus), including representative Gossypium sequences. Bootstrap values (1000 replicates) greater than 50% are shown above each branch. For each Gossypium locus (AdhA–AdhE), sequences were obtained from each of three representative diploid taxa denoted by their genome affiliations (A = G. herbaceum or G. arboreum, C = G. robinsonii, D = G. raimondii). Each inferred Gossypium Adh locus includes all three representative taxa, is monophyletic, and is supported by a 100% bootstrap value.

To ensure that the named genes were unique, and that hybridisation probes that included both intron and exon closely related paralogous genes did not exist, we then sequence. Probes were small (about 500 bp) and included the conducted Southern hybridisation analysis. Because introns majority of intron 3 + exon 4 of the Adh genes. Individual are unalignable between genes and exons are fairly divergent probes were obtained by PCR for each of the Adh genes and (generally 20–30%, except for AdhD/AdhE which are only used in high-stringency Southern hybridisations (see Small about 10% divergent: Small and Wendel 2000a) we designed and Wendel 2000a for detailed hybridisation conditions). 164 Australian Systematic Botany R. L. Small et al.

The results of these experiments in some cases confirmed using a variety of genetic-distance algorithms available in the uniqueness of putative orthologs, and in other cases most phylogenetic inference packages (e.g. PAUP*: refuted those inferences. For example, for most Gossypium Swofford 2002). Calculation of p-distances (proportion of Adh genes, a single hybridising band was seen for the diploid variable sites) provides a straightforward estimate of relative species and two hybridising bands in the tetraploid, as variation. More complex models (e.g. Jukes–Cantor, Kimura expected. For AdhB, however, multiple hybridising bands 2-parameter) can be invoked if deviations from a simple were observed in all species indicative of additional model of evolution are observed (e.g. transition:transversion unsampled AdhB-like genes in the Gossypium genome. bias, GC-content bias). Subsequent publication of Adh sequences from Gossypium genomic and cDNA libraries (Millar and Dennis 1996) Data collection from all taxa of interest corroborated this inference. Phylogenetic analysis of the Once a gene or set of genes has been identified as potentially sequences published by Millar and Dennis (1996) revealed useful in representative taxa, the next step is to generate that they were closely related to the AdhB sequences isolated sequence data for all taxa of interest. In our experience, the via PCR in our study. Orthology–paralogy relationships most efficient approach to accomplish this is to develop among these AdhB- and AdhB-like genes were not clear on locus-specific PCR-amplification primers for each the basis of either phylogenetic analysis or sequence particular gene of interest. There are several advantages to alignment; thus, it was concluded that these particular genes this approach. First, locus-specific amplification may would be poor choices for further phylogenetic studies. minimise the costly, tedious and time-consuming step of In sum, the combination of phylogenetic and Southern having to clone PCR products from a large number of taxa, hybridisation analyses allowed rigorous inference of although this may be necessary in any event if heterozygosity orthology (or lack thereof) among specific sequence types is high. Second, the more specific the PCR primers are, the isolated from the representative Gossypium species surveyed less will be the nagging problem of chimeric, recombinant in our preliminary studies. This in turn allowed us to choose PCR products that occasionally (to often) are produced from among the isolated sequences those that were during PCR amplification of genes (as discussed earlier in potentially phylogenetically useful. this review). Third, the ability to reliably and reproducibly amplify a single homogenous PCR product from multiple Choosing among potential sequences species bolsters the inference of orthology of those Once orthologous and potentially useful genes have been sequences. identified, evaluation of rates of sequence evolution can provide insight into the relative phylogenetic utility of Conclusions various candidate genes. For example, in a recent study of The paths chosen by modern plant systematists increasingly Gossypium phylogeny (Cronn et al. 2002b), 11 low-copy lead to questions that are difficult to resolve through the nuclear encoded sequences were evaluated, and the application of one source, or even sometimes two sources, of percentages of variable and phylogenetically informative molecular inference. There has been enormous growth over sites varied over 3.5-fold and 7.9-fold ranges, respectively. the past decade of publicly available gene-sequence Percentage of variable sites ranged from 3.43 to 12.1%, databases including whole-genome sequences for model while percentage of phylogenetically informative sites organisms such as Arabidopsis thaliana (The Arabidopsis ranged from 0.21 to 1.66%. Genome Initiative 2000) and Oryza sativa (Goff et al. 2002; Evaluation of relative rates, and thus potential Yu et al. 2002) and EST databases from many other phylogenetic utility, can be conducted several ways. First and species (see Plants Genome Central at NCBI: perhaps simplest, phylogenetic analysis of sequences http://www.ncbi.nlm.nih.gov/genomes/PLANTS/PlantList.h obtained from representative taxa can be performed and tml). These accumulating data have set the stage for the relative branch lengths can be assessed. Those genes with development of new, low-copy gene markers that are up to longest branch lengths are the most variable and thus may the challenge of resolving difficult phylogenetic questions. provide the most information for a larger set of taxa. Low-copy nuclear markers come at a higher cost with regard On a finer scale, it is useful to evaluate variation on a per to development than either cpDNA or rDNA markers, but site basis, thus allowing inferences of the amount of they are certain to eclipse cpDNA and rDNA with regards to information obtained per unit of sequence. Clearly, longer reliability and resolving power. Since single-copy nuclear sequences are, in general, more likely to contain more genes are biparentally inherited, they have the potential to variable sites. However, if efficiency in sequencing effort is reveal the ancestry of all lineages contributing to an extant a desirable goal, then those sequences that have the greatest species, whether diploid (Cronn et al. 2003) or polyploid variation per site may be more useful even if they are shorter (Small et al. 1998; Liu et al. 2001; Cronn et al. 2002b). than some other sequences that provide greater overall Low-copy loci are rarely subject to concerted evolution numbers of variable sites. This value can be estimated by (Cronn et al. 1999; Senchina et al. 2003), and they do not Use of nuclear genes in plant phylogeny Australian Systematic Botany 165 display the high intragenic polymorphism (and Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local corresponding increase in homoplasy) characteristic of alignment search tool. Journal of Molecular Biology 215, 403–410. rDNA, which requires concerted evolution to maintain intra- doi:10.1006/JMBI.1990.9999 Alvarez I, Wendel JF (2003) Ribosomal ITS sequences and plant and inter-array homogeneity. Finally, nuclear genes contain phylogenetic inference. Molecular and Evolution 29, exonic regions that limit alignment ambiguity and facilitate 417–434. doi:10.1016/S1055-7903(03)00208-2 homologous comparisons (Bailey and Doyle 1999; Doyle et Anderberg AA, Rydin C, Kallersjo M (2002) Phylogenetic relationships al. 1999b; Bortiri et al. 2002; Sang 2002), as well as introns in the order Ericales s.l.: analyses of molecular data from five genes and other non-coding regions that diverge at a substantially from the plastid and mitochondrial genomes. American Journal of Botany 89, 677–687. higher rate than either cpDNA or rDNA. APGII (2003) An update of the Angiosperm Phylogeny Group To the extent that plant molecular systematic studies classification for the orders and families of flowering plants: APG would benefit from the routine application of one nuclear II. Botanical Journal of the Linnean Society 141, 399–436. gene, the development and application of multiple Arnheim N (1983) Concerted evolution of multigene families. In independent nuclear loci holds even greater promise for ‘Evolution of genes and proteins’. (Eds M Nei, RK Koehn) pp. 38–61. (Sinauer: Sunderland, MA) addressing more complex questions. For example, the Ausubel FM, Brent R, Kingston RE, Moore DD, Seidman JG, Smith JA, application of multiple, low-copy nuclear markers may Struhl K (Eds) (1992) ‘Short protocols in molecular biology.’ 2nd present the only viable approach for teasing apart temporally edn. (John Wiley & Sons: New York) compressed divergence events, since the variance of Awadalla P, Eyre-Walker A, Maynard Smith J (1999) Linkage divergence rates across sites from exonic and intronic disequilibrium and recombination in hominid mitochondrial DNA. Science 286, 2524–2525. doi:10.1126/SCIENCE.286.5449.2524 regions almost ensures that a fraction of sites will yield Bailey CD, Carr TG, Harris SA, Hughes CE (2003) Characterization of sufficient phylogenetic signals to mark lineages (Small et al. angiosperm nrDNA polymorphism, paralogy, and pseudogenes. 1998; Cronn et al. 2002b; Malcomber 2002). Similarly, the Molecular Phylogenetics and Evolution 29, 435–455. topological ‘stalemates’ frequently presented by cpDNA and doi:10.1016/J.YMPEV.2003.08.021 rDNA can only be confidently resolved through the use of Bailey CD, Doyle JJ (1999) Potential phylogenetic utility of the low-copy nuclear gene pistillata in dicotyledonous plants: comparison to multiple independent markers, such as low-copy nuclear nrDNA ITS and trnL intron in Sphaerocardamum and other genes (Small et al. 1998; Cronn et al. 2002b, 2003; Brassicaceae. Molecular Phylogenetics and Evolution 13, 20–30. Malcomber 2002). In many cases, the resolution provided by doi:10.1006/MPEV.1999.0627 nuclear genes will help to shed light on the true topology, as Bailey CD, Price RA, Doyle JJ (2002) Systematics of the Halimolobine Brassicaceae: evidence from three loci and morphology. Systematic well as the underlying methodological or biological basis of Botany 27, 318–332. phylogenetic incongruence. Less frequently, attempts to Baldwin BG, Markos S (1998) Phylogenetic utility of the external resolve cpDNA–rDNA incongruence using multiple nuclear transcribed spacer (ETS) of 18S–26S nrDNA: congruence of ETS loci may turn up exciting instances where multiple nuclear and ITS trees of Calycadenia (Compositae). Molecular markers return a consistent topology that stands in stark Phylogenetics and Evolution 10, 449–463. doi:10.1006/MPEV.1998.0545 contrast to the topologies returned by both cpDNA and Baldwin BG, Sanderson MJ, Porter JM, Wojciechowski MF, Campbell rDNA (Cronn et al. 2003). In these challenging cases, the CS, Donoghue MJ (1995) The ITS region of nuclear ribosomal DNA: single accomplishment of resolving a phylogeny pales in a valuable source of evidence on angiosperm phylogeny. Annals of significance to the greater challenges of understanding the the Missouri Botanical Garden 82, 247–277. evolutionary dynamics that have given rise to the genomic Bancroft I (2001) Duplicate and diverge: the evolution of plant genome microstructure. Trends in Genetics 17, 89–93. complexity embodied in living plant species. doi:10.1016/S0168-9525(00)02179-X Barrier M, Baldwin BG, Robichaux RH, Purugganan MD (1999) Acknowledgments Interspecific hybrid ancestry of a plant adaptive radiation: We thank the many colleagues who have contributed to our allopolyploidy of the Hawaiian silversword inferred from duplicated floral homeotic genes. Molecular Biology and Evolution thoughts on these issues, Jeff Doyle and two anonymous 16, 1105–1113. reviewers whose comments improved this manuscript, and Birky CW (1995) Uniparental inheritance of mitochondrial and acknowledge funding support from the United States chloroplast genes: mechanisms and evolution. Proceedings of the National Science Foundation. National Academy of Sciences of the United States of America 92, 11331–11338. References Blake NK, Lehfeldt BR, Lavin M, Talbert LE (1999) Phylogenetic reconstruction based on low copy DNA sequence data in an Adams KL, Cronn R, Percifield R, Wendel JF (2003) Genes duplicated allopolyploid: the B genome of wheat. Genome 42, 351–360. by polyploidy show unequal contributions to the and doi:10.1139/GEN-42-2-351 organ-specific reciprocal silencing. Proceedings of the National Bortiri E, Oh S-H, Gao F-Y, Potter D (2002) The phylogenetic utility of Academy of Sciences of the United States of America 100, nucleotide sequences of sorbitol 6-phosphate dehydrogenase in 4649–4654. doi:10.1073/PNAS.0630618100 Prunus (Rosaceae). American Journal of Botany 89, 1697–1708. Ahlandsberg S, Sun C, Jansson C (2002) An intronic element directs Bradley RD, Hillis DM (1997) Recombinant DNA sequences generated endosperm-specific expression of the sbellb gene during barley seed by PCR amplification. Molecular Biology and Evolution 14, development. Genetics and Genomics 20, 864–868. 592–593. 166 Australian Systematic Botany R. L. Small et al.

Buckler ES, Holtsford TP (1996) Zea ribosomal repeat evolution and Doyle JJ (1992) Gene trees and species trees: molecular systematics as substitution patterns. Molecular Biology and Evolution 13, one-character taxonomy. Systematic Botany 17, 144–163. 623–632. Doyle JJ (1995) The irrelevance of allele tree topologies for species Buckler ES, Ippolito A, Holtsford TP (1997) The evolution of delimitation, and a nontopological alternative. Systematic Botany ribosomal DNA: divergent paralogues and phylogenetic 20, 574–588. implications. Genetics 145, 821–832. Doyle JJ (1997) Trees within trees: genes and species, molecules and Chang C, Meyerowitz EM (1986) Molecular cloning and DNA morphology. Systematic Biology 46, 537–553. sequence of the Arabidopsis thaliana alcohol dehydrogenase gene. Doyle JJ, Doyle JL (1987) A rapid DNA isolation procedure for small Proceedings of the National Academy of Sciences of the United quantities of fresh leaf tissue. Phytochemical Bulletin 19, 11–15. States of America 83, 1408–1412. Doyle JJ, Doyle JL (1990) Isolation of plant DNA from fresh tissue. Charlesworth D, Awadalla P (1998) Focus 12, 13–15. self-incompatibility: the molecular population genetics of Brassica Doyle JJ, Doyle JL (1999) Nuclear protein-coding genes in phylogeny S-loci. Heredity 81, 1–9. doi:10.1038/SJ.HDY.6884000 reconstruction and homology assessment: some examples from Charlesworth D, Charlesworth B (1998) Sequence variation: looking Leguminosae. In ‘Molecular systematics and plant evolution’. (Ed. for effects of genetic linkage. Current Biology 8, R658–R661. RJ Gornall) pp. 229–254. (Taylor and Francis: London) Charlesworth D, Liu F, Zhang L (1998) The evolution of the alcohol Doyle JJ, Kanazin V, Shoemaker RC (1996) Phylogenetic utility of dehydrogenase gene family in plants of the genus Leavenworthia histone H3 intron sequences in the perennial relatives of soybean (Brassicaceae): loss of introns, and an intronless gene. Molecular (Glycine: Leguminosae). Molecular Phylogenetics and Evolution 6, Biology and Evolution 15, 552–559. 438–447. doi:10.1006/MPEV.1996.0092 Chase MW, Knapp S, Cox AV, Clarkson JJ, Butsko Y, Joseph J, Doyle JJ, Doyle JL, Brown AHD (1999a) Incongruence in the diploid Savolainen V, Parokonny AS (2003) Molecular systematics, GISH B-genome of Glycine (Leguminosae) revisited: and the origin of hybrid taxa in Nicotiana (Solanaceae). Annals of histone H3-D alleles versus chloroplast haplotypes. Molecular Botany 92, 107–127. doi:10.1093/AOB/MCG087 Biology and Evolution 16, 354–362. Chase MW, Soltis DE, Olmstead, RG, Morgan D, Les DH (1993) Doyle JJ, Doyle JL, Brown AHD (1999b) Origins, colonization, and Phylogenetics of seed plants: an analysis of nucleotide sequences lineage recombination in a widespread perennial soybean polyploid from the plastid gene rbcL. Annals of the Missouri Botanical complex. Proceedings of the National Academy of Sciences of the Garden 80, 528–580. United States of America 96, 10741–10745. Clegg MT, Cummings MP, Durbin ML (1997) The evolution of plant doi:10.1073/PNAS.96.19.10741 nuclear genes. Proceedings of the National Academy of Sciences of Doyle JJ, Doyle JL, Brown AHD, Pfeil BE (2000) Confirmation of the United States of America 94, 7791–7798. shared and divergent genomes in the Glycine tabacina polyploid doi:10.1073/PNAS.94.15.7791 complex (Leguminosae) using histone H3-D sequences. Systematic Corriveau JL, Coleman AW (1988) Rapid screening method to detect Botany 25, 437–448. potential biparental inheritance of plastid DNA and results for over Doyle JJ, Doyle JL, Brown AHD, Palmer RG (2002) Genomes, multiple 200 angiosperm species. American Journal of Botany 75, origins, and lineage recombination in the Glycine tomentella 1443–1458. (Leguminosae) polyploid complex: histone H3-D gene sequences. Crawford DJ (1985) Electrophoretic data and plant speciation. Evolution 56, 1388–1402. Systematic Botany 10, 405–416. Drouin G, Prat F, Ell M, Clarke GDP (1999) Detecting and Crawford DJ (2000) Plant macromolecular systematics in the past 50 characterizing gene conversions between multigene family years: one view. Taxon 49, 479–501. members. Molecular Biology and Evolution 16, 1369–1390. Cronn R, Wendel JF (1998) Simple methods for isolating homoeologous loci from allopolyploid genomes. Genome 41, Elder JF, Turner BJ (1995) Concerted evolution of repetitive DNA 756–762. doi:10.1139/GEN-41-6-756 sequences in eukaryotes. The Quarterly Review of Biology 70, Cronn RC, Zhao X, Paterson AH, Wendel JF (1996) Polymorphism and 297–320. doi:10.1086/419073 concerted evolution in a tandemly repeated gene family: 5S Emshwiller E, Doyle JJ (1999) Chloroplast-expressed glutamine ribosomal DNA in diploid and allopolyploid cottons. Journal of synthetase (ncpGS): potential utility for phylogenetic studies with Molecular Evolution 42, 685–705. an example from Oxalis (Oxalidaceae). Molecular Phylogenetics Cronn RC, Small RL, Wendel JF (1999) Duplicated genes evolve and Evolution 12, 310–319. doi:10.1006/MPEV.1999.0613 independently after polyploid formation in cotton. Proceedings of Emshwiller E, Doyle JJ (2002) Origins of domestication and polyploidy the National Academy of Sciences of the United States of America in Oca (Oxalis tuberosa: Oxalidaceae). 2. Chloroplast-expressed 96, 14406–14411. glutamine synthetase data. American Journal of Botany 89, Cronn R, Cedroni M, Haselkorn T, Grover C, Wendel JF (2002a) 1042–1056. PCR-mediated recombination in amplification products derived Endrizzi JD, Turcotte EL, Kohel RJ (1985) Genetics, cytology, and from polyploid cotton. Theoretical and Applied Genetics 104, evolution of Gossypium. Advances in Genetics 23, 271–375. 482–489. doi:10.1007/S001220100741 Evans RC, Campbell CS (2002) The origin of the apple subfamily Cronn RC, Small RL, Haselkorn T, Wendel JF (2002b) Rapid (Maloideae; Rosaceae) is clarified by DNA sequence data from diversification of the cotton genus (Gossypium: Malvaceae) duplicated GBSSI genes. American Journal of Botany 89, revealed by analysis of sixteen nuclear and chloroplast genes. 1478–1484. American Journal of Botany 89, 707–725. Evans RC, Alice LA, Campbell CS, Kellogg EA, Dickinson TA (2000) doi:10.1093/AOB/MCF125 The granule-bound starch synthase (GBSSI) gene in the Rosaceae: Cronn RC, Small RL, Haselkorn T, Wendel JF (2003) Cryptic repeated Multiple loci and phylogenetic utility. Molecular Phylogenetics and genomic recombination during speciation in Gossypium Evolution 17, 388–400. doi:10.1006/MPEV.2000.0828 gossypioides. Evolution 57, 2475–2489. Eyre-Walker A (2000) Do mitochondria recombine in humans? Doyle JJ (1991) Evolution of higher-plant glutamine synthetase genes: Philosophical Transactions of the Royal Society of London. Series tissue specificity as a criterion for predicting orthology. Molecular B, Biological Sciences 355, 1573–1580. Biology and Evolution 8, 366–377. doi:10.1098/RSTB.2000.0718 Use of nuclear genes in plant phylogeny Australian Systematic Botany 167

Eyre-Walker A, Gaut RL, Hilton H, Feldman DL, Gaut BS (1998) Goff SA, Ricke D, Lan T-H, Presting G, Wang R, et al. (2002) A draft Investigation of the bottleneck leading to domestication of maize. sequence of the rice genome (Oryza sativa L. ssp. japonica). Proceedings of the National Academy of Sciences of the United Science 296, 92–100. doi:10.1126/SCIENCE.1068275 States of America 95, 4441–4446. doi:10.1073/PNAS.95.8.4441 Gottlieb LD (1977) Electrophoretic evidence and plant systematics. Ferguson D, Sang T (2001) Speciation through homoploid Annals of the Missouri Botanical Garden 64, 161–180. hybridization between allotetraploids in (Paeonia). Gottlieb LD, Ford VS (1996) Phylogenetic relationships among the Proceedings of the National Academy of Sciences of the United sections of Clarkia (Onagraceae) inferred from the nucleotide States of America 98, 3915–3919. doi:10.1073/PNAS.061288698 sequences of PgiC. Systematic Botany 21, 45–62. Filatov DA, Charlesworth D (1999) DNA polymorphism, Grant D, Cregan P, Shoemaker RC (2000) Genome organization in structure and balancing selection in the Leavenworthia PgiC locus. dicots: genome duplication in Arabidopsis and synteny between Genetics 153, 1423–1434. soybean and Arabidopsis. Proceedings of the National Academy of Ford VS, Gottlieb LD (1999) Molecular characterization of PgiC in a Sciences of the United States of America 97, 4168–4173. tetraploid plant and its diploid relatives. Evolution 53, 1060–1067. doi:10.1073/PNAS.070430597 Ford VS, Gottlieb LD (2002) Single mutations silence PGIC2 genes in Grassly NC, Holmes EC (1997) A likelihood method for the detection two very recent allotetraploid species of Clarkia. Evolution 56, of selection and recombination using sequence data. Molecular 699–707. Biology and Evolution 14, 239–247. Freudenstein JV, Chase MW (2001) Analysis of mitochondrial nad1b-c Hamby RK, Zimmer EA (1988) Ribosomal RNA sequences for intron sequences in : utility and coding of inferring phylogeny within the grass family (Poaceae). Plant length-change characters. Systematic Botany 26, 643–657. Systematics and Evolution 160, 29–37. Fryxell PA (1968) A redfinition of the tribe Gossypieae. Botanical Hamby RK, Zimmer EA (1992) Ribosomal RNA as a phylogenetic tool Gazette 129, 296–308. doi:10.1086/336448 in plant systematics. In ‘Molecular systematics of plants’. (Eds PS Fryxell PA (1979) ‘The natural history of the cotton tribe.’ (Texas A&M Soltis, DE Soltis, JJ Doyle) pp. 50–91. (Chapman & Hall: New University Press: College Station) Yo r k ) Fryxell PA (1992) A revised taxonomic interpretation of Gossypium L. Hanson MA, Gaut BS, Stec AO, Fuerstenberg SI, Goodman MM, Coe (Malvaceae). Rheedea 2, 108–165. EH, Doebley JF (1996a) Evolution of anthocyanin biosynthesis in Fu H, Dooner HK (2002) Intraspecific violation of genetic colinearity maize kernels: the role of regulatory and enzymatic loci. Genetics and its implications in maize. Proceedings of the National Academy 143, 1395–1407. of Sciences of the United States of America 99, 9573–9578. Hanson RE, Islam-Faridi M, Percival EA, CF, Ji Y, McKnight Galloway GL, Malmberg RL, Price RA (1998) Phylogenetic utility of TD, Stelly DM, Price HJ (1996b) Distribution of 5S and 18S–28S the nuclear gene arginine decarboxylase: an example from rDNA loci in a tetraploid cotton (Gossypium hirsutum L.) and its Brassicaceae. Molecular Biology and Evolution 15, 1312–1320. putative diploid ancestors. Chromosoma 105, 55–61. Gaut BS (1998) Molecular clocks and nucleotide substitution rates in doi:10.1007/S004120050159 higher plants. Evolutionary Biology 30, 93–120. Hartmann S, Nason JD, Bhattacharya D (2001) Extensive ribosomal Gaut BS (2001) Patterns of chromosomal duplication in maize and their DNA genic variation in the columnar Lophocereus. Journal implications for comparative maps of the grasses. Genome of Molecular Evolution 53, 124–134. Research 11, 55–66. doi:10.1101/GR.160601 Henikoff S, Greene EA, Pietrokovski S, Bork P, Attwood TK, Hood L Gaut BS, Clegg MT (1991) Molecular evolution of alcohol (1997) Gene families: the taxonomy of protein paralogs and dehydrogenase 1 in members of the grass family. Proceedings of the chimeras. Science 278, 609–614. National Academy of Sciences of the United States of America 88, doi:10.1126/SCIENCE.278.5338.609 2060–2064. Hilton H, Gaut BS (1998) Speciation and domestication in maize and Gaut BS, Clegg MT (1993) Molecular evolution of the Adh1 locus in its wild relatives: evidence from the Globulin-1 gene. Genetics 150, the genus Zea. Proceedings of the National Academy of Sciences of 863–872. the United States of America 90, 5095–5099. Hodges SA, Arnold ML (1994) Columbines: a geographically Gaut BS, Doebley JF (1997) DNA sequence evidence for the segmental widespread species flock. Proceedings of the National Academy of allotetraploid origin of maize. Proceedings of the National Academy Sciences of the United States of America 91, 5129–5132. of Sciences of the United States of America 94, 6809–6814. Hong RL, Hamaguchi L, Busch MA, Weigel D (2003) Regulatory doi:10.1073/PNAS.94.13.6809 elements of the floral homeotic gene AGAMOUS identified by Gaut BS, Morton BR, McCaig BC, Clegg MT (1996) Substitution rate phylogenetic footprinting and shadowing. The Plant Cell 15, comparisons between grasses and palms: synonymous rate 1296–1309. doi:10.1105/TPC.009548 differences at the nuclear gene Adh parallel rate differences at the Hughes AL, Yeager M (1998) Natural selection at major plastid gene rbcL. Proceedings of the National Academy of Sciences histocompatibility complex loci of vertebrates. Annual Review of of the United States of America 93, 10274–10279. Genetics 32, 415–435. doi:10.1146/ANNUREV.GENET.32.1.415 doi:10.1073/PNAS.93.19.10274 Innan H, Tajima F (1997) The amounts of nucleotide variation within Gaut BS, Peek AS, Morton BR, Clegg MT (1999) Patterns of genetic and between allelic classes and the reconstruction of the common diversification within the Adh gene family in the grasses (Poaceae). ancestral sequence in a population. Genetics 147, 1431–1444. Molecular Biology and Evolution 16, 1086–1097. Jakobsen IB, Easteal S (1996) A program for calculating and displaying Ge S, Sang T, Lu B-R, Hong D-Y (1999) Phylogeny of rice genomes compatibility matrices as an aid in determining reticulate evolution with emphasis on origins of allotetraploid species. Proceedings of in molecular sequences. CABIOS 12, 291–295. the National Academy of Sciences of the United States of America Kawabe A, Miyashita NT (1999) DNA variation in the basic chitinase 96, 14400–14405. doi:10.1073/PNAS.96.25.14400 locus (ChiB) region of the wild plant Arabidopsis thaliana. Genetics Gielly L, Yuan Y-M, Küpfer P, Taberlet P (1996) Phylogenetic use of 153, 1445–1453. noncoding regions in the genus Gentiana L.: chloroplast trnL Kawabe A, Innan H, Terauchi R, Miyashita NT (1997) Nucleotide (UAA) intron versus nuclear ribosomal internal transcribed spacer polymorphism in the acidic chitinase locus (ChiA) region of the sequences. Molecular Phylogenetics and Evolution 5, 460–466. wild plant Arabidopsis thaliana. Molecular Biology and Evolution doi:10.1006/MPEV.1996.0042 14, 1303–1315. 168 Australian Systematic Botany R. L. Small et al.

Klein J, Sato A, Nagl S, O’huigin C (1998) Molecular trans-species Mayol M, Rossello JA (2001) Why nuclear ribosomal DNA spacers polymorphism. Annual Review of Ecology and Systematics 29, (ITS) tell different stories in Quercus. Molecular Phylogenetics and 1–21. doi:10.1146/ANNUREV.ECOLSYS.29.1.1 Evolution 19, 167–176. doi:10.1006/MPEV.2001.0934 Kuittinen H, Aguade M (2000) Nucleotide variation at the CHALCONE Millar AA, Dennis ES (1996) The alcohol dehydrogenase genes of ISOMERASE locus in Arabidopsis thaliana. Genetics 155, cotton. Plant Molecular Biology 31, 897–904. 863–872. Miyashita NT, Innan H, Terauchi R (1996) Intra- and interspecific Kuzoff RK, Sweere JA, Soltis DE, Soltis PS, Zimmer EA (1998) The variation of the alcohol dehydrogenase locus region in wild plants phylogenetic potential of entire 26S rDNA sequences in plants. Arabis gemmifera and Arabidopsis thaliana. Molecular Biology and Molecular Biology and Evolution 15, 251–263. Evolution 13, 433–436. Lavin M, Eshbaugh E, Hu J-M, Matthews S, Sharrock RA (1998) Moniz de Sá M, Drouin G (1996) Phylogeny and substitution rates of Monophyletic subgroups of the tribe Millettieae (Leguminosae) as angiosperm actin genes. Molecular Biology and Evolution 13, revealed by phytochrome nucleotide sequence data. American 1198–1212. Journal of Botany 85, 412–433. Morton BR, Gaut BS, Clegg MT (1996) Evolution of alcohol Levy F, Antonovics J, Boynton JE, Gillham NW (1996) A population dehydrogenase genes in the Palm and Grass families. Proceedings genetic analysis of chloroplast DNA in Phacelia. Heredity 76, of the National Academy of Sciences of the United States of America 143–155. 93, 11735–11739. Linder CR, Goertzen LR, Heuvel BV, Francisco-Ortega J, Jansen RK Muir G, Fleming CC, Schlotterer C (2001) Three divergent rDNA (2000) The complete external transcribed spacer of 18S–26S rDNA: clusters predate the species divergence in Quercus petraea (Matt.) amplification and phylogenetic utility at low taxonomic levels in Liebl. and Quercus robur L. Molecular Biology and Evolution 18, Asteraceae and closely allied families. Molecular Phylogenetics 112–119. and Evolution 14, 285–303. doi:10.1006/MPEV.1999.0706 Myerhans A, Vartanian J-P, Wain-Hobon S (1990) DNA recombination Liu F, Zhang L, Charlesworth D (1998) Genetic diversity in during PCR. Nucleic Acids Research 18, 1687–1691. Leavenworthia populations with different inbreeding levels. Olmstead RG, Palmer JD (1994) Chloroplast DNA systematics: a Proceedings of the Royal Society of London. Series B. Biological review of methods and data analysis. American Journal of Botany Sciences 265, 293–301. doi:10.1098/RSPB.1998.0295 81, 1205–1224. Liu Q, Brubaker CL, Green AG, Marshall DR, Sharp PJ, Singh SP Olsen KM (2002) Population history of Manihot esculenta (2001) Evolution of the FA D 2 - 1 fatty acid desaturase 5′ UTR intron (Euphorbiaceae) inferred from nuclear DNA sequences. Molecular and the molecular systematics of Gossypium (Malvaceae). Ecology 11, 901–911. doi:10.1046/J.1365-294X.2002.01493.X American Journal of Botany 88, 92–102. Lynch M, Conery JS (2000) The evolutionary fate and consequences of Olsen KM, Schaal BA (1999) Evidence on the origin of cassava: duplicate genes. Science 290, 1151–1155. phylogeography of Manihot esculenta. Proceedings of the National doi:10.1126/SCIENCE.290.5494.1151 Academy of Sciences of the United States of America 96, 5586–5591. doi:10.1073/PNAS.96.10.5586 Malcomber ST (2002) Phylogeny of Gaertnera Lam. (Rubiaceae) based on multiple DNA markers: evidence of a rapid radiation in a Palmer JD (1992) Mitochondrial DNA in plant systematics: widespread, morphologically diverse genus. Evolution 56, 42–57. applications and limitations. In ‘Molecular systematics of plants’. Marshall HD, Newton C, Ritland K (2001) Sequence-repeat (Eds PS Soltis, DE Soltis, JJ Doyle) pp. 36–49. (Chapman and Hall: polymorphisms exhibit the signature of recombination in New York) Lodgepole Pine chloroplast DNA. Molecular Biology and Palmer JD, Adams KL, Cho Y, Parkinson CL, Qiu Y-L, Song K (2000) Evolution 18, 2136–2138. Dynamic evolution of plant mitochondrial genomes: mobile genes Mason-Gamer RJ (2001) Origin of North American Elymus (Poaceae: and introns and highly variable mutation rates. Proceedings of the Triticeae) allotetraploids based on granule-bound starch synthase National Academy of Sciences of the United States of America 97, gene sequences. Systematic Botany 26, 757–768. 6960–6966. doi:10.1073/PNAS.97.13.6960 Mason-Gamer RJ, Holsinger KE, Jansen RK (1995) Chloroplast DNA Perry DJ, Furnier GR (1996) Pinus banksiana has at least seven haplotype variation within and among populations of Coreopsis expressed alcohol dehydrogenase genes in two linked groups. grandiflora (Asteraceae). Molecular Biology and Evolution 12, Proceedings of the National Academy of Sciences of the United 371–381. States of America 93, 13020–13023. Mason-Gamer RJ, Weil CF, Kellogg EA (1998) Granule-bound starch Purugganan MD, Suddith JI (1998) Molecular population genetics of synthase: structure, function, and phylogenetic utility. Molecular the Arabidopsis CAULIFLOWER regulatory gene: nonneutral Biology and Evolution 15, 1658–1673. evolution and naturally occurring variation in floral homeotic Mathews S, Sharrock RA (1996) The phytochrome gene family in function. Proceedings of the National Academy of Sciences of the grasses (Poaceae): a phylogeny and evidence that grasses have a United States of America 95, 8130–8134. subset of the loci found in dicot angiosperms. Molecular Biology doi:10.1073/PNAS.95.14.8130 and Evolution 13, 1141–1150. Qiu Y-L, Cho Y, Cox JC, Palmer JD (1998) The gain of three Mathews S, Tsai RC, Kellogg EA (2000) Phylogenetic structure in the mitochondrial introns identifies liverworts as the earliest land grass family (Poaceae): evidence from the nuclear gene plants. Nature 394, 671–674. doi:10.1038/29286 Phytochrome B. American Journal of Botany 87, 96–107. Qiu Y-L, Lee J, Bernasconi-Quadroni F, Soltis DE, Soltis PS, Zonis M, Mathews S, Spangler RE, Mason-Gamer RJ, Kellogg EA (2002) Zimmer EA, Chen Z, Sauolainen V, Chase MW (1999) The earliest Phylogeny of Andropogoneae inferred from Phytochrome B, angiosperms: evidence from mitochondrial, plastid, and nuclear GBSSI, and NDHF. International Journal of Plant Sciences 163, genomes. Nature 402, 404–407. doi:10.1038/46536 441–450. doi:10.1086/339155 Rauscher JT, Doyle JJ, Brown AHD (2002) Internal transcribed spacer Mayer MS, Soltis PS (1999) Intraspecific phylogeny analysis using ITS repeat-specific primers and the analysis of hybridization in the sequences: insights from studies of the Streptanthus glandulosus Glycine tomentella (Leguminosae) polyploid complex. Molecular complex (Cruciferae). Systematic Botany 24, 47–61. Ecology 11, 2691–2702. doi:10.1046/J.1365-294X.2002.01640.X Use of nuclear genes in plant phylogeny Australian Systematic Botany 169

Reboud X, Zeyl C (1994) Organelle inheritance in plants. Heredity 72, Small RL, Wendel JF (2000a) Copy number lability and evolutionary 132–140. dynamics of the Adh gene family in diploid and tetraploid cotton Richman AD, Kohn JR (1999) Self-incompatibility alleles from (Gossypium). Genetics 155, 1913–1926. Physalis: implications for historical inference from balanced Small RL, Wendel JF (2000b) Phylogeny, duplication, and intraspecific genetic polymorphisms. Proceedings of the National Academy of variation of Adh sequences in New World diploid cottons Sciences of the United States of America 96, 168–172. (Gossypium, Malvaceae). Molecular Phylogenetics and Evolution doi:10.1073/PNAS.96.1.168 16, 73–84. doi:10.1006/MPEV.1999.0750 Richman AD, Uyenoyama MK, Kohn JR (1996) Allelic diversity and Small RL, Wendel JF (2002) Differential evolutionary dynamics of gene genealogy at the self-incompatibility locus in the Solanaceae. duplicated paralogous Adh loci in allotetraploid cotton Science 273, 1212–1216. (Gossypium). Molecular Biology and Evolution 19, 597–607. Rieseberg LH (1997) Hybrid origins of plant species. Annual Review of Small RL, Ryburn JA, Cronn RC, Seelanan T, Wendel JF (1998) The Ecology and Systematics 28, 359–389. and the hare: choosing between noncoding plastome and doi:10.1146/ANNUREV.ECOLSYS.28.1.359 nuclear Adh sequences for phylogenetic reconstruction in a recently Rieseberg LH, Soltis DE (1991) Phylogenetic consequences of diverged plant group. American Journal of Botany 85, 1301–1315. cytoplasmic gene flow in plants. Evolutionary Trends in Plants 5, Small RL, Ryburn JA, Wendel JF (1999) Low levels of nucleotide 65–84. diversity at homoeologous Adh loci in allotetraploid cotton Rieseberg LH, Wendel JF (1993) Introgression and its consequences in (Gossypium L.). Molecular Biology and Evolution 16, 491–501. plants. In ‘Hybrid zones and the evolutionary process’. (Ed. RG Soltis DE et al. (1997) Angiosperm phylogeny inferred from 18S Harrison) pp. 70–109. (Oxford University Press: New York) ribosomal DNA sequences. Annals of the Missouri Botanical Rokas A, Williams BL, King N, Carroll SB (2003) Genome-scale Garden 84, 1–49. approaches to resolving incongruence in molecular phylogenies. Soltis DE, Soltis PS (1998a) Choosing an approach and an appropriate Nature 425, 798–804. doi:10.1038/NATURE02053 gene for phylogenetic analysis. In ‘Molecular systematics of plants Sambrook J, Fritsch EF, Maniatis T (1989) ‘Molecular cloning: a II: DNA sequencing’. (Eds DE Soltis, PS Soltis, JJ Doyle) pp. 1–42. laboratory manual.’ 2nd edn. (Cold Spring Harbor Laboratory (Kluwer Academic Publishers: Boston) Press) Soltis PS, Soltis DE (1998b) Molecular evolution of 18S rDNA in Sanderson MJ, Doyle JJ (1992) Reconstruction of organismal and gene angiosperms: implications for character weighting in phylogenetic phylogenies from data on multigene families: concerted evolution, analysis. In ‘Molecular systematics of plants II: DNA sequencing’. homoplasy, and confidence. Systematic Biology 41, 4–17. (Eds DE Soltis, PS Soltis, JJ Doyle) (Kluwer Academic Publishers: Sang T (2002) Utility of low-copy nuclear gene sequences in plant Boston, MA) phylogenetics. Critical Reviews in Biochemistry and Molecular Soltis PS, Soltis DE, Chase MW (1999) Angiosperm phylogeny Biology 37, 121–147. inferred from multiple genes as a tool for comparative biology. Sang T, Zhang D (1999) Reconstructing hybrid speciation using Nature 402, 402–404. doi:10.1038/46528 sequences of low copy nuclear genes: hybrid origins of five Paeonia Steele KP, Holsinger KE, Jansen RK, Taylor DW (1991) Assessing the species based on Adh gene phylogenies. Systematic Botany 24, reliability of 5S ribosomal-RNA sequence data for phylogenetic 148–163. analysis in green plants. Molecular Biology and Evolution 8, Sang T, Crawford DJ, Stuessy TF (1997a) Chloroplast phylogeny, 240–248. reticulate evolution, and of Paeonia (Paeoniaceae). American Journal of Botany 84, 1120–1136. Stephens JC (1985) Statistical methods of DNA sequence analysis: Sang T, Donoghue MJ, Zhang D (1997b) Evolution of alcohol detection of intragenic recombination or gene conversion. dehydrogenase genes in peonies (Paeonia): phylogenetic Molecular Biology and Evolution 2, 539–556. relationships of putative nonhybrid species. Molecular Biology and Strand AE, Leebens-Mack J, Milligan BG (1997) Nuclear DNA-based Evolution 14, 994–1007. markers for plant evolutionary biology. Molecular Ecology 6, Sanjur OI, Piperno DR, Andres TC, Wessel-Beaver L (2002) 113–118. doi:10.1046/J.1365-294X.1997.00153.X Phylogenetic relationships among domesticated and wild species of Suh Y, Thien LB, Reeve HE, Zimmer EA (1993) Molecular evolution Cucurbita (Cucurbitaceae) inferred from a mithochondrial gene: and phylogenetic implications of internal transcribed spacer implications for crop plant evolution and areas of origin. sequences of ribosomal DNA in Winteraceae. American Journal of Proceedings of the National Academy of Sciences of the United Botany 80, 1042–1055. States of America 99, 535–540. doi:10.1073/PNAS.012577299 Swofford DL (2002) ‘PAUP*. Phylogenetic analysis using parsimony Sawyer S (1989) Statistical tests for detecting gene conversion. (*and other methods).’ (Sinauer Associates: Sunderland, MA) Molecular Biology and Evolution 6, 526–536. Taberlet P, Gielly L, Pautou G, Bouvet J (1991) Universal primers for Senchina DS, Alvarez I, Cronn RC, Liu B, Rong J, Noyes RD, Paterson amplification of three non-coding regions of chloroplast DNA. AH, Wing RA, Wilkins TA, Wendel JF (2003) Rate variation among Plant Molecular Biology 17, 1105–1109. nuclear genes and the age of polyploidy in Gossypium. Molecular Tank DC, Sang T (2001) Phylogenetic utility of the Biology and Evolution 20, 633–643. glycerol-3-phosphate acyltransferase gene: evolution and doi:10.1093/MOLBEV/MSG065 implications in Paeonia (Paeoniaceae). Molecular Phylogenetics Simmons MP, Savolainen V, Clevinger CC, Archer RH, Davis JI (2001) and Evolution 19, 421–429. doi:10.1006/MPEV.2001.0931 Phylogeny of the Celastraceae inferred from 26S nuclear ribosomal The Arabidopsis Genome Initiative (2000) Analysis of the genome DNA, phytochrome B, rbcL, atpB, and morphology. Molecular sequence of the flowering plant Arabidopsis thaliana. Nature 408, Phylogenetics and Evolution 19, 353–366. 796–815. doi:10.1038/35048692 doi:10.1006/MPEV.2001.0937 Thornton JW, DeSalle R (2000) Gene family evolution and homology: Small RL (2004) Phylogeny of Hibiscus sect. Muenchhusia genomics meets phylogenetics. Annual Review of Genomics and (Malvaceae) based on chloroplast rpL16 and ndhF, and nuclear ITS Human Genetics 1, 41–73. and GBSSI sequences. Systematic Botany, in press. doi:10.1146/ANNUREV.GENOM.1.1.41 170 Australian Systematic Botany R. L. Small et al.

Wagner A (1998) The fate of duplicated genes: loss or new function? Wendel JF, Cronn RC, Johnston JS, Price HJ (2002) Feast and famine BioEssays 20, 785–788. in plant genomes. Genetica 115, 37–42. doi:10.1002/(SICI)1521-1878(199810)20:103.3.CO;2-D doi:10.1023/A:1016020030189 Wagner A (2001) Birth and death of duplicated genes in completely White SE, Doebley JF (1999) The molecular evolution of terminal sequenced eukaryotes. Trends in Genetics 17, 237–239. ear1, a regulatory gene in the genus Zea. Genetics 153, 1455–1462. doi:10.1016/S0168-9525(01)02243-0 Whitten WM, Williams NH, Chase MW (2000) Subtribal and generic Wagner A, Blackstone N, Cartwright P, Dick M, Misof B, Snow P, relationships of Maxillarieae (Orchidaceae) with emphasis on Wagner GP, Barttels J, Murtha M, Pendleton J (1994) Surveys of Stanhopeinae: combined molecular evidence. American Journal of gene families using polymerase chain reaction: PCR selection and Botany 87, 1842–1856. PCR drift. Systematic Biology 43, 250–261. Wolfe KH, Li W-H, Sharp PM (1987) Rates of nucleotide substitution Wang R-L, Stec A, Hey J, Lukens L, Doebley J (1999) The limits of vary greatly among plant mitochondrial, chloroplast, and nuclear selection during maize domestication. Nature 398, 236–239. DNAs. Proceedings of the National Academy of Sciences of the doi:10.1038/18435 United States of America 84, 9054–9058. Waters ER (1995) The molecular evolution of the small heat-shock Wray GA, Hahn MW, Abouheif E, Balhoff JP, Pizer M, Rockman MV, proteins in plants. Genetics 141, 785–795. Romano L (2003) The evolution of transcriptional regulation in Wendel JF (1989) New World tetraploid cottons contain Old World eukaryotes. Molecular Biology and Evolution 20, 1377–1419. cytoplasm. Proceedings of the National Academy of Sciences of the doi:10.1093/MOLBEV/MSG140 United States of America 86, 4132–4136. Yu J, Hu S, Wang J, Wong GK-S, Li S et al. (2002) A draft sequence of Wendel JF, Albert VA (1992) Phylogenetics of the cotton genus the rice genome (Oryza sativa L. ssp. indica). Science 296, 79–92. (Gossypium L.): character-state weighted parsimony analysis of doi:10.1126/SCIENCE.1068037 chloroplast DNA restriction site data and its systematic and Zhang L, Pond SK, Gaut BS (2001) A survey of the molecular biogeographic implications. Systematic Botany 17, 115–143. evolutionary dynamics of twenty-five multigene families from four Wendel JF, Doyle JJ (1998) Phylogenetic incongruence: window into grass taxa. Journal of Molecular Evolution 52, 144–156. genome history and molecular evolution. In ‘Molecular systematics Zimmer EA, Martin SL, Beverly SM, Kan W, Wilson AC (1980) Rapid of plants II. DNA sequencing’. (Eds D Soltis, P Soltis, J Doyle). duplication and loss of genes coding for the α chains of hemoglobin. (Kluwer Academic Publishing) Proceedings of the National Academy of Sciences of the United Wendel JF, Schnabel A, Seelanan T (1995a) Bidirectional interlocus States of America 77, 2158–2162. concerted evolution following allopolyploid speciation in cotton (Gossypium). Proceedings of the National Academy of Sciences of the United States of America 92, 280–284. Wendel JF, Schnabel A, Seelanan T (1995b) An unusual ribosomal DNA sequence from Gossypium gossypioides reveals ancient, cryptic, intergenomic introgression. Molecular Phylogenetics and Evolution 4, 298–313. doi:10.1006/MPEV.1995.1027 Manuscript received 12 June 2003, accepted 28 January 2004

http://www.publish.csiro.au/journals/asb