<<

ARTICLE IN PRESS

Organisms, Diversity & Evolution 5 (2005) 275–283 www.elsevier.de/ode

Allopolyploid evolution in Geinae (Colurieae: ) – building reticulate species trees from bifurcating gene trees Jenny E.E. Smedmarka,b,Ã, Torsten Erikssona,b, Birgitta Bremera,b aBergius Foundation, Royal Swedish Academy of Sciences, Box 50017, SE-104 05 Stockholm, Sweden bDepartment of Botany, Stockholm University, SE-106 91 Stockholm, Sweden

Received17 September 2004; accepted27 December 2004

Abstract

A previous phylogenetic study of paralogous nuclear low-copy granule-bound starch synthase (GBSSI) gene sequences from polyploidanddiploidspecies in Geinae indicatedthatthe cladehas experiencedtwo major allopolyploidevents in its history. These were estimatedto have occurredseveral million years ago. In this extended study we test if the reticulate phylogenetic hypothesis for Geinae can be maintained when additional sequences are added. The results are compatible with the hypothesis and strengthen it in minor aspects. We also attempt to identify extant members of one of the inferredancestral lineages of the allopolyploids. On the basis of previous molecular phylogenies, one specific group has been proposed to be the descendants of this taxon. However, none of the additional paralogues belong to this ancestral lineage. A general methodis proposedfor converting a bifurcating gene tree, with multiple paralogous low-copy gene sequences from allopolyploidtaxa, into a reticulate species tree. r 2005 Gesellschaft fu¨ r Biologische Systematik. Publishedby Elsevier Gmbh. All rights reserved.

Keywords: Geinae; Rosaceae; Allopolyploidy; Reticulate phylogeny; GBSSI

Introduction ploidspeciation with single- or low-copy nuclear gene sequences, paralogous copies from a putative allopoly- The group Geinae Schulze-Menz has been suggested ploidspecies are analysedphylogenetically together with to have been shapedby allopolyploidy( Gajewski 1957, sequences from closely relatedspecies. This method 1958), an evolutionarily important process among provides a direct reconstruction of phylogenetic rela- (Stebbins 1971; Grant 1981) involving interspe- tionships of parental lineages of allopolyploidtaxa cific hybridisation followed by chromosome doubling. (Sang andZhang 1999 ). In an allopolyploid, there are To address this question, a phylogenetic study of low- homoeologous copies of the gene, contributedby copy granule-boundstarch synthase (GBSSI) gene different ancestral taxa. These gene copies will be sister sequences from species in this clade was performed to orthologues in the ancestral taxa in a phylogenetic (Smedmark et al. 2003). When reconstructing allopoly- tree, rather than to each other. Autopolyploids, on the other hand, have paralogous loci that will be more closely relatedto each other than to orthologous ÃCorresponding author. Current address: Swedish Museum of Natural History, Department of Phanerogamic Botany, Box 50007, sequences in closely relatedspecies. Several studies have SE-104 05 Stockholm, Sweden. usedthis methodsuccessfully to infer complex reticulate E-mail address: [email protected] (J.E.E. Smedmark). relationships among species, e.g., in Paeonia (Sang

1439-6092/$ - see front matter r 2005 Gesellschaft fu¨ r Biologische Systematik. Publishedby Elsevier Gmbh. All rights reserved. doi:10.1016/j.ode.2004.12.003 ARTICLE IN PRESS 276 J.E.E. Smedmark et al. / Organisms, Diversity & Evolution 5 (2005) 275–283 andZhang 1999 ), Gossypium (Cronn andWendel 2004 ), methodthat we have used,a methodthat seems to be Glycine (Doyle et al. 2004), and Elymus (Mason-Gamer generally applicable to this type of problems. 2004). The study of Geinae also provided strong evidence for reticulate historical relationships between lineages (Smedmark et al. 2003). A hypothesis suggest- ing two allopolyploidisation events with major impact Materials and methods on extant species diversity in the group was formulated, basedon the GBSSI-1 gene tree. The analysis included Samples,DNA extraction,amplification,cloning, paralogous copies from seven polyploidspecies in and sequencing Geinae, along with copies from two diploid species in this group. These diploids had been shown to belong to Four species in Colurieae andtwo Rosoideaespecies the same clade (Smedmark and Eriksson 2002), which is outside this clade were selected (Table 1)tobe the only one in Geinae known to contain extant incorporatedin an existing data set ( Smedmark et al. diploids. This previous study only identified one of the 2003). One species from the suggestedpaternal lineage two original ancestral lineages of the polyploids. was included, the decaploid Oncostylus leiospermus,as Although the phylogenetic position of the other lineage well as the closest relatives of Geinae, Sieversia Willd. can be inferredfrom the gene tree, it turnedout that no and Fallugia Endl., and the tetraploid Novosieversia extant representatives were included in the study. With glacialis. Basedon data from the trnL–trnF region reference to a previous phylogenetic study with wider (Smedmark and Eriksson 2002), the last of the above taxon sampling (Smedmark and Eriksson 2002), a species was placedin a group that wouldcorrespondto specific clade, the sister group of the remainder of the hypothesisedallopolyploidclade. Geinae, was suggested to be potential descendants of DNA extraction, PCR amplification, andcloning this second parental lineage. In this extended study we followed the procedures described by Smedmark et al. test whether the hypothesis proposedby Smedmark et (2003). Sampling was limitedto available living speci- al. (2003) about reticulate relationships within Geinae mens due to difficulties in amplifying GBSSI using holds true with increased sampling of GBSSI para- extractions from dried material. The amplification of logues. We also attempt to identify the unknown GBSSI-1 for N. glacialis, O. leiospermus, and Sieversia ancestral lineage of the allopolyploids. pentapetala was carriedout with the primers 1F1C and It is worth noting that, despite the fact that low-copy 9R1C (Smedmark et al. 2003), whereas Fallugia para- gene sequences have been usedsuccessfully in several doxa, Filipendula vulgaris, and Fragaria vesca were studies to infer the ancestry of polyploids, no method amplifiedwith the primers 1F and9R ( Alice 1997). for converting bifurcating gene trees into reticulate Sequencing reactions were performedwith a DYE- species tree has been published. We here describe the namic ET termination cycle sequencing premix kit

Table 1. List of taxa

Species Voucher Origin Clone No. clones EMBL accession screened

Fallugia paradoxa (D. T. Eriksson No. 796 USA (Colorado) F. paradoxa 2–2 7 AJ871485 Don) Endl. (SBT) Sieversia pentapetala (L.) T. Eriksson No. 749 Unknown S. pentapetala 2–2 5 AJ871484 Greene (SBT); cult. Go¨ teborg Botanic Garden Novosieversia glacialis A. Batten USA (Alaska) N. glacialis 1–2 22 AJ871488 (Adams) F.Bolle Oncostylus leiospermus M. Chase; cult. Royal New Zealand O. leiospermus 1–3, O. 14 AJ871490, AJ871489, (Petrie) F. Bolle Botanic Gardens Kew leiospermus 1–6, O. AJ871491 leiospermus 2–2 Fragaria vesca L. T. Eriksson & J.E.E. Sweden F. vesca 3–2 2 AJ871486 Smedmark 43 (SBT) Filipendula vulgaris T. Eriksson 821 Sweden F. vulgaris 3–3 1 AJ871487 Moench. (SBT)

The list follows the classification of Bolle (1933) and includes information on vouchers, origins, clones included in the analyses, numbers of GBSSI-1 clones screened, and EMBL accession numbers of the sequences (for information on Rosa multiflora and Rubus odoratus see Evans et al. 2000, for the remaining sequences see Smedmark et al. 2003). ARTICLE IN PRESS J.E.E. Smedmark et al. / Organisms, Diversity & Evolution 5 (2005) 275–283 277

Table 2. Sequencing primers usedfor Fragaria vesca and PHYML from 10,000 pseudo-replicate data sets, which Filipendula vulgaris were assembledby SEQBOOT in PHYLIP (v3.5; Felsenstein 1993). The bootstrap majority-rule consen- Primer Sequence (50–30) sus trees were constructedby CONSENSE (also Ros1_4F GGACAACCAACTTAGATTCAG PHYLIP). This process was automatedusing the Ros1_4R GCTGAATCTAAGTTGGTTGTCC BootPHYML (Nylander 2003), which ties the bootstrap Ros1_7F GCTTACCAAGGCAGATTTGC analysis together. The same model and parameter Ros1_7R AATGCAAATCTGCCTTGG settings as above were used. Bayesian posterior probabilities for clades (PPs; Larget andSimon 1999 ) were estimatedwith MrBayes. The Monte Carlo–Markov chain was run for 1,000,000 (Amersham Pharmacia Biotech), on a MegaBACE 1000 generations, andsampledevery 10 generations. Log- capillary machine (Amersham Pharmacia Biotech). likelihoodvalues were consideredto be stable after 7560 Protocols followedthose providedby the manufacturer. generations, andtherefore the last 99,244 sampledtrees The four Colurieae species were sequencedwith the six were usedto construct a majority-rule consensus tree primers usedby Smedmark et al. (2003). For the two andestimate Bayesian PPs. To make sure that the other species new sequencing primers were constructed Markov chain did not get stuck in one region of basedon GBSSI-1 sequences from Waldsteinia, Rubus, parameter space in this analysis, failing to visit other Rosa, Physocarpus, and Prunus (Table 2). regions where the posterior is as high or higher, we performedtwo additionalanalyses with the same settings, each one starting from a random tree. To Phylogenetic analyses retrieve the single sampledtree with the highest poster- ior probability, the output parameter andtree files were The Staden package (Staden 1996) was usedfor scannedusing a script that is available from the second sequence editing and assembly. Multiple sequence author. alignment was performedusing ClustalX (v1.81; Thompson et al. 1997) with the default settings, followedby corrections madeby eye in the sequence Results alignment editor Se–Al (v2.0a11; Rambaut 1996). All alignedpositions were includedin the analyses. Maxi- mum likelihoodanalyses were performedwith PHYML DNA sequences (v2.0.3; Guindon and Gascuel 2003). In addition, data Six new GBSSI-1 sequences from four Colurieae were analysedusing a Bayesian approach with MrBayes species andtwo from other Rosoideaespecies ( Table 1) (v3.0; Huelsenbeck andRonquist 2001 ). The model of were analysedtogether with 24 sequences from a sequence evolution was chosen basedon results from previous study (Smedmark et al. 2003). The sequences Modeltest (v3.06; Posada and Crandall 1998). The Hierarchical LikelihoodRatio Tests andAkaike In- rangedin length from 1754 to 1913 base pairs. The exons were free of stop codons, except for two of the O. formation Criterion both selectedthe Hasegawa–Kishi- leiospermus sequences (clones 1–3 and2–2), in which no–Yano (HKY; Hasegawa et al. 1985) nucleotide deletions in the exon sequence have damaged the substitution model. Among-site rate variation was reading frame. The aligned sequence length was 2116 allowedto follow a gamma distribution ( Yang 1996); base pairs, of which 0.2% was scoredas missing data, in all analyses the default setting of the different and1261 were variable sites. programs with four rate categories was used. All trees were rootedon the branch leading to Filipendula vulgaris, basedon the results by Morgan et al. (1994). Phylogenetic analyses A search for optimal trees was performedwith PHYML. The algorithm implementedin this program The best tree from the PHYML analysis (ln-like- starts from an initial neighbour-joining tree andadjusts lihood–14243.289) is shown in Fig. 1. The one tree with tree topology andbranch lengths simultaneously during the highest posterior probability that was sampledin the hill-climbing, which makes it fast. The model para- three Bayesian analyses (ln-likelihood–14252.489) hada meters, transition/transversion ratio; andgamma shape topology identical to that of the optimal PHYML-tree. parameter were estimatedduringthe hill-climbing Nonparametric BPs basedon 10,000 pseudo-replicates process. in PHYML are shown in Fig. 2. The three Bayesian Support for each node was assessed using bootstrap analyses foundthe same nodeswith high PPs, andthe andBayesian analyses. Nonparametric bootstrap pro- topologies of the majority-rule consensus trees were portions (BPs; Felsenstein 1985) were estimatedwith identical to that from the bootstrap analysis in ARTICLE IN PRESS 278 J.E.E. Smedmark et al. / Organisms, Diversity & Evolution 5 (2005) 275–283

Geum rivale 1-9 G. urbanum 1-2 urbanum 1-3 98 73 1.00 G. rivale 1-3 Geum montanum 5-1 0.96 94 G. montanum 5-4 Erythrocoma triflora 2-4 6x C 1.00 G. vernum 5-7 Geum vernum 5-1 100 56 0.68 Geum reptans 1-1 1.00 E. triflora 1-4 Geum heterocarpum 1-1 G. reptans 1-6

Geum rivale 1-3 I 91 G. urbanum 1-3 1-2 99 100 1.00 G. rivale 1-9 1.00 1.00 Geum montanum 5-4 G. montanum 5-1 6x J K Erythrocoma triflora 1-4 52 64 G. vernum 5-1 0.79 Geum vernum 5-7 B 0.83 100 E. triflora 2-4 Geum reptans 1-6 1.00 G. reptans 1-1 Geum urbanum 1-1 G. heterocarpum 1-1 4x Geum rivale 1-2 100 G. rivale 1-2 Geum montanum 5-2 99 1.00 G. urbanum 1-1 Erythrocoma triflora 5-1 Geinae 100 1.00 Geum vernum 5-5 1.00 64 G. montanum 5-2 1.00 6x H G. vernum 5-5 Geum reptans 1-5 99 58 1.00 Geum heterocarpum 3-1 G 1.00 E. triflora 5-1 58 G. reptans 1-5 Coluria geoides 5-2 E 1.00 Waldsteinia geoides 2-3 48 G. heterocarpum 3-1 4x 0.99 F Oncostylus leiospermus 1-6 98 100 W. geoides 2-3 2x 1.00 Oncostylus leiospermus 1-3 A 1.00 79 C. geoides 5-2 2x Oncostylus leiospermus 2-2 1.00 80 O. leiospermus 1-3 10x Novosieversia glacialis 1-2 98 1.00 D O. leiospermus 1-6 10x Colurieae 100 1.00 Fallugia paradoxa 2-2 1.00 96 O. leiospermus 2-2 10x 1.00 Sieversia pentapetala 2-2 N. glacialis 1-2 4x Fragaria vesca 3-2 100 F. paradoxa 2-2 4x Rosa multiflora 1.00 S. pentapetala 2-2 2x Rubus odoratus Roperculina 100 Fragaria vesca 3-2 Filipendula vulgaris 3-3 1.00 Rosa multiflora 0.1 substitutions / site Rubus odoratus Fig. 1. The best tree from the PHYML analysis (ln-likelihood Filipendula vulgaris 3-3 –14243.288), with branch lengths drawn in proportion to the Fig. 2. The optimal tree, with support values indicated for amount of change. Numbers after species names are clone each node. Bootstrap proportions obtained with PHYML, identifiers. basedon 10,000 pseudo-replicates,are shown above branches, and Bayesian posterior clade probabilities below. Nodes discussed in the text are marked with capital letters. Numbers PHYML. All nodes but five in the majority-rule after species names are clone identifiers, followed by ploidy consensus trees hada posterior probability of 1.00 in level. the three analyses. The support for these five nodes differed by 0.01–0.05 between the separate runs. The values from one of the analyses are shown in Fig. 2. hexaploidspecies. Within cladeA, the three paralogues In the optimal tree from the PHYML analysis (Figs. 1 from O. leiospermus, together with the one from N. and2 ), Rubus odoratus is the sister of a clade comprising glacialis, form the sister-group (clade D) of the the two lineages Colurieae andRoperculina. Within remainder of the clade (clade E). Colurieae (Fig. 2), S. pentapetala is sister to the rest of the clade, in contrast to previous analyses (Smedmark andEriksson 2002 ) that suggested Fallugia as the sister Discussion to the remainder of the group. In Geinae (Fig. 2) there are three major clades of GBSSI-1 clones: A, B and C. Allopolyploidy in Geinae Clade A contains paralogues from all included Geinae species, whereas the other two, which are sister groups, The topology of the GBSSI-1 phylogeny with only consist of additional paralogues from polyploids. extended species sampling is compatible with the Paralogues from each of the six hexaploidspecies can be hypothesis about reticulate species history in Geinae found in all of the three major clades. Clades C, H, and J (Smedmark et al. 2003). Two of the loci present in the contain only copies from hexaploidspecies. Two of hexaploids are more closely related to paralogues from these have identical topologies (C and H), whereas the other species than they are to each other (Fig. 2, nodes thirdis less resolvedbut congruent with the others. The H andJ). Homoeologues of the tetraploid Geum two paralogues from the tetraploidspecies, G. hetero- heterocarpum are sister to each of these. Therefore, carpum, also appear in different parts of the tree (clades two of the subgenomes of the hexaploids were hypothe- B andG). Each one is sister to a cladeof copies from sisedto have been inheritedfrom the ancestral lineage of ARTICLE IN PRESS J.E.E. Smedmark et al. / Organisms, Diversity & Evolution 5 (2005) 275–283 279 s Sieversia pentapetala Sieversia Fallugia paradoxa Fallugia Oncostylus leiospermu Coluria geoides Geum heterocarpum Waldsteinia geoides Waldsteinia Geum rivale ? Geum montanum Geum reptans Geum urbanum Novosieversia glacialis Novosieversia Erythrocoma triflora Geum vernum

2x 4x10x 4x2x 2x 4x 6x 6x 6x 6x 6x 6x 2x

F D

C, H, J

B, G

E

A

Geinae

Colurieae

Fig. 3. Hypothesis of reticulate historical relationships within Colurieae. Thick grey lines represent organismal lineages, thin black lines the nuclear low-copy gene GBSSI-1. The phylogeny includes two instances where new lineages have arisen as a result of allopolyploidy. Capital letters refer to clades in Fig. 2.

G. heterocarpum (Fig. 3). The only group within Geinae likelihoodtree ( Fig. 2, nodes G, J, and K) which were known to include diploids, here represented by Wald- not supportedin the previous analysis ( Smedmark et al. steinia geoides and Coluria geoides,(Fig. 2, node F) is 2003), andone nodepresent in that analysis is missing the sister group of one of these polyploid clades (node here. All of these changes strengthen the hypothesis G). Hence its ancestral lineage was hypothesisedto have about allopolyploidspeciation. been one of the parents in an initial hybridisation that Basedon the evidencepresentedhere, it is not possible gave rise to the tetraploidlineage, representedby G. to identify any extant members of the other of the two heterocarpum (Fig. 3). The other two loci of the diploid lineages involved in the origin of the allopoly- hexaploids, clades C and J, were hypothesised to have ploids. The hypothesis suggesting that this ancestor originated in an unidentified diploid ancestral lineage belongedto the Oncostylus lineage (Smedmark et al. (Fig. 3). Three nodes are present in the maximum 2003), which is the sister group of the remainder of ARTICLE IN PRESS 280 J.E.E. Smedmark et al. / Organisms, Diversity & Evolution 5 (2005) 275–283

Geinae (Smedmark and Eriksson 2002), can, however, polyploid Acomastylis has been shown to be polyphy- not be fully dismissed. In the previous GBSSI study letic (Smedmark and Eriksson 2002), the diploid Coluria (Smedmark et al. 2003), one clade (corresponding to is likely to be monophyletic basedon morphological Fig. 2, node E) was congruent with the tree based on the evidence. It remains possible that some species classified trnL/trnF region of the chloroplast (Smedmark and in one of these groups actually belong to the inferred Eriksson 2002), andthus was assumedto reflect the diploid ancestral lineage. phylogeny of homoeologues of maternal descent. An- All the analyses in this paper give strong support other clade (Fig. 2, node I) comprised two other GBSSI- (Fig. 2; BP 98, PP 1.00) to a topology that differs from 1 loci. These were suggestedto be derivedfromthe previously reportedresults in the respect that Sieversia, pollen parent lineage involvedin the allopolyploidisa- rather than Fallugia, is the sister of the remainder of tions. If included in the analysis, a paralogue from the Colurieae. The trnL/trnF andITS regions both render paternal lineage wouldbe the sister of either cladeI or C moderate or low support for the monotypic Fallugia (Figs. 2 and3 ), depending on whether the species being the sister of Colurieae, whereas Sieversia, consist- diverged before or after the first allopolyploidisation ing of the two species S. pentapetala and S. pusilla,is event. Instead, the Oncostylus paralogues in the present resolvedas the secondbranch of extant species in study are found in clade D, which is the sister-group of Colurieae (Smedmark and Eriksson 2002). The second clade E (Figs. 2 and3 ). Only three distinct paralogues major paralogue of GBSSI foundin Rosaceae ( Evans et were foundof this species, although the species is al. 2000), GBSSI-2, also supports this latter topology decaploid and thus might be expected to have five (Smedmark, unpublished data). The different topology GBSSI-1 loci. The fact that they form a group indicates presentedin this paper is also not retrievedwhen introns that the species, at least in part, is of autopolyploid are prunedfrom the dataset (not shown). The two origin. In the same clade the only identified paralogue of Sieversia species are diploid, whereas F. paradoxa is the tetraploid N. glacialis is also found. As a conse- tetraploid, but only one paralogue each of GBSSI-1 and quence of the fact that all possible paralogues of O. GBSSI-2 has been foundandsequencedfor the latter. leiospermus or N. glacialis were not found, the hypoth- Seven GBSSI-1 clones from F. paradoxa were screened, esis can neither be falsifiednor verified.There is a in an attempt to findthe inferredsecondparalogue. This possibility that there are homoeologous copies of the corresponds to a 1.6% probability of not sequencing gene from these species belonging to the other ancestral both paralogues by chance. However, all seven clones lineage of the allopolyploids. If instead all five putative were identical. The divergent result of GBSSI-1 pre- paralogues of the decaploid O. leiospermus, or the two sentedhere is difficult to explain basedon available of the tetraploid N. glacialis, hadbeen foundto belong evidence. Visual comparison of the GBSSI-1 paralogue to clade D, the ancestral lineage of these species would with the GBSSI-2 paralogue from S. pentapetala shows have been shown not to have been involvedin the origin that, throughout the sequence, each one is more similar of the allopolyploids. Considerable effort was put into to the corresponding paralogue of F. paradoxa than they finding additional paralogues. For N. glacialis, 22 clones are to each other. This indicates that recombination from six separate PCRs were screened. If both between the two S. pentapetala paralogues has not taken paralogues amplify andclone equally well, sampling place. Possible scenarios, which cannot be addressed in 22 colonies results in a 4.8 10–7 probability of not detail here, may involve, e.g., hybridisation or lineage sequencing both paralogues by chance. For O. leiosper- sorting. mus, 14 clones from five PCRs were screened, which gives a probability of 1.0 10–7 of not finding all five paralogues, provided that they amplify and clone A method for converting a bifurcating gene tree into equally well. There are many possible explanations to a reticulate tree why not all putative paralogues were found. For example, they may have diverged too much for the Commonly usedalgorithms for phylogenetic infer- primers to match, in which case they wouldnot be ence reconstruct strictly diverging relationships among amplified; they also could have been deleted from the ancestral lineages. Therefore the process of converting a genome, undergone homogenisation, or we may have gene tree, with multiple paralogous loci from polyploids, failedto sample them by chance. into a reticulate taxon tree may appear to be somewhat Although the seconddiploidancestral lineage may be arbitrary. Here we describe, step by step, the method extinct, there remains a possibility that it just has not that we have usedto accomplish this. been sampledfor this study.If the latter is the case, we The first step is to determine whether a reticulate tree suggest that possible candidates may be found among is the best description of species relationships. To do the species that have been classifiedin Coluria or this, the different gene copies from each polyploid taxon Acomastylis. Distinction between these two genera has are located. If copies from the same species form a not always been straightforward( Bolle 1933). While the group, they may either represent allelic variation or ARTICLE IN PRESS J.E.E. Smedmark et al. / Organisms, Diversity & Evolution 5 (2005) 275–283 281

2x1 6x 2x2 6x4x 6x 4x 2x3

A B

a

2x1 6x 2x2 6x 4x 2x3

C D

AB

b

2x 2x 2x1 6x 2 4x 3

C D

A B

c

Fig. 4. Graphical representation of the methodusedfor transforming a gene tree of allopolyploidsinto a reticulate organismal tree. In (a), the nodes A and B have been identified as connection points (lowermost lineage splits of primary polyploid with lower ploidy level). In (b), these are joined by a dashed line, and the clades including the primary polyploid are combined and moved onto the connection. Also, nodes C and D are indicated as the next points of connection as in (a) (lowermost lineage splits of secondary polyploid with lower ploidy level). In (c), nodes C and D have been joined by a dashed line and the hexaploid clade moved onto the connection, completing the reticulate tree. ARTICLE IN PRESS 282 J.E.E. Smedmark et al. / Organisms, Diversity & Evolution 5 (2005) 275–283 duplicate loci. The latter may be due to either gene ing taxa of the same ploidy level. Thereafter, proceed duplication or genome duplication, autopolyploidy. It is with taxa of the next ploidy level (secondary polyploids) sometimes difficult to distinguish alleles from loci, but as before. Findthe point where the lineages of the copies the variation between loci is normally considerably of a secondary polyploid branch off from lower ploidy greater than the variation foundamong alleles. A levels (Fig. 4b, nodes C and D). Connect these with a Southern blot can usually show whether several loci line andmove the combinedpolyploidtaxon onto the are present in the taxon, but further sampling may also connecting branch (Fig. 4b, c). Repeat this process until help to sort out relationships among multiple copies. all multiple copies have been replacedby taxa. For a review on gene level processes, acting on alleles In some cases, as in the present study, representatives andparalogues, that may leadto erroneous conclusions of ancestral lineages may be missing from the analysis. about taxon phylogeny, see Doyle andDavis (1998) .If This makes the reconstruction of the taxon tree more copies from the same species insteadappear in different difficult. However, in such cases the points of reticula- parts of the tree, it is first determined whether they occur tion (i.e. where to attach the connecting hybridisation in congruent clades. If this is the case, and the clades are lines) may be inferredfrom the points of attachment of sister groups, they represent different paralogues that the putative allopolyploidterminal. For example, in the have originatedin a gene or genome duplication. If the gene tree in Fig. 2, node E should be connected with clades, on the contrary, are found to be more closely node I, and the congruent clades G and B (including the relatedto copies from differentspecies, the paralogues primary polyploid) combined. To continue with the have originatedin separate lineages andhave been secondary polyploid, move to the next ploidy level split, brought together in the common ancestor of the group, the now combined node H/J, and connect it to node C. e.g., as a result of hybridisation. The same is true if the The last step is to combine this thirdcongruent cladeof copies are not in congruent clades, but each one is found copies from the secondary (hexaploid) taxa with the to be more closely relatedto taxa of lower ploidylevel other two (nodes H/J/C). This leads to the conclusion than to each other. Both cases indicate that allopoly- that a diploid taxon donated the genome C, and this ploidy has taken place, in which case historical relation- inferred lineage may be added to the species tree for ships among taxa are best representedby a reticulate clarity, as in Fig. 3. phylogeny. We have testedthis methodon fairly complicated To construct a reticulate tree, start by selecting a examples of hypothetical gene trees derived from known species, or group of species, identified as being of reticulate taxon trees, andit has recoveredthe correct potential allopolyploidorigin according to the reasoning taxon tree in all cases. Therefore, the described method above. This taxon should be of the second-lowest ploidy may be of general relevance for solving these kinds of level present in the gene tree. We call this a primary problems. polyploid. To begin tracing the ancestry of this primary polyploidtaxon, start at the root of the tree andfollow the branches upwards to the point where the lineage of a Acknowledgements copy from the primary polyploidfirst branches off from a lineage of lower ploidy level (Fig. 4a, node A). Note The authors wouldlike to thank the Go ¨ teborg that both the branch leading to the primary polyploid Botanic Garden for providing material of Sieversia and its sister group may include copies of higher ploidy pentapetala, Alan R. Batten for material of Novosiever- levels. Thus, it is the split between the least inclusive sia glacialis, andNahidHeidari for help in the lab. clades including a primary polyploid copy, and that including a copy of lower ploidy level, that should be found. Return to the root of the tree and follow the References branches upwards as before, to a second point where another copy of the primary polyploidbranches off Alice, L.A., 1997. Molecular phylogenetic systematics of from a copy of lower ploidy level (Fig. 4a, node B). Rubus (Rosaceae). Ph.D. dissertation, University of Maine, Connect these two points by a line, indicating a Orono, Maine. hybridisation event (Fig. 4a, dashed line). The two Bolle, F., 1933. Eine U¨ bersicht u¨ ber die Gattung Geum L. und branches leading to the primary polyploid are then die ihr nahestehenden Gattungen. Feddes Repert. Beih. 72, combinedinto a single one, andmovedonto the 1–119. Cronn, R., Wendel, J.F., 2004. Cryptic trysts, genomic connecting hybridisation line. Should the copies of a mergers, andplant speciation. New Phytol. 161, 133–142. primary polyploidtaxon be foundin congruent groups Doyle, J.J., Davis, J.I., 1998. Homology in molecular that include copies of higher ploidy level, the entire phylogenetics: a parsimony perspective. In: Soltis, D.E., groups are mergedandmovedonto the connecting line Soltis, P.S., Doyle, J.J. (Eds.), Molecular Systematics of (Fig. 4b, node D). When all copies of the first primary Plants II. Kluwer Academic Publishers, Massachusetts, polyploidtaxon are treated,continue with any remain- pp. 101–131. ARTICLE IN PRESS J.E.E. Smedmark et al. / Organisms, Diversity & Evolution 5 (2005) 275–283 283

Doyle, J.J., Doyle, J.L., Rauscher, J.T., Brown, A.H.D., 2004. Morgan, D.R., Soltis, D.E., Robertson, K.R., 1994. Systema- Diploidandpolyploidreticulate evolution throughout the tic andevolutionary implications of rbcL sequence varia- history of the perennial soybeans (Glycine subgenus tion in Rosaceae. Am. J. Bot. 81, 890–903. Glycine). New Phytol. 161, 121–132. Nylander, J.A.A., 2003. BootPHYML. Computer program Evans, R.C., Alice, L.A., Campbell, C.S., Kellogg, E.A., distributed by the author. Evolutionary Biology Centre, Dickinson, T.A., 2000. The granule-boundstarch synthase Uppsala University, Uppsala, Sweden. (GBSSI) gene in the Rosaceae: multiple loci andphyloge- Posada, D., Crandall, K.A., 1998. MODELTEST: testing the netic utility. Mol. Phylogenet. Evol. 17, 388–400. model of DNA substitution. Bioinformatics 14, 817–818. Felsenstein, J., 1985. Confidence limits on phylogenies: an Rambaut, A., 1996. Se-Al: Sequence Alignment Editor, approach using the bootstrap. Evolution 39, 783–791. version 2.0a11. Computer program, available at http:// Felsenstein, J., 1993. PHYLIP, version 3.5. Computer evolve.zoo.ox.ac.uk/4 program available at http://evolution.gs.washington.edu/ Sang, T., Zhang, D., 1999. Reconstructing hybridspeciation phylip.html4 using sequences of low copy nuclear genes: hybridorigins Gajewski, W., 1957. A cytogenetic study on the Geum. of five Paeonia species basedon Adh gene phylogenies. Syst. Monogr. Bot. 4, 3–414. Bot. 24, 148–163. Gajewski, W., 1958. Evolution in the genus Geum. Evolution Smedmark, J.E.E., Eriksson, T., 2002. Phylogenetic rela- 13, 378–388. tionships of Geum (Rosaceae) andrelatives inferred Grant, V., 1981. Plant Speciation. Columbia University Press, from the nrITS and trnL–trnF regions. Syst. Bot. 27, New York, pp. 193–347. 303–317. Guindon, S., Gascuel, O., 2003. A simple, fast, and accurate Smedmark, J.E.E., Eriksson, T., Evans, R.C., Campbell, C.S., algorithm to estimate large phylogenies by maximum 2003. Ancient allopolyploidspeciation in Geinae (Rosa- likelihood. Syst. Biol. 52, 696–704. ceae): evidence from nuclear granule-bound starch synthase Hasegawa, M., Kishino, H., Yano, T., 1985. Dating the (GBSSI) gene sequences. Syst. Biol. 52, 374–385. human–ape split by a molecular clock of mitochondrial Staden, R., 1996. The Staden sequence analysis package. Mol. DNA. J. Mol. Evol. 22, 160–174. Biotech. 5, 233–241. Huelsenbeck, J.P., Ronquist, F.R., 2001. MRBAYES: Baye- Stebbins, G.L., 1971. Chromosomal Evolution in Higher sian inference of phylogeny. BioInformatics 17, 754–755. Plants. Edward Arnold, London. Larget, B., Simon, D., 1999. Markov chain Monte Carlo Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., algorithms for the Bayesian analysis of phylogenetic trees. Higgins, D.G., 1997. The ClustalX windows interface: Mol. Biol. Evol. 16, 750–759. flexible strategies for multiple sequence alignment aided by Mason-Gamer, R.J., 2004. Reticulate evolution, introgression, quality analysis tools. Nucleic Acids Res. 25, 4876–4882. andintertribal gene capture in an allohexaploidgrass. Syst. Yang, Z, 1996. Among-site variation andits impact on Biol. 53, 25–37. phylogenetic analyses. Trends Ecol. Evol. 11, 367–371.