<<

SAJB-00968; No of Pages 12 South African Journal of xxx (2013) xxx–xxx

Contents lists available at SciVerse ScienceDirect

South African Journal of Botany

journal homepage: www.elsevier.com/locate/sajb

Phylogenetic utility of 19 low copy nuclear genes in closely related genera and species of caesalpinioid legumes

Marielle Babineau ⁎,EdelineGagnon,AnneBruneau

Institut de Recherche en Biologie Végétale and Département de Sciences biologiques, Université de Montréal, 4101 rue Sherbrooke est, Montréal, Québec, Canada H1X 2B2 article info abstract

Available online xxxx Numerous studies have identified low copy nuclear genes (LCNG) with phylogenetic potential throughout angiosperms, several specifically focused on Leguminosae. However, phylogenetic resolu- Edited by B-E Van Wyk tion at the species- to subspecies-level is often inferred based only on a small subset of taxa scattered throughout the higher level study group. This study aims to reassess the phylogenetic resolution of 19 Keywords: previously published nuclear regions in Leguminosae using 18 species from two clades of the Caesalpineae clade representing both distantly related genera and closely related species. Nuclear regions s.l fi Delonix were ampli ed and aligned throughout the sampled taxa. Sequences were cloned when polymor- – Duplication phism was noted. The plastid loci matK, rps16, trnL, trnD trnT and the nuclear ribosomal ITS regions Low copy nuclear genes were also analyzed for comparison. Phylogenetic analyses using parsimony were performed on indi- Phylogeny vidual matrices. Three nuclear regions were eliminated due to the non-specificity of the primers (RNAH, PTSB, GI). Four regions showed lower resolution than predicted from previous studies (MMK1, CYB6, RBPCO, EFGC), while three revealed greater resolution than anticipated (SQD1, AT103, EIF3E). Three other markers indicated previously unidentified duplication events for the Caesalpinia s.l. (ATCP and AROB) and at the base of the Caesalpinia and Peltophorum clades (CALTL). Phylogenies reconstructed from the intron-spanning regions AIGP, SHMT, AT103 and EIF3E are congru- ent with ITS and plastid data and show the best phylogenetic potential for studies of closely related species of caesalpinioid legumes. We present a screening strategy for the evaluation of LCNG for phy- logenetic studies. © 2013 SAAB. Published by Elsevier B.V. All rights reserved.

1. Introduction 2000; Soltis et al., 2009). Low copy nuclear genes (LCNG) in are being developed rapidly (e.g., Armisen et al., 2008; Chapman One of the important criteria in choosing loci for molecular et al., 2007; Choi et al., 2006; Doyle et al., 1999, 2002; Duarte phylogenetic studiesishavingsufficient levels of variation to distin- et al., 2010; Ilut and Doyle, 2012; Li et al., 2008; Steele et al., guishandgrouptaxa,butnottoomuchsoastocauseproblemsof 2008). However, most of these studies examine relationships homology. Because of the inadequacy of the chloroplast genome to based upon a few scattered speciesofthesamefamily,orinafew accurately resolve relationships among closely related genera studies, a diversity of species from distantly related angiosperm and species in some groups, the last decade has been focused on families: rarely have these LCNG been tested for their phylogenetic generating, evaluating and testing low copy nuclear loci for species value outside of the group for which they were developed. There- identification and evolutionary studies. The use of nuclear loci in fore the assessment of genus to subspecies level resolution is indi- phylogenetic studies is especially crucialinlightofthelargenum- rect and often speculative because of the lack of sampling of sister ber of species thought to be directly implicated in, or evolved genera and species. In addition, these indirect estimations can be in- from, multiparental hybridization, ancient and recent polyploidy, accurate due to specific properties of the nuclear genome such as and lateral gene transfer (Ishikawa et al., 2009; Rieseberg et al., hybridization, genome or gene duplication, levels of sequence vari- ation of exons and introns, and incomplete lineage sorting. These phenomena can lead to an under or overestimation of the level of resolution. Hence, preliminary studies of nuclear genes are essential to evaluate the level of sequence variation of a specificlocusfor ⁎ Corresponding author. Tel.: +1 514 343 6111x82356. phylogenetic reconstruction and to estimate copy number and pos- E-mail addresses: [email protected] (M. Babineau), [email protected] (E. Gagnon), [email protected] sible dynamics of gene duplication (Hughes et al., 2006; Soltis and (A. Bruneau). Soltis, 1998).

0254-6299/$ – see front matter © 2013 SAAB. Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.sajb.2013.06.018

Please cite this article as: Babineau, M., et al., Phylogenetic utility of 19 low copy nuclear genes in closely related genera and species of caesalpinioid legumes, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.018 2 M. Babineau et al. / South African Journal of Botany xxx (2013) xxx–xxx

Here we assess the utility of 19 published LCNG for phylogenetic 2.2. Screening strategy for LCNG markers analysis, by comparing sequences across 18 species of caesalpinioid Leguminosae whose phylogenetic relationships are poorly resolved An initial round of amplification was carried out on all taxa to using chloroplast loci. Our test group includes two distantly related screen for markers that produced a clear single band from the PCR clades in the Caesalpineae clade, one for testing generic level varia- reaction. These were sequenced directly. If the PCR was successful tion, the other for examining species level variation. Caesalpinia s.l. and free of polymorphisms, it was assumed that a single copy gene consists of seven, possibly nine, distinct genera (Gagnon et al., in was amplified, with a single allele. If a single double-peaked nucleo- press; Lewis, 2005). The second group comprises a set of closely re- tide was detected on the sequencing chromatogram, the presence lated species of the genus Delonix, and three related genera in the of two allelic sequences was deduced. If multiple double-peaked Peltophorum clade. The complementary structure of these two nucleotides were detected on direct sequencing chromatograms, groups establishes a three-tier sampling strategy that allows us to in- this was interpreted as the presence either of two different alleles vestigate the level of variation at the species, the generic, and the or of more than one gene copy. To distinguish these two possibilities, clade level for several low copy nuclear genes and to determine the samples were cloned, and three to six transformed colonies were their phylogenetic utility. sequenced.

2. Materials and methods 2.3. Molecular methods

2.1. Selection of taxa and markers Extraction of DNA was carried out using a modified CTAB pro- tocol, as described by Joly et al. (2006), or the QIAGEN DNeasy The first group sampled consists of ten species belonging to Plant Mini Kit (Mississauga, ON), following the manufacturer's Caesalpinia s.l. that have been proposed to belong to different generic instructions. lineages (but several species have not been published under the All PCR amplifications for nuclear and plastid loci were done in new generic combination): Caesalpinia pulcherrima (genus Caesalpinia reaction volumes of 25 μl, with 1× Taq DNA polymerase buffer without sensu stricto), bonduc, Coulteria pumila (genus Coulteria), MgCl2 (Roche Diagnostics, Indianapolis, IN), 3.0 mM of MgCl2,200μM Mezoneuron scortechinii, Libidibia paraguariensis, Caesalpinia mimosifolia of each dNTP (Fermentas, Burlington, ON), 0.4 μM of each primer, 3 μg (genus Erythrostemon)andPoincianella palmeri, Tara spinosa,aswellas of bovine serum albumin (New England Biolabs, Ipswich, MA), 0.03% Caesalpinia decapetala and Caesalpinia madagascariensis, whose generic of tween-20, 3% of pure DMSO, one unit of Taq polymerase, and 50– affiliations are uncertain. The second group consists of five closely relat- 300 ng of genomic DNA. ed species of Delonix (Delonix pumila, Delonix floribunda, Delonix Plastid and nuclear ribosomal DNA regions were amplified boiviniana, Delonix velutina and ), and racemosa, according to previously published PCR cycling protocols: the trnL which studies using chloroplast sequences have found to occur as nested (UAA) intron was obtained following Bruneau et al. (2001) with within the Delonix clade (Haston et al., 2005; Hawkins et al., 2007; primers trnL-C and trnL-D (Taberlet et al., 1991). The matK gene Simpson et al., 2003). Included within this second test group are the and 3′-trnK intron were amplified as described in Bruneau et al. closely related Conzattia multiflora and the more distantly related (2008), with primers trnK685F (Hu et al., 2000), trnK4La Parkinsonia aculeata, see Supplementary Data for voucher information. (Wojciechowski et al., 2004), trnK2R*(Wojciechowski et al., 2004) A total of 19 nuclear regions were chosen to be tested across and KC6 (Bruneau et al., 2008). The trnD–trnT region was amplified our set of taxa. Thirteen of these (RNAH, PTSB, MMK1, AIGP, as described in Shaw et al. (2005), with primers trnD, trnT, trnE tRALs, CTP, CALTL, ATCP, SHMT, EFGC, UDPGD, CYB6, RBPCO)are and trnY, while the rps16 region was obtained using primers from Choi et al. (2006) and were specifically designed for the rps16F and rps2R from Oxelman et al. (1997). Primers AB101 and family Leguminosae. Another six markers (AGT1, GI, AROB, EIF3E, AB102 from Douzery et al. (1999) were used to amplify the ITS AT103, SQD1) are COSII genes that were selected from Li et al. region. (2008) and which were developed to amplify across the angio- For the LCNG markers designed by Li et al. (2008), we followed sperms. Both intron-spanning regions and exon-only loci were PCR cycling conditions similar to those used by the authors, but selected. with a longer extension step (1 min 30 s). For markers designed by Sequences for these nuclear regions also were retrieved from Choi et al. (2006), we used an initial denaturing step at 94 °C for GenBank in order to evaluate the degree of resolution. 3 min, followed by 35 cycles of three steps at 94 °C for 30 s, 53 °C for max, Glycine tomentella, Phaseolus vulgaris, Pisum sativum, Medicago 30 s, and 72 °C for 2 min, concluding with a final extension step at truncatula, Vigna radiata, tenuis, var. japonicus, 72 °C for 5 min. Wisteria sp. and Wisteria sinensis were selected from the Choi et al. All PCR amplification products were subsequently purified (2006) study. Sequences for P. aculeata and D. regia from Choi et al. using a 20% PEG protocol (Joly et al., 2006)andsequencedin (2006) also were used in a subset of nuclear loci to verify that the both directions using Big Dye Terminator 3.1 chemistry (Applied correct sequences were indeed obtained in our own study. Sequences Biosystems, Carlsbad, CA), as described in Bruneau et al. (2008). of Prunus avium and of Ranunculus bulbosus were selected from the After an ethanol–sodium acetate precipitation followed with study of Li et al. (2008). G. max or G. tomentella were used as outgroup two 70% ethanol washes, sequencing products were run on an taxa to root the for the nuclear loci developed by Choi et al. ABI 3100-Avant automated sequencer (Applied Biosystems). The (2006), P. avium or M. truncatula were used to root the gene trees chromatograms were assembled and visually inspected using for markers from Li et al. (2008),andP. aculeata was used to root Sequencher (versions 3.1–4.6, Gene Codes Corporation, Ann Arbor, trees for the comparative matrices. MI). To assess the phylogenetic signal of these loci, results from After direct sequencing, sequences that needed cloning were theLCNGanalyseswerecompared with results from a dataset of amplified in triplicate reactions to help reduce PCR recombinants plastid markers (trnL intron, matK gene and trnK-3′ intron, rps16, and Taq induced errors (Cronn et al., 2002; Joly et al., 2006; Judo trnD–trnT intergenic spacer) and a second dataset consisting of et al., 1998). Cloning was performed using the CloneJET PCR cloning nuclear ribosomal internal transcribed spacer sequences (ITS). The kit (Fermentas), following manufacturer instructions, but with complete dataset was assembled using 121 published sequences lower concentrations of reagents for the sticky-end protocol and li- available from GenBank and 314 sequences generated in this study gation reactions. All sequenced transformed colonies were visually (Table A in Appendix). inspected upon alignment; sequences that were suspected to be

Please cite this article as: Babineau, M., et al., Phylogenetic utility of 19 low copy nuclear genes in closely related genera and species of caesalpinioid legumes, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.018 M. Babineau et al. / South African Journal of Botany xxx (2013) xxx–xxx 3 chimeras were submitted to a BLAST search (www.ncbi.nlm.nih.gov/ (C. multiflora, C. racemosa, D. regia and D. velutina). Although re- blast) and eliminated if they did not correspond to sequences pub- duced in its taxonomic sampling, the AGT1 region was kept for lished in GenBank. further analysis because it nonetheless could be of interest for relationships among closely related species. The 16 successfully 2.4. Phylogenetic analyses amplified regions were either exon-spanning regions that were easy to align with the outgroup taxa or intron-spanning regions Sequences were aligned using the default settings in Muscle (Edgar, for which the alignments with the outgroup taxa were more ambig- 2004), and visually inspected and manually corrected using BioEdit uous. There were no major length differences between the two test 7.3.1 (Hall, 1999). We noted the variation in sequence length for groups (Table 1). each marker, as well as the final length of the aligned sequences in the Nine markers (ATCP, EFGC, EIF3, AROB, AT103, CALTL, RBPCO, matrix. When present, we verified the number of introns for each SQD1, tRALs) showed polymorphism and were cloned for some locus, their positions, and compared their lengths in the Caesalpinia or all of the samples. In the Delonix test group, five markers and Delonix groups. (ATCP, EIF3E, AROB, CALTL, RBPCO) were cloned for all the species, Because our end goal was to quickly assess the utility of these whereas in the Caesalpinia s.l. test group, two markers were cloned markers, we carried out relatively fast and basic parsimony analy- for all species (ATCP, RPBCO) and the remaining seven were cloned ses. The program Seqstate (Müller, 2006) was used to create for particular species showing polymorphism (EFGC for T. spinosa, Nexus files; for simplicity, we did not code any indels. The parsimo- C. madagascariensis, C. pumila; EIF3E for Poincianella palmeri; ny analyses were performed in PAUP* (Swofford, 2002)usingatwo- AROB for L. paraguariensis, P. palmeri, G. bonduc, C. mimosifolia, C. step analysis (Davis et al., 2004) to improve the search for optimal madagascarienis, M. scortechinii; AT103 for C. madagascariensis, C. trees. The first heuristic search consisted of 1000 replicates of mimosifolia, C. decapetala; CALTL for L. paraguariensis, C. pulcherrima, C. random addition sequence, bisection–reconnection branch- madagascariensis, M. scortechinii, C. mimosifolia, P. palmeri, C. pumila; swapping, retaining five most parsimonious trees at each replicate. SQD1 for C. mimosifolia, P. palmeri, L. paraguariensis; tRALs for P. palmeri, This was followed by a second heuristic search with the same G. bonduc, M. scortechinii). settings from all trees in memory, but with a maximum of 5000 For the ten intron-spanning markers, the same number of in- trees kept in memory. A total of 5000 bootstrap replicates were trons as reported by Choi et al. (2006) and Li et al. (2008) was performed to assess branch support, with also a maximum of 5000 found in our two test groups (Table 1). Intron lengths are the trees kept in memory per replicate. A strict consensus tree for same for MMK1and tRALs intron 1 in the two test groups. In the each marker was produced. Delonix test group, introns are shorter than in the Caesalpinia s.l. We assessed the resolution of each nuclear gene based on the tree group for tRALs intron 2, SHMT intron 1, AT103 and AROB,whereas topology characteristics obtained from each matrix. Markers which they are longer for AIGP and SHMT intron2.TheDelonix group yielded a topology with one large polytomy for each of the two test showed high variability in length for the first intron of EIF3E and groups were identified as having a family to tribe level of resolution. Caesalpinia s.l. group species showed high variability in intron Markers that resolved relationships among Caesalpinia s.l. species, but length for CALTL, ATCP and EIF3E intron2 (Table 1). The intron for presented a polytomy within the Delonix test group were considered ATCP could not be aligned and therefore was removed from the sub- as resolving relationships at the generic level. Markers that revealed sequent analyses. bifurcating dichotomous branches among species in both test groups and that showed the two groups as separate clades were identified 3.2. Phylogenetic analyses as useful for resolving relationships among closely related species. By assessing the monophyly or non-monophyly of the cloned copies from For the phylogenetic analyses of the successfully amplified 16 the same sample in a tree and by evaluating the topologies obtained, LCNG, plastid and nuclear ribosomal datasets, the percentage of we deduced whether more than one gene copy was present for a partic- resolved clades did not seem to be correlated with the percentage ular marker. of parsimony informative characters (Table 1). The phylogenies We concatenated the matrices for the loci that showed species-level based on ITS and the concatenation of the plastid loci are fully resolution and performed a phylogenetic analysis as described above. resolved and show generally congruent relationships (Fig. 1). A strict consensus tree was generated in order to assess whether The LCNG presented here show much higher percentage of parsi- the use of concatenated multiple low copy nuclear regions is effective mony informative characters than the plastid regions and are in recovering the species tree (Sang, 2002). We were confident that more similar to the percentage found in ITS (Fig. 2). Of the 16 suc- we obtained the species tree if the consensus tree showed a well- cessful LCNG tested in our study, the regions ATCP, CALTL,and resolved topology that was more congruent with plastid and ITS data SHMT showed the highest percentage of parsimony informative than each of the individual trees obtained from the independent loci. characters (Fig. 2). In general, exon-spanning markers had higher The individual matrices had to be altered in order to be concatenated levels of homoplasy (mean CI 0.56, RI 0.80) than intron-spanning and to run the analysis: we omitted the outgroup taxa and when a markers (mean CI 0.67, RI 0.86) and yielded a lower proportion of taxon lacked the sequence for a particular locus we retained the taxon resolved clades (mean 68% vs. 83% for intron-spanning loci). The with missing values in the Nexus file. clones for EFGC, EIF3E, AT103, SQD1 and tRALs formed monophyletic groups in both test groups. However, for the highly polymorphic 3. Results CALTL, AROB, ATCP and RBPCO loci, the clones sequenced did not re- solve into monophyletic groups (Fig. 4, Appendix Fig. A), indicating 3.1. Amplification, cloning and alignment thepossiblepresenceofmultipledivergentallelesorofmultiple gene copies. Amplifications failed for RNAH and PTSB in both test groups. The resulting topologies from the phylogenetic analyses al- Amplifications were difficult for GI and AGT1.TheGI region resulted lowed us to divide the LCNG into four categories based on their in an amplification product for the Delonix test group after doubling level of resolution (Table 1, Fig. 2). The markers CYB6, RBPCO, the number of cycles in the PCR protocol, but it was eliminated EFGC,andSQD1 showed a family to tribe level resolution, where- because the sequences obtained matched mitochondrial sequences as UDPGD, CTP, MMK1, AGT1 and tRALs resolved relationships as revealed by a BLASTn similarity search in the GenBank database. among distantly related genera (Appendices Figs. A and B respec- For AGT1, only four species amplified in the Delonix test group tively). Four regions, AIGP, SHMT, AT103 and EIF3E,showedgood

Please cite this article as: Babineau, M., et al., Phylogenetic utility of 19 low copy nuclear genes in closely related genera and species of caesalpinioid legumes, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.018 4 M. Babineau et al. / South African Journal of Botany xxx (2013) xxx–xxx

Table 1 Sequence characteristics and statistics for parsimony analyses of each of the matrices analyzed in the phylogenetic evaluation of the 16 successfully amplified low copy nuclear loci. Length, aligned length and number of trees retained are in base pairs calculated excluding the outgroup taxa. The percentage of unresolved clades is calculated on strict consensus tree excluding the root node. Symbols (+ and −) in length and intron length represent differences compared to corresponding lengths from Choi et al. (2006) and Li et al. (2008):+or− = ≤100 bp, ++ or −− =101–200 bp, +++ or −−− =201–300 bp, ++++ or −−−− = ≥900 bp.

Marker Length (bp) Aligned No. and length No. of CI RI % resolved No. of copies Resolution from Resolution length of introns trees clades found this study according retained in this study to Choi or Li

Exon UDPGD 191–205 206 5 0.6 0.76 87 1 Genus–distant spp Fam–genus CYB6 194–242 242 8 0.73 0.82 50 1 Family–subfamily Fam–genus RBPCO 222–320 321 5000 0.34 0.87 62 1 Family–subfamily Fam–genus CTP 261–314 314 153 0.53 0.71 81 1 Tribe–distant genus Fam–genus EFGC 134–213 213 1120 0.61 0.84 54 1 Family–subfamily Fam–genus SQD1 185–275 276 5000 0.55 0.8 72 1 Tribe Fam Mean 0.56 0.8 68 Intron MMK1 244–296 (−)4911)71–76 (−) 15 0.66 0.77 89 1 Genus–distant spp Fam–subsp spanning AIGP 317–420 481 1) 82–103 1 0.74 0.85 100 1 Species Fam–sp tRALs 276–356 371 2) 60–102, 62–76 24 0.67 0.88 86 1 Genus–distant spp Fam–sp CALTL 521–599 738 3) 73–91, 85–115, 5000 0.56 0.91 78 2 Genus–species (dup) Fam–subsp 100–144 ATCP 341–476 550 1) 90–225 5000 0.47 0.83 72 2 Species (dup) Trib–subsp SHMT 399–616 825 2) 98–159, 4 0.77 0.82 94 1 Species Trib–subsp 99–302 (−−) AT103 343–749 (+++) 1048 1) 151–587 534 0.74 0.9 76 1 Species Fam (−/+++) AGT1 717–835 (−−−) 850 1) 526–528 1 0.89 0.88 100 1 Genus Genus AROB 416–535 (−−−−) 560 1) 282–365 5000 0.63 0.96 74 Possibly 3 Distant spp (dup) Genus EIF3E 688–794 902 2) 78–113, 107–211 428 0.65 0.82 76 1 Species Fam Mean 0.67 0.86 83 Comparative matk 1552–1590 1600 1 9 0.68 0.86 84 1 Genus–distant spp matrices trnL 297–512 524 0 6 0.9 0.98 67 1 Tribe–genus trnD–trnT 1122–1351 1454 3 51 0.71 0.83 91 1 Species rps16 789–808 818 1 380 0.66 0.82 25 1 Family–subfamily Combined cp 3961–4221 4402 5 1 0.67 0.84 100 1 Species ITS 524–663 743 2 2 0.57 0.66 93 1 Species Mean 0.70 0.83 77

resolution among closely related species in both the Delonix and treated as a distinct category (Fig. 4). CALTL shows one duplication Caesalpinia s.l. test groups (Fig. 3). The other three markers, CALTL, event, with duplicated copies present in both test groups, suggesting AROB and ATCP, indicated the presence of duplicated copies and are that the duplication occurred in the most recent common ancestor

93 99 70 C. pumila 56 96 C.madagascariensis 60 C. mimosifolia 99 60 P. palmeri 91 50 60 68 L. paraguariensis 62

M. scortechinii 40 56 69 67 100 C. decapetala 90 100 G. bonduc 30 C. pulcherrima

D. boiviniana 20 99 88 100 D. regia 69 D. floribunda 10

95 70 97 % Parsimony informative characters 91 D. pumila

100 D. velutina 58 0 81 ITS

Colvillea trnL CTP matK AIGP rps16 CYB6 AGT1 ATCP SQD1 tRALs EIF3E EFGC SHMT AROB MMK1 AT103 Conzattia CALTL RBPCO UDPGD trnD-trnT combined cp

Fig. 1. Strict consensus trees from the phylogenetic analysis of concatenated plastid Fam-tribe Distant genera Species Duplicated Comparative loci regions rps16, trnL, matK and trnD–trnT, as well as ribosomal ITS, serving as comparative matrices, for the Delonix and Caesalpinia s.l. test groups in the reassessment of low copy Fig. 2. Histogram of percentage of parsimony informative characters for the 16 successfully nuclear genes. Bootstrap values over 50% from parsimony analyses are indicated above amplified low copy nuclear markers and for the comparative matrices trnL, rps16, matK, the branches. trnD–trnT and ITS.

Please cite this article as: Babineau, M., et al., Phylogenetic utility of 19 low copy nuclear genes in closely related genera and species of caesalpinioid legumes, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.018 M. Babineau et al. / South African Journal of Botany xxx (2013) xxx–xxx 5

Glycine max Phaseolus vulgaris 100 Phaseolus vulgaris Vigna radiata 98 Vigna radiata Lotus corniculatus Medicago truncatula 100 Medicago truncatula 100 96 Pisum sativum Pisum sativum P. palmeri Parkinsonia aculeata D. boiviniana 94 L. paraguariensis 100 100 D. pumila M. scortechinii 100 80 97 C. decapetala D. velutina 63 D. floribunda 90 C. madagascariensis 100 78 D. regia 100 88 T. spinosa 88 64 C. pulcherrima L. paraguariensis Parkinsonia aculeata 97 P. palmeri 95 Conzattia multiflora C. mimosifolia D. regia 95 87 61 M. scortechinii Colvillea racemosa 56 85 C. decapetala 92 D. boiviniana 100 G. bonduc AIGP 71 D. velutina SHMT C. madagascariensis 70 D. floribunda C. pumila 73 D. pumila T. spinosa Ranunculus bulbosus Wisteria sinensis Ranunculus bulbosus Prunus avium 97 L. paraguariensis Prunus avium Parkinsonia aculeata 94 P. palmeri 96 100 Conzattia multiflora C. mimosifolia 95 100 Colvillea racemosa C. pulcherrima 94 D. pumila 70 C. decapetala 68 D. floribunda G. bonduc 97 D. velutina 100 66 T. spinosa D. boiviniana 92 D. regia Parkinsonia aculeata(3) C. madagascariensis(6) 99 Conzattia multiflora(4) L. paraguariensis 97 94 Colvillea racemosa(2) 88 C. palmeri 98 C. mimosifolia(4) 98 D. regia(3) 91 T. spinosa 61 D. boiviniana(4) 76 C. pulcherrima D. velutina(3) 87 G. bonduc EIF3E 82 AT103 D. floribunda(3) 66 M. scortechinii 88 79 D. pumila(5) C. decapetala(6)

Fig. 3. Strict consensus of phylogenetic analyses for low copy nuclear intron-spanning regions AIGP, EIF3E, SHMT and AT103, which show resolution among closely related species in two groups of the Caesalpineae clade. Bootstrap values over 50% from parsimony analysis are indicated above the branches.

of the Caesalpinia s.l. and Peltophorum clades. The ATCP phylogeny sug- 4. Discussion gests a duplication in the common ancestor of the Caesalpinia s.l. clade and another independent duplication in the common ances- 4.1. Phylogenetic utility of the LCNG tor of the Delonix clade (Fig. 4). The AROB phylogeny suggests a pat- tern with one or possibly two duplication events in some species of Our study reveals differences in the phylogenetic utility of the the Caesalpinia s.l. clade. One duplication seems to have taken place nuclear regions previously tested by Choi et al. (2006), Li et al. at the base of the Caesalpinia s.l. clade where clones of representa- (2008) and with other studies which have analyzed the same nu- tives of the ten species of Caesalpinia s.l. sampled divide into two clear loci (e.g., Barkley et al., 2008; Duminil et al., 2010; Friesen distinct clades. Another duplication is hypothesized to have taken et al., 2010; Keeling, 2009; Krak et al., 2012; Richards et al., place for two species of the Caesalpinia s.l. clade, which occur in 2006; Scherson et al., 2005; Tripp and Fatimah, 2012). Of the the Delonix clade: L. paraguariensis and P. palmeri appear to have 16 remaining LCNG tested, nine were found to resolve relation- three copies of AROB. The pattern seen for these latter species ships at the taxonomic level specified in previous studies, but might also be explained by allelic variation or incomplete lineage only at the highest taxonomic level predicted in those studies. sorting. For example, CYB6 was predicted by Choi et al. (2006) to resolve The strict consensus topology obtained from the concatenated relationships from family to species level, but our results suggest matrix of all nuclear loci that showed potential for species level the degree of taxonomic resolution to be at the family to subfam- resolution (EIF3E, AIGP, AT103,andSHMT)iscompletelyresolved ily level at best. In our analysis, the loci MMK1, CYB6, RBPCO and (Appendix Fig. C). In this concatenated analysis, generic and spe- EFGC resolved relationships at higher taxonomical levels and cies phylogenetic relationships are not entirely congruent with SQD1, AT103 and EIF3E were able to resolve phylogenetic rela- plastid and ITS data and are a mixture of the four different consen- tionships at lower taxonomic levels than predicted by the studies sus trees obtained from the individual species-level resolution of Choi et al. (2006) and Li et al. (2008).Two(RBPCO and AROB) markers (Fig. 3). showed phylogenetic relationships that were incongruent with

Please cite this article as: Babineau, M., et al., Phylogenetic utility of 19 low copy nuclear genes in closely related genera and species of caesalpinioid legumes, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.018 6 M. Babineau et al. / South African Journal of Botany xxx (2013) xxx–xxx

Glycine max Glycine max Medicago truncatula Phaseolus vulgaris Phasolus vulgaris P. palmeri A(2) 98 Vigna radiata 79 Vigna radiata L. paraguariensis A Pisum sativum Lotus corniculatus 69 C. mimosifolia A(7) 93 Medicago truncatula Pisum sativum C. pulcherrima A Lotus corniculatus 100 Medicago truncatula 53 T. spinosa A Parkinsonia aculeata A L. paraguariensis A 70 C. pumila A 85 Colvillea racemosa A C. madagascariensis A(7) T. spinosa A D. regia A 100 100 73 C. decapetala A C. pulcherrima A 65 D. boiviniana A(2) 100 G. bonduc A(6) C. madagascariensis A(3) Colvillea racemosa B(2) M. scortechinii A(7) G. bonduc A Parkinsonia aculeata B(2) 89 P. palmeri 1B 91100 C. decapetala A Conzattia multiflora B(4) 89 89 M. scortechinii A(4) D. floribunda 1B G. bonduc B 93 L. paraguariensis 1B Parkinsonia aculeata A(3) D. velutina B(3) 58 Parkinsonia aculeata 1B(2) Conzattia multiflora A D. floribunda 3B(3) 66 C. paraguariensis 2B D. boiviniana A D. regia B(2) 100 C. palmeri 2B D. regia A(2) D. boiviniana B 84 D. floribunda 2B C. paraguariensis 5B D. floribunda A(3) D. pumila B(3) 73 Parkinsonia aculeata 3B 86 D. pumila A(3) 71 L. paraguariensis A(2) C. paraguariensis 3B 90 D. boiviniana A(2) 73 P. palmeri A C. paraguariensis 4B 97 D. velutina A(3) 67 C. mimosifolia A(3) Parkinsonia aculeata 2B Conzattia multiflora B(2) C. madagascariensis A(3) Conzattia multiflora(4) D. boiviniana B T. spinosa A(2) 99 D. velutina(2) C. pumila A(2) D. velutina B(2) D. velutina 59 M. scortechinii A(4) 76 D. floribunda B(2) D. velutina(2) 99 C. decapetala A(4) 59 D. pumila B(3) D. regia(3) L. paraguariensis B(5) 100 100 Colvillea racemosa B(3) C. palmeri 1C C. mimosifolia B(3) 99 D. regia B(3) 59 Colvillea racemosa(4) 53 P. palmeri B(3) L. paraguariensis B(5) C. madagascariensis B(3) D. boiviniana(5) C. mimosifolia B(4) 76 T. spinosa B(4) D. floribunda 98100 P. palmeri B(6) G. bonduc B(4) D. pumila(3) 73 C. pumila B(6) 87 C. decapetala B(2) C. palmeri 2C 87 M. scortechinii B(2) M. scortechinii B(2) C. paraguariensis 1C 65 AROB CALTL 55 C. madagascariensis B(3) ATCP C. pumila B(4) C. palmeri 3C 58 C. pulcherrima B(5) C. pulcherrima B(5) D. floribunda(2)

Fig. 4. Strict consensus of phylogenetic analyses for low copy nuclear intron-spanning regions AROB, ATCP and CALTL, which show duplication events in the two Caesalpinieae test groups. Bootstrap values over 50% from parsimony analysis are indicated above the branches. Duplication events are indicated by black symbols; circle, duplication in Caesalpinia s.l. group; triangle, duplication in Delonix group. Letters following the species names indicate the putative copy type. When two or more clones for the same individual together lacked any significant structure, the clade was collapsed and the number of clones sequenced is indicated in parentheses.

plastid and ITS sequence data and yielded complex patterns of derived species representing five tribes within the Leguminosae polymorphism; they are therefore considered to be of little or (Wojciechowski et al., 2004). Conversely, we expected the markers no phylogenetic utility. from Li et al. (2008) to be more difficult to amplify and more variable The six exon-spanning markers do not resolve clades at the pre- in their assessment of the taxonomic level of usefulness since dicted taxonomic level: they resolve clades at higher taxonomic they were evaluated from 25 distantly related species across 15 levels (e.g., subfamilial) and therefore have a lower phylogenetic evolutionarily diverse angiosperm families. However, when dif- resolution in our test groups than previously predicted for the ferences were noted the reverse expectations were observed: Papilionoideae. The low levels of nucleotide differences detected in markers from Choi et al. (2006) lacked phylogenetic resolution at exons of the Caesalpineae clade species sampled, and which resulted low taxonomic level and those from Li et al. (2008) showed better in low levels of phylogenetic resolution, suggest that, at least for the phylogenetic resolution at low taxonomical levels. This suggests groups and loci studied, Caesalpinieae exons are more conserved that evaluation of levels of phylogenetic resolution for closely relat- than Papilionoideae exons. Although this may be indirect evidence ed species may be more accurate when tested, not with distantly re- for a more active genomic history in Papilionoideae (Citerne et al., lated species of the same family (method of Choi et al., 2006), but 2003; Doyle et al., 1995, 1997, 2003; Hougaard et al., 2008), for the rather on a very large subset comprising many species from differ- purposes of our study it clearly indicates that the level of phyloge- ent families, as undertaken by Li et al. (2008). Alternatively, this netic resolution for a specific nuclear marker may be variable within could simply indicate that the more species sampled in an analysis, the large Leguminosae family. regardless of the family, the better the evaluation of phylogenetic We expected the markers from Choi et al. (2006) to more accu- resolution. rately correspond to levels of phylogenetic resolution assessed here Our study confirms the phylogenetic utility for closely related because they had been designed and tested on seven recently genera and species of Caesalpinieae of four LCNG (Fig. 3). The

Please cite this article as: Babineau, M., et al., Phylogenetic utility of 19 low copy nuclear genes in closely related genera and species of caesalpinioid legumes, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.018 M. Babineau et al. / South African Journal of Botany xxx (2013) xxx–xxx 7

Fig. 5. Screening methodology for preliminary studies of the phylogenetic utility of low copy nuclear genes for closely related genera and species. Modified from the screening methodology proposed by Small et al. (2004) and Sang (2002).

differences in species relationships among these four markers, species-level resolution (Appendix Fig. C)supportstheideathat and in comparison with the ITS and plastid topologies, may result using multiple nuclear regionstogetherinananalysismaybe from the history or function of these genes. The phylogenetic an appropriate approachincomplextaxatoachievebetterphy- relationships of these four LCNG are assumed to represent the logenetic results, higher congruence with other markers, better species relationship (versus gene histories) as they are individually resolved clades, and possibly a topology much closer to the spe- congruent with aspects of the species biology (morphology, habitat, cies tree than to a gene tree (Brumfield et al., 2008; Maddison, pollinators), previously published phylogenies (Bruneau et al., 1997; Page, 2000; Sang, 2002). Matrix concatenation of multiple 2008; Haston et al., 2005; Simpson et al., 2003) and preliminary independent loci is an efficient way to extract cryptic phyloge- unpublished data from our laboratory. The surprisingly high vari- netic information from individual loci that do not resolve, or ability in sequence and intron length for AT103 suggests a strong po- support, certain clades, and loci which show minor to intermedi- tential of this marker for phylogenetic studies among closely related ate congruence problems (Gadagkar et al., 2005; McMahon and species. Sanderson, 2006; Page and Charleston, 1997; Rokas and Carroll, Our success in obtaining a well resolved and congruent spe- 2005;butseeCarstens and Knowles, 2007; Kubatko and Degnan, cies tree by concatenating the four markers that demonstrated 2007).

Please cite this article as: Babineau, M., et al., Phylogenetic utility of 19 low copy nuclear genes in closely related genera and species of caesalpinioid legumes, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.018 8 M. Babineau et al. / South African Journal of Botany xxx (2013) xxx–xxx

The increased use of LCNG in developing legume phylogenies remains tentative and further sampling would be necessary to is one of the next logical steps in the systematics of this group evaluate the biological and/or methodological causes for the topo- (LPWG, 2013), and for systematics of angiosperms as a whole. logical pattern. In contrast, our study did not recover the cryptic These markers are able to resolve relationships at lower taxonomic duplication for tRALs found by Scherson et al. (2005) in Astralagus. levels than most plastid regions previously used, and help Allelic variation was identified in our group but the clones are untangle relationships in clades where species-level relationships species-monophyletic (Appendix Fig. B) and therefore give no in- have remained problematic, such as in the Caesalpinia s.l. and dication of a duplication. Peltophorum clades (Haston et al., 2005; Simpson et al., 2003). When multiple gene copies or diverging alleles are found, the Low copy nuclear genes also are an alternative to the traditional design of specific primers is an interesting option in order to amplify nuclear ribosomal ITS sequences that have been severely criticized only one copy/allele (Small et al., 2004). Our analyses suggest that (Álvarez and Wendel, 2003; Feliner and Rosselló, 2007; Small exon copy-specific primers could easily be designed for the different et al., 2004). The methodological and interpretation issues associ- copy types (between two and three) of AROB for the Caesalpinia s.l. ated with the nuclear genome are complex and can seem over- species. It also would be possible to differentiate the two ATCP exon whelming (recombination, orthology assessment of gene copies, copies for the genus Caesalpinia s.l. However, the design of specific incomplete lineage sorting, instraspecies/populational/individual primers for the two copies of CALTL seems difficult, and because polymorphism, gene tree versus species tree). Many studies have the two copies are of similar sizes, differentiation by gel migration suggested new phylogenetic analytical methods to facilitate is unlikely. incorporation of data from the nuclear genome in systematics Although sometimes considered less interesting candidates for (Chapman et al., 2007; Choi et al., 2006; Creer, 2007; Duarte phylogenetic use, duplicated nuclear genes can be useful in un- et al., 2010; Hughes et al., 2006; Ilut and Doyle, 2012; Li et al., derstanding gene function and expression, as well as morphology 2008; Maureira-Butler et al., 2008; Small et al., 2004; Steele and chromosomal duplication (Doyle et al., 2003). Identification et al., 2008; Strand et al., 1997). As more studies begin to use of duplication events can also be useful for rooting phylogenetic the same LCNG in different taxonomic groups, a better assessment trees in cases where no outgroup taxa are available or if the of their level of taxonomic resolution, and of phylogenetic/ phylogenetic position of the taxa under study is unclear or expression/functional information content will be obtained. The isolated (Sang, 2002). In addition, high amounts of phylogenetic development of new LCNG requires extensive molecular sampling information can be recovered in duplicated markers (Fig. 2), implicating the availability of more DNA material sometimes not emphasizing the fact that duplicated loci have a strong potential possible if samples are from herbarium specimens. It also requires for phylogeny reconstruction (Sanderson and McMahon, 2007; a trial and error step in order to design, create, amplify, sequence, Simmons et al., 2000). Given the methods and tools available and analyze a multitude of primers, which can be a long and ex- to handle processing and interpretation of duplicated loci, we pensive process. Therefore, a less costly, and less time consuming believe they should not be automatically ignored in phylogenetic alternative is the testing pre-existing LCNG available for a particu- studies. lar taxonomic group.

4.2. Duplication events 4.3. Screening strategy for pilot study of nuclear genes in phylogenetic analysis The two clades seen in each test group in the tree topologies of ATCP, CALTL and AROB (Fig. 4) are likely to be the result of In light of our study, we propose a screening protocol for orthologous gene copies rather than of allelic variation. First, similar choosing nuclear genes, including laboratory-related decisions, sequences from distant taxa cluster together systematically and, data analysis methods and plausible interpretation avenues, and when sufficient phylogenetic resolution is present, each of the dupli- to estimate more accurately the level of taxonomic resolution for cated clades shows species relationships congruent with plastid and LCNG (Fig. 5). This screening strategy is partly based on the ITS data. Second, exon sequences across all gene copies are very screening methodology proposed by Small et al. (2004), adding similar, whereas intron sequences vary substantially among clades, important steps based on our own observations and those of but are highly similar within clades. Sang (2002). It provides a precise methodology to facilitate and These duplications were not found in the studies of Choi et al. correctly evaluate the accuracy of the taxonomic resolution and (2006) and Li et al. (2008), suggesting either that they are unique phylogenetic utility of LCNG among taxa. As such, this screening to the Caesalpinieae clade or that only a dominant copy was se- strategy will improve and make the use of multiple LCNG more ef- quenced in the two previous studies. A population genetic study ficient in molecular ecology, systematics, and population genetics. of Medicago confirms the single copy assessment of Choi et al. It is a good starting point for a researcher wanting to conduct a (2006) in papilionoids for CALTL (Friesen et al., 2010). However, thorough preliminary study leading to the use of new nuclear Scherson et al. (2005) reported double peaks in the chromato- loci in any group of species. grams of the papilionoid genus Astralagus for ATCP and CALTL, indi- Along with the screening strategy presented, we believe our cating the possibility that some papilionoid genera may have the results can be used to rapidly, and efficiently, produce nuclear duplicated loci. AROB, or 3-dehydroquinate synthase, is one of genome-based phylogenies of Leguminosae at different taxonomic many closely related enzymes in the ARO pathway of aromatic bio- levels (subspecies to subfamily), all of which will help to better un- synthesis (Keeling, 2009; Richards et al., 2006). We have identified derstand the evolution of this ecologically and economically impor- (BLASTn) the two to three hypothesized copies of the Caesalpinia tant family of angiosperms. s.l. species recovered in our study all as AROB homologues, suggestingthattheremaybemultiplegenecopiesinatleastone of the closely related enzymes of the ARO pathway. The AROB Acknowledgements pattern is complex for the Caesalpinia s.l. test group because some species show one duplication (two copy types) and some show This study was supported financially by a grant from the Natural two possible duplications (three copy types) suggesting either Sciences and Engineering Research Council of Canada to AB and a Fonds many copy losses in multiple taxa, alternate gain/loss of copies, or de Recherche Québécois sur la Nature et les Technologies (FRQNT) allelic variation within copies. Thus the duplication assessment fellowship to EG.

Please cite this article as: Babineau, M., et al., Phylogenetic utility of 19 low copy nuclear genes in closely related genera and species of caesalpinioid legumes, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.018 M. Babineau et al. / South African Journal of Botany xxx (2013) xxx–xxx 9

Appendix A

Glycine max Glycine max Vigna radiata Phaseolus vulgaris 97 Phaseolus vulgaris 75 Vigna radiata Wisteria sp. Lotus corniculatus Pisum sativum Medicago truncatula 85 Medicago truncatula Wisteria sp. Lotus corniculatus Pisum sativum 62 C. paraguariensis 56 T. spinosa (4) C. madagascariensis C. madagascariensis (3) 71 T. spinosa 53 C. pumila (4) M. scortechinii M. scortechinii C. pulcherrima C. pulcherrima P. palmeri 67 C. madagascariensis C. mimosifolia C. decapetala C. pumila 58 G. bonduc C. decapetala L. paraguariensis G. bonduc P. palmeri Parkinsonia aculeata* C. mimosifolia 52 D. regia* Parkinsonia aculeata* Conzattia multiflora Parkinsonia aculeata D. pumila 79 D. velutina Colvillea racemosa Colvillea racemosa D. floribunda D. regia D. boiviniana D. floribunda D. velutina 87 D. regia* a Parkinsonia aculeata c D. pumila D. regia D. boiviniana Glycine max Vigna radiata Wisteria sinensis 69 Phaseolus vulgaris Medicago truncatula Prunus avium 91 Pisum sativum Lotus corniculatus 100 Ranunculus bulbosus C. mimosifolia A 92 G. bonduc A Parkinsonia aculeata 57 C. madagascariensis A P. palmeri A(2) D. floribunda D. floribunda A D. boiviniana A(2) D. boiviniana D. regia (4) D. velutina (3) Colvillea racemosa D. pumila A(2) Parkinsonia aculeata A(3) 61 D. pumila P. palmeri B D. regia 50 Conzattia multiflora A(3) Colvillea racemosa (5) Conzattia multiflora 57 C. pulcherrima A D. floribunda B(3) 84 Conzattia multiflora B D. velutina Parkinsonia aculeata B 83 T. spinosa D. pumila B(2) T. spinosa A C. pulcherrima L. paraguariensis A 86 D. boiviniana B C. madagascariensis Conzattia multiflora C C. madagascariensis B C. pumila C. pumila A(2) 100 C. mimosifolia B C. decapetala L. paraguariensis B(2) 72 C. pulcherrima B C. mimosifolia A(2) G. bonduc B(2) 60 61 M. scortechinii 78 C. decapetata A(3) 83 M. scortechinii A(4) 78 G. bonduc C(3) G. bonduc C. madagascariensis C(4) C. decapetala B(3) L. paraguariensis A(3) 93 M. scortechinii B(3) C. palmeri A(3) C. pulcherrima C(6) C. mimosifolia C(3) L. paraguariensis B(3) L. paraguariensis C(3) P. palmeri C(4) d P. palmeri B(4) C. pumila B(4) 71 b T. spinosa B(6) C. mimosifolia (4)

Fig. A. Strict consensus of phylogenetic analyses for low copy nuclear loci that show a family to subfamily level of resolution among Leguminosae and . Bootstrap values above 50% from parsimony analysis are indicated under the branches. When two sequences for the same species are available, * indicates the sequence retrieved from GenBank. a) CYB6, b) RBPCO,c)EFGC,andd)SQD1.

Please cite this article as: Babineau, M., et al., Phylogenetic utility of 19 low copy nuclear genes in closely related genera and species of caesalpinioid legumes, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.018 10 M. Babineau et al. / South African Journal of Botany xxx (2013) xxx–xxx

Glycine max Phaseolus vulgaris Glycine tomentella 97 Vigna radiata Vigna radiata Lotus corniculatus Phaseolus vulgaris 53 Medicago truncatula Lotus corniculatus 99 Pisum sativum 51 Pisum sativum Conzattia multiflora 95 97 Medicago truncatula D. pumila Wisteria sp. D. boiviniana 95 Conzattia multiflora 100 61 D. velutina Parkinsonia aculeata 62 D. floribunda 100 Parkinsonia aculeata* Colvillea racemosa 91 D. boiviniana 67 D. regia 94 D. regia 95 D. regia* Colvillea racemosa Parkinsonia aculeata 98 D. velutina 100 Parkinsonia aculeata* 100 95 D. floribunda C. pulcherrima 85 D. pumila C. madagascariensis L. paraguariensis P. palmeri 99 99 P. palmeri (5) 83 L. paraguariensis 100 C. madagascariensis 66 C. mimosifolia T. spinosa T. spinosa 98 C. mimosifolia 89 C. pumila 87 G. bonduc (6) M. scortechinii 61 C. decapetala C. decapetala a 94 d 67 65 M. scortechinii (6) G. bonduc

Glycine max Phaseolus vulgaris Vigna radiata Vigna radiata Phaseolus vulgaris Lotus corniculatus Wisteria sp. 71 Medicago truncatula 54 Lotus tenuis 100 Pisum sativum Glycine tomentella Wisteria sp. 67 L. paraguariensis Pisum sativum 75 88 Medicago truncatula P. palmeri Parkinsonia aculeata 81 C. mimosifolia D. pumila 84 M. scortechinii Colvillea racemosa 71 C. pumila D. floribunda C. decapetala 93 D. velutina 61 G. bonduc 51 D. regia C. pulcherrima D. regia* 95 78 C. madagascariensis Conzattia multiflora D. regia* 85 D. boiviniana 55 T. spinosa L. paraguariensis 97 Conzattia multiflora C. madagascariensis 100 Colvillea racemosa 67 C. pulcherrima 62 D. floribunda C. mimosifolia D. boiviniana P. palmeri D. regia T. spinosa 77 D. pumila M. scortechinii D. velutina b G. bonduc e Parkinsonia aculeata C. decapetala 54 Parkinsonia aculeata*

Phaseolus vulgaris Prunus avium Conzattia multiflora

100 Colvillea racemosa c D. regia D. velutina

Fig. B. Strict consensus of phylogenetic analyses for low copy nuclear loci which show resolution among genera and distantly related species within two groups of the Caesalpineae clade. Bootstrap values over 50% from parsimony analysis are indicated under the branches. When two sequences for the same species are available, * indicates the sequence retrieved from GenBank. a) tRALs,b)UDPDG,c)AGT1,d)MMK1, and e) CTP.

Please cite this article as: Babineau, M., et al., Phylogenetic utility of 19 low copy nuclear genes in closely related genera and species of caesalpinioid legumes, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.018 M. Babineau et al. / South African Journal of Botany xxx (2013) xxx–xxx 11

Parkinsonia aculeata Doyle, J.J., Doyle, J.L., Brown, A.H.D., 1999. Incongruence in the diploid B-genome species Conzattia multiflora complex of Glycine (Leguminosae) revisited: histone H3-D alleles versus chloroplast haplotypes. Molecular Biology and Evolution 16, 354–362. D. regia Doyle, J.J., Doyle, J.L., Brown, A.H.D., Palmer, R.G., 2002. Genomes, multiple origins, and 99 76 Colvillea racemosa lineage recombination in the Glycine tomentella (Leguminosae) polyploidy complex: – 100 D. boiviniana histone H3-D gene sequences. Evolution 56, 1388 1402. Doyle, J.J., Doyle, J.L., Harbison, C., 2003. Chloroplast-expressed glutamine synthetase in 57 D. velutina Glycine and related Leguminosae: phylogeny, gene duplication, and ancient polyploidy. 85 D. floribunda Systematic Botany 28, 567–577. 91 D. pumila Duarte, J.M., Wall, P.K., Edger, P.P., Landherr, L.L., Ma, H., Pires, J.C., Leebens-Mack, J., fi L. paraguariensis dePamphilis, C.W., 2010. Identi cation of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various P. palmeri taxonomical levels. BMC Evolutionary Biology 10, 61–79. 100 85 C. mimosifolia Duminil, J., Heuertz, M., Doucet, J.L., Bourland, N., Cruaud, C., Gavory, F., fi C. madagascariensis Doumenge, C., Navascues, M., Hardy, O.J., 2010. CpDNA-based species identi - cation and phylogeography: application to African tropical tree species. Molecular 66 C. pulcherrima Ecology 19, 5469–5483. 70 T. spinosa Edgar, R.C., 2004. MUSCLE: a multiple sequence alignment method with high accuracy – M. scortechinii and high throughput. Nucleic Acids Research 32, 1792 1797. 67 Feliner, G.N., Rosselló, J.A., 2007. Better the devil you know? Guidelines for insightful 100 C. decapetala utilization of nrDNA ITS in species-level evolutionary studies in plants. Molecular 70 G. bonduc Phylogenetics and Evolution 44, 911–919. Friesen, M.L., Cordeiro, M.A., Penmetsa, R.V., Badri, M., Huguet, T., Aouani, M.E., Cook, D.R., Nuzhdin, S.V., 2010. Population genomic analysis of Tunisian Medicago truncatula Fig. C. Strict consensus of the phylogenetic analysis for the four concatenated low copy reveals candidates for local adaptation. The Plant Journal 63, 623–635. nuclear loci, EIF3E, AIGP, SHMT, AT103, which show species-level resolution among Gadagkar, S.R., Rosenberg, M.S., Kumar, S., 2005. Inferring species phylogenies from two groups of the Caesalpineae clade. Bootstrap values over 50% from parsimony analysis multiple genes: concatenated sequence tree versus consensus gene tree. The Journal are indicated under the branches. of Experimental Zoology 304B, 64–74. Gagnon, E., Lewis, G.P., Sotuyo, S., Hughes, C.E., Bruneau, A., 2013w. A molecular phylogeny of Caesalpinia sensu lato: increased sampling reveals new insights and more genera than expected. South African Journal of Botany (in press). Appendix B. Supplementary data Hall, T.A., 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows. Nucleic Acids Symposium Series 41, 95–98. Haston, E.M., Lewis, G.P., Hawkins, J.A., 2005. A phylogenetic reappraial of the Supplementary data to this article can be found online at http://dx. Peltophorum group (Caesalpinieae: Leguminosae) based on the chloroplast trnL-F, doi.org/10.1016/j.sajb.2013.06.018. rbcL and rps16 sequence data. American Journal of Botany 92, 1359–1371. Hawkins, J.A., Boutaoui, N., Cheun, K.Y., Klinken, R.D., Hughes, C.E., 2007. Intercontinental dispersal prior to human translocation revealed in a cryptogenic invasive tree. New Phytologist 175, 575–587. References Hougaard, B.K., Madsen, L.H., Sandal, N., de Carvalho Morentzsohn, M., Fredslund, J., Schauser, L., Nielsen, A.M., Rohde, T., Sato, S., Tabata, S., Bertiolo, D.J., Stougaard, J., Álvarez, I., Wendel, J.F., 2003. Ribosomal ITS sequences and plant phylogenetic inference. 2008. Legume anchor markers link syntenic regions between Phaseolus vulgaris, Molecular Phylogenetics and Evolution 29, 417–434. Lotus japonicus, Medicago truncatula and Arachis.Genetics179,2299–2312. Armisen, D., Lecharny, A., Aubourg, S., 2008. Unique genes in plants: specificities and Hu, J.-M., Lavin, M., Wojciechowski, M.F., Sanderson, M.J., 2000. Phylogenetic systematics conserved features throughout evolution. BMC Evolutionary Biology 8, 280–300. of the tribe Millettieae () based on chloroplast trnK/matK sequences and its Barkley, N.A., Wang, M.L., Gillaspie, A.G., Dean, R.E., Pederson, G.A., Jenkins, T.M., 2008. implications for evolutionary patterns in Papilionoideae. American Journal of Botany Discovering and verifying DNA polymorphisms in a mung bean [V. radiata (L.) 87, 418–430. R. Wilczek] collection by EcoTILLING and sequencing. BMC Research Notes 1, 28. Hughes, C.E., Eastwood, R.J., Donovan Bailey, C., 2006. From famine to feast? Selecting Brumfield, R.T., Liu, L., Lum, D.E., Edwards, S.V., 2008. Comparison of species tree methods nuclear DNA sequence loci for plant species-level phylogeny reconstruction. for reconstructing the phylogeny of bearded Manakins (Aves: Pipridae: Manacus)from Philosophical Transactions of the Royal Society B 361, 211–225. multilocus sequence data. Systematic Biology 57, 719–731. Ilut, D.C., Doyle, J.J., 2012. Selecting nuclear sequences for fine detail molecular phylogenetics Bruneau, A., Forest, F., Herendeen, P.S., Klitgaard, B.B., Lewis, G.P., 2001. Phylogenetic studies in plants: a computational approach and sequence repository. Systematic Botany relationships in the Caesalpinioideae (Leguminosae) as inferred from chloroplast trnL 37, 7–14. intron sequences. Systematic Botany 26, 487–514. Ishikawa, N., Yokoyama, J., Tsukaya, H., 2009. Molecular evidence of reticulate evolu- Bruneau, A., Mercure, M., Lewis, G.P., Herendeen, P.S., 2008. Phylogenetic patterns and tion in the subgenus Plantago (Plantaginaceae). American Journal of Botany 96, diversification in the Caesalpinioid legumes. Botany 86, 697–718. 1627–1635. Carstens, B.C., Knowles, L.L., 2007. Estimating species phylogeny from gene tree probabilities Joly, S., Starr, J.R., Lewis, W.H., Bruneau, A., 2006. Polyploid and hybrid evolution in roses despite incomplete lineage sorting: an example from Melanoplus grasshoppers. east of the Rocky Mountains. American Journal of Botany 93, 412–425. Systematic Biology 56, 400–411. Judo, M.S.B., Wedel, A.B.., Wilson, C., 1998. Stimulation and suppression of PCR-mediated Chapman, M.A., Chang, J., Weisman, D., Kesseli, R.V., Burke, J.M., 2007. Universal markers recombination. Nucleic Acids Research 26, 1819–1825. for comparative zapping and phylogenetic analysis in the Asteraceae (Compositae). Keeling, P.J., 2009. Role of horizontal gene transfer in the evolution of photosynthetic Theoretical and Applied Genetics 115, 747–755. eukaryotes and their plastids. In: Gogarten, M.B., et al. (Ed.), Horizontal Gene Transfer: Choi, H.K., Luckow, M., Doyle, J., Cook, D.R., 2006. Development of nuclear gene-derived Genomes in Flux, vol. 532. Humana Press, New York, pp. 501–515. molecular markers linked to legume genetic maps. Molecular Genetics and Genomics Krak, K., Alvarez, I., Caklova, P., Costa, A., Chrtek, J., Fehrer, J., 2012. Development of novel 276, 56–70. low-copy nuclear markers for Hieraciinae (Asteraceae) and their perspective for other Citerne, H.L., Luo, D., Pennington, R.T., Coen, E., Cronk, Q.C.B., 2003. Aphylogenomic tribes. American Journal of Botany Primer Notes and Protocols in the Plant sciences 99, investigation of CYCLOIDEA-like TCP genes in the Leguminosae. Plant Physiology 131, e74–e77. 1042–1053. Kubatko, L.S., Degnan, J.H., 2007. Inconsistency of phylogenetic estimates from concatenated Creer, S., 2007. Choosing and using introns in molecular phylogenetics. Evolutionary Bio- data under coalescence. Systematic Biology 56, 17–24. informatics 3, 99–108. Lewis, G.P., 2005. Caesalpinieae. In: Lewis, G.P., Schrire, B., Mackinder, B., Lock, M. (Eds.), Cronn, R., Cedroni, M., Haselkorn, T., Grover, C., Wendel, J.F., 2002. PCR-mediated Legumes of the World. Royal Botanic Gardens, Kew, Richmond, UK, pp. 127–161. recombination in amplification products derived from polyploidy cotton. Theoretical Li, M., Wunder, J., Bissoli, G., Scarponi, E., Gazzani, S., Barbaro, E., Saedler, H., Varotto, C., and Applied Genetics 104, 482–489. 2008. Development of COS genes as universally amplifiable markers for phyloge- Davis, J.I., Stevenson, D.W., Petersen, G., Seberg, O., Campbell, L.M., Freudenstein, J.V., netics reconstructions of closely related plant species. Cladistics 24, 727–745. Goldman, D.H., Hardy, C.R., Michelangeli, F.A., Simmons, M.P., Specht, C.D., Vergara- LPWG (Legume Phylogeny Working Group), 2013. Legume phylogeny and classification Silva, F., Gandolfo, M., 2004. A phylogeny of the monocots, as inferred from rbcL in the 21st century: progress, prospects and lessons for other species-rich clades. and atpA sequence variation, and a comparison of methods for calculating jackknife Taxon 62, 217–248. and bootstrap values. Systematic Botany 29, 467–510. Maddison, W.P., 1997. Gene trees in species trees. Systematic Biology 46, 523–536. Douzery, E.J.P., Pridgeon, A.M., Kores, P., Linder, H.P., Kurzweil, H., Chase, M., 1999. Maureira-Butler, I.J., Pfeil, B.E., Muangprom, A., Osborn, T.C., Doyle, J.J., 2008. The reticulate Molecular phylogenetics of Disea (Orchidaceae): a contribution from nuclear ribosomal history of Medicago (Fabaceae). Systematic Biology 57, 466–482. ITS sequences. American Journal of Botany 86, 887–899. McMahon, M.M., Sanderson, M.J., 2006. Phylogenetic supermatrix analysis of GenBank Doyle, J.J., Doyle, J.L., Palmer, J., 1995. Multiple independent losses of two genes and one sequences from 2228 papilionoid legumes. Systematic Biology 55, 818–836. intron from legume chloroplast genomes. Systematic Botany 3, 272–294. Müller, K., 2006. Incorporating information from length-mutational events into phylogenetic Doyle, J.J., Doyle, J.L., Ballenger, J.A., Dickson, E.E., Kajita, T., Ohashi, H., 1997. Aphylogeny analysis. Molecular Phylogenetics and Evolution 38, 667–676. of the chloroplast gene RBCL in the Leguminosae: taxonomic correlations and insight Oxelman, B., Liden, M., Berglund, D., 1997. Chloroplast rps16 intron phylogeny of the tribe into the evolution of nodulation. American Journal of Botany 84, 541–554. Sileneae (Caryophyllaceae). Plant Systematics and Evolution 206, 393–410.

Please cite this article as: Babineau, M., et al., Phylogenetic utility of 19 low copy nuclear genes in closely related genera and species of caesalpinioid legumes, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.018 12 M. Babineau et al. / South African Journal of Botany xxx (2013) xxx–xxx

Page, R.D.M., Charleston, M.A., 1997. From gene to organismal phylogeny: reconciled B.B., Bruneau, A. (Eds.), Advances in Legumes Systematics, Part 10,Higher Level trees and the gene tree/species tree problem. Molecular Phylogenetics and Evolution Systematics. Royal Botanical Garden,Kew, pp. 123–148. 7, 231–240. Small, R.L., Cronn, R.C., Wendel, J.F., 2004. Use of nuclear genes for phylogeny reconstruction Page, R.D.M., 2000. Extracting species trees from complex gene trees: reconciled trees and in plants. Australian Systematic Botany 17, 145–170. vertebrate phylogeny. Molecular Phylogenetics and Evolution 14, 89–106. Soltis, D.E., Soltis, P.S., 1998. Choosing an approach and an appropriate gene for phyloge- Richards,T.A.,Dacks,J.B.,Campbell,S.A.,Blanchard,J.L.,Foster,P.G.,McLeod,R.,Roberts,C.W., netic analysis. In: Soltis, D.E., Soltis, P.S., Doyle, J.J. (Eds.), DNA Sequencing. Kluwer 2006. Evolutionary origin of the eukaryotic Shikimate pathway: gene fusions, horizontal Academic, Dordrecht, pp. 1–42. gene transfer, and endosymbiotic replacements. Eukaryotic Cell 5, 1517–1531. Soltis, D.E., Albert, V.A., Leebens-Mack, J., Bell, C.D., Paterson, A.H., Zheng, C., Sankoff, D., Rieseberg, L.H., Baird, S.J.E., Gardner, K.A., 2000. Hybridization, introgression, and linkage dePamphilis, C.W., Wall, P.K., Soltis, P.S., 2009. Polyploidy and angiosperm diversifica- evolution. Plant Molecular Biology 42, 205–224. tion. American Journal of Botany 96, 336–348. Rokas,A.,Carroll,S.B.,2005.More genes or more taxa? The relative contribution of gene Steele, P.R., Guisinger-Bellian, M., Linder,C.R.,Jansen,R.K.,2008.Phylogenetic utility number and taxon number to phylogenetic accuracy. Molecular Biology and Evolution of 141 low-copy nuclear regions in taxa at different taxonomical levels in two 22, 1337–1344. distantly related families of . Molecular Phylogenetics and Evolution 48, Sanderson, M.J., McMahon, M.M., 2007. Inferring angiosperm phylogeny from EST data 1013–1026. with widespread gene duplication. BMC Evolutionary Biology 7 (Suppl.1), S3. Strand, A.E., Leebens-Mack, J., Milligan, B.G., 1997. Nuclear DNA-based markers for plant Sang, T., 2002. Utility of low-copy nuclear gene sequence in plant phylogenetics. evolutionary biology. Molecular Ecology 6, 113–118. Critical Reviews in Biochemistry and Molecular Biology 37, 121–147. Swofford, D.L., 2002. PAUP* Phylogenetic Analysis Using Parsimony (*and Other Methods), Scherson, R.A., Choi, H.K., Cook, D.R., Sanderson, M.J., 2005. Phylogenetics of New World Fourth edition. Sinauer, Sunderland, Massachusetts, USA. Astralagus: screening of novel nuclear loci for the reconstruction of phylogenies at Taberlet, P., Gielly, L., Pautou, G., Bouvet, J., 1991. Universal primers for amplification low taxonomic levels. Brittonia 57, 354–366. of three non-coding regions of chloroplast DNA. Plant Molecular Biology 17, Shaw, J., Lickey, E.B., Beck, J.T., Farmer, S.B., Liu, W., Miller, J., Siripun, K., Winder, C.T., 1105–1109. Schilling, E.E., Small, R.L., 2005. ThetortoiseandthehareII:relativeutilityof21 Tripp, E.A., Fatimah, S., 2012. Comparative anatomy, morphology, and molecular noncoding chloroplast DNA sequences for phylogenetic analysis. American Journal phylogenetics of the African genus Satanocrater (Acanthaceae). American Journal of of Botany 92, 142–166. Botany 99, 967–982. Simmons, M.P., Donovan Bailey, C., Nixon, K.C., 2000. Phylogeny reconstruction using Wojciechowski, M.F., Lavin, M., Sanderson, M.J., 2004. A phylogeny of legumes duplicate genes. Molecular Biology and Evolution 17, 469–473. (Leguminosae) based on analysis of the plastid matk gene resolves many Simpson, B.B., Larkin, L.L., Weeks, A., 2003. Progress toward resolving the relationships of well-supported subclades within the family. American Journal of Botany 91, the Caesalpinia group (Caesalpinieae: Caesalpinioideae: Leguminosae). In: Klitgaard, 1846–1862.

Please cite this article as: Babineau, M., et al., Phylogenetic utility of 19 low copy nuclear genes in closely related genera and species of caesalpinioid legumes, South African Journal of Botany (2013), http://dx.doi.org/10.1016/j.sajb.2013.06.018