<<

Molecular Ecology Resources (2011) 11, 185–195 doi: 10.1111/j.1755-0998.2010.02893.x

MOLECULAR DIAGNOSTICS AND DNA Molecular identification of roots from a grassland community using size differences in fluorescently labelled PCR amplicons of three cpDNA regions

JOHN M. TAGGART, JAMES F. CAHILL JR, GORDON G. MCNICKLE and JOCELYN C. HALL Department of Biological Sciences, University of , Edmonton, Alberta, T6G 2E9

Abstract Elucidating patterns of root growth is essential for a better understanding of the functioning of -dominated ecosys- tems. To this end, reliable and inexpensive methods are required to determine species compositions of root samples con- taining multiple species. Previous studies use a range of PCR-based approaches, but none have examined a species pool greater than 10 or 30 when evaluating mixed and single species samples, respectively. We present a method that evaluates size differences in fluorescently labelled PCR amplicons (fluorescent fragment length polymorphism) of the trnLintron and the trnT-trnLandtrnL-trnF intergenic spacers. Amplification success of the trnT-trnL spacer was limited, but variation in the trnLintronandthetrnL-trnF spacer was sufficient to distinguish over 80% of the 95 species (97% of the 77 genera) evaluated from a diverse fescue grassland community. Moreover, we identified species known to be present in mixed sam- ples of 4, 8, 12, and 16 species on average 82% of the time. However, this approach is sensitive to detecting species known to be absent (false positives) when using our key of 95 species. Comparing unknowns to a limited species pool ameliorates this problem, comparable to a researcher using prior knowledge of what species could be found in a sample to constrain the identification of species. Comparisons to other methods and future improvements are discussed. This method is efficient, cost- effective and broadly applicable to many ecosystems.

Keywords: community composition, fluorescent fragment length polymorphism, mixed samples, roots, species identifica- tion, trnL-trnF Received 9 April 2010; revision received 7 May 2010; accepted 26 May 2010

history should lead to the conclusion that elucidating Introduction patterns of root growth in natural systems is of primary Individual live both aboveground and below- importance for understanding biodiversity and function- ground, with interactions among roots, soil, fungi, ing of plant-dominated systems. Of course, the tradition microbes, herbivores, shoots, and the atmosphere driving in plant ecology is opposite, and descriptions of plant patterns of plant growth (Wardle 2002). Although con- communities are typically based upon shoot, rather than nected at the soil surface, plants encounter different whole-plant, distributions (e.g., Tilman & Pacala 1993). ecological challenges aboveground and belowground. For The factors that typically limit the inclusion of below- example, most plants capture more than 20 different ground plant distributions in studies of plant ecology resources belowground compared to only two above- have been methodological, rather than conceptual. The ground. Doing so involves unique foraging strategies fine roots of many plant species are visually indistin- (Kembel & Cahill 2005) resulting in highly asymmetric guishable, and thus it has been historically difficult to root systems (e.g., Brisson & Reynolds 1994), and in a root- accurately measure belowground diversity without using ing breadth much greater than the corresponding breadth excavations of entire plots (Brisson & Reynolds 1994), of the canopy aboveground (Schenk & Jackson 2002). using tracer injections (Jackson et al. 1996), or dyeing Even at a more basic level, in most systems plants allocate individual root systems (Holzapfel & Alpert 2003). To more biomass belowground than aboveground (Jackson allow for more detailed understanding of belowground et al. 1996). This combination of observations of natural patterns of plant diversity, plant ecologists need reliable, high throughput, and inexpensive methods to determine Correspondence: James F. Cahill Jr, Fax: +1 780 492 9234; E-mail: species identity and distributions of plant roots in natural [email protected] systems.

2010 Blackwell Publishing Ltd 186 MOLECULAR DIAGNOSTICS AND DNA TAXONOMY

To this end, a range of techniques were developed for distinguish 82 out of 95 species in a diverse, fescue species identification of roots: PCR-RFLP (restriction grassland community. Fluorescently labelled primers are fragment length polymorphism) analysis of rbcL (Bobow- cost-effective, accurate and efficient because they permit ski et al. 1999), the trnL intron (Brunner et al. 2001; Ridg- precise sizing of three regions simultaneously on a way et al. 2003) and the internal transcribed spacer (ITS; capillary sequencer. Moreover, this method avoids the Moore & Field 2005); sequencing of ITS (Jackson et al. need for procedures downstream of PCR, such as restric- 1999; Linder et al. 2000); PCR assay with species-specific tion digests, and generates digital, easily processed primers (McNickle et al. 2008; Mommer et al. 2008); and results. We developed a relational key based on size fluorescent fragment length polymorphisms (FFLP) anal- variation in 95 species, representing a significant percent- ysis of the trnL intron (Frank et al. 2010; Ridgway et al. age of species diversity within the community. We then 2003). These studies highlight the power of using PCR- examined the efficacy of our key in correctly identifying driven strategies for species identification, but the utility species in mixed samples of 4, 8, 12, and 16 species. of these approaches has been limited to low numbers of Benefits and limitations of this approach are discussed species. To date, the highest number of species investi- with emphasis in comparison with other methods. gated was 30 tree species from the Alps using the PCR– RFLP method (Brunner et al. 2001). However, most plant Materials and methods communities have a species pool much larger than 30. As a result, to understand belowground species diversity Focal community patterns in natural systems, methods are needed that readily discriminate among a greater diversity of species. The methods we developed should apply to any focal The standard method for harvesting roots in an eco- community; however, we focus on a specific fescue grass- logical study is to take a series of root cores of a given land in western Canada. The field site was located at the depth and diameter. These cores are then washed free of Kinsella Research Ranch, a 6000-ha research facility run soil, leaving a mass of tangled and morphologically by the University of Alberta in central Alberta. The ranch indistinguishable roots of an unknown number of spe- is located in the Aspen Parkland ecoregion, a savanna cies. Thus, to use molecular methods for species determi- consisting of aspen (Populus tremuloides) stands in the nation following this standard ecological sampling lowlands, and rough fescue (Festuca hallii) grasslands in protocol, the technological hurdle of having DNA from the uplands. We concentrated on the undisturbed fescue multiple species in a single sample (mixed or pooled grassland regions of the ranch. This community has been samples) must be overcome. Existing methods for deter- studied for 10 years by the Cahill lab, and the species pool mining plant species in mixed samples often rely on spe- and basic ecology is well understood (e.g., Coupe et al. cies-specific markers (McNickle et al. 2008; Mommer 2009; Lamb & Cahill 2008). In general, this is a species- et al. 2008). While highly accurate, these approaches are rich assemblage, with typical plots of 20 · 50 cm contain- designed for pot ⁄ mesocosm experiment where species ing 3–12 vascular species (pers. obs.) aboveground. Over pools are small and tightly controlled by the researcher. 100 vascular species were found throughout the fescue To date, such studies have included a maximum of ten regions of the ranch, including many representatives of species (McNickle et al. 2008) but more commonly four the species-rich families , , and Poa- (Mommer et al. 2008; Moore & Field 2005). If species ceae. We believe this system is well suited for develop- information is not already available in public databases ment of a community-level molecular key, and the such as GenBank, then developing species-specific mark- species diversity allows for a good test of this methods ers for communities with large species richness is not fea- ability to discriminate within families and within genera. sible as it would be time-consuming and potentially require significant DNA sequencing. Moreover, the Sampling and DNA extractions expensive and labour-intensive approach of cloning is required for sequencing-based identification of mixed We sampled 215 individuals, comprising 105 species from samples. In order for belowground plant community 29 angiosperm families, collected from the University of ecology to move forward, alternative methods must be Alberta Research Ranch near Kinsella, Alberta, Canada able to discern species within a high species pool as well (535¢N, 11133¢W). Plants were chosen through a series as identify species in mixed samples. of walks (totalling over 11 km) from 2005 to 2006, with Here, we describe a method for species identification the intention of collecting multiple individuals of all of roots that evaluates size differences in fluorescently species that were found. Of these samples, we collected labelled PCR amplicons of the trnL intron and the trnT- size information from 205 individuals and 95 species (27 trnL and trnL-trnF intergenic spacers. Variability in size angiosperm families; Table 1). Specimen vouchers were of PCR amplicons of these regions was sufficient to deposited at the University of Alberta

2010 Blackwell Publishing Ltd MOLECULAR DIAGNOSTICS AND DNA TAXONOMY 187

Herbarium. Some species were rare and, as such, only a Differently coloured fluorescently labelled primers small number of individuals were collected. DNA was were used in PCR for each primer pair (primer A: FAM; extracted from multiple representatives (2–10 individu- primer C: VIC; primer F: NED; Integrated DNA Technol- als) of 49 species (Table 1) to evaluate possible intraspe- ogies) enabling simultaneous visualization of all three cific variation. For the remaining species, DNA was regions. Prior to fragment analysis, the PCR products extracted from only one individual per species. from all three regions were mixed together, diluted to To ensure species identification, total genomic DNA 200· 0.2 lL of the final dilution, and combined with was extracted from tissue of all 215 individuals using 0.25 lL of the size standard and 8 lL of HiDi formalde- Qiagen DNeasy (Qiagen Inc, Mississauga, ON, Canada). hyde. The final dilution was then denatured at 94 C for These genomic DNA samples were analysed to generate 2 min and coldsnapped into single strand form with ice- the molecular identification key. To test the ability of our packs. Pooled products from the PCR reactions were key to identify species from mixed samples, we extracted resolved on a capillary sequencer (ABI 3730 DNA ana- DNA from cleaned roots of sixteen species and made lyzer; Applied Biosystems, Foster City, CA, USA) and mixtures of known species composition of 4, 8, 12, or 16 then sized using GeneMapper (Applied Biosystems) with species. Roots are high in polyphenolic and secondary the GeneScan 1200 LIZ size standards (Applied Biosys- compounds that can interfere with DNA extraction pro- tems). Amplification and analysis was replicated three cess (Linder et al. 2000). Thus, we extracted DNA from times per individual to ensure consistency and accuracy roots of these 16 species using a modified CTAB extrac- of amplicon sizes. tion method (Brunner et al. 2001) for mixed DNA experi- Sizing with the capillary sequencer allowed for high ments. precision in sizing, and fragment sizes were rounded to the nearest bp (Table 1). Peaks per reaction were recorded at a range of relative fluorescent units (RFU; Development of size profiles for all species 300-31274), otherwise known as peak heights, which is a Three chloroplast (cpDNA) non-coding regions were measurement of fragment intensity. Because of the sen- independently amplified with fluorescently labelled uni- sitivity of this method in picking up all amplicons per versal primers to produce size profiles for all species reaction, multiple peaks were observed from single sam- (Taberlet et al. 1991): (i) the trnT-trnL intergenic spacer ples, likely because of primer mismatch during PCR. with primers A (5¢CATTACAAATGCGATGCTCT) and One large peak per region was recovered for the vast B(5¢TCTACCGATTTCGCCATATC), (ii) the trnL intron majority of the species; however, multiple large peaks with primers C (5¢CGAAATCGGTAGACGCTACG) and were observed for a small number (Table 1). Because D(5¢GGGGATAGAGGGACTTGAAC), and (iii) the trnL- our goal was to find unique identifiers per species, not trnF intergenic spacer with primers E (5¢GGTTCAAG ascertain homology of bands, multiple peaks per were TCCCTCTATCCC) and F (5¢ATTTGAACTGGTGACAC- deemed potentially useful for species identification and GAG). PCR reactions with a total volume of 20 lL were were recorded if they consistently showed up in all conducted with Qiagen taq (Qiagen Inc), following the three replicates, had an RFU of greater than 300, and manufacturer’s protocol: 1X Q-solution, 1X Qiagen PCR were present in all individuals of a given species for buffer, 15 mM MgCl2,10mM dNTPs, 20 mM of each pri- which multiple representatives were examined. Given mer, 0.25 U taq polymerase and 2–30 ng of genomic the range of sizes for these non-coding regions across DNA. We used touchdown PCR on an Eppendorf Mas- angiosperms (Shaw et al. 2005), we are unable to ascer- tercycler Pro, gradient thermal cycler (Model 6321; tain by size alone which peak represents the targeted Eppendorf Canada, Mississauga, ON, Canada), for accu- region. racy and specific primer amplification. Different condi- tions were used for each set of primers: (i) trnT-trnL Mixed DNA sample experiments with DNA extracted intergenic spacer, 94 C 5 min, 2 cycles of 94 C45s, from roots 56 C60s,72C 80 s, followed by 33 cycles of 94 C45s, 61.5–0.3 C ⁄ cycle 60 s, 72 C 80 s and a final extension of Once the fragment size key was generated, we tested its 72 C for 30 min; (ii) trnL intron, 94 C 5 min, 2 cycles of efficacy for identifying species in mixed samples of DNA 94 C60s,60C60s,72C 80 s, followed by 33 cycles of extracted from roots of 16 species. Competition for uni- 94 C 60 s, 59.6–0.4 C ⁄ cycle 60 s, 72 C 80 s and a final versal primers will become more intense as the number extension of 72 C for 30 min; (iii) trnL-trnF intergenic of species increased in mixtures. As a result of this com- spacer, 94 C 5 min, 2 cycles of 94 C60s,60C60s, petition, we expected there to be an upper limit to the 72 C 80 s, followed by 33 cycles of 94 C60s, total number of species that could be detected in mixed 63–0.4 C ⁄ cycle 60 s, 72 C 80 s and a final extension of samples. Specifically, we wanted to establish the maxi- 72 C for 30 min. mum number of species that could be correctly identified

2010 Blackwell Publishing Ltd 188 MOLECULAR DIAGNOSTICS AND DNA TAXONOMY

Table 1 Species included in study and their amplicon size fragments for each of the three regions. Species with bolded names were included in mixed DNA experiments

trnT-trnL trnL-trnF intergenic trnL intergenic Family Species (# individuals sampled) spacer intron spacer

Apiaceae Zizia aptera (Gray) Fern (3) 852 572 445 Asteraceae L. (3) 567 491 ⁄ 262 426 neglecta Greene (2) x 522 440 Antennaria parvifolia Nutt. (2) 617 522 440 Artemisia campestris L. (1) x 495 440 Artemisia frigida Willd. (1) 565 495 439 Artemisia ludoviciana Nutt. (3) x 495 440 Cirsium arvense (L.) Scop. (5) x 590 456 Crepis tectorum L. (4) 627-630 507 420 caespitosus Nutt. (2) x 514 452 Erigeron glabellus Nutt. (2) x 453 458 Gaillardia aristata Pursh (2) 576 513 426 Gutierrezia sarothrae (Pursh) Britt. & Rusby (1) 598 504 441 Heterotheca villosa (Pursh) Shinners var. hispida (1) x 496 432 Lygodesmia juncea (Pursh) D. Don ex Hook. (2) 647 512 434 cana (Hook.) W.A. Weber & A. Love (= Senecio canus Hook.) (3) 598 507 429 Solidago rigida L. subsp. humilus (Porter) S.B. Heard & Semple (1) x 500 446 Solidago missouriensis Nutt. (1) x 500 446 Symphyotrichum laeve (L.) A.& D. Lo¨ve var. laeve (1) x 504 432 Taraxacum officinale F.H. Wiggers (1) x 522 440 Tragopogon dubius Scop. (2) 637 520 434 Xanthisma spinulosum (Pursh) D.R. Morgan & R.L. Hartman x 504 432 (= Haplopappus spinulosus (Pursh) DC.) (2) Brassicaceae Arabis divaricarpa A. Nels. (2) 822 575 476 Descurainia pinnata (Walt.) Britt. (1) 852 581 628 Draba nemorosa L. (3) x 393 432 Erysimum inconspicuum (S. Wats.) MacM (5) 770 575 708 Lepidium densiflorum Schrad. (2) 885 590 596 Lesquerella arenosa (Richards.) Rydb. var. arenosa (3) x 599 666 Caprifoliaceae Symphoricarpos occidentalis Hook. (3) 807 587 447 Caryophyllaceae Cerastium arvense L. (2) x 590 456 Chenopodiaceae Axyris amaranthoides L. (2) 854 697 457 Chenopodium album L. (1) 836 589 416 Cyperaceae Carex siccata Dewey (1) 638 694 444 Carex stenophylla Wahl. ssp. eleocharis (Bailey) Hult. (10) 617 695 446 Elaeagnaceae Elaeagnus commutata Bernh. ex Rydb (1) 734 589 437 Fabaceae agrestis Dougl. ex G. Don (5) x 629 185 Astragalus drummondii Dougl. ex Hook. (1) x 606 183 Astragalus flexuosus Dougl. ex G. Don (1) x 606 183 Astragalus laxmanii Jacq. var. rubustior (Hook.) Barneby & Welsh (1) x 622 184 Hedysarum alpinum L. (3) x x 313 Lathyrus ochroleucus Hook. (1) x 511 176 Melilotus officinalis (L.) Lam. (1) x 320 217 Oxytropis campestris (L.) DC. (2) x 612 187 Petalostemon purpureum (Vent.) Rydb. (1) 868 597 197 Psoralea argophylla Pursh (1) 999 579 489 Psoralea esculenta Pursh (1) 985 574 491 Thermopsis rhombifolia (Nutt. ex Pursh) Nutt. ex Richards. (4) x 587 464 Vicia americana Muhl. (5) x 522 179 Iridaceae Sisyrinchium montanum Greene (1) 675 552 308 Lamiaceae Stachys palustris L. ssp. Pilosa (Nutt.) Epling (1) 744 559 375 Linaceae Linum lewisii Pursh. (3) 476 588 400 Liliaceae Lilium philadelphicum L. (1) 815 608 255 Malvaceae Sphaeralcea coccinea (Pursh) Rydb. (2) x 634 465

2010 Blackwell Publishing Ltd MOLECULAR DIAGNOSTICS AND DNA TAXONOMY 189

Table 1 Continued

trnT-trnL trnL-trnF intergenic trnL intergenic Family Species (# individuals sampled) spacer intron spacer

Onagraceae Gaura coccinea Pursh (1) x 589 448 luteus Nutt. (3) 802 560 431 Linaria vulgaris Hill. (2) 607 555 402 gracilis Nutt. (3) 750 568 405 Penstemon procerus Dougl. ex Graham (1) 753 568 405 Poaceae Agropyron cristatum (L.) Gaertn. (1) 666 620 ⁄ 637 426 ⁄ 444 Agrostis scabra Willd. (1) 873 494 421 Anthoxanthum nitens (Weber) Y. Schouten & Veldkamp 866 607 426 (= Hierochloe odorata (L.) Beauv.) (2) Bouteloua gracilis (Kunth) Lag. ex Griffiths (1) 463 626 301 ⁄ 333 Bromus porteri (J.M. Coult.) Nash (1) 680 620 ⁄ 648 443 Calamovilfa longifolia (Hook.) Scribn. (1) 879 644 394 ⁄ 439 Dactylis glomerata L (1) 882 621 433 Elymus glaucus Buckley ssp. glaucus (1) 673 645 394 ⁄ 431 Elymus trachycaulus (Link.) Gould ex Skinners ssp. trachycaulus (1) 675 649 ⁄ 512 394 ⁄ 432 Festuca hallii (Vasey) Piper (1) 881 616 424 Hesperostipa comata (Trin. & Rupr.) Barkworth (1) 182 598 ⁄ 512 164 ⁄ 394 ⁄ 427 ⁄ 435 Hesperostipa curtiseta (Hitchc.) Barkworth (1) 569 598 194 ⁄ 394 ⁄ 425 Hordeum jubatum L. (2) 666 635 394 ⁄ 406 (Ledeb.) Schultes (5) 848 407 418 Muhlenbergia cuspidata (Torr.) Rydb. (3) 462 638 394 ⁄ 470 Pascopyrum smithii (Rydb.) A. Love (1) 675 644 394 ⁄ 430 Poa palustris L. 887 620 444 hoodii Richards. (3) 758 613 443 Polygonaceae Polygonum convolvulus L. (1) 678 645 394 ⁄ 430 Primulaceae Androsace septentrionalis L. (5) 852 574 475 Pulsatilla patens (L.) P. Mill. ssp. multifida (Pritz.) Zamels (2) x 571 507 pedatifidus J.E. Smith ssp. affinis (R.Br.) Hult. (1) x 581 458 Ranunculus rhomboideus Goldie (1) x 581 458 Thalictrum venulosum Trel. (1) 752 610 470 Amelanchier alnifolia Nutt.(2) x 586 484 Fragaria virginiana Duchesne (1) x 490 497 Potentilla anserina L. (3) x 501 478 ⁄ 484 Potentilla arguta Pursh (2) x 498 500 Potentilla bipinnatifida (Pursh) D. Don ex Hook (2) x 601 484 Potentilla concinna Richards. (3) x 599 472 Potentilla hippiana Lehm. (7) x 591 472 Potentilla norvegica L. (1) 823 633 447 Rosa arkansana Porter (1) x x 482 Rubiaceae Galium boreale L. (3) x 607 483 Ruscaceae Maianthemum stellatum (L.) Link (1) x 602 433 Santalaceae Comandra umbellata (L.) Nutt. (4) 657 573 183 Viola adunca J.E.Smith (4) 412 577 443 Viola pedatifida G. Don (1) 411 547 437 in mixed DNA samples of known species composition of template DNA in three separate PCR reactions, one 4, 8, 12, and 16 species (Table 1). These species numbers for each set of primers following procedures described per sample are consistent with a recent study that found earlier. The PCR product from all three reactions was an average of 2.1–5.7 species per root core (Frank et al. then pooled for fragment analysis. As with development 2010). DNA was extracted from roots of each species of the size key, amplification and size analysis was individually, quantified, and then diluted to 20 ng ⁄ lL. replicated three times to ensure accuracy and consis- Five random combinations of individual extractions were tency. Because we were unable to reliably amplify the determined for 4, 8, 12, and 16 species. Extractions were trnT-trnL intergenic spacer, this region was excluded combined and 1 lL of this mixture was then used as from analyses.

2010 Blackwell Publishing Ltd 190 MOLECULAR DIAGNOSTICS AND DNA TAXONOMY

We determined the number of species identified by Table 2 The steps involved in estimating belowground plant matching peaks profiles from mixed samples with the community composition using our identification key. Four PCR amplicon sizes using a relational key (Lucid Builder different identification rules are presented v3.3. 2009; Centre for Biological Information Technology, 1. Collect root core. Record species present in a quadrat centered Brisbane, Australia). The key was constructed by treating on root core location. Note: Size of root core and quadrat will each region as a ‘feature’ in the key, and the peak sizes depend on species–area relationships in the system and for each region as a categorical ‘state’ of a given feature should be determined empirically ) (Table 1). To account for potential error in sizing by the 2. Wash soil from roots. Dry in silica gel, or freeze at 80 C for long-term storage capillary sequencer, we entered each fragment length 3. Grind root core to fine powder. Extract DNA from tissue using with a range of the reported value ± 1 bp. We recorded a modified CTAB method (Brunner et al. 2001) feature as ‘uncertain’ for a species that had no observed 4. Amplify each target region separately using PCR peak size in Table 1. Once the amplicon sizes were deter- 5. Mix together all PCR products, and size using a capillary mined from a mixed sample, we selected the correspond- sequencer ing values using Lucid Player (Lucid Builder v3.3. 2009; 6. Analyze and record peak diversity from root core sample (25 Centre for Biological Information Technology). A smaller relative fluorescent units cutoff) 7. Apply lucid key to peak diversity data to estimate species cut-off for recording peaks (25 RFU) was used because diversity. This can be carried out in four different ways pooled samples are known to have problems with PCR 7.a. Liberal, unconstrained. Only one known species peak must bias and interactions between different sets of DNA (Avis be detected to assign species presence. The species pool is et al. 2010; Dickie & FitzJohn 2007). Because of the com- the entire community petitive nature of PCR, we did not expect every species to 7.b. Liberal constrained. Only one known species peak must be be amplified in every PCR replicate. To account for this, detected to assign species presence. The species pool is all peaks were counted regardless of how many repli- limited to only those species found near the root core 7.c. Conservative unconstrained. All known species peaks must cates in which they were observed (e.g., if a peak was be detected to assign species presence. The species pool is observed in one of three replicates, it was recorded). the entire community A number of different criteria can be used to indicate 7.d. Conservative, constrained. All known species peaks must be the potential presence of a species (Table 2). To explore detected to assign species presence. The species pool is the consequences of different assumptions, we used the limited to only those species found near the root core two most extreme: (i) liberal, which identified a species as potentially present in the sample if a peak of the ‘right size’ (Table 1) was found for at least one of the two gene ceae), Geum aleppicum Jacq. (Rosaceae), Geum triflorum regions, and (ii) conservative, which identified a species Pursh (Rosaceae), pauciflorus Nutt. subsp. sub- only if appropriate peak sizes were present for both gene rhomboideus (Rydb.) O. Spring & E.E. Schilling (Astera- regions. We further explored the accuracy of our meth- ceae), Heuchera richardsonii R. Br. (Saxifragaceae), ods for species detection by using two different ‘species Lithospermum incisum (Boraginaceae), Monarda fistulosa L. pools’. In the first, we assumed no prior knowledge about (Lamiaceae), Mulgedium pulchellum (= Lactuca tatarica; which species might be found in a mixed sample, and Asteraceae), and Senecio eremophilus Richardson (Astera- thus recorded any species found to be potentially present ceae). Though troubleshooting would probably have within the key of 95 species. In the second, we assumed allowed for amplification from these species, the goals of that the researcher would have more specific information this project were not dependent upon using any particu- about the samples, such as the list of species found lar number of species, and as such these species were not aboveground in the vicinity of the roots samples considered further. We thus use 95 as the size of our (Table 2). Here, that corresponded to only recording the ‘species pool’. presence ⁄ absence of the species known to be present Amplicon sizes were recovered from 95 (100%), 94 within the samples (16 species). (98%), and 55 (58%) of the 95 species sampled for the trnL-trnF intergenic spacer, trnL intron, and trnT-trnL intergenic spacer, respectively (Table 1). In general, sizes Results of PCR amplicons varied among almost all 95 taxa (Table 1). Identical sizes of both the trnL intron and trnL- Single species: PCR amplicon size variation and trnF intergenic spacer were obtained from six sets of identification key congenerics and ⁄ or confamilials (Table 1): (i) Antennaria We were not able to amplify all three regions with equal neglecta, A. parvifolia and Taraxacum officinale (Asteraceae), consistency. We were unable to get amplification of any (ii) Artemisia campestris and A. ludoviciana (Asteraceae; target regions for ten species: Anemone canadensis L. (Ran- note A. frigida is only 1 bp different for the trnL-trnF unculaceae), Cirsium undulatum (Nutt.) Spreng. (Astera- intergenic spacer); (iii) Solidago missouriensis and S. rigida

2010 Blackwell Publishing Ltd MOLECULAR DIAGNOSTICS AND DNA TAXONOMY 191

(Asteraceae), (iv) Astragalus drummondii and A. flexuosus result appears to be because of the finding that a single (Fabaceae), (v) Ranunculus pedatifidus and R. rhomboideus amplicon from only one region does not allow us to dis- (Ranunculaceae), and (vi) Penstemon gracilis and P. procerus tinguish between every species (e.g., many species have (Plantaginaceae). Amplicon sizes are not available for the same length amplicon for the trnL intron; Table 1). the trnT-trnL intergenic spacer for these taxa with one The conservative method greatly reduces the number of exception. The trnT-trnL intergenic spacer has a different false positives (Fig. 1b). size profile for Penstemon species demonstrating potential Although the conservative method reduced false- value of this region if amplification is successful (but see positive rates, they remained higher than would be mixed samples below). In contrast, different sizes were desired for a typical study in plant community ecology obtained for the trnL intron and trnL-trnF intergenic (‘measured diversity’ would be 25–50% greater than spacer in congenerics of Carex (Cyperaceae), Elymus known diversity because of false positives). However, (Poaceae), Erigeron (Asteraceae), Heterostipa (Poaceae), false positives were essentially eliminated when we used Potentilla (Rosaceae), Psoralea (Fabaceae), and Viola a restricted species pool (16 species), combined with the (Violaceae). The remaining 65 genera are represented by conservative method of species determination (Fig. 1c). only one species in this community. Multiple peaks were When only species that existed within the 16 species covered for four and twelve species of trnL intron ‘pool’ were scored as present ⁄ absent, there were never and trnL-trnF intergenic spacer, respectively (Table 1). any false positives when known diversity was four, and a One species, Potentilla anserina, displayed infraspecific single false positive when diversity was eight (we did not variation in the size of the trnL-trnF intergenic spacer. analyse when diversity = 12, as we felt that was biased towards not finding false positives, as only four species in the species pool were not put in each mix). Thus, if the Mixed samples researcher has prior knowledge about what species could False negatives – inability to detect species known to be be found (e.g., lists above local aboveground diversity; present. Across all 95 species and all samples, we were seeds sown into a planted mesocosm study), the conser- able to detect the presence of a species known to be pres- vative method of species determination is highly success- ent in the sample 82% of the time (an 18% false-negative ful in producing both low rates of false positives and rate). However, false negatives were primarily limited to false negatives. the grasses, with Festuca hallii never detected, Agropyron cristatum and Poa palustris each detected 50% of the time, Discussion and Hordeum jubatum with only 67% frequency. We sub- sequently refer to these monocots as ‘difficult’ species, a This study provides a novel method for species determi- term molecular biologists share with field biologists who nation from mixed root samples, with many potential routinely struggle with morphological identification of applications in plant ecology. The specific assumptions these taxa. one makes about whether a species is ‘found’, and the Using the liberal identification criteria (a known peak size of the species pool, will strongly influence false- size was found for at least one of the fragments), false- positive and false-negative rates. The current approach negatives rates were 6%, 9%, 19%, and 27% at known was particularly effective for , with mixed results diversities of 4, 8, 12, and 16 species. These rates were among monocots included in this study. higher for the conservative method (requiring same size Amplicon size differences from the trnL intron and peaks in both gene regions) (Fig. 1a). The general the trnT-trnL and trnL-trnF intergenic spacers differenti- increase in false negatives appears to be because of the ate 85% of the 95 species examined (Table 1). Much of probability of having more of the ‘difficult’ species in remaining ambiguity occurred among congenerics, samples of higher species diversity, as false-negative where this method fails to distinguish among almost 50% rates for the remaining species were uniformly low of genera with more than one species sampled (Table 1). across all diversity treatments. In other words, if a species Thus, size variation in these three regions alone is not could be regularly detected when diversity = 4, it was sufficient to determine the identity of all species in our equally likely to be detected when diversity = 16. community. However, differences in amplicon sizes resolve identification at the generic level 97% of the time. False positives – detection of species known to be absent. Although direct comparisons with other studies are diffi- When using the full species list (95 species), both the lib- cult since we examined minimally three times as many eral and conservative species selection criteria result in species, finding differences among congenerics appears frequent false positives (Fig. 1b). The liberal method to be a general limitation with molecular approaches for gives a much higher number of false positives than the identifying plant roots. For example, comparison of ITS conservative method, even at low actual diversity. This variation from 24 species in 12 families reveal that

2010 Blackwell Publishing Ltd 192 MOLECULAR DIAGNOSTICS AND DNA TAXONOMY

(a) Fig. 1 Results from mixed sample tests of species identifica- tion, presented as box plots. In a box plot, the ‘box’ represents the range from 1st to 3rd quartile, the horizontal line within the box is the median, and the upper and lower lines represent the maximum and minimum values. a) Box plots for detection of species present in mixed samples using two approaches liberal (white), which identifies species based on size comparison of either fragment (trnL intron or trnL-trnF intergenic spacer), and conservative (black), which counts a species only sizes that matched with both regions. b) Box plots for the number of ‘extra’ species (out of 95) detected in a sample using the trnL intron (white), the trnL-trnF intergenic spacer (black), or both regions (grey). c) Box plots for the number of ‘extra’ species detected in trnL intron (white), trnL-trnF intergenic spacer (black), or both regions (grey) when amplicon sizes were compared to a restricted species pool including only 16 focal species.

sequences were identical within Ilex and Quercus (Linder et al. 2000). Also, PCR-RFLP analysis of the trnL intron (b) was unable to discern species of Populus, Quercus, and Salix when studying 30 species of both gymnosperms (two families) and angiosperms (eight families; Brunner et al. 2001). Using FFLP of the trnL intron alone, Ridgway et al. (2003) differentiated seven of 14 species of Poaceae. PCR–RFLP analysis of rbcL was most effective at discrim- inating at the generic level when comparing 15 species from one gymnosperm and nine angiosperm families (Bobowski et al. 1999). In sum, no existing method is able to discern 100% of the plant species examined, but the approach presented here is able to identify the vast majority of species from the largest plant species pool studied to date. Our method is reliant on the successful amplification of three regions, although comparison of two of the three regions (trnL intron and the trnL-trnF intergenic spacer) may be sufficient for many plant species (Table 1). The (c) approach presented here is similar to ARISA (automated rRNA intergenic spacer analysis; Fisher & Triplett 1999), which examines size variation of a single intergenic spacer to assess diversity of microbial communities. An advantage of our method over microbial applications of ARISA is knowledge of our potential species pool and species identification rather than simple bulk richness. Because we are determining size profiles for multiple regions for the vast majority species in a community, which is not currently feasible for bacterial studies, we can distinguish among species that share an amplicon length for a single region. Thus, an important caveat to our method is that it relies on having a comprehensive database of all species within a community. Such a com- prehensive list is not generally available for microbe communities. However, without size profiles of all spe- cies, diversity may be missed because some species share size profiles. We advocate that size profiles be generated for each focal community, even where there is species

2010 Blackwell Publishing Ltd MOLECULAR DIAGNOSTICS AND DNA TAXONOMY 193 overlap with taxa presented in this study. For those spe- of false positives. However, using fluorescently primers cies that are found in other communities, there is no cer- easily permits the addition of more regions that could be tainty that similar lengths will also be found. Drift alone analysed simultaneously. That is, we examined three would be expected to result in differing band lengths regions, but a fourth region using an additional colour among separated populations, and could also result in could be included with little additional cost and effort. additional ir fewer bands. That is, the potential for intra- Furthermore, the difficulties amplifying the trnT-trnL specific differences still needs additional investigation. intergenic spacer, a region that has some problems with Furthermore, the usefulness of given bands for a species amplification because of primer A (see Shaw et al. 2005), is dependent upon what other species are also in that suggest additional markers are needed for even this pool community, and their band length distributions. of species. Recent investigations (Shaw et al. 2005, 2007) Despite the restriction of requiring amplification of at compared 34 potentially variable cpDNA regions across least two regions, this method is accurate, efficient, and angiosperms. Although directed towards finding non- cost-effective. First, there is the benefit of avoiding down- coding regions informative for phylogenetic studies, stream reactions required after PCR. Both PCR–RFLP Shaw et al. (2005, 2007) provide essential information for and sequence-based methods entail additional reactions finding additional markers amenable to this method after initial amplification. Further, restriction fragment including: (i) universal primers tested across a range of analysis requires changes in restriction sites among spe- angiosperms, (ii) lists of species for which amplification cies (Linder et al. 2000). Determination of appropriate was not successful, (iii) length range across examined cutters may necessitate sequence knowledge of the species, and (iv) information on structural rearrange- species pool, which is unrealistic for large numbers of ments and composition (e.g., A-T rich). Thus, there are species. Second, use of fluorescent primers allows for numerous readily available markers potentially suited precise measurement of fragment sizes using a capillary for the approach presented here. machine, but PCR-RFLP can also take advantage of the Like all methods examining roots, we are limited by capillary machine (Dickie & FitzJohn 2007). Using differ- the quality of DNA template. DNA extraction from ently labelled primers also permits combination of multi- below ground tissue is difficult and less likely to result in ple PCR reactions for data analysis, which increases successful amplification than aboveground tissue (Bob- throughput and decreases cost. Finally, this approach is owski et al. 1999), mostly because of varying chemical broadly applicable because it utilizes universal primers compositions of large species pools. Different roots will that have been shown to amplify across many plant produce variable amounts of DNA independent of initial groups (Taberlet et al. 1991), an advantage shared with biomass (Mommer et al. 2008). Also, plants collected later other approaches (Brunner et al. 2001; Frank et al. 2010; in the season may be more difficult to extract because of Linder et al. 2000; Ridgway et al. 2003). an increase in polyphenolic compounds accumulated The method described here picks up false positives throughout the season (Linder et al. 2000). Moreover, root because of finding unexpected amplicon sizes in mixed extraction protocols have been optimized on eudicots samples. Previous methods analysing mixed DNA sam- (Brunner et al. 2001; Linder et al. 2000). Although mono- ples from roots examined pools of either four (Mommer cots were successfully amplified in single-species sam- et al. 2008; Moore & Field 2005) or ten (McNickle et al. ples, they were less frequently detected in the mixed 2008) species were 100% effective at identifying all spe- species samples. One explanation is quality of DNA cies in a mixed sample. Two of these approaches relied extraction relative to other compounds. Thus, additional on species-specific primers (McNickle et al. 2008; Mom- optimization of extraction methods focusing on monocots mer et al. 2008), while the last used PCR–RFLP (Moore & may be necessary. Field 2005). Because these studies only examine a limited Our method shares difficulties inherent with examin- species pool, these approaches were not designed for ing mixed DNA samples of potentially rich species pools field-based research, but rather mecocosm or quantifica- using either ARISA or PCR-RFLP (Avis et al. 2010, 2006; tion experiments. It would be prohibitively time-consum- Fisher & Triplett 1999; Popa et al. 2009). These ing and costly to develop species-specific primers for 90+ approaches have been mostly used to examine microbial species and then have to conduct 90+ PCR reactions per or fungal communities, which have issues specific to sample in order to estimate species diversity (one per those taxa (Avis et al. 2006). Both PCR–RFLP and ARISA species-specific primer). analyses pick up unexpected multiple peaks and are challenged by discerning noise from signal within electr- opherograms (Avis et al. 2006). Our single species results Future improvements reveal multiple peaks for a number of species (Table 1). Our method was generally ineffective at discriminating Because these additional peaks are reproducible in our among congenerics, thus greatly enhancing the frequency laboratory, we believe they are real, not artifacts, and

2010 Blackwell Publishing Ltd 194 MOLECULAR DIAGNOSTICS AND DNA TAXONOMY should be included in our key. In fact, removing these et al. 2009), permit the automatic analysis of many sam- additional peaks from consideration reduces the number ples (see also Kent et al. 2003). of identifiers per species and, thus, the efficacy of dis- cerning taxa in the species pool. However, the presence Conclusion of additional peaks further highlights the necessity for a size key to be developed for every focal community using Size differences in three cpDNA regions are able to dis- this method. We determined a cut-off of 25 RFU for dis- cern amongst the majority of species in a diverse commu- tinguishing between signal and noise in analysing results nity. Moreover, this efficient and high throughput from mixed samples, which is consistent with previous approach can identify over 80% of the species in mixed studies that acknowledge this phenomenon (Avis et al. samples. This technique is a significant improvement over 2010; Popa et al. 2009). Most PCR protocols are designed other studies examining mixed DNA samples in roots in for a single template and it is clear that PCR bias may that it is amenable to examining larger species pools. influence results making some peaks much smaller than others (Acinas et al. 2005; Avis et al. 2010). Some authors Acknowledgements advocate adjusting number of PCR cycles (Avis et al. 2010), whereas we replicated PCR reactions. This work was funded by Natural Sciences and Engineering Of all the difficulties reported, the most significant for Research Council of Canada (NSERC) Discovery grants to JCH field biologists is the high rate of false positives when and JFC, and an NSERC Accelerator Supplement to JFC. We thank C. Davis and the Molecular Biology Service Unit, Depart- using the global species pool. This would be greatly ment of Biological Sciences, University of Alberta for technical reduced though better discrimination among congener- advice with fragment analysis, V. Bowles for assistance in using ics, perhaps through the use of additional markers. GeneMapper, D. Fabijan for help with species identifications, Another solution is to compare unknown samples to a and J. Bennett for useful conversations throughout this project. relatively small species pool (Fig. 1c). If a researcher is able to constrain the potential diversity that could be found in the soil, perhaps by including only those species References they find aboveground in the area, this method would Acinas SG, Sarma-Rupavtarm R, Klepac-Ceraj V, Polz MF (2005) PCR- work effectively. We suggest that researchers constrain induced sequence artifacts and bias: insights from comparison of two the potential species pool by collecting information about 16S rRNA clone libraries constructed from the same sample. Applied the species present in the aboveground community in a and Environmental Microbiology, 71, 8966–8969. quadrat surrounding their root core. Furthermore, an Avis PG, Dickie IA, Mueller GM (2006) A ‘dirty’ business: testing the limi- tations of terminal restriction fragment length polymorphism (TRFLP) ecologist is unlikely to take a core and find one of 100 analysis of soil fungi. Molecular Ecology, 15, 873–882. species, as <20 probably occur in any given local area Avis PG, Branco S, Tang Y, Mueller GM (2010) Pooled samples bias fun- (2 · 2 m or so). As an extension, smaller cores could be gal community descriptions. Molecular Ecology Resources, 10, 135–141. Bobowski BR, Hole D, Wolf PG, Bryant L (1999) Identification of roots of used, as expected diversity (and false-positive rates) will woody species using polymerase chain reaction (PCR) and restriction be low. Subsampling and repeated analyses within single fragment length polymorphism (RFLP) analysis. Molecular Ecology, 8, root cores could be an effective check on whether detec- 485–491. tions are limited by species diversity. Thus, a constrained Brisson J, Reynolds JF (1994) The effect of neighbors on root distribution in a Creosotebush (Larrea tridentata) population. Ecology, 75, 1693–1702. list is a likely solution, though it will require accurate Brunner I, Brodbeck S, Buchler U, Sperisen C (2001) Molecular identifica- species lists aboveground and will miss cryptic species. tion of fine roots of trees from the Alps: reliable and fast DNA extrac- The size of the root core, and the area which must be tion and PCR-RFLP analyses of plastid DNA. Molecular Ecology, 10, sampled aboveground for the constrained method, will 2079–2087. Coupe MD, Stacey JN, Cahill JF (2009) Limited effects of above- and depend highly on the system in question. For example, belowground insects on community structure and function in a spe- grasslands differ from forests in species–area relation- cies-rich grassland. Journal of Vegetation Science, 20, 121–129. ships. Thus, the researcher will need to use prior knowl- Culman SW, Bukowski R, Gauch HG, Cadillo-Quiroz H, Buckley DH edge of such relationships when developing a field (2009) T-REX: software for the processing and analysis of T-RFLP data. BMC Bioinformatics, 10, 10. sampling protocol for aboveground and belowground Dickie IA, FitzJohn RG (2007) Using terminal restriction fragment length community samples. Even with these limits, this is a polymorphism (T-RFLP) to identify mycorrhizal fungi: a methods great improvement over currently available methods. review. Mycorrhiza, 17, 259–270. Here, we used a relational key to compare results from Fisher MM, Triplett EW (1999) Automated approach for ribosomal inter- genic spacer analysis of microbial diversity and its application to fresh- mixed samples to our known amplicon sizes per species. water bacterial communities. Applied and Environmental Microbiology, However, the data generated using this method is also 65, 4630–4636. amenable to analyses using other types of databases. For FitzJohn RG, Dickie IA (2007) TRAMPR: an R package for analysis and matching of terminal-restriction fragment length polymorphism example, programs designed for PCR-RFLP data, such as (TRFLP) profiles. Molecular Ecology Notes, 7, 583–587. TRAMPR (FitzJohn & Dickie 2007) or T-REX (Culman

2010 Blackwell Publishing Ltd MOLECULAR DIAGNOSTICS AND DNA TAXONOMY 195

Frank DA, Pontes AW, Maine EM et al. (2010) Grassland root communi- Moore LA, Field CB (2005) A technique for identifying the roots of differ- ties: species distributions and how they are linked to aboveground ent species in mixed samples using nuclear ribosomal DNA. Journal of abundance. Ecology, in press. Vegetation Science, 16, 131–134. Holzapfel C, Alpert P (2003) Root cooperation in a clonal plant: connected Popa R, Mashall MJ, Nguyen H, Tebo BM, Brauer S (2009) Limitations strawberries segregate roots. Oecologia, 134, 72–77. and benefits of ARISA intra-genomic diversity fingerprinting. Journal of Jackson RB, Canadell J, Ehleringer JR et al. (1996) A global analysis of root Microbiological Methods, 78, 111–118. distributions for terrestrial biomes. Oecologia, 108, 389–411. Ridgway KP, Duck JM, Young JPW (2003) Identification of roots from Jackson RB, Moore LA, Hoffmann WA, Pockman WT, Linder CR (1999) grass swards using PCR-RFLP and FFLP of the plastid trnL (UAA) Ecosystem rooting depth determined with caves and DNA. Proceedings intron. BMC Ecology, 3,8. of the National Academy of Sciences of the of America, 96, Schenk HJ, Jackson RB (2002) Rooting depths, lateral root spreads and 11387–11392. below-ground ⁄ above-ground allometries of plants in water-limited Kembel SW, Cahill JF (2005) Plant phenotypic plasticity belowground: a ecosystems. Journal of Ecology, 90, 480–494. phylogenetic perspective on root foraging trade-offs. American Natural- Shaw J, Lickey EB, Beck JT et al. (2005) The tortoise and the hare II: rela- ist, 166, 216–230. tive utility of 21 noncoding chloroplast DNA sequences for phyloge- Kent AD, Smith DJ, Benson BJ, Triplett EW (2003) Web-based phyloge- netic analysis. American Journal of Botany, 92, 142–166. netic assignment tool for analysis of terminal restriction fragment Shaw J, Lickey EB, Schilling EE, Small RL (2007) Comparison of whole length polymorphism profiles of microbial communities. Applied and chloroplast genome sequences to choose noncoding regions for phylo- Environmental Microbiology, 69, 6768–6776. genetic studies in angiosperms: the tortoise and the hare III. American Lamb EG, Cahill JF (2008) When competition does not matter: grassland Journal of Botany, 94, 275–288. diversity and community composition. American Naturalist, 171, 777– Taberlet P, Gielly L, Pautou G, Bouvet J (1991) Universal primers for 787. amplification of three non-coding regions of chloroplast DNA. Plant Linder CR, Moore LA, Jackson RB (2000) A universal molecular method Molecular Biology, 17, 1105–1109. for identifying underground plant parts to species. Molecular Ecology, Tilman D, Pacala S (1993) The maintenance of species richness in plant 9, 1549–1559. communities. In: Species Diversity in Ecological Communities (eds Rick- McNickle GG, Cahill JF, Deyholos MK (2008) A PCR-based method for lefs RE & Schluter D), pp. 13–25. University of Chicago Press, Chicago, the identification of the roots of 10 co-occurring grassland species in IL, USA. mesocosm experiments. Botany, 86, 485–490. Wardle DA (2002) Communities and ecosystems: linking the aboveground and Mommer L, Wagemaker CAM, De Kroon H, Ouborg NJ (2008) Unravel- belowground components. Princeton University Press, Princeton, NJ, ling below-ground plant distributions: a real-time polymerase chain USA. reaction method for quantifying species proportions in mixed root samples. Molecular Ecology Resources, 8, 947–953.

2010 Blackwell Publishing Ltd