A Test of Sequence-Matching Algorithms for a DNA Barcode Database of Invasive Grasses
Total Page:16
File Type:pdf, Size:1020Kb
DNA BARCODES Research Article • DOI: 10.2478/dna-2012-0002 • DNA • 2012 • 19–26 A test of sequence-matching algorithms for a DNA barcode database of invasive grasses Syme, A.E.*, Abstract Udovicic, F., Stipoid grasses (Poaceae, tribe Stipeae) include many species that are Stajsic, V., highly invasive. In Australia, several species are problematic environmental Murphy, D.J. and economic weeds, degrading pastures, injuring livestock, and invading native grasslands. An accurate means of identification is the first line of defense against importation and establishment of potentially invasive National Herbarium of Victoria, Royal Botanic Gardens Melbourne, stipoid grasses. Traditional morphological identification relies on floret Birdwood Avenue, South Yarra, characters, and because these characters are rarely available in juvenile Victoria, 3141, Australia or fragmentary material, DNA barcodes provide an alternative and rapid means of identification. Although barcodes themselves are tested to ensure appropriate discriminatory variation for identifying query sequences, there are few studies that report the testing of sequence matching algorithms. This limits the utility of sequence databases for DNA barcoding purposes. Therefore, in this study, we tested the efficacy of three sequence matching algorithms for stipoid grasses to determine the method and barcode that worked best. Using several sequence matching algorithms - BLAST, Neighbour Joining and Bayesian Likelihood - we assessed the success of identifying an “unknown” query sequence against a reference database of 206 specimens. The highest accuracy was achieved using the ITS (internal transcribed spacers) barcode region and the BLAST algorithm. The poorest performing barcode and analysis were rbcL and the Bayesian Likelihood analysis. However, the BLAST method was only slightly more successful than Neighbour Joining. Increasing the number of query sequences would further indicate whether this trend is significant for stipoid grasses. Keywords Stipoid grasses • Invasive • Weed • Stipeae • Barcoding • BLAST • DNA barcodes Received 13 August 2012 Accepted 22 November 2012 © Versita Sp. z o.o. Introduction part due to the scarcity of taxonomic expertise, or when only vegetative or incomplete specimen material is available. The Stipoid grasses (Spear-grasses; Poaceae, tribe Stipeae) use of DNA sequences, or “barcodes” [9], is an alternative comprise almost 600 species [1]. Several mainly South American way to accurately identify specimens and this method has species have become naturalized and highly invasive in Australia been successfully applied to identify invasive species in many and in many other parts of the world. For example, Serrated taxonomic groups [10]. This approach is now being applied to Tussock, Nassella trichotoma (Nees) Hack. ex Arechav., is one stipoid grass weeds. of the 32 Weeds of National Significance in Australia (WONS) After a recent incursion of a potentially devastating new [2,3]. These weeds are regarded as potentially the worst weeds stipoid grass weed, Mexican Feather-grass (N. tenuissima), in Australia because of their invasiveness, capacity to spread, a large project was instigated at the Royal Botanic Gardens and economic and environmental impacts. Nassella trichotoma Melbourne (RBG) for molecular identification of stipoid grasses. is also a weed in New Zealand [4], and is declared a federal This required the construction of a database of DNA barcodes noxious weed in the USA [5]. Chilean Needle-grass, N. neesiana from several genetic regions. In comparison to other DNA-based (Trin. & Rupr.) Barkworth, is another significant weed in Australia methods, DNA barcoding is commonly regarded as the best way [6], and in New Zealand [7]. Climate modeling predicts that it will to identify unknown specimens (reviewed by [11]). The advent also become a significant problem in Europe [8]. of the Barcoding of Life Database (BOLD) [12] is a practical Because of the devastating potential of these and other development further aiding the utility of DNA barcoding, in exotic stipoid grasses, it has become increasingly important particular by supporting the sequence databases with links to to rapidly and accurately identify suspect plant material, vouchered specimens, and by the provision of analytical tools for particularly for quarantine purposes at border entry points. identification of samples – effectively creating a one-stop shop Identifications based on morphology alone can be difficult, in for DNA barcoding. * E-mail: [email protected] Unauthenticated19 | 203.55.14.1 Download Date | 1/29/13 3:09 AM A.E. Syme et al. Although various regions have been advocated as DNA Different selections of these algorithms have been tested barcodes for particular taxonomic groups, for land plants a with real and simulated datasets. Little and Stevenson [25] combination of two plastid markers has been recommended: found that BLAST performed better (in terms of precision and maturase K (matK) and ribulose-1,5-bisphosphate carboxylase/ accuracy) than parsimony and NJ methods. In contrast, Ross oxygenase (rbcL) [13]. These two barcodes are usually broadly et al. [26] showed similar success from BLAST, distance and useful across plant groups, but additional markers may need distance-based NJ methods, as did Austerlitz et al. [15], finding to be added for differentiating taxa in some genera [14]. that even under a variety of simulations, success was similar for The addition of markers independently-inherited to the two NJ, ML and statistical classification methods. Under changing chloroplast barcodes, such as nuclear genes, can improve the parameters, the most consistently performing method was the success of matching sequences to a barcode database [15]. In distance-based 1-nearest-neighbour method, despite its relative this study, we used nuclear internal transcribed spacers (ITS) simplicity. Munch et al. [27] reported that BL outperformed sequences, in addition to rbcL and matK barcodes. BLAST, but despite this, their results show that correct species- Although it is routine to test the relative success of DNA level assignments were made for only 51% of Liliopsida barcode regions, particularly in the construction of a new sequences, which may not be an acceptable result in practical taxonomic database, application of sequence-matching application. They also found that BL was equivalent to a faster algorithms are considerably less well characterized. Such NJ plus bootstrapping approach [28]. algorithms include measures of basic sequence similarity, From all these studies, however, no general consensus phylogenetic reconstruction (both population- and species- emerges regarding the best method to use for sequence matching level), and statistical classification methods. One of the most - one that would accurately and precisely match a barcode commonly used programs is the Basic Local Alignment Search sequence to the correct species name. This is unsurprising Tool (BLAST), an heuristic method that approximates the Smith- given the variability in taxa, barcodes, and possible algorithm Waterman algorithm for local sequence alignment [16]. The parameters, but does suggest that DNA barcode datasets algorithm matches the query sequence to local alignments of should be individually examined for different taxonomic groups particular size, followed by extension (to a set threshold) and in this context. Therefore, we tested the sequence-matching an evaluation of significance. The BLAST program can be used algorithms for DNA barcodes of stipoid grasses, determining the on the web to search genetic databases, such as GenBank most suitable barcode and method for this group and maximizing (http://BLAST.ncbi.nlm.nih.gov), and can be downloaded to local the utility of the barcode database resource. desktop computers. Sequence matching can also be performed in a phylogenetic Methods framework, where the nearest sister match to an individual or clade identifies the query sequence. Neighbour joining (NJ) uses Construction of DNA barcoding database an algorithm of minimum evolution to assess the best branching To create a barcode reference database, we obtained DNA pattern between taxa based on their genetic distances [17]. sequences (barcodes) for native Australian and exotic stipoid Genetic distance is inferred from observed distances by using grasses. The primary biosecurity aim was to be able to a model of nucleotide evolution that includes the relative rate of distinguish exotic stipoid species from Australian native stipoid change between bases and corrects for multiple hits at one site. grasses. Consequently, the majority of species included were Maximum Likelihood (ML) and Bayesian Likelihood (BL) methods from the native genus Austrostipa, with additional species added optimise the phylogenetic topology under various models of from the genus containing serious weeds - Nassella, and also nucleotide evolution [18]. Both NJ and ML/BL methods require other stipoid genera: Amelichloa, Anatherostipa, Celtica, Jarava, that the reference and query sequences are aligned first. Piptatherum, Piptochaetium, and Stipa. All sequences were A range of other sequence matching algorithms have been matched to expertly identified plant material (Supplementary developed, but at present are less-commonly used. A character- Table 1). based approach examines nucleotide substitutions at each site in Following preliminary testing of various loci, the barcoding the sequence alignment [19]. Oligonucleotide frequencies