Highly Variable Chloroplast Markers for Evaluating Plant Phylogeny at Low Taxonomic Levels and for DNA Barcoding Wenpan Dong1,2, Jing Liu1,3,JingYu1,3, Ling Wang2, Shiliang Zhou1* 1 State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China, 2 College of Landscape Architecture, Northeast Forestry University, Harbin, China, 3 Graduate University of Chinese Academy of Sciences, Beijing, China Abstract Background: At present, plant molecular systematics and DNA barcoding techniques rely heavily on the use of chloroplast gene sequences. Because of the relatively low evolutionary rates of chloroplast genes, there are very few choices suitable for molecular studies on angiosperms at low taxonomic levels, and for DNA barcoding of species. Methodology/Principal Findings: We scanned the entire chloroplast genomes of 12 genera to search for highly variable regions. The sequence data of 9 genera were from GenBank and 3 genera were of our own. We identified nearly 5% of the most variable loci from all variable loci in the chloroplast genomes of each genus, and then selected 23 loci that were present in at least three genera. The 23 loci included 4 coding regions, 2 introns, and 17 intergenic spacers. Of the 23 loci, the most variable (in order from highest variability to lowest) were intergenic regions ycf1-a, trnK, rpl32-trnL, and trnH-psbA, followed by trnSUGA-trnGUCC, petA-psbJ, rps16-trnQ, ndhC-trnV, ycf1-b, ndhF, rpoB-trnC, psbE-petL, and rbcL-accD. Three loci, trnSUGA-trnGUCC, trnT-psbD, and trnW-psaJ, showed very high nucleotide diversity per site (p values) across three genera. Other loci may have strong potential for resolving phylogenetic and species identification problems at the species level. The loci accD-psaI, rbcL-accD, rpl32-trnL, rps16-trnQ, and ycf1 are absent from some genera. To amplify and sequence the highly variable loci identified in this study, we designed primers from their conserved flanking regions. We tested the applicability of the primers to amplify target sequences in eight species representing basal angiosperms, monocots, eudicots, rosids, and asterids, and confirmed that the primers amplified the desired sequences of these species. Significance/Conclusions: Chloroplast genome sequences contain regions that are highly variable. Such regions are the first consideration when screening the suitable loci to resolve closely related species or genera in phylogenetic analyses, and for DNA barcoding. Citation: Dong W, Liu J, Yu J, Wang L, Zhou S (2012) Highly Variable Chloroplast Markers for Evaluating Plant Phylogeny at Low Taxonomic Levels and for DNA Barcoding. PLoS ONE 7(4): e35071. doi:10.1371/journal.pone.0035071 Editor: Ahmed Moustafa, American University in Cairo, Egypt Received June 25, 2011; Accepted March 13, 2012; Published April 12, 2012 Copyright: ß 2012 Dong et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This study was supported by National Natural Science Foundation of China (30930010, 30872062) and the Research Fund for the Large-scale Scientific Facilities of the Chinese Academy of Sciences (Grant No. 2009-LSF-GBOWS-01). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected] Introduction concatenation of many individual genes must be used to improve the resolution of the phylogenetic analysis, and to obtain At present, techniques for studying the molecular phylogeny of reasonable results. Such extra investments could be avoided if plants rely heavily on chloroplast genome sequence data. This is more variable locations were identified and universal primers were because the chloroplast genome has a simple and stable genetic available. structure, it is haploid, there are no (or very rare) recombination, it Some regions of the chloroplast genome, for example, atpF-H, is generally uniparentally transmitted, and universal primers can matK, psbK-I, rbcL, rpoB, rpoC1 and trnH-psbA have been relied upon be used to amplify target sequences. Another important reason is heavily for development of candidate markers for plant DNA the ease of PCR amplification and sequencing of chloroplast barcoding [8,9,10,11,12]. The aim of DNA barcoding is to solve genes, despite some intrinsic problems similar to those encoun- species identification problems, but some regions such as rbcL, tered when using animal mitochondrial DNA [1]. Many fragments rpoB, and rpoC1 are useful for identification at the family rather of coding regions, introns, and intergenic spacers, including atpB, than species level. Recently, candidate loci and some other loci atpB-rbcL, matK, ndhF, rbcL, rpl16, rps4-trnS, rps16, trnH-psbA, trnL-F, frequently used in phylogenetic analyses were critically evaluated trnS-G, etc., have been used for phylogenetic reconstructions at for several flowering plant groups, including Amomum [13], Carex various taxonomic levels [2,3,4,5,6,7]. Unfortunately, these [14], Meteoriaceae [15], Cycadales [16], Compsonuera [17], Panax regions often lack variations in closely related species, especially [18], peach [19] and tree peonies [20]. It seems that matK and those that have diverged recently in evolution. Therefore, a trnH-psbA are the two most promising choices of chloroplast PLoS ONE | www.plosone.org 1 April 2012 | Volume 7 | Issue 4 | e35071 Highly Variable Chloroplast Markers regions. The matK gene is one of the most versatile candidates so Table 1. Angiosperm genera in which complete chloroplast far, because it is useful for identification at family, genus, and even genomes have been determined in two or more species. species levels. However, it is difficult to amplify and sequence this region from certain taxa, and additional universal primers and optimization of PCR reactions are necessary [21,22]. trnH-psbA is Genus Species Family Smax Mean Stdev the most variable region in the chloroplast genome across a wide range of groups. However, there are some exceptions and long Acorus A. americanus Acoraceae 3 0.82 0.32 mononucleotide repeats (poly-structures or single nucleotide A. calamus microsatellites) can cause sequencing problems. Another problem Aethionema Ae. cordifolium Brassicaceae 49 9.67 9.00 is the presence of inversions in the middle of the sequence, which Ae. grandiflorum can lead to incorrect alignments [23]. Calycanthus C. chinensis Calycanthaceae 10 1.51 1.77 Most of the regions that are commonly used for phylogenetic analyses were first identified in the 1990s, before entire genome C. floridus var. glaucus sequences were available. Shaw et al. [24] summarized and Chimonanthus Ch. nitens Calycanthaceae 10 1.32 1.49 evaluated the most frequently used chloroplast regions in seed Ch. praecox plants, which significantly helped beginning researchers. Current- Eucalyptus E. globulus subsp. globulus Myrtaceae 10 1.08 1.52 ly, about 191 entire chloroplast genomes are available, and some E. grandis genera have two or more completely sequenced chloroplast Gossypium G. barbadense Malvaceae 28 1.44 2.59 genomes. Therefore, it is timely to reevaluate the variability of chloroplast regions at low taxonomic levels. Identification of G. hirsutum variable loci in chloroplast genomes will be extremely useful for Nicotiana N. sylvestris Solanaceae 16 4.00 3.49 molecular systematics and DNA barcoding. Many plant species N. tabacum evolved via adaptive radiations or explosive patterns of speciation, N. tomentosiformis and have evolutionary histories of only a few million years. The Oenothera Oe. argillicola Onagraceae 42 2.17 3.94 very short evolutionary histories result in low sequence divergence. The limited sequence variation is usually harbored in a few Oe. biennis hotspots, and most of the loci available to researchers based on Oe. glazioviana previous research provide very few informative characters. Oe. parviflora To solve phylogenetic problems at the species level, or to Oryza O. nivara Poaceae 11 0.82 1.43 identify species using DNA sequences, we need to identify regions O. sativa subsp. indica with very high evolutionary rates. Greater availability of such Paeonia P. brownii Paeoniaceae 31 8.04 5.82 regions will increase our ability to resolve such identification problems. Utilization of a larger number of regions of genes or P. obovata sequences can minimize the noise of the evolutionary heteroge- P. suffruticosa neity of genes or parts of a gene. Therefore, searching for more Populus P. alba Salicaceae 18 2.02 2.53 regions with high evolutionary rates is very important for plant P. trichocarpa phylogenetic analyses and for DNA barcoding. Fortunately, there Solanum S. bulbocastanum Solanaceae 26 5.03 4.67 are now many complete chloroplast genome sequences available, even for different species in same genera. This information allows S. lycopersicum the identification of most variable regions between or among S. tuberosum species. In this paper, we summarize the results of comparative Maximum number of polymorphic sites (Smax), mean number of polymorphic studies on chloroplast genomes of congeners of flowering plants. sites, and standard deviation of polymorphic sites is shown for each genus. Our aim was to find the most variable regions that are common doi:10.1371/journal.pone.0035071.t001 across many genera. Such regions can be used to resolve phylogenies and for DNA barcoding of closely related flowering variable loci present in at least one genus, 29 were shared by two plant species. or more genera, 23 by three or more genera, 11 by four or more genera, 10 by 5 or more genera, and only 5 by 6 or more genera. Results To provide reasonable choices, we further analyzed 23 loci (Table S1). Among them, ndhF, trnK (containing matK), ycf1-a, and ycf1-b Identification of most variable loci in chloroplast are largely coding regions, clpP and ndhA are introns, and the other genomes 17 are intergenic spacers (Fig.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages9 Page
-
File Size-