<<

Annual Research & Review in Biology

34(1): 1-12, 2019; Article no.ARRB.53419 ISSN: 2347-565X, NLM ID: 101632869

Bioinformatics Analysis on DNA Sequences for : A Review

Huyen-Trang Vu1,2 and Ly Le2*

1Faculty of Biotechnology, Nguyen Tat Thanh University, 298A-300A Nguyen Tat Thanh Street, Ward 13, Ho Chi Minh City, District 4, 72820, Vietnam. 2Faculty of Biotechnology, International University - Vietnam National University, Vietnam.

Authors’ contributions

This work was carried out in collaboration between both authors. Author HTV designed the study, performed the statistical analysis, wrote the protocol, managed the literature searches and wrote the first draft of the manuscript. Author LL managed the analyses of the study. Both authors read and approved the final manuscript.

Article Information

DOI: 10.9734/ARRB/2019/v34i130142 Editor(s): (1) Dr. Xiao-Xin Yan, Professor, Department of Anatomy and Neurobiology, Xiangya School of Medicine (CSU-XYSM), Central South University, China. Reviewers: (1) Bhaba Amatya, Tribhuvan University, Nepal. (2) Bipinchandra B. Kalbande, Rashtrasant Tukadoji Maharaj Nagpur University, India. Complete Peer review History: http://www.sdiarticle4.com/review-history/53419

Received 12 October 2019 Accepted 22 December 2019 Review Article Published 28 December 2019

ABSTRACT

Classification of organisms is the primary step in management of biodiversity, breeding, conservation and development of populations and distinguishing adulterant objects. There are many approaches in taxonomic identification, from morphological, PCR-based to sequence-based techniques. Molecular methods give more accurate results than morphological comparisons and are independent of stages. PCR-based methods are low-cost but their limited information gives less reproducibility and can only distinguish samples among determined groups. In contrast, in sequence-based methods each nucleotide site is considered as genetic information hence a sequence of nucleotide represents large data, which is highly specific and more stable than PCR bands. Establishment of worldwide DNA library for barcoding is essential. There were previous reviews on screenings and applications of in different taxa. In this review we discussed common analyses as well as some new improved techniques relying on barcoding approaches.

______

*Corresponding author: E-mail: [email protected];

Vu and Le; ARRB, 34(1): 1-12, 2019; Article no.ARRB.53419

Keywords: Molecular classification; bioinformatics tools; DNA barcoding; sequence analysis; identification technique.

ABBREVIATIONS evaluating genetic relationship of a new query sample based on the available sequence library AFLP : Amplified Fragment Length and thus allow to consult origin of the unknown Polymorphism; homologous . DNA polymorphism may ARMS : Amplification Refractory Mutation even provide more information than proteins due System; to the degradation of genetic code and the BA : Bayesian; presence of large non-coding stretches [7]. DNA BLAST : Basic Local Alignment Search Tool; fragments represented for the organism can be Cp : Complete; used as an identifying sequence like the human cpDNA : DNA; fingerprint (Fig. 1). This is the most common IR : Inverted Repeat; method in molecular identification strategy today. LSC : Large single-copy region; In , the highly of MAP : Maximum a posteriori; mitochondrial cytochrome c oxidase subunit I ML : Maximum Likelihood; (COI), which relates to oxidative phosphorylation MP : Maximum Parsimony; for metabolism, was effectively used as barcode NGS : Next Generation ; in diverse animal species [8,9]. Nevertheless NJ : Neighbor-joining; there was no such effective barcode for . OUT : Operational Taxonomic Units; PCR : Polymerase Chain Reaction; RADP : Random Amplified Polymorphic DNA; RFLP : Restriction Fragment Length Polymorphism; SCAR : Sequence Characterized Amplified Region; SNP : Single Nucleotide Polymorphism; SSC : Small single-copy region; Fig. 1. DNA barcoding scheme SSR : Simple Sequence Repeat; (https://en.wikipedia.org/wiki/DNA_barcoding)

UPGMA : Unweighted Pair Group Method with A numerous studies of barcoding for plants have Arithmetic mean. been conducted. To date, sequencing techniques 1. INTRODUCTION for determining species using next generation sequencing (NGS) in plants are limited primarily Collection of genetic information, for looking up to agricultural crops [10-13] due to their high cost the origin of a wide range of organisms linked all and time. Therefore short sequences are still over the world, is an advanced and essential considered as convenient and effective toosl due idea for the protection of species, phylogenetic to the quick and accurate sequencing [14]. A inference, management and development of number of studies have reviewed on finding and genetic diversity [1,2]. Morphological methods applying of DNA barcoding in which the show limitations of accuracy and high reliance on efficiency of different biomarkers have been reproductive organs. PCR-based methods can summarized [15-22]. Others discussed the overcome these above problems with just a small effectiveness of different barcoding techniques, piece of sample. However, amplification criteria and measurements [7,23,24]. In this techniques can only be effectively applied to review, we discussed some bioinformatics tools samples of a defined group using RADP, RFLPs, in identification analysis of barcoding studies. We AFLPs [3] or to samples containing specific also presented some prospects and developed using species-specific PCR [4-6]. An techniques relied on barcoding approaches. unknown taxon cannot be determined using PCR-band techniques. 2. COMMON BIOINFORMATICS ANALYSES IN IDENTIFICATION Since each site of sequence is considered as a TECHNIQUES USING SEQUENCES character in bioinformatics analysis, the sequence-based method gives more variable Bioinformatics procedure for species information. This approach allows the read of identification using sequences comprises two every nucleotide among the samples. Specific basic steps: First, the sequence alignment and stable features of monomers are useful in should be conducted on the basis of a

2

Vu and Le; ARRB, 34(1): 1-12, 2019; Article no.ARRB.53419

comparative step. This alignment can be In the alignment step, sequences can be aligned performed by two methods. The novel across their entire length (global alignment) or sequences are pairwise aligned against the only in certain regions (local alignment). For available sequences in certain databases BLAST (Basic Local Alignment Search Tool) (similarity search) [25,26] or studied sequences (https://blast.ncbi.nlm.nih.gov/Blast.cgi) and are aligned against each other in a specific set of FASTA [40] programs, each examined sequence data (many-against-each other search) [2,27,28]. is considered a query sequence. A local pairwise Following, based on these alignment data, alignment is running between two biological similarity or variation are investigated. This step sequences: the query sequence and each of is performed by evaluating such parameters as database sequences. In contrast to similarity GC% content [27,29], genetic distance [30-33], search, alignment process in many-against-each variable sites [32,34], indel appearances [35], or other search is based on two stages, the monophyletic clusters [36-38], which can indicate pairwise alignments followed by multiple typical characters of the examined sequences. alignments. In multiple alignment, whether These measurements vary from different studies global alignment or local alignment should be as they depend on alignment and identification applied depending on similarity level and methods (Table 1). Final target of this step is to difference in sequence lengths. ClustalW is a decide whether the species are different or not. fully automatic program for global multiple In some studies, species resolution was alignment of DNA and protein sequences[41]. calculated by counting the number of identified However, for MAFFT [42] and MUSCLE [43] species out of the total number of examined programs, users can select suitable aligning species [2,31,39]. strategies. Global alignment is effective

Table 1. List of common sequence-based identification methods and identification criteria

Alignment methods Species Identification methods Identification criteria Similarity search Blast Correct Identification Ambiguous Identification Incorrect Identification Many-against-each other Genetic distance- Nearest Correct Identification search based distance Ambiguous Identification Incorrect Identification Many-against-each other Best match Correct (match) search Ambiguous Incorrect (mismatch) Best close Correct (match) match Ambiguous Incorrect (mismatch) Nomatch (under Threshold (%) All species Correct (match) barcodes Ambiguous Incorrect (mismatch) Many-against-each other Barcoding gap Intra-specific genetic distance search Inter-specific genetic distance Barcoding gap Many-against-each other Tree-based Monophylectic search Paraphylectic Polyphylectic Many-against-each other Character-based Variable sites search (nucleotide polymorphism) Indels Sequence lengths GC% contents

3

Vu and Le; ARRB, 34(1): 1-12, 2019; Article no.ARRB.53419

when the input sequences share global Another method is the use of nucleotide homology, and the similarity level is high. On the polymorphism features, such as variable sites, other hand, with fragmentary and divergent sequence length variation, indel information, sequences, local alignment would be the better GC% content [4,19,32,53,54] as tools of option [44]. For “BLAST” method, the target is identification. This approach is known as searching for the best homologous sequences character-based method in which each (best hits) from the available databases nucleotide is consider as the fifth character (GenBank, BOLD, others). Based on this beside four traditional characters A, T, C and G. background, “Correct Identification” means that: the best hit is the sequence with species name Regardless of which method is used, the core as expected. “Ambiguous Identification”: the best property of the identification strategy is that all hits are some sequences belong to different conspecific sequences should be grouped species including the expected species. together without blending with any other species. “Incorrect Identification”: the best hit does not To address this issue, the tree-based method is match with expected species [25,26,28,45]. a simple and visualized approach that is most However, because “BLAST” depends on common in such classification studies [2,25- available sequences reported in databases, 27,29,34,36-39,45,46,48,49,55]. Two species are query species must be already included in a completely separated when all sequences of one database otherwise the result will be a failure, species are clustered in a monophyletic branch therefore a negative “Incorrect Identification” will [56]. There are some problems that should be occur. noticed to avoid errors in identification process. Small sample size [4,26,29,36,51,57-59] or a For genetic distance-based methods, the “best wide range but less con-level of taxonomic hit” feedback of “Correct”, “Ambiguous” and samples [27,60,61] may lead to over-fitting [56]. “Incorrect” criteria are similar to that in “BLAST”, but based on the smallest genetic distances 3. COMMON BIOINFORMATICS TOOLS (“Nearest Distance”) [28] or the most similarity IN IDENTIFICATION TECHNIQUES (“Best match”, “Best close match”) [46-49]. The query sequence is compared with each other in USING SEQUENCES the given data set. “Best close match” differs from “Best match”. A similarity threshold value For various measurements, different (e.g. 95%) of all intraspecific distances is bioinformatics tools could be used also. The established to determine how similar a barcode BLAST method is presented by BLASTn program match needs to be and the results under this from NCBI. Intra- and inter-specific genetic similar value (No match) would be removed distances, matching sequences and clustering before identification step. sequences based on pairwise distances can be calculated in “Best match”, “Best close match” For “All species barcodes” methods, a list of and “All species barcodes” methods by Species sequences sorted by similarity to each query Identifier tool using TaxonDNA software. When using the same threshold as for “Best close multiple regions are compared for selection of match” are assembled. If the query is followed by optimal biomarkers, TaxonGap program is used all conspecific barcodes with at least two ones to infer intra- and inter-specific distance for then the species is identified. If the query is “Nearest” method in high-throughput sequencing followed by only one or some of conspecific researches [62]. barcodes then the identification is ambiguous. If the query is followed by none of conspecific For tree-based reconstruction, probability model barcodes but other species then misidentification should be estimated. Neighbor-joining (NJ), occurs [47]. Maximum Likelihood (ML), Maximum Parsimony (MP), Bayesian (BA) or Unweighted Pair Group Barcoding gap method analyses the divergence Method with Arithmetic mean (UPGMA) are between intra-specific and inter-specific genetic common phylogenetic algorithms inferred along distances of each query versus other conspecific with suitable models. Which method should be and hetero-specific sequences in a data set [25- used in different studies is still a problem that 28, 30-33,50,51]. However this method may be need to be noticed for obtaining the most not precise in case of using mean instead of accurate results. Some studies performed smallest inter-specific distances versus largest comparisons on different methods [27,45]. intra-specific distances [52]. If barcoding gap Although algorithms are inferred from exists, the species is successfully identified. suppositions, having a thorough understanding of

4

Vu and Le; ARRB, 34(1): 1-12, 2019; Article no.ARRB.53419

our algorithms and data is the best way to accurate and reliable results, PAUP* and achieve the highest efficiency. The standard MRBAYES are often used although it takes much principal of tree building is to examine all more time. Maximum Likelihood and Maximum possible topologies or certain topologies that Parsimony can also be calculated using PAUP* represent the true structure. Neighbor-joining [70]. Data was estimated for revolutionary performance [39,46] is a time-consuming method models using software jModelTest [71] before in reconstructing phylogenetic trees. It finds pairs running in PAUP*. The software MRBAYES of operational taxonomic units (OTUs) that analyses the Bayesian Inference [72]. However, minimize the total branch length at each stage of the use of PAUP* and MRBAYES needs OTUs clustering and starts with a star-like tree bioinformatics skill to run the program, which is [63]. However the reliability of Neighbor-joining quite complex for some researchers. tree is a problematic issue [47] as it quickly generates only one while 4. RELATION BETWEEN MOLECULAR others may be more fitting. In contrast, Maximum IDENTIFICATION AND APPLICATION Likelihood (ML) accounts the probability for all FOR PHYLOGENETIC STUDY events that can happen simultaneously and the best tree is supported at a higher For discrimination of species, tree-based method probability. Hence ML become a powerful and seems to be the most common technique in professional method in phylogenetic algorithm many different studies on variety taxa. [64,65] although it requires significant running Phylogenetic tree can be presented for both time for optimal tree especially with large data phylogenetic and barcoding studies. However [66,67]. phylogenetic analysis represents the measurement and estimation of evolutionary The Maximum parsimony method is based on past. Whereas barcoding analysis is used to the least character state changes required to identify taxa in certain taxonomic groups infer a tree. In case of heterogeneous evolution, [2,39,48] or to determine a new taxon [60,73] by maximum parsimony (MP) is strongly biased DNA sequence comparison. In barcode- towards recovering an incorrect tree. However phylogenetic tree, the differences between this method outperforms ML over a wide range nucleotide characters are more important than of conditions, including low and moderate the way they form through the evolutionary time. heterogeneity [68]. This means that taxa of the same species should be clustered in a monophyletic branch and the While both ML and MP use the probabilities different ones should be distributed in separated called likelihood, the Bayesian (BA) technique [45,49]. The length of branches and the represents the posteriori probability (Bayes’ members of clusters are not strictly evaluated. rule). A known theory called prior is Therefore, barcode-phylogenetic tree is not really implemented. The posteriori is in direct a phylogenetic tree. However, as the trees are proportion with the product of likelihood and prior built based on specific DNA sequences, [67]. Bayesian posterior probability gives more- information from them can be used for generous estimates of subtree reliability than the phylogenetic investigations. Barcoding and maximum likelihood analysis, particularly when phylogenetic relationships of species have also using the gamma distribution modeling [65,67]. been studied in combination previously When the size of data is small, the probabilities [28,29,36,74]. For phylogenetic relationship inferred from ML may be over-fitting. Maximum analyses, homologous variations at each a posteriori (MAP) can be taken accounts to alignment site are considered. In this case, solve this problem. Maximum Likelihood or Maximum Parsimony is used instead of Neighbor-joining [27]. Regarding genetic distance and tree-based methods, MEGA is the most popular software Yang et al. (2013) used six data sets including used due to its friendly interface and optimal sequences of whole complete (cp) genomes, analysis time. Since genetic variations have to be protein-coding exons, large single-copy region, inferred from evolutionary distance matrices, small single-copy region, inverted repeat region, MEGA versions have integrated these and spacers for phylogenetic tree evolutionary models into their program along with establishments to indicate congruent among different algorithms, i.e. Neighbor-joining, different data partitions (Fig. 2). Only cp genome Maximum Likelihood, Maximum Parsimony and tree and the and intergenic spacer tree UPGMA [69]. However, the number of models in gave genetic similarity. The relationships MEGA is pruned for its convenience. For more between seven species C. danzaiensis,

5

Vu and Le; ARRB, 34(1): 1-12, 2019; Article no.ARRB.53419

Fig. 2. Phylogenetic trees constructed from different data partitions from whole chloroplast genome, with all clades were absolutely separated by high genetic variation [75] (CP Genome: complete genome; LSC: large single-copy regions; SSC: small single-copy regions; IR Region: inverted repeat region; numbers above the lines on the left indicate the maximum parsimony bootstrap of each >50%; numbers above the lines on the right indicate the Bayesian posterior probabilities; numbers below each branch are the maximum likelihood bootstrap of each clade >50%)

C. pitardii, C. impressinervis, C. cuspidata, C. levels but still have some limitations with closely taliensis 7, C. taliensis 8 and C. yunnanensis related taxa. Some researchers have suggested were not consistent in other trees [75]. This result the way of serving complete chloroplast genome posed a question: whether a partial DNA as a single barcode in plants [31,76]. This sequence region is sufficient to represent method was called ultra-barcoding by Kane et al. species and the relationship between them. The when they compared nine complete plastid great data of whole genome might be more genotypes of Theobroma cacao and one related reliable for estimating the evolution compared to species T. grandiflorum with a control GenBank shorter barcodes. accession [77]. The complete chloroplast DNA (cpDNA) could absolutely separate all examined 5. THE USE OF COMPLETE intra-species in the study. This approach promise CHLOROPLAST GENOME AS ULTRA new effective applications in identification at BARCODE (SUPER-BARCODE) species and below-species level [78]. Fritillaria, a popular in China, Short DNA sequences can solve most questions experienced some efforts in characterizing in identification at species and above-species closely related species using universal

6

Vu and Le; ARRB, 34(1): 1-12, 2019; Article no.ARRB.53419

molecular markers (ITS, trnL-trnF ect.) but could techniques only react upon annealing to specific not be distinguished entirely [79-81]. Using DNA sequences and give more specific and complete chloroplast sequences, phylogenetic reproducible results than random amplifications. analyses of eight Fritillaria species were well SCAR (sequence characterized amplified resolved in the study of Bi et al. [82]. In other regions) technique succeeded in authentication studies, even no potential regions in cp genome three species of Paphiopedilum armeniacum, were proposed but the complete genome itself Paphiopedilum micranthum, Paphiopedilum had a capability in distinguishing samples as a delenatii and their hybrids by developing three single barcode [83]. The interesting thing is that species-specific primer pairs from ITS sequences you can also use this meta-data to develop [6]. Some studies focused on developing and potential mini-barcodes which are high variability comparing simple sequence repeat (SSR) for quick authentication of certain taxa [84-88]. system among screened taxa [86,91,92]. Kim et al. designed new primers for ARMS This genomic strategy not only allows the use of (amplification refractory mutation system) complete cpDNA as a single barcode, but also technique based on specific insertion in facilitate the utilization of other data partitions to sequence of Cypripedium macranthos and SNPs identify plants. Protein-coding exons, large (single nucleotide polymorphisms) in sequences single-copy region, small single-copy region, of Cypripedium japonicum and Cypripedium inverted repeat region, introns and intergenic formosanum) located inside atpF-atpH barcode. spacers (Fig. 2) [75], pseudogenes [89] or the These three primer pairs can be used in differences between size, number of annotated combination to distinguish four Cypripedium genes [86] could be taken into accounts in species with different-size bands on the phylogenetic analyses. Length-variations which electrophoresis gel [4]. RFLP (restriction based on the differences of indels (insertions and fragment length polymorphism) technique is a deletions) at certain locations of cpDNA genome type of random PCR-based method in were also considered as effective barcodes identification of subjects. However, RFLP based [35,90]. on the combination of ITS and some chloroplast sequences gave more reproducible and Genetic information of this super-data is successful identification of native Dendrobium definitely great, enough to avoid the analytical species in Thailand by Peyachoknagula et al. limitation by different bioinformatics methods [93]. Therefore using known barcodes to develop such as Maximum Likelihood (ML), Maximum other time- and cost-saving methods can also Parsimony (MP) or Bayesian (BA). This means support molecular identification. that the topologies of phylogenetic trees based on these three algorithms, which be usually Furthermore, using high-through congruent using short DNA sequences, are put sequencing can help identify a variety of highly similar in the case [75]. species in multiple samples. A detection of species was performed simultaneously in 55 Since the cost for whole genome sequencing has commercial salep products based on ITS significantly decreased, from $2.7 billion in 2003 barcode. Each sample was found to contain 1-55 for the first human genome to $300,000 in 2006 species [94]. RbcL, matK and ITS were also and from there to $1,000 in 2016, a series of used in combination to determine 16 orchid studies on plant barcoding was published in the species as components of a common food next two years 2017 and 2018. Along with named Chikanda in Zambia [95]. Based on this sequencing and assembling techniques, whole- foundation, the authors alerted the over- plastome barcode may offer more informative harvesting condition and called for the sites and is considered as accurate and effective conservation of these rare orchids. This single barcode for identification in plants. technique also opened a new trend in application of DNA barcodes. 6. APPLICATION OF BARCODES TO DEVELOP OTHER IDENTIFICATION 7. CONCLUSIONS TECHNIQUES Barcoding technique had been shown to be Nucleotide information can be used to develop useful in many practical applications and some species-specific amplification markers for classification studies. This method promises quick and cheap identification of specific more optimal results than the PCR method subjects. PCR primers derived from these especially when the price of the sequencing is

7

Vu and Le; ARRB, 34(1): 1-12, 2019; Article no.ARRB.53419

decreasing. In barcoding technique using papua’s endemic orchids based on RAPD sequences, the chosen loci and algorithms are markers. Natural Science. 2017;9:377-385. directly related to the identification results. 4. Kim JS, Kim HT, Son S-W, Kim J-H. Molecular identification of endangered Whole genome comparison based on next Korean lady’s slipper orchids generation sequencing and high-throughput (Cypripedium, Orchidaceae) and related sequencing has been shown to be more taxa. . 2015;93(9):603-610. convenient and contain greater data than 5. Peyachoknagul S, Mongkolsiriwatana C, traditional barcoding. However, in terms of price, Wannapinpong S, Huehne PS, Srikulnath it is about $200 per sample up to now, which is K. Identification of native dendrobium 20 times more expensive than mini-barcodes species in Thailand by PCR-RFLP of ($11-13). In terms of technology, a powerful rDNA-ITS and chloroplast DNA. Science computer, which is not available in small Asia. 2014;40:113–120. laboratories, is needed to manage large genome 6. Sun YW, Liao YJ, Hung YS, Chang JC, data. Besides, it also takes a considerable time Sung JM. Development of ITS from the sequencing to the analyzing. For those sequence based SCAR markers for reasons, traditional barcoding methods are still discrimination of Paphiopedilum popular and effective in many cases. Hence armeniacum, Paphiopedilum micranthum, these two barcoding trends can be performed Paphiopedilum delenatii and their hybrids. according to the demands and conditions of Scientia Horticulturae. 2011;127(3):405- different laboratories. In parallel, the sequencing 410. techniques and analyzing tools should be 7. Pereira F, Carneiro J, Amorim A. improved to make it simpler and more convenient Identification of species with DNA-based for researchers. The next orientation of technology: Current progress and identification should be a species identification challenges. Recent Patents on DNA and gadget that is portable and requires no Sequences. 2008;2(3):187-99. amplification step. 8. Yang F, Ding F, Chen H, He M, Zhu S, Ma X, Jiang L, Li H. DNA barcoding for the ACKNOWLEDGEMENT identification and authentication of animal species in traditional medicine. Evidence- The study was supported by Air Force Office of Based Complementary and Alternative Scientific Research and Asian Office of Medicine. 2018;2018:18. Aerospace Research and Development 9. Waugh J. DNA barcoding in animal (grant number FA23861514119). All authors species: Progress, potential and pitfalls. declared no other competing financial Bioessays. 2007;29(2):188-97. interests. 10. Li X, Wu L, Wang J, Sun J, Xia X, Geng X, Wang X, Xu Z, Xu Q. Genome sequencing COMPETING INTERESTS of rice subspecies and genetic analysis of recombinant lines reveals regional yield- Authors have declared that no competing and quality-associated loci. BMC Biology. interests exist. 2018;16(1):102. 11. Ray S, Satya P. Next generation REFERENCES sequencing technologies for next generation plant breeding. Frontiers in plant science. 2014;5:367-367. 1. Kress WJ, Wurdack KJ, Zimmer EA, Weigt 12. Thottathil GP, Jayasekaran K, Othman AS. LA, Janzen DH. Use of DNA barcodes to Sequencing Crop Genomes: A Gateway to identify flowering plants. Proceedings of Improve Tropical Agriculture. Tropical life the National Academy of Sciences of the sciences research. 2016;27(1):93-114. United States of America. 2005;102(23): 8369-8374. 13. Zhou X, Bai X, Xing Y. A Rice Genetic 2. Kim HM, Oh SH, Bhandari GS, Kim CS, Improvement Boom by Next Generation Park CW. DNA barcoding of Orchidaceae Sequencing. Current Issues in Molecular in Korea. Molecular Resources. Biology. 2018;27:109-126. 2014;14(3):499-507. 14. Hebert PD, Cywinska A, Ball SL, deWaard 3. Abbas B, Dailami M, Listyorini FH, Munarti. JR. Biological identifications through DNA Genetic variations and relationships of barcodes. Proceedings of the Royal

8

Vu and Le; ARRB, 34(1): 1-12, 2019; Article no.ARRB.53419

Society. B, Biological sciences. 2003;270 Peninsular Malaysia. Phytotaxa. 2019;387 (1512):313-21. (2):94-104. 15. Das S, Deb B. DNA barcoding of fungi 27. Chattopadhyay P, Banerjee G, Banerjee N. using Ribosomal ITS Marker for genetic Distinguishing orchid species by DNA diversity analysis: A Review. International barcoding: Increasing the resolution of Journal of Pure and Applied Bioscience. population studies in plant biology. Omics. 2015;3(3):160-167. 2017;21(12):711-720. 16. Selvaraj D, Park JI, Chung MY, Cho YG, 28. Feng S, Jiang Y, Wang S, Jiang M, Chen Ramalingam S, Nou I-S, Utility of DNA Z, Ying Q, Wang H. Molecular Barcoding for Plant Biodiversity Conserva- Identification of Dendrobium Species tion. 2013;1. (Orchidaceae) based on the DNA barcode 17. Techen N, Parveen I, Pan Z, Khan IA. ITS2 region and Its application for DNA barcoding of medicinal plant material phylogenetic study. International journal of for identification. Current Opinion in molecular sciences. 2015;16(9):21975- Biotechnology. 2014;25:103-110. 21988. 18. Teixeira da Silva JA, Jin X, Dobranszki J, 29. Wu CT, Gupta SK, Wang AZM, Lo SF, Kuo Lu J, Wang H, Zotz G, Cardoso JC, Zeng CL, Ko YJ, Chen CL, Hsieh CC, Tsay HS. S. Advances in dendrobium molecular Internal transcribed spacer sequence research: Applications in genetic variation, based identification and phylogenic identification and breeding. Molecular relationship of Herba Dendrobii. Journal of and Evolution. 2016;95:196- Food and Drug Analysis. 2012;20(1):143- 216. 151. 19. Vu H-T, Huynh P, Tran H-D, Le L. In silico 30. Ginibun FC, Saad MRM, Hong TL, Othman study on molecular sequences for RY, Khalid N, Bhassu S. Chloroplast DNA identification of Paphiopedilum species. barcoding of Spathoglottis species for Evolutionary Bioinformatics. 2018;14: genetic conservation. Acta Hortuculturae. 117693431877454. 2010;878:453-460. 20. Fiser Pecnikar Z, Buzan EV. 20 years 31. Singh HK, Parveen I, Raghuvanshi S, since the introduction of DNA barcoding: Babbar SB. The loci recommended as From theory to application. Journal of universal barcodes for plants on the basis Applied Genetics. 2014;55(1):43-52. of floristic studies may not work with 21. Hollingsworth PM. Refining the DNA congeneric species as exemplified by DNA barcode for land plants. Proceedings of the barcoding of Dendrobium species. BMC National Academy of Sciences of the Research Notes. 2012;5:42. United States of America. 2011;108(49): 32. Tanee T, Chadmuk P, Sudmoon R, 19451-2. Chaveerach A, Noikotr K. Genetic analysis 22. Saddhe AA, Kumar K. DNA barcoding of for identification, genomic template stability plants: Selection of core markers for in hybrids and barcodes of the Vanda taxonomic groups. Plant Science Today. species (Orchidaceae) of Thailand. African 2018;5:1. Journal of Biotechnology. 2012;11(55): 23. Duan H, Chen F, Liu W, Zhou C, Zhou Y. 11772-11781. Research and applications of DNA 33. Yao H, Song JY, Ma XY, Liu C, Li Y, Xu barcode in identification of plant species. HX, Han JP, Duan LS, Chen SL. Research in Plant Biology. 2014;4(3). Identification of Dendrobium species by a 24. Ganie SH, Upadhyay P, Das S, Prasad candidate DNA barcode sequence: The Sharma M. Authentication of medicinal chloroplast psbA-trnH intergenic region. plants by DNA markers. Plant Gene. 2015; Planta Medica. 2009;75(6):667-9. 4:83-99. 34. Givnish TJ, Spalink D, Ames M, Lyon SP, 25. Parveen I, Singh HK, Malik S, Hunter SJ, Zuluaga A, Iles WJD, Clements Raghuvanshi S, Babbar SB. Evaluating MA, Arroyo MTK, Leebens-Mack J, Lorena five different loci (rbcL, rpoB, rpoC1, E, Ricardo K, Kurt MN, Whitten WM, Norris matK, and ITS) for DNA barcoding of HW, Kenneth MC. Orchid Indian orchids. Genome. 2017;60(8):665- and multiple drivers of their extraordinary 671. diversification. Proceedings of the Royal 26. Rajaram MC, Yong C, Azlan GJ, Go R. Society B. 2015;282:20151553. DNA barcoding of endangered 35. Santos C and Pereira F. Identification of Paphiopedilum species (Orchidaceae) of plant species using variable length

9

Vu and Le; ARRB, 34(1): 1-12, 2019; Article no.ARRB.53419

chloroplast DNA sequences. Forensic in Venus Slipper (Paphiopedilum). PLOS Science International: Genetics. 2018;36: ONE. 2016;11(1):e0146880. 1-12. 47. Meier R, Shiyang K, Vaidya G, Ng PK. 36. Asahina H, Shinozaki J, Masuda K, DNA barcoding and in Diptera: A Morimitsu Y, Satake M. Identification of tale of high intraspecific variability and low medicinal Dendrobium species by identification success. Systematic Biology. phylogenetic analyses using matK and 2006;55(5):715-28. rbcL sequences. Journal of Natural 48. Xiang XG, Hu H, Wang W, Jin XH. DNA Medicines. 2010;64(2):133-8. barcoding of the recently evolved genus 37. Tang Y, Yukawa T, Bateman RM, Jiang H, Holcoglossum (Orchidaceae: Aeridinae): A Peng H. Phylogeny and classification of test of DNA barcode candidates. Molecular the East Asian Amitostigma alliance Ecology Resources. 2011;11(6):1012-21. (Orchidaceae: Orchideae) based on six 49. Xu S, Li D, Li J, Xiang X, Jin W, Huang W, DNA markers. BMC . Jin X, Huang L. Evaluation of the DNA 2015;15:96. Barcodes in Dendrobium (Orchidaceae) 38. Tran DD, Khuat HT, La TN, Nguyen TTT, from Mainland Asia. PLOS ONE. 2015; Pham BH, Nguyen TK, Tran HD, Do MT, 10(1):e0115168. Tran DK. Identification of vietnamese 50. Gao T, Yao H, Song J, Liu C, Zhu Y, Ma X, native Dendrobium species based on Pang X, Xu H, Chen S. Identification of ribosomal DNA internal transcribed spacer medicinal plants in the family Fabaceae sequence. Advanced Studies in Biology. using a potential DNA barcode ITS2. 2018;10(1):1-12. Journal of Ethnopharmacology. 2010;130 39. Poovitha S, Stalin N, Balaji R, Parani M. (1):116-21. Multi-locus DNA barcoding identifies matK 51. Siripiyasing P. DNA barcoding of the as a suitable marker for species Cymbidium species (Orchidaceae) in identification in Hibiscus L. Genome. 2016; Thailand. African journal of agricultural 59(12):1150-1156. research. 2012;7(3):393-404. 40. EMBL-EBI. FASTA Help and 52. Meier R, Zhang G, Ali F. The use of mean Documentation; 2019. instead of smallest interspecific distances Available:https://www.ebi.ac.uk/seqdb/conf exaggerates the size of the “Barcoding luence/display/JDSAT/FASTA+Help+and+ Gap” and leads to misidentification. Documentation. Systematic Biology. 2008;57(5):809-813. 41. Lloyd A. The Clustal W WW server at the 53. Guo YY, Luo YB, Liu ZJ, Wang XQ. EBI; 1997. Evolution and biogeography of the slipper Available:http://www.ebi.ac.uk/embnet.new orchids: Eocene vicariance of the s/vol4_3/clustalw1.html. conduplicate genera in the old and new 42. Katoh K, Asimenos G, Toh H. Multiple World Tropics. PLOS ONE. 2012;7(6): alignment of DNA sequences with MAFFT. e38788. Methods in molecular biology (Clifton, 54. Shaw J, Lickey EB, Schilling EE, Small RL. N.J.). 2009;537:39-64. Comparison of whole chloroplast genome 43. Edgar RC. MUSCLE: Multiple sequence sequences to choose noncoding regions alignment with high accuracy and high for phylogenetic studies in angiosperms: throughput. Nucleic Acids Res. 2004;32 the tortoise and the hare III. American (5):1792-7. Journal of Botany. 2007;94(3):275-88. 44. Katoh K, Rozewicki J, Yamada KD. 55. Trung KH, Khanh TD, Ham LH, Duong TD, MAFFT online service: Multiple sequence Khoa T. Molecular phylogeny of the alignment, interactive sequence choice endangered Vietnamese Paphiopedilum and visualization. Briefings in species based on the internal transcribed Bioinformatics. 2017;20(4):1160-1166. spacer of the nuclear ribosomal DNA. 45. Ghorbani A, Gravendeel B, Selliah S, Advanced Studies in Biology. 2013;5(7): Zarre S, de Boer H. DNA barcoding of 337 - 346. tuberous Orchidoideae: A resource for 56. Meyer CP, Paulay G. DNA barcoding: identification of orchids used in Salep. Error rates based on comprehensive Molecular Ecology Resources. 2017;17 sampling. PLoS biology. 2005;3(12):e422- (2):342-352. e422. 46. Guo YY, Huang LQ, Liu ZJ, Wang XQ. 57. Gigot G, Van Alphen-Stahl J, Bogarin D, Promise and challenge of DNA barcoding Warner J, Chase MW, Savolainen V.

10

Vu and Le; ARRB, 34(1): 1-12, 2019; Article no.ARRB.53419

finding a suitable DNA barcode for 68. Kolaczkowski B and Thornton JW. Mesoamerican orchids. Lankesteriana Performance of maximum parsimony and International Journal on Orchidology. 2007; likelihood phylogenetics when evolution 7(1-2):200-203. is heterogeneous. Nature. 2004;431: 58. Parveen I, Singh HK, Raghuvanshi S, 980. Pradhan UC, Babbar SB. DNA barcoding 69. Kumar S, Nei M, Dudley J, Tamura K. of endangered Indian Paphiopedilum MEGA: A biologist-centric software for species. Molecular Ecology Resources. evolutionary analysis of DNA and protein 2012;12(1):82-90. sequences. Briefings in Bioinformatics. 59. Yukawa T, Kinoshita1 A, Tanaka N. 2008;9(4):299-306. Molecular identification resolves taxonomic 70. Swofford DL. PAUP*. Phylogenetic confusion in Grammatophyllum speciosum analysis using parsimony and other Complex (Orchidaceae). Bulletin of the methods. Version 4.0; 2003. National Museum of Nature and Science, 71. Posada D. jModelTest: Phylogenetic model Series B. 2013;39(3):137–145. averaging. Mol Biol Evol. 2008;25(7):1253- 6. 60. Chen LJ, Liu ZJ, Li YY, Li LQ. A new 72. Huelsenbeck JP, Ronquist F. MRBAYES: orchid Paphiopedilum guangdongense and Bayesian inference of phylogenetic trees. its molecular evidence. Journal of Bioinformatics. 2001; 17(8):754-5. and Evolution. 2010;48(5): 73. Xu Q, Zhang G-Q, Liu Z-J, Luo Y-B. Two 350-355. new species of Dendrobium (Orchidaceae: 61. Lahaye R, van der Bank M, Bogarin D, Epidendroideae) from China: Evidence Warner J, Pupulin F, Gigot G, Maurin O, from morphology and DNA. Phytotaxa. Duthoit S, Barraclough TG, Savolainen V. 2014;174(3):15. DNA barcoding the floras of biodiversity 74. Xiang XG, Jin WT, Li DZ, Schuiteman A, hotspots. Proceedings of the National Huang WC, Li JW, Jin XH, Li ZY. Academy of Sciences of the United States Phylogenetics of tribe collabieae of America. 2008;105(8):2923-8. (Orchidaceae, Epidendroideae) based on 62. Slabbinck B, Dawyndt P, Martens M, De four chloroplast genes with morphological Vos P, De Baets B. Taxon Gap: A appraisal. PLOS ONE. 2014;9(1):e87625. visualization tool for intra- and inter- 75. Yang JB, Yang SX, Li HT, Yang J, Li DZ. species variation among individual Comparative chloroplast genomes of biomarkers. Bioinformatics. 2008;24(6): Camellia species. PloS One. 2013;8(8): 866-7. e73053-e73053. 63. Nei M and Saitou N. The neighbor-joining 76. Nock CJ, Waters DL, Edwards MA, Bowen method: a new method for reconstructing SG, Rice N, Cordeiro GM, Henry RJ. phylogenetic trees. Molecular Biology and Chloroplast genome sequences from total Evolution. 1987;4(4):406-425. DNA for plant identification. Plant Biotechnology Journal. 2011;9(3):328- 64. Felsenstein J. Evolutionary trees from 33. DNA sequences: A maximum likelihood 77. Kane N, Sveinsson S, Dempewolf H, Yang approach. Journal of Molecular Evolution. JY, Zhang D, Engels JM, Cronk Q. Ultra- 1981;17(6):368-76. barcoding in cacao (Theobroma spp.; 65. Penny D. Inferring phylogenies — Joseph Malvaceae) using whole chloroplast Felsenstein. 2003. Sinauer Associates, genomes and nuclear ribosomal DNA. Sunderland, Massachusetts. Systematic American Journal of Botany. 2012;99(2): Biology. 2004;53(4):669-670. 320-9. 66. Felsenstein J. Confidence limits on 78. Parker J, Helmstetter AJ, Devey D, phylogenies: An approach using the Wilkinson T, Papadopulos AST. Field- bootstrap. Evolution. 1985;39(4):783- based species identification of closely- 791. related plants using real-time nanopore 67. Mar JC, Harlow TJ, Ragan MA. Bayesian sequencing. Scientific Reports. 2017;7(1): and maximum likelihood phylogenetic 8345. analyses of protein sequence data under 79. Day PD, Berger M, Hill L, Fay MF, Leitch relative branch-length differences and AR, Leitch IJ, Kelly LJ. Evolutionary model violation. BMC Evolutionary Biology. relationships in the medicinally important 2005;5:8-8. genus Fritillaria L. (Liliaceae). Molecular

11

Vu and Le; ARRB, 34(1): 1-12, 2019; Article no.ARRB.53419

Phylogenetics and Evolution. 2014;80:11- Lee YM. The complete chloroplast genome 9. sequence of Abies nephrolepis (Pinaceae: 80. Turktas M, Aslay M, Kaya E, Ertugrul F. Abietoideae). Journal of Asia-Pacific Molecular characterization of phylogenetic Biodiversity. 2016;9(2):245-249. relationships in Fritillaria species inferred 89. Zhang Y, Guan W, Zhang X, Li L. The from chloroplast trnL-trnF sequences. complete chloroplast genomes of Turkish Journal of Biology. 2012;36(5): species. Research & reviews: 552-560. Journal of Botanical Sciences. 2016;5(1): 81. Khourang M, Babaei A, Sefidkon F, 24-28. Naghavi MR, Asgari D, Potter D. 90. Jheng CF, Chen TC, Lin JY, Chen TC, Wu Phylogenetic relationship in Fritillaria spp. WL, Chang CC. The comparative of Iran inferred from ribosomal ITS and chloroplast genomic analysis of chloroplast trnL-trnF sequence data. photosynthetic orchids and developing Biochemical Systematics and Ecology. DNA markers to distinguish Phalaenopsis 2014;57:451-457. orchids. Plant Science. 2012;190:62- 82. Bi Y, Zhang MF, Xue J, Dong R, Du YP, 73. Zhang XH. Chloroplast genomic resources 91. Lin JY, Lin BY, Chang CD, Liao SC, Liu for phylogeny and DNA barcoding: A case YC, Wu WL, Chang CC. Evaluation of study on Fritillaria. Scientific Reports. chloroplast DNA markers for distinguishing 2018;8(1):1184. Phalaenopsis species. Scientia 83. Chen X, Zhou J, Cui Y, Wang Y, Duan B, Horticulturae. 2015;192:302-310. Yao H. Identification of herbs 92. Yu XQ, Drew BT, Yang JB, Gao LM, Li DZ. using the complete chloroplast genome as Comparative chloroplast genomes of a super-barcode. frontiers in pharma eleven Schima (Theaceae) species: cology. 2018;9:695-695. Insights into DNA barcoding and 84. Lee J, Chon J, Lim J, Kim EK, Nah G. phylogeny. PLOS One. 2017;12(6): Characterization of complete chloroplast e0178026. genome of Allium victorialis and Its 93. Peyachoknagul S, Mongkolsiriwatana C, Application for Barcode Markers. Plant Wannapinpong S, Huehne P, Srikulnath K. Breeding and Biotechnology. 2017;5(3): Identification of native dendrobium species 221-227. in Thailand by PCR-RFLP of rDNA-ITS 85. Dong W, Liu H, Xu C, Zuo Y, Chen Z, and chloroplast DNA. ScienceAsia. 2014; Zhou S. A chloroplast genomic strategy for 40(2):113–120. designing taxon specific DNA mini- 94. Boer HJ, Ghorbani A, Manzanilla V, barcodes: A case study on ginsengs. BMC Raclariu AC, Kreziou A, Ounjai S, genetics. 2014;15:138-138. Osathanunkul M, Gravendeel B. DNA 86. Zhou Y, Nie J, Xiao L, Hu Z, Wang B. metabarcoding of orchid-derived products Comparative chloroplast genome analysis reveals widespread illegal orchid of rhubarb botanical origins and the trade. Proceedings of the Royal development of specific identification Society B: Biological Sciences. 2017;284 markers. Molecules. 2018;23(11). (1863). 87. Curci PL, De Paola D, Danzi D, Vendramin 95. Veldman S, Kim SJ, van Andel TR, Bello GG, Sonnante G. Complete chloroplast Font M, Bone RE, Bytebier B, Chuba D, genome of the multifunctional crop globe Gravendeel B, Martos F, Mpatwa G, Ngugi artichoke and comparison with other G, Vinya R, Wightman N, Yokoya K, de Asteraceae. PLoS One. 2015;10(3): Boer HJ. Trade in Zambian edible orchids- e0120589. DNA barcoding reveals the Use of 88. Yi DK, Choi K, Joo M, Yang JC, Mustafina Unexpected Orchid Taxa for Chikanda. FU, Han JS, Son DC, Chang KS, Shin CH, Genes (Basel). 2018;9(12). ______© 2019 Vu and Le; This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Peer-review history: The peer review history for this paper can be accessed here: http://www.sdiarticle4.com/review-history/53419

12