DNA Barcoding: How It Complements Taxonomy, Molecular Phylogenetics and Population Genetics
Total Page:16
File Type:pdf, Size:1020Kb
Opinion TRENDS in Genetics Vol.23 No.4 DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics Mehrdad Hajibabaei1, Gregory A.C. Singer2, Paul D.N. Hebert1 and Donal A. Hickey3 1 Biodiversity Institute of Ontario, Department of Integrative Biology, University of Guelph, Guelph, Ontario N1G 2W1, Canada 2 Human Cancer Genetics Program, The Ohio State University, Columbus, OH 43210, USA 3 Department of Biology, Concordia University, 7141 Sherbrooke Street, Montreal, Quebec H4B 1R6, Canada DNA barcoding aims to provide an efficient method for barcoding datasets are essentially composed of short species-level identifications and, as such, will contribute DNA sequences from several individuals of a large number powerfully to taxonomic and biodiversity research. As of species (typically five to ten individuals per species, but the number of DNA barcode sequences accumulates, these numbers will increase in the future) (Figure 1). however, these data will also provide a unique ‘horizon- Here, we discuss the role of DNA barcodes in advancing tal’ genomics perspective with broad implications. For the taxonomic enterprise and its potential to provide a example, here we compare the goals and methods of contextual framework for both building phylogenies and DNA barcoding with those of molecular phylogenetics for population genetics. In particular, we argue that bar- and population genetics, and suggest that DNA barcod- code results can be of high value in aiding the selection of ing can complement current research in these areas by species for more detailed analysis, and demonstrate that providing background information that will be helpful in DNA barcoding can broaden our understanding of both the selection of taxa for further analyses. phylogenetic signal and population-level variation. Introduction The DNA barcoding workflow Because of advances in sequencing and computational Species identification through barcoding is usually technologies, DNA sequences have become the major source achieved by the retrieval of a short DNA sequence – the of new information for advancing our understanding of ‘barcode’ – from a standard part of the genome (i.e. a evolutionary and genetic relationships. The footprints of specific gene region) from the specimen under investi- comparative sequence analysis are now apparent in almost gation. The barcode sequence from each unknown speci- all areas of the biological sciences, from development to men is then compared with a library of reference barcode epidemiology [1]. However, two branches of biology have sequences derived from individuals of known identity developed the tools and applications employed to assess (Figure 2). A specimen is identified if its sequence closely biological relationships with DNA sequences: molecular matches one in the barcode library. Otherwise, the new phylogenetics, and population genetics. These disciplines record can lead to a novel barcode sequence for a given focus on different levels of organization. Studies in molecu- species (i.e. a new haplotype or geographical variant), or it lar phylogenetics typically deal with evolutionary relation- can suggest the existence of a newly encountered species ships among deeper clades, whereas those in population (see below). genetics target variation within and among populations of a Various gene regions have been employed for single species. By comparison, DNA barcoding occupies a species-level biosystematics (Table 1); however, DNA bar- middle ground as it seeks comprehensive coverage for coding advocates the adoption of a ‘global standard’, and a species, but focuses on their delineation rather than their 650-base fragment of the 50 end of the mitochondrial gene relationships (Figure 1). cytochrome c oxidase I (COI, cox1) [2] has gained desig- DNA barcoding is based on the premise that a short nation as the barcode region for animals. This fragment standardized sequence can distinguish individuals of a species because genetic variation between species exceeds that within species [2]. Pilot projects have now established Glossary the effectiveness of this approach in several large groups of Clade: a group of organisms descended from a common evolutionary animals, such as birds [3], fish [4], cowries [5], spiders [6], ancestor; a branch of the Tree of Life. and several arrays of Lepidoptera [7–9]. In addition, DNA Homoplasy: a shared character that was not inherited from a common barcoding systems are now being established for other evolutionary ancestor but, rather, that arose through convergent evolution. Phylogeny: the evolutionary history of species relationships, often visualized groups of organisms, including plants [10], macroalgae as phylogenetic tree (e.g. the Tree of Life). [11], fungi [12], protists [13] and bacteria [14]. DNA Taxon: a group of organisms. Taxonomy: the science of classification. Biological taxonomy creates a hierarchical classification of biological taxa, usually reflecting evolutionary Corresponding author: Hajibabaei, M. ([email protected]). relationships. Available online 20 February 2007. www.sciencedirect.com 0168-9525/$ – see front matter ß 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2007.02.001 168 Opinion TRENDS in Genetics Vol.23 No.4 Figure 1. DNA barcodes lie within the gray area between phylogenetics and population genetics. This diagram demonstrates the position of DNA barcode data relative to both population genetics and phylogenetics data. Each small square represents an individual. Different colors are used to represent different species and within-species variation is shown by varying shades of color. Figure 2. Major components of the Barcode of Life projects and their contribution to taxonomy, reconstruction of molecular phylogenies and population genetics investigations. This diagram shows how DNA barcoding libraries can support the conventional taxonomic workflow by high-throughput identification of unknown specimens and by helping to draw attention to new and cryptic species. Barcode sequences and collateral data for each specimen are accessible through a global online data base (e.g. BOLD: http://www.barcodinglife.org). This information can be useful in other contexts, such as phylogenetics (Tree of Life projects) and population-level studies. In addition, archival DNA and tissue specimens collected in barcoding projects provide an excellent resource for other investigations. Butterfly images are taken from the database of Daniel Janzen and Winnie Hallwachs (http://janzen.sas.upenn.edu/). www.sciencedirect.com Opinion TRENDS in Genetics Vol.23 No.4 169 Table 1. Common species-level molecular markers Genea Genomic location Number of sequences Animals Plants Protists Fungi COI-barcodeb Mitochondria 195 777 520 1931 410 16S-rDNA Mitochondria 41 381 221 2059 285 cytb Mitochondria 88 324 165 1920 1084 ITS1-rDNA Nucleus 12 175 57 693 68 839 56 675 ITS2-rDNA Nucleus 13 923 58 065 67 332 56 349 18S-rDNA Nucleus 21 063 17 121 32 290 33 327 rbcL Plastid NAc 30 663 37 328 NA aGene abbreviations: COI, cytochrome c oxidase I; cytb, cytochrome b; ITS, internal transcribed spacer; rbcL, large subunit of ribulose 1,5-bisphosphate carboxylase/ oxygenase. bCOI-barcode statistics are retrieved from Barcode of Life Data systems (http://www.barcodinglife.org). Statistics for other loci are retrieved from GenBank. cNA, not applicable. size has been selected so that a reliable sequence read can photographs and locality information, and assembling be obtained by a single sequence pass in conventional cycle- the barcode library (i.e. a 650-base segment of the COI sequencing platforms. Shorter fragments of COI have also gene) [16]. The analysis of DNA barcoding data is usually been shown to be effective for the identification of speci- performed by a clustering method, such as distance-based mens with degraded DNA, however, where a 650-base neighbor-joining (NJ) [25], and by evaluating genetic dis- sequence is not easily obtainable [15]. In addition, the tances within and between species (e.g. [8]). More complex usability and robustness of COI in a standard high- methodologies for data analysis are under development, throughput barcoding analysis have been extensively including statistical tests for species assignment [26,27], assessed [16]. and character-based clustering methods [28]. Other researchers have suggested that alternate loci might also serve as a basis for species identification. For DNA barcoding and taxonomy example, 18S rDNA has been used for the identification of Although its role in identifying specimens to a species level soil nematodes and other small organisms in an approach is an important aid for taxonomic workflow (Figure 2), known as ‘DNA taxonomy’ [17]. This approach differs from barcoding is no replacement for comprehensive taxonomic DNA barcoding in that it does not aim to link the genetic analysis. For example, when an unknown specimen does not entities recognised through sequence analysis with Lin- return a close match to existing records in the barcode naean species. As such, it is most useful for groups of library, the barcode sequence does not qualify the unknown organisms that lack detailed taxonomic systems. Alternate specimen for designation as a new species. Instead, such markers have also been used where COI sequences have specimens are flagged for thorough taxonomic analysis. not been produced robustly or are shown to be divergent When viewed in the context of the traditional taxonomic within species [18] or as further molecular