Genome Skimming Herbarium Specimens for DNA Barcoding and Phylogenomics Chun‑Xia Zeng1†, Peter M
Total Page:16
File Type:pdf, Size:1020Kb
Zeng et al. Plant Methods (2018) 14:43 https://doi.org/10.1186/s13007-018-0300-0 Plant Methods METHODOLOGY Open Access Genome skimming herbarium specimens for DNA barcoding and phylogenomics Chun‑Xia Zeng1†, Peter M. Hollingsworth2†, Jing Yang1, Zheng‑Shan He1, Zhi‑Rong Zhang1, De‑Zhu Li1* and Jun‑Bo Yang1* Abstract Background: The world’s herbaria contain millions of specimens, collected and named by thousands of researchers, over hundreds of years. However, this treasure has remained largely inaccessible to genetic studies, because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today’s next-generation sequencing world, opportunities and prospects for historical DNA have changed dramati‑ cally, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Results: As a practical test of routine recovery of rDNA and plastid genome sequences from herbarium specimens, we sequenced 25 herbarium specimens up to 80 years old from 16 diferent Angiosperm families. Paired-end reads were generated, yielding successful plastid genome assemblies for 23 species and nuclear rDNAs for 24 species, respectively. These data showed that genome skimming can be used to generate genomic information from herbar‑ ium specimens as old as 80 years and using as little as 500 pg of degraded starting DNA. Conclusions: The routine plastome sequencing from herbarium specimens is feasible and cost-efective (com‑ pare with Sanger sequencing or plastome-enrichment approaches), and can be performed with limited sample destruction. Keywords: Degraded DNA, Herbarium specimens, Genome skimming, Plastid genome, rDNA, DNA barcoding Background species (DNA barcoding) and increasing knowledge of Herbaria are collections of preserved plant specimens phylogenetic relationships. stored for scientifc study. Tere are approximately 3400 Te ‘unlocking’ of preserved natural history specimens herbaria in the world, containing around 350 million for DNA barcoding/species discrimination is of particu- specimens, collected over the past 400 years (http://sciwe lar relevance. In the frst decade of DNA barcoding, it b.nybg.org/science2/indexHerbariorum.asp). Tese col- became clear that obtaining material from expertly veri- lections cover most of the world’s plant species, including fed is a key rate-limiting step in the construction of a many rare and endangered local endemics, and species global DNA reference library [2]. Te millions of samples collected from places that are currently expensive or dif- that are required for this endeavor, each needing cor- fcult to access [1]. Te recovery of DNA from this vast responding voucher specimens and meta-data, create resource of already collected expertly-verifed herbarium a strong impetus for making best-use of previously col- specimens represent a highly efcient way of building a lected material. DNA-based identifcation resource of the world’s plant DNA degradation in herbarium samples followed by subsequent difusion from the sample creates challenges for DNA recovery [3]. In addition, diferent preserva- *Correspondence: [email protected]; [email protected] tion methods can negatively afect the ability of extract, †Chun-Xia Zeng and Peter M. Hollingsworth contributed equally to this work amplify and sequence DNA [4–6]. PCR amplifcation 1 Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese of historical DNA is, therefore, generally restricted to Academy of Sciences, Kunming 650201, Yunnan, China short amplicons (< 200 bp) and is further vulnerable Full list of author information is available at the end of the article © The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Zeng et al. Plant Methods (2018) 14:43 Page 2 of 14 to contamination by recent DNA and PCR products input DNA was available (i.e. below 10 ng in Kanda et al. from the study species. Te cumulative damage to the [20]; below 5 ng in Blaimer et al. [21]). To better under- DNA can also cause incorrect bases to be inserted dur- stand ideal approaches of sample preparation for speci- ing enzymatic amplifcation. Te main sources for these mens with minimal DNA, we intentionally limited DNA alterations are single nucleotide misincorporations [7, input to 500 pg per specimen. 8]. Above all, PCR-based Sanger sequencing by using In this paper we provide a further practical test of the herbarium samples to generate standard DNA bar- genome skimming methodology applied to herbarium codes can be challenging. A recent large-scale study by specimens. As part of the China Barcode of Life pro- Kuzmina et al. 2017 [9] examined 20,816 specimens rep- ject, and our wider phylogenomic studies, our aim was resenting 5076 of 5190 vascular plant species in Canada. to assess whether the success reported in these early Kuzmina et al. found that specimen age and method of genome skimming studies could be repeated in other preservation had signifcant efects on sequence recov- laboratories. ery for all barcode markers. However, massively-parallel We evaluated the success and failure rates of rDNA and short-read Next-generation sequencing (NGS) protocols plastid genome sequencing from genome skims of 25 dif- have the potential to greatly increase the success of her- ferent species from herbarium specimens, and explored barium sequencing projects, as many new sequencing the impacts of parameters such as amount of input DNA approaches do not rely on large, intact DNA templates and PCR cycle numbers. and instead are well-suited for sequencing low concen- trations of short (100-400 bp) fragmented molecules [3, Methods 10]. Specimen sampling Straub et al. [11], described how “genome skimming”, 25 herbarium specimens were selected from 16 Angio- involving a shallow-pass genome sequence using NGS, sperm families covering 22 genera, with specimen ages could recover highly repetitive genome regions such up to 80 years old. All 25 species were taken from the as rDNA or organelle genomes, and yield highly use- specimens housed in the Herbarium of the Institute of ful sequence data at relatively low sequence depth, and Botany, Chinese Academy of Sciences (KUN). Te sam- these regions include the usual suite of DNA barcoding ples were selected to represent the major clades of APG markers [12, 13]. Te genome skimming approach using III system (Table 1). NGS has been used to recover plastid DNA and rDNA sequences from 146 herbarium specimens [14], to pro- DNA extraction duce the entire nuclear genome of a 43-year-old Arabi- Approximately 1 cm2 sections of leaf or 20 mg of leaf dopsis thaliana herbarium specimen [15], the complete tissue were used for each DNA extraction. Genomic plastome, the mitogenome, nuclear ribosomal DNA clus- DNA was extracted using Tiangen DNAsecure Plant ters, and partial sequences of low-copy genes from an Kit (DP320). Yield and integrity (size distribution) of herbarium specimen of an extinct species of Hesperelaea genomic DNA extracts were quantifed by fuorometric [16, 17], and the complete plastome, nuclear ribosomal quantifcation on the Qubit (Invitrogen, Carlsbad, Cali- DNA clusters, and partial sequences of low-copy genes fornia, USA) using the dsDNA HS kit, as well as by visual from three grass herbarium specimens [18]. assessment on a 1% agarose gel. However, sequencing small, historical specimens may be especially challenging if a specimens is unique, or Library preparation nearly so, with no alternative specimens available for All samples were subsequently built into blunt-end DNA study should the frst specimen fail. Methods used to libraries in the laboratories using the NEBNext Ultra II extract and prepare DNA for sequencing must both be DNA library Prep kit for Illumina (New England BIo- more or less guaranteed to work, and, in many cases, labs) which has been optimized for as little as 5 ng start- allow for preservation of DNA for future study [19]. In ing DNA and Illumina-specifc adapters [22]. Te library recent studies that report successfully sequencing of his- protocol was performed as per the manufacturer’s torical specimens from 1 ng to 1 μg of input DNA (for instructions with four modifcations: (i) 500 pg of input example, up to 1 μg in Bakker et al. [14]; ∽ 600 ng in DNA was selected to accommodate low starting DNA Staats et al. [15]; 33 ng in Zadane et al. [17]; 8.25–537 ng quantities, (ii) DNA was not fragmented by sonication in Kanda et al. [20]; 5.8–200 ng in Blaimer et al. [21]; because the DNA was highly degraded; (iii) Te NEB- less than 10 ng in Besnard et al. [18]; 1–10 ng in Sproul Next library was generated without any size selection; (iv) and Maddison [19]). But a number of studies also report DNA libraries were then amplifed in an indexing PCR, abandoning a subset of specimens for which too little which barcoded each library and discriminated each Zeng et al. Plant Methods (2018) 14:43 Page 3 of 14 Table 1 List of the specimen materials, DNA yields used in our study Sample