A Multi-Gene Region Targeted Capture Approach to Detect Plant DNA in Environmental Samples
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2021.07.03.450983; this version posted July 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 A multi-gene region targeted capture approach to detect plant DNA in 2 environmental samples: A case study from coastal environments 3 Nicole R. Foster1*, Kor-jent van Dijk1, Ed Biffin2, Jennifer M. Young3, Vicki Thomson1, 4 Bronwyn M. Gillanders1, Alice Jones1,4, Michelle Waycott1,2 5 6 Abstract 7 Metabarcoding of plant DNA recovered from environmental samples, termed environmental DNA 8 (eDNA), has been used to detect invasive species, track biodiversity changes and reconstruct past 9 ecosystems. The P6 loop of the trnL intron is the most widely utilized gene region for metabarcoding 10 plants due to the short fragment length and subsequent ease of recovery from degraded DNA, which 11 is characteristic of environmental samples. However, the taxonomic resolution for this gene region is 12 limited, often precluding species level identification. Additionally, targeting gene regions using 13 universal primers can bias results as some taxa will amplify more effectively than others. To increase 14 the ability of DNA metabarcoding to better resolve flowering plant species (angiosperms) within 15 environmental samples, and reduce bias in amplification, we developed a multi-gene targeted capture 16 method that simultaneously targets 20 chloroplast gene regions in a single assay across all flowering 17 plant species. Using this approach, we effectively recovered multiple chloroplast gene regions for 18 three species within artificial DNA mixtures down to 0.001 ng/µL of DNA. We tested the detection 19 level of this approach, successfully recovering target genes for 10 flowering plant species. Finally, 20 we applied this approach to sediment samples containing unknown compositions of environmental 21 DNA and confidently detected plant species that were later verified with observation data. Targeting 22 multiple chloroplast gene regions in environmental samples enabled species-level information to be 23 recovered from complex DNA mixtures. Thus, the method developed here, confers an improved level 24 of data on community composition, which can be used to better understand flowering plant 25 assemblages in environmental samples. 26 27 1 Introduction 28 Environmental DNA (eDNA) is a rapidly growing field of research and has been applied 29 extensively to monitor site-based vegetation change over periods of hundreds to thousands of years 30 using samples from soil cores (Willerslev et al., 2003; Willerslev et al., 2014; Parducci et al., 2013; 31 Zimmermann et al., 2017; Del Carmen Gomez Cabrera et al., 2019). As plants are sedentary, they are 32 a reliable reflection of their environment at a specific point in time (Yoccoz et al., 2012), therefore, 33 reconstruction of plant communities through time can represent environmental conditions and how 34 they have changed at the sampled location. Such information can be used to predict future changes, 35 inform management (Fordham et al., 2016; Balint et al., 2018; Brown & Blois, 2001) and provide 36 insight to uncover species extinctions and/or introductions, elucidate past climate conditions and 37 assess ecosystem trajectories (Thomsen & Willerslev, 2015; Ruppert, Kline & Rahman, 2019). In 38 addition, changes in plant community composition can shed light on historical human impacts and 39 agricultural practices (Giguet-Covex et al., 2014; Pansu et al., 2015). 40 The ability to take an environmental sample and accurately determine the plant community 41 composition contained within, as remnant fragments of tissue or DNA, is most commonly achieved 42 through the process of DNA metabarcoding. Metabarcoding involves Polymerase Chain Reaction bioRxiv preprint doi: https://doi.org/10.1101/2021.07.03.450983; this version posted July 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 43 (PCR) amplification of a short, but highly variable, gene region that is flanked by conserved regions. 44 The design of PCR amplification primers target these conserved regions, which allows amplification 45 of the variable region across all plant taxa present in an environmental sample (Murchie et al., 2020). 46 The variable region that is often used to recover plant DNA in environmental samples is the P6 loop 47 of the chloroplast trnL (UAA) intron (Taberlet et al., 2006). This gene region was adopted due to 48 being short enough to amplify DNA in environmental samples (10–143 bp), whilst still possessing 49 enough discriminatory information to distinguish many groups of plant taxa. Unfortunately, reliance 50 on a single, short, gene region creates limitations to generating accurate results, as species can only 51 be detected if this particular region remains present in the genomic DNA extracted, and if the primer 52 binding sites are intact to allow successful amplification. Additionally, targeting only one gene 53 region using a single pair of universal primers creates bias in the results as primers may preferentially 54 bind to certain taxa, creating an unreliable representation of the vegetation community (Pedersen et 55 al., 2015). Further, recovering the short section of the trnL gene region does not ensure 56 taxonomically high-resolution results, given the lowered discrimination ability of this region due to 57 the short length. Other commonly used plant barcodes such as rbcL and matK, which have a greater 58 discriminatory ability than trnL (Hollingsworth et al., 2011), may provide increased species-level 59 resolution. However, these regions are much longer and are difficult to recover from degraded 60 samples (Fahner et al., 2016), in addition to being subject to the same limitations of relying on a 61 single gene region. These shortcomings highlight that a more effective approach is needed to fully 62 utilise plant DNA from environmental samples and obtain detailed information on plant community 63 composition. 64 Targeted capture (also referred to as hybridisation capture) offers an alternative way to 65 overcome the limitations of metabarcoding and bypass the issues associated with PCR amplification 66 (Lemmon & Lemmon, 2013). Targeted capture uses biotinylated RNA molecules called ‘baits’ that 67 are specifically designed to bind to target DNA regions which are then separated from non-target 68 sequences using a magnet These baits eliminate the need for universal primers and, compared to 69 methods such as genome skimming or shotgun sequencing, the targeted nature enhances recovery of 70 organisms of interest within a DNA mixture. This approach reduces overall sequencing costs and 71 increases the amount of information that can be recovered compared to metabarcoding (Foster et al., 72 2020). Targeted capture approaches have been shown to recover more taxa from environmental 73 samples compared to metabarcoding for a range of different organisms (Dowle et al., 2016; Shokralla 74 et al., 2016; Murchie et al., 2020), and given the potential to recover a large amount of sequence data 75 from multiple gene regions, this approach is also likely to improve taxonomic assignment of 76 sequences from various plant communities. 77 Here, we employed a targeted capture approach to characterise flowering plant communities 78 in environmental samples, using a bait set designed to simultaneously capture 20 chloroplast regions 79 across all flowering plants (angiosperms). By incorporating multiple gene regions, we aimed to 80 accurately reconstruct plant communities in environmental samples to species level identification, 81 removing the reliance on a single, short barcode. The bait set utilised in this study has not previously 82 been applied to environmental samples and it is rare that studies are undertaken targeting multiple 83 gene regions in a single assay from complex DNA mixtures (one that we know of; Murchie et al., 84 2020) and never with this many gene regions. We conducted sensitivity and discriminatory power 85 tests on artificial DNA mixtures to assess the threshold of detection and the level of taxonomic 86 resolution offered by the approach before analysing soil samples from a study site where we could 87 verify results using observations from the area. 88 2 Methods 2 bioRxiv preprint doi: https://doi.org/10.1101/2021.07.03.450983; this version posted July 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 89 Three trials were conducted to determine the sensitivity, discriminatory power and 90 demonstrate a proof-of-concept for the multi-gene region targeted capture of chloroplast DNA 91 proposed in this study. Each test consisted of a separate experimental setup (see section 2.1) and data 92 analysis (section 2.5) but the same library preparation (section 2.2), multi-gene region bait capture 93 (section 2.3) and read processing and mapping (section 2.4) were conducted for all three trials 94 (Figure 1). Control samples (blanks) were run with each trial sample