Minimizing Polymerase Biases in Metabarcoding
Total Page:16
File Type:pdf, Size:1020Kb
Received: 27 November 2017 | Revised: 12 February 2018 | Accepted: 1 March 2018 DOI: 10.1111/1755-0998.12895 FROM THE COVER Minimizing polymerase biases in metabarcoding Ruth V. Nichols1 | Christopher Vollmers2 | Lee A. Newsom3 | Yue Wang4 | Peter D. Heintzman1,5 | McKenna Leighton2 | Richard E. Green2 | Beth Shapiro1 1Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Abstract Santa Cruz, California DNA metabarcoding is an increasingly popular method to characterize and quantify 2Department of Biomolecular Engineering, biodiversity in environmental samples. Metabarcoding approaches simultaneously University of California Santa Cruz, Santa Cruz, California amplify a short, variable genomic region, or “barcode,” from a broad taxonomic 3Department of Social Sciences, Flagler group via the polymerase chain reaction (PCR), using universal primers that anneal College, St. Augustine, Florida to flanking conserved regions. Results of these experiments are reported as occur- 4Department of Geography, University of Wisconsin-Madison, Madison, Wisconsin rence data, which provide a list of taxa amplified from the sample, or relative abun- 5Tromsø University Museum, UiT – The dance data, which measure the relative contribution of each taxon to the overall Arctic University of Norway, Tromsø, Norway composition of amplified product. The accuracy of both occurrence and relative abundance estimates can be affected by a variety of biological and technical biases. Correspondence Ruth V. Nichols and Beth Shapiro, For example, taxa with larger biomass may be better represented in environmental Department of Ecology and Evolutionary samples than those with smaller biomass. Here, we explore how polymerase choice, Biology, University of California Santa Cruz, Santa Cruz, CA. a potential source of technical bias, might influence results in metabarcoding experi- E-mails: [email protected]; ments. We compared potential biases of six commercially available polymerases [email protected] using a combination of mixtures of amplifiable synthetic sequences and real sedi- Funding information mentary DNA extracts. We find that polymerase choice can affect both occurrence University of California Office of the President, Grant/Award Number: and relative abundance estimates and that the main source of this bias appears to 20160713SC; National Science Foundation, be polymerase preference for sequences with specific GC contents. We further Grant/Award Number: ARC-1203990; Gordon and Betty Moore Foundation, recommend an experimental approach for metabarcoding based on results of our Grant/Award Number: GBMF-3804 synthetic experiments. KEYWORDS bias, eDNA, environmental DNA, metabarcoding, soil, trnL P6 loop 1 | INTRODUCTION (Haile et al., 2009; Jerde, Mahon, Chadderton, & Lodge, 2011; Ped- ersen et al., 2016; Rees, Baker, Gardner, Maddison, & Gough, 2017). Metabarcoding, which is erroneously described as barcoding or Additionally, metabarcoding has been used to calculate differences in metagenomics in some literature, is the technique in which a univer- haplotype or allele frequency between populations of the same spe- sal primer pair is used to amplify multiple templates from a mixture cies (Sigsgaard et al., 2016) and to link changes in community com- of many different taxa or haplotypes. Metabarcoding is often used in position over time to climatic shifts (Haile et al., 2007; Willerslev conjunction with environmental DNA (eDNA), or DNA that is col- et al., 2003, 2007, 2014). These latter examples analyse both the lected from environmental sources such as water, sediment, air and occurrence and relative abundance of each unique sequence in the faeces (Deiner et al., 2017). Metabarcoding is an increasingly popular amplification product, where abundance is estimated as the propor- tool in ecological and palaeoecological research, mainly due to its tion of the total number of sequences generated matching each simplicity and low cost. eDNA can be used, for example, to charac- taxon or haplotype. terize biodiversity of a particular taxonomic group (Ushio et al., While metabarcoding is a promising approach to characterize 2017) or to estimate the ranges of rare, extinct or cryptic species biodiversity both quickly and inexpensively, few studies have | Mol Ecol Resour. 2018;1–13. wileyonlinelibrary.com/journal/men © 2018 John Wiley & Sons Ltd 1 2 | NICHOLS ET AL. validated the method experimentally by, for example, testing the number of target loci may vary between taxa and tissue type. extent to which the true community or population is reconstructed. Chloroplast DNA, for example, is a common target for metabarcod- It is generally accepted that taxon occurrence can be inferred via ing, but can differ in copy number between taxa, individuals and cell metabarcoding, provided that a sufficient number of PCR replicates tissue types within the same plant (Morley & Nielsen, 2016). Tapho- —amplifying DNA multiple times from the same soil extract using nomic factors may also influence DNA preservation, for example by the same amplification conditions—are performed (Pinol,~ Mir, affecting the rate of degradation. Lignified structures in plants may Gomez-Polo, & Agustı, 2015; Shaw et al., 2016) and false positives slow the rate of DNA degradation (Yoccoz et al., 2012), as may have been accounted for (Lahoz-Monfort, Guillera-Arroita, & Tingley, anoxic environments (Corinaldesi, Barucca, Luna, & Dell’Anno, 2011). 2016). The first eDNA metabarcoding studies used replication In some environments, soil leaching and postdepositional mixing may (Cooper & Poinar, 2000), where DNA extraction and amplification move DNA up or down sediment columns or horizontally over space were both replicated, to help confirm their results (Willerslev et al., (Andersen et al., 2012; Anderson-Carpenter et al., 2011; Pedersen 2003), but many subsequent studies did not replicate experiments et al., 2015; Rawlence et al., 2014). (Soininen et al., 2009; Sønstebø et al., 2010; Valentini et al., 2009). Technical biases can be introduced during DNA extraction and After a detailed exploration of the utility of replication in metabar- PCR amplification. DNA extraction protocols can be more or less coding (Darling & Mahon, 2011), the use of replication increased, optimized for soil chemistry, which can influence the extent to which but the number of replicates performed per experiment varied DNA is recovered (Zielinska et al., 2017). Soils rich in clays or humic widely. Most studies used between two and five PCR replicates per acids may bind DNA, for example, reducing DNA recovery (Direito, sample (Andersen et al., 2012; De Barba et al., 2014; Jørgensen Marees, & Roling,€ 2012). PCR is a highly stochastic process, which is et al., 2012; Willerslev et al., 2014) and some as many as eight further complicated by the presence of variable templates, with (Giguet-Covex et al., 2014). Recently, the use of site occupancy many opportunities for the introduction of bias (Aird et al., 2011; models has been proposed as a tool to estimate how many replicates Pinto & Raskin, 2012; Polz & Cavanaugh, 1998; Suzuki & Giovan- are needed; with most recommendations ranging from six to 12 noni, 1996). Although the universal primers used in metabarcoding replicates per sample (Ficetola et al., 2015; Lahoz-Monfort et al., are designed to anneal to conserved genomic regions, slight variation 2016; Schmidt, Kery, Ursenbacher, Hyman, & Collins, 2013), depend- in binding site sequences may affect primer binding efficiency, ing on the number and abundance of rare taxa. Another approach to resulting in bias (Elbrecht & Leese, 2015; Pinol~ et al. 2015). For estimate the amount of replication required is rarefaction, whereby example, Fahner, Shokralla, Baird, and Hajibabaei (2016) used four the number of new taxa identified per replicate PCR is used to esti- plant-specific primers to infer community composition from the same mate the probability that most rare taxa have been recovered (Hsieh, soil samples and found that each primer pair produced a different Ma, & Chao, 2016; Sanders, 1968). result. This result may also be related to amplicon length whereby Whether relative abundance can be estimated accurately from shorter amplicons amplify more readily than longer amplicons. Tem- metabarcoding data is a more contentious issue. Some researchers plate secondary structures can also bias PCR when molecules with routinely interpret the relative abundance of sequences post-PCR as secondary structures bind to themselves and inhibit their own ampli- indicative of real relative biomass estimates (Kowalczyk, Taberlet, fication. In addition, templates with suboptimal GC contents can be Kaminski, & Wojcik, 2011; Niemeyer, Epp, Stoof- Leichsenring, Pes- disfavoured during amplification, although some polymerases are tryakova, & Herzschuh, 2017; Willerslev et al., 2014). Others argue known to have reduced GC bias and additives such as dimethyl against this approach, citing challenges that include differential DNA sulphoxide (DMSO) for GC-rich templates or betaine for AT-rich degradation, different primer binding efficiencies and sequencing templates can reduce this bias (Baskaran et al., 1996; van Dijk, errors as confounding factors that might influence the utility of rela- Jaszczyszyn, & Thermes, 2014; Kozarewa et al., 2009). Finally, the tive abundance data collected from metabarcoding loci (Deagle, Tho- number of PCR cycles has also been shown to influence results: mas, Shaffer, Trites, & Jarman, 2013;