Introduction to DNA Metabarcoding
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to DNA metabarcoding Pierre Taberlet Laboratoire d'Ecologie Alpine, CNRS UMR 5553 Université Grenoble Alpes, Grenoble, France Porto, 1-5 May 2017 Introduction to DNA metabarcoding • Definitions • Technical context • Which marker for DNA metabarcoding? • The importance of bioinformatics • Key studies – For diet analysis – For current biodiversity surveys – For reconstructing past ecosystems • The future Introduction to DNA metabarcoding • Definitions • Technical context • Which marker for DNA metabarcoding? • The importance of bioinformatics • Key studies – For diet analysis – For current biodiversity surveys – For reconstructing past ecosystems • The future Environmental DNA • First reference in 1987 • Microbiology: from 2000 • Plants and animals: from 2003 • Environmental DNA: DNA that can be extracted from environmental samples (such as soil, water, or air), without first isolating any target organisms • Complex mixture of genomic DNA from many different organisms, possibly degraded • Contains intracellular and extracellular DNA Overview of the emergence of eDNA studies Taxonomic identification from environmental DNA: terminology metabarcoding Suggested terminology DNA DNA barcoding metabarcoding identification level DNA DNA eDNA barcoding metabarcoding metabarcoding (sensu lato) (sensu lato) species level type of markers standardized barcodes DNA DNA eDNA barcoding metabarcoding metabarcoding (sensu lato) (sensu lato) genus, family, or order level other markers complexity of DNA extract single multiple environmental specimen specimens sample (bulk sample) (air, water, soil, feces) Taberlet et al. (2012) Molecular Ecology, 21, 1789-1793. metabarcoding Web of Science, 28 April 2017 DNA metabarcoding The metabarcoding approach: bioinformatics, field, bench, bioinformatics • In silico analysis: design and test the most efficient metabarcodes for the target group • Sampling in the field to obtain a DNA extract representative of the local biodiversity • DNA amplification and sequencing • Sequence analysis and taxa identification – OBITools (metabarcoding.org/obitools) – Problem of amplification/sequencing errors DNA metabarcoding • Sampling in the field (soil, water, feces, etc.) • DNA extraction • DNA amplification with barcode primers • Sequencing of the PCR products on next generation sequencers • Identification of taxa using a reference database (or identification of MOTUs) DNA metabarcoding is not DNA barcoding • Same objective to identify taxa, but ... • Different methodology: metabarcoding relies on high throughput systems for high throughput taxon identification • Not the same constraints when working with environmental DNA -> different markers might be used The main steps of an eDNA study, showing the three possible approaches: single- species identification, metabarcoding, and metagenomics Introduction to DNA metabarcoding • Definitions • Technical context • Which marker for DNA metabarcoding? • The importance of bioinformatics • Key studies – For diet analysis – For current biodiversity surveys – For reconstructing past ecosystems • The future DNA sequencing • 2005: Capillary electrophoresis – 500-1000 bp per sequencing reaction – 12 x 96 reactions per day (≈ 1 Mb per day) • 2016: Next generation sequencers – Roche 454: ≈ 0.8 Gb per day – HiSeq 4000: ≈ 400 Gb per day = 400'000 times increase of sequencing capacity in 10 years Traditional versus next generation sequencing tradi3onal sequencing next generaon sequencing sampling and DNA extrac3on DNA amplificaon sequencing bioinformacs results ACGCTA ACGTTA ACGTTA ACGTTG ACATTA ACGCTA ACGTTA ACGTTA ACGTTG ACATTA 454 GS FLXTM • Company: Roche Diagnostic® • Website: www. 454.com • Fragment length: 700-800 bases • Number of reads per run: 1 106 • Total output per run: 0.7-0.8 Gb per run • Time per run: 23 hours Ion Torrent • Company: Life Technologies • Website: www.iontorrent.com • Fragment length: 100, 200, 400 bases • Number of reads per run: 0.1, 1, 8 106 • Time per run: 2 hours HiSeq 4000 • Company: Illumina® • Website: www.illumina.com • Fragment length: 150 bases (2x150 paired-ends) • Number of reads per run: 8.6-10 billions • Total output per run: 1.3-1.5 Tb • Time per run: 3.5 days An idea of the HiSeq 4000 production per run • 10 billions of reads of 150 bp • 6 lines per read • 55 lines per page (font 11) • 1 090 909 091 pages • 324 000 km long • 122.4 km high • more than 5,000 tons of paper MiSeq • Company: Illumina® • Website: www.illumina.com • Fragment length: 300 bases (2x300 paired-ends) • Number of reads per run: 2x25 106 • Total output per run: 14 Gb • Time per run: 27 hours MiniSeq • Company: Illumina® • Website: www.illumina.com • Fragment length: 150 bases (2x150 paired-ends) • Number of reads per run: 2x25 106 • Total output per run: 7.5 Gb • Time per run: 24 hours MinION • Company: Oxford Nanopore Technologies Ltd • Website: www.nanoporetech.com • Fragment length: ultra long reads (up to 300 kb) • Total output: 6 Gb per day Introduction to DNA metabarcoding • Definitions • Technical context • Which marker for DNA metabarcoding? • The importance of bioinformatics • Key studies – For diet analysis – For current biodiversity surveys – For reconstructing past ecosystems • The future Standard barcodes: COI, rbcL, matK • Advantages – Standard reference libraries can be used – High taxonomic resolution • Drawbacks – Primers from standard barcodes are designed on protein-coding genes, and cannot be highly conserved (the third nucleotide of each codon is variable) – Too long for using with degraded environmental DNA New barcodes for analyzing environmental DNA • Very short marker (usually less than 100 bp) • Highly conserved primers to equally amplify the different target sequences • Problem of the taxonomic resolution when using very short barcodes The mirage of standard "minibarcodes" • Hajibabaei M, Smith MA, Janzen DH, Rodriguez JJ, Whitfield JB, Hebert PDN (2006) A minimalist barcode can identify a specimen whose DNA is degraded. Molecular Ecology Notes, 6, 959-964. • Meusnier I, Singer GAC, Landry JF, Hickey DA, Hebert PDN, Hajibabaei M (2008) A universal DNA mini-barcode for biodiversity analysis. BMC Genomics, 9, 214. • Hajibabaei M, Shokralla S, Zhou X, Singer GAC, Baird DJ (2011) Environmental barcoding: a next-generation sequencing approach for biomonitoring applications using river benthos. PLoS ONE, 6, e17497. • Hajibabaei M, Spall JL, Shokralla S, van Konynenburg S (2012) Assessing biodiversity of a freshwater benthic macroinvertebrate community through non-destructive environmental barcoding of DNA from preservative ethanol. BMC Ecology, 12, 28. The mirage of standard "minibarcodes" COI Metazoameusnier 7 errors 8 6 4 Reverse errors Reverse 2 Reverse errors 0 0 2 4 6 8 ForwardForward errors errors Meusnier et al. (2008) BMC Genomics, 9, 214. The mirage of standard "minibarcodes" COI Metazoameusnier 7 errors 18S Eukaryota18s_hardy 5 errors 8 5 6 4 3 4 2 Reverse errors Reverse Reverse errors Reverse 2 1 Reverse errors Reverse errors 0 0 0 2 4 6 8 0 1 2 3 4 5 ForwardForward errors errors ForwardForward errors errors Meusnier et al. (2008) Hardy et al. (2010) Molecular BMC Genomics, 9, 214. Ecology, 19, 197-212. The mirage of standard "minibarcodes" COI Arthropodazbj 7 errors 16S Insectains 5 errors 8 5 6 4 3 4 2 Reverse errors Reverse Reverse errors Reverse 2 1 Reverse errors Reverseerrors 0 0 0 2 4 6 8 0 1 2 3 4 5 ForwardForward errors errors ForwardForward errors errors Zeale et al. (2011) Molecular Ecology Resources, 11, 236-244. Unpublished The chloroplast trnL(UAA) intron The g/h primers target highly conserved regions Spermatophytagh 3 errors 3.0 2.5 cpDNA trnL (UAA) intron 2.0 1.5 Reverse errors Reverse 1.0 Reverse errors 0.5 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ForwardForward errors The ideal metabarcode • The primer targets must Target taxonomicgh 3 errors group be perfectly conserved (no mismatch at all) 3.0 • Must be short, but highly 2.5 informative • Must amplify all the 2.0 target taxonomic group, 1.5 but not the other groups errors Reverse 1.0 • The reference database Reverse errors must be comprehensive 0.5 • Unfortunately, such ideal 0.0 marker does not exist 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ForwardForward errors Introduction to DNA metabarcoding • Definitions • Technical context • Which marker for DNA metabarcoding? • The importance of bioinformatics • Key studies – For diet analysis – For current biodiversity surveys – For reconstructing past ecosystems • The future Bioinformatic tools for designing new markers for DNA metabarcoding http://metabarcoding.org/obitools (NAR : RIAZ et al 2011) http://metabarcoding.org/obitools Look for conserved regions that flank variable regions many whole genome sequences [email protected] ecoPCR: fully integrated taxonomy http://metabarcoding.org/obitools ecoPCR -d ebpln96 -l50 -L150 -k -e3 GGGCAATCCTGAGCCAA CCATTGAGTCTCTGCACCTATC # # ecoPCR version 0.1 # direct strand oligo1 : GGGCAATCCTGAGCCAA ; oligo2c : GATAGGTGCAGAGACTCAATGG # reverse strand oligo2 : CCATTGAGTCTCTGCACCTATC ; oligo1c : TTGGCTCAGGATTGCCC # max error count by oligonucleotide : 3 # database : arctic_01_02_2008 # amplifiat length between [5,300] bp # output in kingdom mode # 0240g | 495 | 10000726 | subspecies | 282718 | Achillea alpina | 13328 | Achillea | 4210 | Asteraceae | 33090 | Viridiplantae | D | GGGCAATCCTGAGCCAA | 0 | CCATCGAGTCTCTGCACCTATC | 1 | 90 | ATCACGTTTTCCGAAAACAAACAAAGGTTCAGAAAGCGAAAAGAAAAAAAA | 1043o | 496 | 10000724 | subspecies | 282718 | Achillea alpina | 13328 | Achillea | 4210 | Asteraceae |