Max-Planck-Institut für Menschheitsgeschichte Max Planck Institute for the Science of Human History
Ancient DNA – methods, quality and acceptance criteria
Alexander Herbig Department of Archaeogenetics Max Planck Institute for ©the Scienceby author of Human History, Jena, Germany 03.06.2016 ESCMID Online Lecture Library Application of Molecular Diagnostics in Forensic Microbiology 2 - 3 June 2016, Leuven, Belgium Why ancient DNA? Historical Questions / Evolutionary Questions
© by author ESCMID Online Lecture Library Denisovans First Hominin discovered by Geneticists
Viel Spaß! 350K © 200Kby author ESCMID Online Lecture Library
Modern Human Neandertal Denisovan Chimpanzee Krause et al., 2010 Ancient Pathogen Genome Evolution
Medieval Yersinia pestis Pre-Columbian Mycobacterium tuberculosis (Bos et al., 2016) (Bos et al., 2014) © by author ESCMID Online Lecture Library
Mycobacterium leprae Medieval Helicobacter pylori from the Iceman (Schuenemann et al., 2013) (Maixner et al., 2016) Ancient DNA Properties and Challenges
Low amount of endogenous DNA
DNA highly fragmented
Contamination by humans (clean room facilities)
Contamination by environmental DNA © by author ESCMID Online Lecture Library Targeted Enrichment
Plant Plant
Fraction of target organism's DNA is low Bacteria
Enrichment of target DNA using Bacteria Array capture
Array design has to account for Human expected diversity Fungal Pathogen© by author Target organism might be only distantly related to known ESCMIDTargeted DNA Online Enrichment Lectureorganisms Library Targeted Enrichment Capture Arrays
Target molecules Array hybridisation Target molecules
Stoneking and Krause, 2011
Collection of genomes of appropriate reference organisms © by author Whole-genome alignment
ESCMID Identification of conserved Online and Lecturevariable regions Library Capture Arrays Probe Design
© by author ESCMID Online Lecture Library Capture Arrays Challenges
More probes for variable regions needed
Maximal number of probes limited (more than one array can be used)
Melting temperature and other factors have to be taken into account © by author Repetitive regions have to be excluded ESCMID Online Lecture Library Next Generation Sequencing Data Processing
Related Bacteria Pathogen
DNA from related or unrelated bacteria are still contained © by author Stricter filtering criteria have to be used to avoid processing of data from related species ESCMID Online Lecture Library But we do not want to be to strict. Where is the sweet spot? Data Processing Pipeline
© by author ESCMID Online Lecture Library Data Processing Pipeline Available Software
EAGER: efficient ancient genome reconstruction Peltzer et al. 2016, Genome Biology http://it.inf.uni-tuebingen.de
Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX Schubert et al. 2014, Nature Protocols© by author http://geogenetics.ku.dk/publications/paleomix ESCMID Online Lecture Library Data Processing Read Merging
Sequencing Reads DNA Fragment
Overlapping reads pairs due to short DNA fragments
No standard paired-end mapping possible
Overlap can be used to© compensate by author for sequencing errors ESCMID Merging of overlapping Online read pairs Lecture Library Data Processing Read Merging
Construction of consensus© by sequence author in overlapping region
Selection of nucleotides with higher sequencing quality
ESCMID Results in merged Online reads with Lecture effectively higher Library sequencing quality in overlapping region Data Processing De novo Assembly
Sequencing Reads Sequencing Reads
Contig Contig
Determine overlap between© readsby andauthor combine them to create contigs
Gaps occur due to low coverage ESCMID Online Lecture Library No scaffolding possible (no 'real paired-end reads') De novo Assembly Ancient Leprosy
De novo assembly of ancient M. leprae strain resulted in 169 contigs
Coverage of more than 97% of the reference genome
Most repetitive regions could © by authornot be resolved Metagenomic assembly consists of 2354 contigs ESCMID Online Lecture Library
Schuenemann et al., 2013 Phylogenetic Analyses Ancient Tuberculosis
~1000 years old M. tuberculosis genomes from Peru
Most closely related to strains found in seals and sea lions © by author ESCMID Online Lecture Library
Bos et al., 2014 Early History of Diseases Bonze Age Plague
Y. pestis genomes from Bronze Age remains
Ancestral to all forms identified so far
Insights into the evolution of Y. pestis virulence and © by author the early history of plague ESCMID Online Lecture Library
Rasmussen et al., 2015 The Human Microbiome
© by author ESCMID Online Lecture Library
Ottman et al., 2012 The Human Microbiome Co-Evolution
© by author ESCMID Online Lecture Library MALT
MALT - MEGAN ALignment Tool (Herbig et al., under review) http://ab.inf.uni-tuebingen.de/software/malt
Preprocessed Reads adapter clipping etc. Spaced Seeds compensating for mismatches,© by high authorsensitivity Hash Index fast seed matching ESCMIDPrecise Alignment Online Lecture Library for in-depth downstream analyses Taxonomic Binning for metagenomic analysis The MALT Pipeline
© by author ESCMID Online Lecture Library Integration with the interactive metagenomics analysis software MEGAN (Huson et al. 2011) MEGAN
© by author ESCMID Online Lecture Library MALT Runtime Performance
Comparison to BLAST (Altschul et al. 1990) and lambda (Hauswedell et al. 2014)
© by author Runtime [s] Runtime 82% sensitivity ESCMID Online Lecture Libraryat significance level 10-5 (E-value)
Number of Reads The Tyrolean Iceman
• 5,300 year-old copper age glacier mummy
• Early farmer
• Various well preserved soft tissues
Two Tissue Samples © by author • Gingiva tissue (oral cavity)
ESCMID• Lung tissue Online Lecture Library The Tyrolean Iceman Oral Cavity vs. Lung
© by author ESCMID Online Lecture Library
Oral Cavity Lung The Tyrolean Iceman Oral Cavity vs. Lung
© by author ESCMID Online Lecture Library The Tyrolean Iceman The Iceman’s Original Microbiomes
Oral Cavity Lung Total Reads 12,486,937 8,503,240 Assigned Reads 1,098,791 1,867,705 Streptococcus 5,551 55 Streptococcus mutans 159 0 Staphylococcus 4,109 27 Staphylococcus aureus 341 0 Treponema denticola 78 0 Filifactor alocis © by author561 0 Fusobacterium nucleatum 1,941 0 Lactococcus lactis 843 0 Haemophilus 224 66 ESCMIDKlebsiella pn eOnlineumoniae Lecture172 Library375
The Tyrolean Iceman The Iceman’s Original Microbiomes
Streptococcus mutans Tooth decay
Filifactor alocis Periodontal disease
Fusobacterium nucleatum Peridontal plaque © by author Haemophilus influenzae Bacterial meningitis ESCMID Online Lecture Library Klebsiella pneumoniae Pneumonia The Tyrolean Iceman H. pylori from the Stomach
Reconstructed H. pylori genome from the Iceman’s stomach
Comparative analysis with contemporary strains
Original European population? © by author More recent African admixture? ESCMID Online Lecture Library
Maixner et al., 2016 Authentication
Is the recovered DNA of ancient Origin?
Can we differentiate Species?
Influence of Contamination? Multiple© by Infections?author ESCMID Online Lecture Library Authentication Criteria
• Ancient DNA? • DNA Damage patterns
© by author ESCMID Online Lecture Library Authentication DNA Damage Patterns
Deamination: Cytosine Uracil Thymine
© by author mapDamage 2.0 (Jónsson et al. 2013) ESCMID OnlinePosition from 5’ Lecture end Library CT
5’ Authentication DNA Damage Patterns
© by author ESCMID Online Lecture Library Authentication
• Ancient DNA? • DNA Damage patterns
• Correct species? • Evenness of target sequence coverage © by author ESCMID Online Lecture Library Cross-Mapping Evenness of Coverage
Distribution of reads across the reference genome
even
© by author unevenESCMID Online Lecture Library Authentication
• Ancient DNA? • DNA Damage patterns
• Correct species? • Evenness of target sequence coverage
• Multiple species? • Distributions of ©similarity by scores author ESCMID Online Lecture Library Cross-Mapping Similarity Distributions
Distributions of %identity values for aligned reads
Positive Negative
Background Foreground © by author ESCMID Online Lecture Library Authentication
• Ancient DNA? • DNA Damage patterns
• Correct species? • Evenness of target sequence coverage
• Multiple species? • Distributions of ©similarity by scores author
ESCMID• Multiple strains? Online Lecture Library • Distributions of allele frequencies Cross-Mapping Allele Frequencies
Only one strain present
Multiple strains present in equal/different proportions© by author ESCMID Online Lecture Library Cross-Mapping Allele Frequencies
Only one strain present
...with cross-mapping from other related bacteria © by author ESCMID Online Lecture Library Cross-Mapping Allele Frequencies
Default parameters
© by author
Stricter mapping and ESCMIDfiltering Online Lecture Library
Bos et al. 2014 Chan et al. 2013 Authentication Criteria
• Ancient DNA? • DNA Damage patterns
• Correct species? • Evenness of target sequence coverage
• Multiple species? • Distributions of ©similarity by scores author
ESCMID• Multiple strains? Online Lecture Library • Distributions of allele frequencies Summary
Steps to ancient DNA retrieval and analysis
• Sampling and DNA extraction (Clean Room) • Screening • Targeted (e.g. PCR-based) • Non-Targeted (e.g. MALT) • Authentication • Genome Capture © by author • Probe Design • Sequencing and Data Analysis ESCMID• Comparative OnlinePhylogenomics Lecture Library • Virulence Factors • … Acknowledgements
The Archaeogenetics Department of the MPI for SHH, Jena Johannes Krause Åshild Vågene Kirsten Bos Michal Feldman Verena Schuenemann Maria Spyrou Aida Andrades Valtueña James Fellows Yates Aditya Kumar Lankapalli © by authorFlorian Aldehoff ESCMID Online Lecture Library Daniel Huson Frank Maixner Benjamin Buchfink Albert Zink