Max-Planck-Institut für Menschheitsgeschichte Max Planck Institute for the Science of Human History

Ancient DNA – methods, quality and acceptance criteria

Alexander Herbig Department of Archaeogenetics Max Planck Institute for ©the Scienceby author of Human History, Jena, Germany 03.06.2016 ESCMID Online Lecture Library Application of Molecular Diagnostics in Forensic Microbiology 2 - 3 June 2016, Leuven, Belgium Why ancient DNA? Historical Questions / Evolutionary Questions

© by author ESCMID Online Lecture Library Denisovans First Hominin discovered by Geneticists

Viel Spaß! 350K © 200Kby author ESCMID Online Lecture Library

Modern Human Neandertal Denisovan Chimpanzee Krause et al., 2010 Ancient Pathogen Genome Evolution

Medieval Pre-Columbian Mycobacterium tuberculosis (Bos et al., 2016) (Bos et al., 2014) © by author ESCMID Online Lecture Library

Mycobacterium leprae Medieval Helicobacter pylori from the Iceman (Schuenemann et al., 2013) (Maixner et al., 2016) Ancient DNA Properties and Challenges

 Low amount of endogenous DNA

 DNA highly fragmented

 Contamination by humans (clean room facilities)

 Contamination by environmental DNA © by author ESCMID Online Lecture Library Targeted Enrichment

Plant Plant

 Fraction of target organism's DNA is low Bacteria

 Enrichment of target DNA using Bacteria Array capture

 Array design has to account for Human expected diversity Fungal Pathogen© by author  Target organism might be only distantly related to known ESCMIDTargeted DNA Online Enrichment Lectureorganisms Library Targeted Enrichment Capture Arrays

Target molecules Array hybridisation Target molecules

Stoneking and Krause, 2011

 Collection of genomes of appropriate reference organisms © by author  Whole-genome alignment

ESCMID Identification of conserved Online and Lecturevariable regions Library Capture Arrays Probe Design

© by author ESCMID Online Lecture Library Capture Arrays Challenges

 More probes for variable regions needed

 Maximal number of probes limited (more than one array can be used)

 Melting temperature and other factors have to be taken into account © by author  Repetitive regions have to be excluded ESCMID Online Lecture Library Next Generation Sequencing Data Processing

Related Bacteria Pathogen

 DNA from related or unrelated bacteria are still contained © by author  Stricter filtering criteria have to be used to avoid processing of data from related species ESCMID Online Lecture Library  But we do not want to be to strict. Where is the sweet spot? Data Processing Pipeline

© by author ESCMID Online Lecture Library Data Processing Pipeline Available Software

EAGER: efficient ancient genome reconstruction Peltzer et al. 2016, Genome Biology http://it.inf.uni-tuebingen.de

Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX Schubert et al. 2014, Nature Protocols© by author http://geogenetics.ku.dk/publications/paleomix ESCMID Online Lecture Library Data Processing Read Merging

Sequencing Reads DNA Fragment

 Overlapping reads pairs due to short DNA fragments

 No standard paired-end mapping possible

 Overlap can be used to© compensate by author for sequencing errors ESCMID Merging of overlapping Online read pairs Lecture Library Data Processing Read Merging

 Construction of consensus© by sequence author in overlapping region

 Selection of nucleotides with higher sequencing quality

ESCMID Results in merged Online reads with Lecture effectively higher Library sequencing quality in overlapping region Data Processing De novo Assembly

Sequencing Reads Sequencing Reads

Contig Contig

 Determine overlap between© readsby andauthor combine them to create contigs

 Gaps occur due to low coverage ESCMID Online Lecture Library  No scaffolding possible (no 'real paired-end reads') De novo Assembly Ancient Leprosy

 De novo assembly of ancient M. leprae strain resulted in 169 contigs

 Coverage of more than 97% of the reference genome

 Most repetitive regions could © by authornot be resolved  Metagenomic assembly consists of 2354 contigs ESCMID Online Lecture Library

Schuenemann et al., 2013 Phylogenetic Analyses Ancient Tuberculosis

~1000 years old M. tuberculosis genomes from

Most closely related to strains found in seals and sea lions © by author ESCMID Online Lecture Library

Bos et al., 2014 Early History of Diseases Bonze Age

Y. pestis genomes from Bronze Age remains

Ancestral to all forms identified so far

Insights into the evolution of Y. pestis virulence and © by author the early history of plague ESCMID Online Lecture Library

Rasmussen et al., 2015 The Human Microbiome

© by author ESCMID Online Lecture Library

Ottman et al., 2012 The Human Microbiome Co-Evolution

© by author ESCMID Online Lecture Library MALT

MALT - MEGAN ALignment Tool (Herbig et al., under review) http://ab.inf.uni-tuebingen.de/software/malt

Preprocessed Reads adapter clipping etc. Spaced Seeds compensating for mismatches,© by high authorsensitivity Hash Index fast seed matching ESCMIDPrecise Alignment Online Lecture Library for in-depth downstream analyses Taxonomic Binning for metagenomic analysis The MALT Pipeline

© by author ESCMID Online Lecture Library Integration with the interactive metagenomics analysis software MEGAN (Huson et al. 2011) MEGAN

© by author ESCMID Online Lecture Library MALT Runtime Performance

Comparison to BLAST (Altschul et al. 1990) and lambda (Hauswedell et al. 2014)

© by author Runtime [s] Runtime 82% sensitivity ESCMID Online Lecture Libraryat significance level 10-5 (E-value)

Number of Reads The Tyrolean Iceman

• 5,300 year-old copper age glacier mummy

• Early farmer

• Various well preserved soft tissues

Two Tissue Samples © by author • Gingiva tissue (oral cavity)

ESCMID• Lung tissue Online Lecture Library The Tyrolean Iceman Oral Cavity vs. Lung

© by author ESCMID Online Lecture Library

Oral Cavity Lung The Tyrolean Iceman Oral Cavity vs. Lung

© by author ESCMID Online Lecture Library The Tyrolean Iceman The Iceman’s Original Microbiomes

Oral Cavity Lung Total Reads 12,486,937 8,503,240 Assigned Reads 1,098,791 1,867,705 Streptococcus 5,551 55 Streptococcus mutans 159 0 Staphylococcus 4,109 27 Staphylococcus aureus 341 0 Treponema denticola 78 0 Filifactor alocis © by author561 0 Fusobacterium nucleatum 1,941 0 Lactococcus lactis 843 0 Haemophilus 224 66 ESCMIDKlebsiella pn eOnlineumoniae Lecture172 Library375

The Tyrolean Iceman The Iceman’s Original Microbiomes

Streptococcus mutans Tooth decay

Filifactor alocis Periodontal disease

Fusobacterium nucleatum Peridontal plaque © by author Haemophilus influenzae Bacterial meningitis ESCMID Online Lecture Library Klebsiella pneumoniae Pneumonia The Tyrolean Iceman H. pylori from the Stomach

Reconstructed H. pylori genome from the Iceman’s stomach

Comparative analysis with contemporary strains

Original European population? © by author More recent African admixture? ESCMID Online Lecture Library

Maixner et al., 2016 Authentication

Is the recovered DNA of ancient Origin?

Can we differentiate Species?

Influence of Contamination? Multiple© by Infections?author ESCMID Online Lecture Library Authentication Criteria

• Ancient DNA? • DNA Damage patterns

© by author ESCMID Online Lecture Library Authentication DNA Damage Patterns

Deamination: Cytosine  Uracil  Thymine

© by author mapDamage 2.0 (Jónsson et al. 2013) ESCMID OnlinePosition from 5’ Lecture end Library CT

5’ Authentication DNA Damage Patterns

© by author ESCMID Online Lecture Library Authentication

• Ancient DNA? • DNA Damage patterns

• Correct species? • Evenness of target sequence coverage © by author ESCMID Online Lecture Library Cross-Mapping Evenness of Coverage

Distribution of reads across the reference genome

even

© by author unevenESCMID Online Lecture Library Authentication

• Ancient DNA? • DNA Damage patterns

• Correct species? • Evenness of target sequence coverage

• Multiple species? • Distributions of ©similarity by scores author ESCMID Online Lecture Library Cross-Mapping Similarity Distributions

Distributions of %identity values for aligned reads

Positive Negative

Background Foreground © by author ESCMID Online Lecture Library Authentication

• Ancient DNA? • DNA Damage patterns

• Correct species? • Evenness of target sequence coverage

• Multiple species? • Distributions of ©similarity by scores author

ESCMID• Multiple strains? Online Lecture Library • Distributions of allele frequencies Cross-Mapping Allele Frequencies

Only one strain present

Multiple strains present in equal/different proportions© by author ESCMID Online Lecture Library Cross-Mapping Allele Frequencies

Only one strain present

...with cross-mapping from other related bacteria © by author ESCMID Online Lecture Library Cross-Mapping Allele Frequencies

Default parameters

© by author

Stricter mapping and ESCMIDfiltering Online Lecture Library

Bos et al. 2014 Chan et al. 2013 Authentication Criteria

• Ancient DNA? • DNA Damage patterns

• Correct species? • Evenness of target sequence coverage

• Multiple species? • Distributions of ©similarity by scores author

ESCMID• Multiple strains? Online Lecture Library • Distributions of allele frequencies Summary

Steps to ancient DNA retrieval and analysis

• Sampling and DNA extraction (Clean Room) • Screening • Targeted (e.g. PCR-based) • Non-Targeted (e.g. MALT) • Authentication • Genome Capture © by author • Probe Design • Sequencing and Data Analysis ESCMID• Comparative OnlinePhylogenomics Lecture Library • Virulence Factors • … Acknowledgements

The Archaeogenetics Department of the MPI for SHH, Jena Johannes Krause Åshild Vågene Kirsten Bos Michal Feldman Verena Schuenemann Maria Spyrou Aida Andrades Valtueña James Fellows Yates Aditya Kumar Lankapalli © by authorFlorian Aldehoff ESCMID Online Lecture Library Daniel Huson Frank Maixner Benjamin Buchfink Albert Zink