HOPS: Automated Detection and Authentication of Pathogen DNA in Archaeological Remains

HOPS: automated detection and authentication of pathogen DNA in archaeological remains The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Hübler, Ron et al. "HOPS: automated detection and authentication of pathogen DNA in archaeological remains." Genome Biology 20 (Dec. 2019): 280 doi 10.1186/s13059-019-1903-0 ©2019 Author(s) As Published 10.1186/s13059-019-1903-0 Publisher BioMed Central Version Final published version Citable link https://hdl.handle.net/1721.1/126318 Terms of Use Creative Commons Attribution Detailed Terms https://creativecommons.org/licenses/by/4.0/ Hübler et al. Genome Biology (2019) 20:280 https://doi.org/10.1186/s13059-019-1903-0 METHOD Open Access HOPS: automated detection and authentication of pathogen DNA in archaeological remains Ron Hübler1†, Felix M. Key1,2,3*†, Christina Warinner1, Kirsten I. Bos1, Johannes Krause1 and Alexander Herbig1* Abstract High-throughput DNA sequencing enables large-scale metagenomic analyses of complex biological systems. Such analyses are not restricted to present-day samples and can also be applied to molecular data from archaeological remains. Investigations of ancient microbes can provide valuable information on past bacterial commensals and pathogens, but their molecular detection remains a challenge. Here, we present HOPS (Heuristic Operations for Pathogen Screening), an automated bacterial screening pipeline for ancient DNA sequences that provides detailed information on species identification and authenticity. HOPS is a versatile tool for high-throughput screening of DNA from archaeological material to identify candidates for genome-level analyses. Keywords: Ancient DNA, Archaeogenetics, Pathogen detection, Metagenomics, Paleopathology, Ancient bacteria, Microbial archaeology Background Genome-level investigations of ancient bacterial patho- High-throughput DNA sequencing enables large-scale gens have provided valuable information about the evo- metagenomic analyses of environmental samples and lution of Yersinia pestis [11–18], Mycobacterium leprae host tissues and provides an unprecedented understand- [19, 20], Mycobacterium tuberculosis [21, 22], pathogenic ing of life’s microbial diversity. Examples of coordinated Brucella species [23, 24], Salmonella enterica [25, 26], efforts to quantify this diversity include the Human and Helicobacter pylori [27], with others surely on the Microbiome Project [1], the Tara Ocean Project [2], and horizon. Notably, most studies to date have leveraged the Earth Microbiome Project [3]. Metagenomic data paleopathological evidence or historical context to pin- from human archaeological remains (e.g., bones, teeth, point a priori involvement of a specific bacterial patho- or dental calculus) provide a window into the individ- gen. However, the vast majority of infectious diseases do uals’ metagenomic past and are an unprecedented tem- not lead to the formation of distinct and characteristic poral dimension added to the wide landscape of bone lesions, and most remains are found in contexts microbial diversity now being explored. While many an- that lack clear associations with a particular disease. cient DNA (aDNA) studies focus on the analysis of hu- Consequently, studies of ancient pathogens must con- man endogenous DNA isolated from ancient specimens sider a long list of candidate microbes. Given the sizes [4–8], co-recovery of metagenomic aDNA permits quer- and availability of current aDNA datasets, there is clear ies that provide information related to endogenous mi- benefit for the development of an automated computa- crobial content at death, with applications ranging from tional screening tool that both detects and authenticates characterizing the natural constituents of the microbiota true pathogen genetic signals in ancient metagenomic to identifying infectious diseases [9, 10]. data. Ideally, this tool also is able to distinguish pathogens from the dominant and diverse microbial background of archaeological and other decomposed * Correspondence: [email protected]; [email protected] material, a consideration typically not required for tools †Ron Hübler and Felix M. Key contributed equally to this work. developed for clinical applications. 1Max Planck Institute for the Science of Human History, Jena, Germany Full list of author information is available at the end of the article © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Hübler et al. Genome Biology (2019) 20:280 Page 2 of 13 To save computational time and effort, most available when only trace amounts of species-specific DNA are metagenomic profiling tools focus only on individual present, and (iii) authentication of its ancient origin. No genes, such as the 16S rRNA gene used by QIIME [28], software currently exists that fulfills all requirements for or panels of marker genes, such as those used by reliable screening of metagenomic aDNA. Here, we MetaPhlAn2 [29] and MIDAS [30], that are easy to re- introduce HOPS (Heuristic Operations for Pathogen trieve and sufficiently specific. However, these genes Screening), an automated computational pipeline that make up only a small proportion of a bacterial genome screens metagenomic aDNA data for the presence of (the 16S rRNA gene, for example, accounts for only ~ bacterial pathogens and assesses their authenticity using 0.2% of a bacterial genome and is usually present in established criteria. We test HOPS on experimental and multiple copies), and if a pathogen is present at low simulated data and compare it to common metagenomic abundance compared to host and environmental DNA, profiling tools. We show that HOPS outperforms avail- these genes are likely to be missed in routine metage- able tools, is highly specific and sensitive, and can per- nomic sequencing screens. Although these tools can form taxonomic identification and authentication with have high specificity, they lack the sensitivity required as few as 50 species-derived reads present. for ancient pathogen screening from shallow but highly complex metagenomic datasets. Screening techniques Results that accommodate queries of whole genomes are of clear HOPS workflow benefit for archaeological studies since alignment to a HOPS consists of three parts (Fig. 1): (i) a modified ver- full reference genome offers greater chances for detec- sion of MALT [25, 35] that includes optional PCR dupli- tion when data for a given taxon are sparse [25]. While cate removal and optional deamination pattern tolerance some algorithms, such as Kraken [31], have been devel- at the ends of reads; (ii) the newly developed program oped to query databases that contain thousands of MaltExtract that provides statistics for the evaluation of complete reference genomes using k-mer matching, this species identification as well as aDNA authenticity cri- approach does not produce the alignment information teria for an arbitrarily extensive user-specified set of bac- necessary to further evaluate species identification accur- terial pathogens, with additional functionality to filter acy or authenticity. the aligned reads by various measures such as read In addition to taxonomic classification [32], it is also length, sequence complexity, or percent identity; and helpful to distinguish ancient bacteria from modern con- (iii) a post-processing script that provides a summary taminants as early as the initial screening [9, 10]. Genu- overview for all samples and potential bacterial patho- ine aDNA, especially pathogen bacterial DNA, is usually gens that have been identified. only present in small amounts and can be distinguished from modern DNA contamination by applying an estab- MALT lished set of authenticity criteria [9, 10], the most im- MALT (Megan Alignment Tool) [25, 35] is an alignment portant of which is the assessment of DNA damage. In and taxonomic binning tool for metagenomic data that ancient DNA, cytosine deamination accumulates over aligns DNA reads to a user-specified database of refer- time at DNA fragment termini [9, 10, 33, 34], thus lead- ence sequences. Reads are assigned to taxonomic nodes ing to a specific pattern of nucleotide misincorporation by the naïve Lowest Common Ancestor (LCA) algorithm during amplification. The evaluation of additional au- [36, 37] and are thus assigned to different taxonomic thenticity criteria such as edit distances (number of mis- ranks based on their specificity. The default version of matches between read and reference) and the MALT is intended for the analysis of metagenomic data- distribution of mapped reads across the reference are sets derived from modern DNA, and thus, it was not de- also recommended to circumvent database bias artifacts signed to accommodate the specific requirements of and to further validate taxonomic assignments [9, 10]. aDNA analyses. In particular, aDNA damage that mani- While manual evaluation of species identification and fests as misincorporated nucleotides in sequenced prod- aDNA

HOPS: Automated Detection and Authentication of Pathogen DNA in Archaeological Remains

Research on Ancient DNA in the Near East Mateusz Baca*1, Martyna Molak2 1 Center for Precolumbian Studies, University of Warsaw, Ul

CHRISTINA “TINA” WARINNER (Last Updated October 18, 2018)

The Global History of Paleopathology

Course Outline of Record Los Medanos College 2700 East Leland Road Pittsburg CA 94565 (925) 439-2181

Curriculum Vitae Johannes Krause

AIA Bulletin, Fiscal Year 2005

Ancient DNA in Physical Anthropology: a Review Jacqueline E Broida

European Meeting of the Paleopathology Association

Jochen Holger Schutkowski

EAA2021 Sessions 14 July-1.Pdf

The Fourth Genetime Workshop 'Domestication of Plants & Animals'

Osteoporosis and Paleopathology: a Review