Addressing Challenges of Ancient DNA Sequence Data Obtained with Next Generation Methods

DISSERTATION Addressing Challenges of Ancient DNA Sequence Data Obtained with Next Generation Methods submitted in fulfillment of the requirements for the degree Doctorate of natural science doctor rerum naturalium at the Faculty of Biology Johannes Gutenberg University Mainz by Christian Sell born 12.12.1980 in Merzig, Germany Mainz, 26.04.2017 Abstract This thesis addresses challenges in the bioinformatic analysis of palaeogenomes that were generated by Next Generation Sequencing of highly degrade ancient DNA from archaeological skeletal remains. It establishes a pipeline that incorporates a correction for postmortem damage as well as sequencing errors, to facilitate the comparison with sequence data from modern specimen. By applying the pipeline to published ancient genomes from the Aegean Neolithic and by comparing the results to data from the 1000 Genomes project, it could be shown that an excess of Cytosine to Thymine transitions linked to deaminations during the postmortem degradation of the DNA, can be reverted by bioinformatic processing. In another attempt to address the complexity and scarcity of DNA from prehistorical specimen, an in-solution hybridization enrichment was designed. This method can counteract the relatively low endogenous DNA content in samples from prehistoric human skeletal remains by selectively enriching specifically designed regions. The developed capture array was analyzed in 21 skeletal human remains from a Bronze Age battlefield, resulting in an average read depth of 1.71x over the whole genome. The statistical analysis of data produced by this approach enables genomic inferences similar to those based on full genomes. Third the thesis addresses the false assignments of individual bar-code-indices to sequence samples. In a data set comprising 38 capture enriched mitochondrial genomes from prehistoric human remains, it could be shown that this sequencing error can mimic a cross contamination event during lab work. By identifying and removing affected reads, false positive variants could be reduced from ∼38% to 0%. i Contents Abstract . i Acknowledgments . ii 1 Introduction . 1 1.1 aDNA characteristics . 1 1.2 Methods introduction . 2 1.2.1 Sequencing library preparation . 2 1.2.2 Capture enrichment approach . 3 1.2.3 Illumina HiSeq sequencing . 3 1.3 Motivation . 6 2 Pipeline . 7 2.1 Methods . 8 2.1.1 Raw data processing . 8 2.1.2 Alignment generation and processing . 9 2.1.3 Alignment refinement . 10 2.1.4 Detecting variants . 11 2.1.5 Contamination assessment and/or authenticity of aDNA . 12 2.1.6 Method testing . 14 2.1.7 Sample description . 15 2.2 Results . 21 2.2.1 Filtering results . 21 2.2.2 Results of recalibration methods . 23 2.3 Discussion . 28 2.3.1 Modern data set . 28 2.3.2 Filtering and recalibration . 28 2.3.3 General pipeline . 30 2.4 Conclusion . 32 3 Nuclear capture enrichment approach . 33 3.1 The motivation for the nuclear capture . 33 3.2 Methods . 34 3.2.1 Selection of conservative neutral regions . 34 3.2.2 Workflow relaxed neutral regions . 34 3.2.3 Bait design . 35 3.2.4 PCA . 35 3.3 Results . 37 3.3.1 Capture development . 37 3.3.2 Captured samples . 37 3.3.3 PCA . 39 3.4 Discussion . 44 4 Case study Welzin . 47 4.1 Introduction . 47 4.1.1 Sample background . 47 4.1.2 Genetic history of Bronze Age Germany . 48 4.1.3 Archaeological background of the Bronze Age . 48 iii 4.2 Methods . 48 4.2.1 ATLAS . 49 4.2.2 Relatedness . 50 4.2.3 Plink and the reference data set . 50 4.2.4 Admixtools . 51 4.2.5 Admixture . 52 4.3 Results . 52 4.3.1 Reference overlap . 52 4.3.2 PCA . 53 4.3.3 F3 and D-statistics . 55 4.3.4 Admixture . 58 4.4 Discussion . 60 4.4.1 Interpretation of the results in context of population history and Archeology . 60 4.4.2 Data quality . 61 4.5 Conclusion . 62 5 On lane contamination . 63 5.1 Samples . 63 5.2 Methods . 63 5.2.1 Lab methods . 63 5.2.2 Bioinformatics . 66 5.3 Results . 68 5.3.1 Capture efficiency . 68 5.3.2 Correction for index mis-identification . 70 5.3.3 SNP calling . 70 5.3.4 Blast . 74 5.3.5 ContaMix . 75 5.4 Discussion . 76 5.4.1 Capture efficiency & correction for index mis-identification . 76 5.4.2 SNP calling . ..

Addressing Challenges of Ancient DNA Sequence Data Obtained with Next Generation Methods

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support