Supplementary Materials For
Total Page:16
File Type:pdf, Size:1020Kb
Supplementary Materials for A high coverage genome sequence from an archaic Denisovan individual Matthias Meyer, Martin Kircher, Marie-Theres Gansauge, Heng Li, Fernando Racimo, Swapan Mallick, Joshua G. Schraiber, Flora Jay, Kay Prüfer, Cesare de Filippo, Peter H. Sudmant, Can Alkan, Qiaomei Fu, Ron Do, Nadin Rohland, Arti Tandon, Michael Siebauer, Richard E. Green, Katarzyna Bryc, Adrian W. Briggs, Udo Stenzel, Jesse Dabney, Jay Shendure, Jacob Kitzman, Michael F. Hammer, Michael V. Shunkov, Anatoli P. Derevianko, Nick Patterson, Aida M. Andrés, Evan E. Eichler, Montgomery Slatkin, David Reich, Janet Kelso, Svante Pääbo correspondence to: [email protected] This PDF file includes: Materials and Methods Figs. S1 to S38 Tables S1 to S43, S45 to S47, S51 to S53, S56 to S58 Other Supplementary Materials for this manuscript includes the following: 1224344s2.xls (Tables S44, S48, S49, S50, S54 and S55) 1 Materials and methods Table of Contents Note 1: Sampling and extraction of side fractions .......................................................................... 3 Note 2: Library preparation and sequencing ................................................................................... 4 Note 3: Comparison of the single-stranded and double-stranded library preparation methods ..... 9 Note 4: Processing and mapping raw sequence data from Denisova ........................................... 12 Note 5: Sequencing and processing of 11 present-day human genomes ...................................... 14 Note 6: Extended Variant Call Format files ................................................................................. 16 Note 7: Estimates of present-day human contamination in Denisova .......................................... 21 Note 8: Assessing genome sequence quality ................................................................................ 25 Note 9: Estimates of sequence divergence of Denisovans and present-day humans .................... 27 Note 10: Date of population divergence of modern humans from Denisovans and Neandertals . 32 Note 11: D-statistics and interbreeding between archaic and modern humans ............................ 36 Note 12: Relationship between Denisova and 11 present-day human genomes .......................... 47 Note 13: Segmental duplication and copy number variation analysis .......................................... 49 Note 14: Chromosome Two Fusion Site ....................................................................................... 53 Note 15: Estimating heterozygosity in Denisova and 11 present-day human genomes ............... 55 Note 16: The Denisovan genome lacks a signal of recent inbreeding .......................................... 58 Note 17: Inferred population size changes in the history of Denisovans ..................................... 59 Note 18: Less effective selection in Denisovans than modern humans due to a smaller population size ................................................................................................................................................ 61 Note 19: A complete catalog of features unique to the human genome ....................................... 64 Note 20: Catalog of features unique to the Denisovan genome .................................................... 73 Supplementary Figures ................................................................................................................. 75 Supplementary Tables ................................................................................................................. 101 2 Note 1: Sampling and extraction of side fractions Marie-Theres Gansauge, Qiaomei Fu and Matthias Meyer* * To whom correspondence should be addressed ([email protected]) In 2010, two extracts (E236 and E245), each with a volume of 100 µl, were prepared from 40mg of the Denisovan phalanx (2) (see Figure S1 for an overview of bone material usage). For initial screening of the bone, sequencing libraries were prepared without prior excision of deoxyuracils from the template strands. When analysis of captured mtDNA fragments revealed that the sample came from a previously unknown hominin (4) and that the DNA fragments contained patterns of nucleotide misincorporations typical of ancient DNA (9), fractions of the extracts were treated with uracil-DNA-glycosylase (UDG) and endonuclease VIII (29) to produce two libraries (SL3003 and SL3004) that were used to generate the nuclear DNA sequences that were analyzed and published earlier (2). We routinely freeze all relevant side fractions arising during DNA extraction. The silica-based DNA extraction process used (30) involves three steps: (i) sample lysis in an EDTA/proteinase K buffer, (ii) DNA binding to a silica suspension in the presence of chaotropic salts (iii) ethanol wash and elution in low-salt buffer. From each extraction, the following side fractions are kept: (i) any undissolved bone powder; (ii) binding supernatant; (iii) the silica pellet. The bone powder from the phalanx was completely dissolved. We used the binding supernatant and the silica pellet of E245 in an attempt to recover additional molecules. In a falcon tube, 30 µl of new silica suspension were added to 5 ml of ‘left-over’ binding supernatant. After rotating for three hours in the dark, the silica was pelleted by centrifugation and washed twice in washing buffer. DNA was eluted in 28 µl TE buffer and transferred to the old silica pellet for re-elution, thereby combining the molecules recovered from both side fractions (ii and iii) of E245. 3 Note 2: Library preparation and sequencing Marie-Theres Gansauge, Qiaomei Fu and Matthias Meyer* * To whom correspondence should be addressed ([email protected]) Method summary A schematic overview of the single-stranded library preparation method as well as the two double-stranded methods previously used for library preparation from ancient DNA (31, 32) is provided in Figure S2. In more detail, the single-stranded method involves the following steps: (i) Input DNA is treated with a mixture of uracil-DNA-glycosylase (UDG) and endonuclease VIII, which removes uracils from DNA strands and converts resulting abasic sites into single-nucleotide gaps. This treatment is optional. (ii) To this reaction, a heat-labile phosphatase is added, which removes both 5’- and 3’- phosphates from DNA ends if present. (iii) The double-stranded molecules are heat-denatured and again treated with phosphatase to complete dephosphorylation. (iv) After heat-inactivation of the phosphatase, adaptor oligonucleotides of ten bases, which carry 5’-phosphates and 3’-biotin-linker arms, are ligated in the presence of polyethylene glycol (PEG) to the 3’-ends of the input molecules using CircLigase IITM, an enzyme that preferentially circularizes single-stranded DNA (33), but when presented with donor and acceptor molecules with only one ligatable end, efficiently achieves end-to-end ligation (34). (v) Ligation products as well as excess adaptors are subsequently bound to streptavidin- coated magnetic beads. (vi) A 5’-tailed primer, which carries a priming site for later amplification, is hybridized to the adaptor oligonucleotide, and the original template strand is copied using Bst polymerase (large fragment). (vii) T4 DNA polymerase is used to remove 3’-overhangs generated during the primer extension reaction. (viii) T4 DNA ligase is used to ligate the 5’-phosphorylated end of a double stranded adaptor to the 3’-end of the newly copied strand. The 3’-end of the adaptor carries a dideoxy modification to prevent adaptor self-ligation. Between all enzymatic reactions, beads are intensely washed, also at elevated temperature, to remove extension primers hybridized to excess adaptor oligonucleotides. 4 (ix) Finally, the original template molecules and the library molecules, i.e. copies with short adaptor sequences attached to each end, are released by heat-destruction of the beads. Libraries were prepared from extracts E236 and E245 as well as from the molecules recovered from the side fractions of extraction E245 using the single-stranded method (Table S1). In addition, water controls were carried through the library preparation process. The number of molecules in each library was estimated by qPCR (35, 36). According to these estimates, the sample extracts produced at least one order of magnitude more molecules than the water controls, indicating that less than 10% of library molecules are derived from artifacts. Full-length adaptor sequences, carrying sample-specific sequence tags (indexes), were then added to both ends of the library molecules by amplification with 5’-tailed primers (11). For a more efficient use of sequencing capacity, these amplified libraries were separated on polyacrylamide gels, and the fractions of library molecules with inserts larger than ~40 bp were excised. Sequence data was produced from amplified libraries both with and without size-fractionation, using a protocol for double-index sequencing described elsewhere (35). A special sequencing primer was used for the first read, because the P5 adaptor sequence was truncated by five bases compared to the design described previously (11) to avoid interactions between the highly self-complementary P5 and P7 adaptors during single-strand library preparation. In addition to sequencing the libraries prepared with the new single-stranded method, deeper sequencing was performed for libraries prepared in the previous study with the double-stranded method (SL3003 and SL3004) (2). Method details Sequences