SUPPLEMENTARY INFORMATION Supplementary Methods
Total Page:16
File Type:pdf, Size:1020Kb
SUPPLEMENTARY INFORMATION Supplementary Methods Fosmid library construction High molecular weight genomic DNA (gDNA) was extracted from 5 ml EDTA blood of MP1 (Stratagene # 200600). 16 µg of gDNA was mechanically sheared in disposable 1 ml syringes with 23G 1¼ gauge needles in 800 µl TE for 50 seconds. Sheared DNA was end-repaired as per protocol (Epicentre, ER81050) and separated by pulse-field gel electrophoresis (BioRad Chef-DRII) at 6V/cm and 0.5 sec sweep time (final sweep time 2.0 sec) for 20-22 hours in a 1% low melting agarose gel (Sigma, A9414). For size selection, the gel was stained with SybrGold (Invitrogen, S11494) and DNA bands visualized on a Dark Reader, so as to avoid the removal and reconstruction of the marker lane necessary in ethidium bromide staining practice. A band between 30-38 kb was excised, the gel slice digested by gelase (Epicentre, G09050) and DNA recovered by sodium acetate precipitation. Purified blunt-end DNA was ligated to the linearized EpiFOS-5 vector (Epicentre, FOS0901) using 250 ng of input DNA and a vector to insert molar ratio of 10:1 according to manufacturer’s protocols. The ligated DNA was packaged into phage particles (Epicentre, MaxPlax Lambda packaging extract) and the reaction filled to 1 ml with Phage Dilution Buffer (PDB) as per protocol. A single colony from EPI100TR cells was expanded in 50 ml LB with 10 mM MgSO4 and 12.5µg/ml chloramphenicol. The titer of packaged phages was determined by transfecting 100µl EPI100 cells with 10µl of phage particles from a 10-2 and a 10-3 dilution. Transfected E.coli cells were then streaked out onto agar plates, and after overnight incubation, the number of grown colonies counted to calculate the titer. Based on this titer, the required amount of packaged phage solution was filled to 3 ml and then used for mass transfection of 27 ml EPI100 cells, and subsequently grown to OD600 =0.99 in a 37°C shaking incubator for 20 minutes. This amount of mass transfection was estimated to generate a fosmid library with a complexity of ~1.44 x 106 clones. In order to create fosmid pools with ~5,000 clones each, 100 µl of mass transfected cells were each aliquoted into a well of three 96 deep-well plates, which each contained 1.5 ml LB with 10 mM MgSO4 and 12.5 µg/ml chloramphenicol. For quality control, 2.5µl and 5.0µl of the mass transfection reaction of randomly selected wells were streaked out and incubated 1 at 37°C overnight. 145 and 310 colony forming units (cfu) were found for these quality controls, which translated to a content of ~5,000 fosmids per 100µl aliquot or pool. After sealing the three deep- well plates with a breathable foil (Greiner, 676051), fosmid pools were incubated in a shaking incubator at 200 rpm and 37°C for 20 hours. In total, 288 pools were generated and stored as glycerol- bacterial stocks to ensure long term availability of the library. In order to facilitate efficient processing of the fosmid pools in subsequent steps, we generated a single 96-well plate by combining three fosmid pools into a super-pool with 15,000 fosmids each. Fosmid library complexity and evenness of genome representation have been checked by PCR of 10 different chromosomal loci on all pools. The numbers of positive PCR signals were in good agreement to prior expectations of genome coverage. Further, restriction digest with Not1 (NEB) for a selected number of fosmid pools showed a distinct band of 8 kb length (vector backbone) and a smear of digested insert DNA. For fosmid DNA isolation, approximately 10 µl frozen glyerol-bacterial stock per clone pool were scratched into 1ml LB medium, plated onto 22cm x 22cm LB agar and incubated at 37°C overnight. Fosmids were collected and circular fosmid clone DNA isolated by using the Large Construct Kit (Qiagen, 12163), including an exonuclease digest step to remove as much chromosomal E. coli DNA as possible in subsequent NGS, as per standard protocol. SOLiD sequencing library production SOLiD sequencing libraries were prepared for 67 unique fosmid super pools. For each pool used in single fragment (50 bp) or paired-end (50 bp +25 bp) sequencing, ~ 3 µg of isolated fosmid pool DNA were sheared, a band of 100-150 bp size-selected and the DNA ligated to SOLiD adaptor sequences. We slightly modified the P2-adaptor, which contained a unique barcode identifier, to increase the ligation efficiency and allow for a second size-selection between 150 -200 bp of ligated fragments. Mate-pair libraries were prepared for varying insert lengths of up to 10 kb. After shearing (Hydroshear) the fosmid DNA to the targeted insert size (800 bp, 1.5 kb, 2.5 kb, 3.5 kb, 8 kb and 10 kb), size-selected DNA was ligated to CAP adaptors, which enabled to connect the two ends of the insert and resulted in circularization of the fosmid DNA. By enzymatic EcoP15I digestion of the circularized molecules, only the ends (25 bp each) of the insert DNA remained as tags for mate-pair 2 sequencing. With the update to SOLiD V3, this allowed sequencing of 2 x 50 bp tags. Accordingly, longer mate-pair tags were generated via nick translation and subsequent T7 and S1 exonuclease digest as per protocol. Barcoded SOLiD sequencing libraries were subjected to limited PCR amplification (< 5 cycles) with a Hi-Fidelity PCR Supermix (Invitrogen, 12531-016). The corresponding PCR band (250 - 275 bp) was excised from a 4% NuSieve agarose gel (Lonza, 50094), purified with the Qiaquick gel extraction kit (Qiagen, 28706) and quantified with the Qubit system (Invitrogen). Up to 16 barcoded super pool libraries were then multiplexed for subsequent emulsion PCR, which resulted in monoclonal amplification of sequencing library molecules onto magnetic beads. After Butanol breakage of the emulsion, templated magnetic beads were recovered, denatured to single stranded molecules, enriched by P2-covered polystyrene beads, and prepared for deposition by modifying the 3´ends with terminal transferase according to SOliD protocols. This allowed sequencing of up to 16 barcoded super-pools on one slide in a single flow-cell of the SOLiD system. Multiplexing of up to 8 mate pair libraries was enabled by depositing the beads into eight physically separated chambers on a single flow cell. Approximately 80% of all sequencing reads were unambiguously assigned to unique barcodes, allowing separation of multiplexed sequencing reads into their original fosmid pool. SOLiD sequencing Sequencing-by-ligation (SOLiD) was carried out on 67 unique fosmid super pools (15,000 fosmids per pool) and, of these, mate-paired sequencing (2 x 50 bp and 2x 25 bp) was performed on 16 super pools. Paired-end sequencing (50 + 25 bp) has been performed for a subset of 16 fosmid pools. In addition, genomic DNA of MP1 was sequenced to achieve a 30x total genome coverage using mate- pair (insert size 1.5 kb and 2.0 kb) and paired-end sequencing. The quality of generated templated beads (on-axis, P2:P1 ratio, satay plot) was checked by WFA prior to full sequencing of up to 650 million templated beads per flow-cell (SOLiD V3+/V4). Sequencing was initially performed on the SOLiD V2 system, which generated 2- 2.5 Gb mappable data per flow-cell, and since then underwent a significant increase with the implementation of upgraded versions of the SOLiD system. With the 3 current SOLiD V4, a 10 to 20-fold larger amount of mappable data (25 – 50 Gb) can be generated with one flow-cell. Fosmid Detection The first step towards haplotyping is to identify the regions of the genome where fosmids are present in the sample. This was done by examining the coverage pile-up (generated using Samtools (Li et al. 2009) of mapped reads for each pool. The expected coverage of each pool was calculated as the ratio between the total mapped bases and the expected amount of DNA present in a pool. For each pool three steps were implemented: (i) pile-up reads; (ii) perform an allele call for each covered SNP; (iii) identify covered regions of suitable length as fosmids. The first two steps are performed by a single traversal of the mapped reads. To pile-up the reads we divided the genome in bins of 1 kb in length and each mapped read was assigned to the appropriate bin. For each possible allele at each SNP locus the number of bases called was counted. After processing the mapped reads we performed consolidated allele calls for the pool by calling the majority allele on each covered SNP. This procedure assumes that all bases different than the majority base are sequencing errors, which makes sense given the haploid nature of fosmids inside a pool. An allele was left un-called if the count for the majority base was less than one fifth of the average coverage. Heterozygous SNPs, indicating collision of two complementary fosmids from both copies of the chromosomes within a single pool, were defined by the following two conditions: (i) the number of allele calls for the alternative allele more than half the average coverage of the pool; (ii) the number of allele calls for the alternative allele is more than half of the number of calls for the majority base. When a heterozygous SNP was called, the bin was tagged and excluded in subsequent determination of fosmid contigs and their genomic coordinates. Adjacent bins above a specified coverage threshold (> 1/5 of the average coverage) and validated to be haploid were selected for analysis.