Accurate SNV Detection in Single Cells by Transposon-Based Whole-Genome Amplification of Complementary Strands

Accurate SNV detection in single cells by transposon-based whole-genome amplification of complementary strands Dong Xinga,1,2,3, Longzhi Tana,1,4, Chi-Han Changa, Heng Lib,c,5, and X. Sunney Xied,e,5 aDepartment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138; bDepartment of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215; cDepartment of Biomedical Informatics, Harvard Medical School, Boston, MA 02215; dInnovation Center for Genomics, Peking University, 100871 Beijing, China; and eBiomedical Pioneering Innovation Center, Peking University, 100871 Beijing, China Contributed by X. Sunney Xie, November 29, 2020 (sent for review July 8, 2020; reviewed by Peter A. Sims and Christopher A. Walsh) Single-nucleotide variants (SNVs), pertinent to aging and disease, Results occur sporadically in the human genome, hence necessitating META-CS for SNV Identification. META-CS is based on our previ- single-cell measurements. However, detection of single-cell SNVs ously developed WGA method, multiplexed end-tagging ampli- suffers from false positives (FPs) due to intracellular single- fication (8). Briefly, genomic DNA from single-cell lysates was stranded DNA damage and the process of whole-genome amplifi- fragmented by Tn5 transposition, and each fragment was ran- cation (WGA). Here, we report a single-cell WGA method termed domly tagged with 2 out of 16 unique transposon sequences multiplexed end-tagging amplification of complementary strands which served as priming sites in the following reactions (Fig. 1B). (META-CS), which eliminates nearly all FPs by virtue of DNA com- The use of multiple transposon sequences provided fragment- plementarity, and achieved the highest accuracy thus far. We val- specific barcodes and reduced the amplification loss associated idated META-CS by sequencing kindred cells and human sperm, with the intramolecular hairpin structure that may form when a and applied it to other human tissues. Investigation of mature fragment is tagged by the same sequence on both ends (SI Ap- single human neurons revealed increasing SNVs with age and po- pendix, Fig. S1). Next, DNA fragments were melted by heating, tentially unrepaired strand-specific oxidative guanine damage. We and the two single strands were preamplified by two sequential determined SNV frequencies along the genome in differentiated polymerase extension reactions to obtain strand-specific labeling GENETICS single human blood cells, and identified cell type-dependent mu- (Fig. 1C). Excess primers were removed by exonuclease I after tational patterns for major types of lymphocytes. single-cell sequencing | mutations | false positives | Tn5 transposition | Significance complementary DNA strands The boom of single-cell sequencing technologies in the past enome-wide determination of single-nucleotide variants (SNVs) decade has profoundly expanded our understanding of fun- Gin single cells has been challenging due to false positives damental biology. Today, tens of thousands of cells can be (FPs) from two sources. First, polymerases used for amplification measured by single-cell RNA-seq in one experiment. However, make errors. The error rates of base substitution during in vitro single-cell DNA-sequencing studies have been limited by false − DNA synthesis for most DNA polymerases range from 10 4 to positives and cost. Here we report META-CS, a single-cell − 10 6 (1), indicating that thousands of FPs can be generated in the whole-genome amplification method that takes advantage of first cycle of amplification for a human genome of 6 billion bp. the complementary strands of double-stranded DNA to filter Second, the harsh conditions of cell lysis and amplification cause out false positives and reduce sequencing cost. META-CS DNA damage, which, together with damage that occurred natu- achieved the highest accuracy in terms of detecting single- rally in the live cell (e.g., deamination and oxidative damage), can nucleotide variations, and provided potential solutions for the identification of other genomic variants, such as insertions, be misrecognized by DNA polymerases and turned into errors. deletions, and structural variations in single cells. For example, it has been reported that deamination of cytosine resulted in the artifact of C>T transitional mutation in single-cell Author contributions: D.X., L.T., C.-H.C., and X.S.X. designed research; D.X. and L.T. per- multiple displacement amplification (MDA) (2). These FPs usu- formed research; D.X., C.-H.C., and H.L. contributed new reagents/analytic tools; D.X., L.T., ally significantly outnumbered naturally occurring SNVs, posing a and H.L. analyzed data; and D.X., L.T., H.L., and X.S.X. wrote the paper. limitation on single-cell genomics. Reviewers: P.A.S., Columbia University Medical Center; and C.A.W., Boston Children’s While a true SNV has to be on the same position of both Hospital . strands, both polymerase errors and DNA damage occur only on Competing interest statement: D.X., L.T., C.-H.C., and X.S.X. are investors on Patent one strand of DNA. Therefore, FPs can be filtered out through WO2018217912A1 filed by the president and Fellows of Harvard College that checking the complementarity of the two strands after se- covers META-CS. quencing (Fig. 1A). Current methods to achieve this, involving This open access article is distributed under Creative Commons Attribution-NonCommercial- NoDerivatives License 4.0 (CC BY-NC-ND). ligation of strand-specific adapters (3, 4) or physical separation 1 of the two strands (5), are labor-intensive and suffer from sig- D.X. and L.T. contributed equally to this work. 2 nificant sample loss of single cells. In silico SNV-calling algo- Present address: Innovation Center for Genomics, Peking University, 100871 Beijing, China. rithms, on the other hand, rely on heterozygous single-nucleotide 3Present address: Biomedical Pioneering Innovation Center, Peking University, 100871 polymorphisms and the assumption that single strands are am- Beijing, China. plified uniformly, which still leads to FPs when one of the strands 4Present address: Department of Bioengineering, Stanford University, Stanford, fails to be amplified (6, 7). Here, we report a whole-genome CA 94305. amplification (WGA) method termed multiplexed end-tagging 5To whom correspondence may be addressed. Email: [email protected] or amplification of complementary strands (META-CS), which is [email protected]. able to separately label and amplify the two strands of DNA in a This article contains supporting information online at https://www.pnas.org/lookup/suppl/ one-tube reaction and accurately identify de novo SNVs from a doi:10.1073/pnas.2013106118/-/DCSupplemental. single cell. Published February 15, 2021. PNAS 2021 Vol. 118 No. 8 e2013106118 https://doi.org/10.1073/pnas.2013106118 | 1of8 Downloaded by guest on September 25, 2021 A Double-stranded DNA the amplification product. We observed four discordant strands Denature out of 173,635 reads, corresponding to a strand conversion rate of −5 Forward strand Reverse strand 2.3 × 10 . This suggested a high specificity of the strand-labeling strategy, although different templates and other combinations of Amplification Amplification META primers might result in different strand conversion rates. Characterization of Detection Accuracy. To evaluate the accuracy of META-CS, we clonally expanded a single ancestor cell from a human haploid cell line (eHAP) for 5 d to a few hundred cells A Complementarity check (Fig. 2 ). Ten single cells were picked by mouth pipetting and successfully amplified by META-CS (SI Appendix, Fig. S3). The rest of the cells were used for a second expansion of another 8 d True mutation DNA damage Polymerase error Damage induced error to millions of cells for bulk DNA extraction to represent the ancestor cell’s genome. Both single-cell and bulk samples were mapped to the human reference genome and de novo SNVs B were identified in single cells compared with the ancestor cell. We first determined the minimum number of sequencing reads required to confidently identify a SNV. We used the letter Tn5 transposase Transposon DNA (n=16) Transposome “a” (for allelic depth) to represent the minimum number of total reads, and the letter “s” (for stranded allelic depth) to represent Transposition the minimum number of strand-specific reads. To filter out se- 5’ quencing errors, we started with a total number of no less than Genomic DNA 5’ four reads, requiring at least two reads from each strand (a4s2). Tn5 removal and gap-fillin We then examined the robustness of our calling with respect to the thresholds. Compared with FPs that occur on only one strand C of DNA, true positives existing on both strands of DNA gener- Denature ally have higher allelic frequencies after amplification, and are Forward strand Reverse strand thus less sensitive to the changing of threshold. Increasing the 5’ threshold of the strand filter from a4s2 to a8s4, we found that the 5’ META-R1 priming number of SNVs detected from the eHAP single cells only de- ∼ SI Appendix 5’ R1 5’ creased 2-fold (95/48) ( , Fig. S4). In comparison, 5’ R1 5’ for callings without the strand filter, increasing the threshold Extension from a4s0 to a8s0 led to an ∼6.4-fold decrease (11,153/1,754). 5’ R1 5’ Notably, after adjustment for detection efficiency, the number of 5’ R1 5’ META-R2 priming SNVs remained almost unchanged for callings with the strand 5’ R1 5’ R2 filter but decreased continuously for callings without the strand R2 5’ R1 5’ filter (Fig. 2B). This suggested that META-CS was able to Extension measure mutation rates with as few as four sequencing reads. 5’ R1 5’ R2 In day-5 eHAP single cells, META-CS achieved an average R2 5’ R1 5’ (±SD) of 50.9 ± 12.2% (n = 10) genome coverage and detected PCR amplification ± 5’ R1 5’ R2 9.5 5.9 autosomal de novo SNVs per cell, which corresponded to 70 ± 25 SNVs after correction with a strand-specific detection efficiency of 15.9 ± 6.1% (Fig.

Accurate SNV Detection in Single Cells by Transposon-Based Whole-Genome Amplification of Complementary Strands

Specific Binding of Transposase to Terminal Inverted Repeats Of

A Sleeping Beauty Mutagenesis Screen Reveals a Tumor Suppressor Role for Ncoa2/Src-2 in Liver Cancer

RNA As a Source of Transposase for Sleeping Beauty-Mediated Gene Insertion and Expression in Somatic Cells and Tissues

Licensing RNA-Guided Genome Defense

Exploring the Interplay of Telomerase Reverse Transcriptase and Β-Catenin in Hepatocellular Carcinoma

A Survey of Transposable Element Classification Systems Â

Characterization of the First Cultured Representative of Verrucomicrobia Subdivision 5 Indicates the Proposal of a Novel Phylum

Molecular Architecture of a Eukaryotic DNA Transposase

The Molecular Coupling Between Substrate Recognition and ATP Turnover in A

Measuring Error Rates of High-Fidelity DNA Polymerases by NGS Determining Fidelity of Phusion Plus and Phusion Hot-Start II DNA Polymerases

Adaptive Laboratory Evolution of Corynebacterium Glutamicum

DNA Transposons and the Evolution of Eukaryotic Genomes