RNA Sequencing by Direct Tagmentation of RNA/DNA Hybrids
Total Page:16
File Type:pdf, Size:1020Kb
RNA sequencing by direct tagmentation of RNA/DNA hybrids Lin Dia,1, Yusi Fua,1,2, Yue Suna, Jie Lib,c, Lu Liub,c, Jiacheng Yaob,c, Guanbo Wangd,e, Yalei Wuf, Kaiqin Laof, Raymond W. Leef, Genhua Zhengf, Jun Xug, Juntaek Ohg, Dong Wangg, X. Sunney Xiea,3, Yanyi Huanga,e,h,i,j,3, and Jianbin Wangb,c,j,k,3 aBeijing Advanced Innovation Center for Genomics (ICG), Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, 100871 Beijing, China; bSchool of Life Sciences, Tsinghua University, 100084 Beijing, China; cTsinghua-Peking Center for Life Sciences, Tsinghua University, 100084 Beijing, China; dSchool of Chemistry and Materials Science, Nanjing Normal University, 210046 Nanjing, China; eInstitute for Cell Analysis, Shenzhen Bay Laboratory, 518132 Shenzhen, China; fXGen US Co, South San Francisco, CA 94080; gDepartment of Cellular and Molecular Medicine, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093; hCollege of Engineering, Peking University, 100871 Beijing, China; iCollege of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China; jChinese Institute for Brain Research (CIBR), 102206 Beijing, China; and kBeijing Advanced Innovation Center for Structural Biology, Tsinghua University, 100084 Beijing, China Edited by David A. Weitz, Harvard University, Cambridge, MA, and approved December 31, 2019 (received for review November 11, 2019) Transcriptome profiling by RNA sequencing (RNA-seq) has been type of approach is to barcode each cell’s transcripts and hence widely used to characterize cellular status, but it relies on second- bioinformatically assign identity to the sequencing data that is strand complementary DNA (cDNA) synthesis to generate initial linked to each cell and each molecule (20–23). However, the material for library preparation. Here we use bacterial transposase detectability and reproducibility of such approaches are still not Tn5, which has been increasingly used in various high-throughput ideal (24). An easy and versatile RNA-seq method is needed that DNA analyses, to construct RNA-seq libraries without second- works with input from single cells to bulk RNA. strand synthesis. We show that Tn5 transposome can randomly Bacterial transposase Tn5 (25) has been employed in next- bind RNA/DNA heteroduplexes and add sequencing adapters onto generation sequencing, taking advantage of the unique “tagmen- RNA directly after reverse transcription. This method, Sequencing tation” function of dimeric Tn5, which can cut double-stranded HEteRo RNA-DNA-hYbrid (SHERRY), is versatile and scalable. DNA (dsDNA) and ligate the resulting DNA ends to specific SHERRY accepts a wide range of starting materials, from bulk adaptors. Genetically engineered Tn5 is now widely used in se- APPLIED BIOLOGICAL SCIENCES RNA to single cells. SHERRY offers a greatly simplified protocol quencing library preparation for its rapid processivity and low and produces results with higher reproducibility and GC unifor- sample input requirement (26, 27). For general library prepara- mity compared with prevailing RNA-seq methods. tion, Tn5 directly reacts with naked dsDNA. This is followed by PCR amplification with sequencing adaptors. Such a simple single cell | RNA-seq | Tn5 transposase Significance ranscriptome profiling through RNA sequencing (RNA-seq) Thas become routine in biomedical research since the popu- RNA sequencing is widely used to measure gene expression in larization of next-generation sequencers and the dramatic fall in biomedical research; therefore, improvements in the simplicity the cost of sequencing. RNA-seq has been widely used in and accuracy of the technology are desirable. All existing RNA addressing various biological questions, from exploring the sequencing methods rely on the conversion of RNA into pathogenesis of disease (1, 2) to constructing transcriptome double-stranded DNA through reverse transcription followed maps for various species (3, 4). RNA-seq provides informative by second-strand synthesis. The latter step requires additional assessments of samples, especially when heterogeneity in a enzymes and purification, and introduces sequence-dependent complex biological system (5, 6) or time-dependent dynamic bias. Here, we show that Tn5 transposase, which randomly – processes are being investigated (7 9). A typical RNA-seq ex- binds and cuts double-stranded DNA, can directly fragment periment requires a DNA library generated from messenger and prime the RNA/DNA heteroduplexes generated by reverse RNA (mRNA) transcripts. The commonly used protocols con- transcription. The primed fragments are then subject to PCR tain a few key steps, including RNA extraction, poly-A selection amplification. This provides an approach for simple and accu- or ribosomal RNA depletion, reverse transcription, second- rate RNA characterization and quantification. strand complementary DNA (cDNA) synthesis, adapter liga- tion, and PCR amplification (10–12). Author contributions: Y.F., K.L., X.S.X., Y.H., and J.W. designed research; L.D., Y.F., Y.S., Although many experimental protocols, combining chemistry J.L., L.L., G.W., J.X., J.O., and D.W. performed research; L.D., Y.F., L.L., J.Y., Y.W., R.W.L., and processes, have recently been invented, RNA-seq is still a G.Z., Y.H., and J.W. analyzed data; and L.D., X.S.X., Y.H., and J.W. wrote the paper. challenging technology to apply. On one hand, most of these Competing interest statement: XGen US Co. has applied for a patent related to this work. – X.S.X., K.L., and Y.W. are shareholders of XGen US Co. K.L., Y.W., R.W.L., and G.Z. are protocols are designed for conventional bulk samples (11, 13 15), employees of XGen US Co. which typically contain millions of cells or more. However, many This article is a PNAS Direct Submission. cutting-edge studies require transcriptome analyses of very small This open access article is distributed under Creative Commons Attribution-NonCommercial- amounts of input RNA, for which most large-input protocols do NoDerivatives License 4.0 (CC BY-NC-ND). not work. The main reason for this incompatibility is because the Data deposition: The sequence reported in this paper has been deposited in the Genome purification operations needed between the main experimental Sequence Archive database, http://gsa.big.ac.cn/ (accession no. CRA002081). steps cause inevitable loss of the nucleic acid molecules. 1L.D. and Y.F. contributed equally to this work. On the other hand, many single-cell RNA-seq protocols have 2Present address: Department of Molecular and Human Genetics, Baylor College of Med- been invented in the past decade (16–19). However, for most of icine, Houston, TX 77030. these protocols it is difficult to achieve both high throughput and 3To whom correspondence may be addressed. Email: [email protected], yanyi@pku. high detectability. One type of single-cell RNA-seq approach, edu.cn, or [email protected]. such as Smart-seq2 (17), is to introduce preamplification to ad- This article contains supporting information online at https://www.pnas.org/lookup/suppl/ dress the low-input problem, but such an approach is likely to doi:10.1073/pnas.1919800117/-/DCSupplemental. introduce bias and to impair quantification accuracy. Another www.pnas.org/cgi/doi/10.1073/pnas.1919800117 PNAS Latest Articles | 1of8 Downloaded by guest on October 2, 2021 one-step tagmentation reaction has greatly simplified the experi- Results mental process, shortening the workflow time and lowering costs. RNA-Seq Strategy Using RNA/DNA Hybrid Tagmentation. Because of Tn5 tagmentation has also been used for the detection of chro- its nucleotidyl transfer activity, transposase Tn5 has been widely matin accessibility, high-accuracy single-cell whole-genome se- used in recently developed DNA sequencing technologies. Pre- quencing, and chromatin interaction studies (28–32). For RNA-seq, vious studies (36, 37) have identified a catalytic site within its the RNA transcripts have to undergo reverse transcription and DDE domain (SI Appendix, Fig. S1A). Indeed, when we mutated second-strand synthesis, before the Tn5 tagmentation of the one of the key residues (D188E) (38) in pTXB1 Tn5, its frag- resulting dsDNA (33, 34). mentation activity on dsDNA was notably impaired (SI Appendix, In this paper we present a RNA-seq method using Tn5 Fig. S1 A and B). Increased amounts of the mutated enzyme transposase to directly tagment RNA/DNA hybrids to form an showed no improvement in tagmentation, verifying the impor- amplifiable library. We experimentally show that, as an RNase H tant role of the DDE domain in Tn5 tagmentation. Tn5 is a superfamily member (35), Tn5 binds to RNA/DNA heteroduplex member of the ribonuclease H-like (RNHL) superfamily, to- similarly as to dsDNA and effectively fragments and then ligates gether with RNase H and Moloney Murine Leukemia Virus the specific amplification and sequencing adaptor onto the hy- (MMLV) reverse transcriptase (35, 39, 40); therefore, we hy- pothesized that Tn5 is capable of processing not only dsDNA but brid. This method, named Sequencing HEteRo RNA-DNA-hYbrid also RNA/DNA heteroduplex. Sequence alignment between (SHERRY), greatly improves the robustness of low-input RNA-seq these three proteins revealed a conserved domain (two Asps and with a simplified experimental procedure. We also show that one Glu) within their active sites, termed the RNase H-like SHERRY works with various amounts of input sample, from single domain (Fig. 1A). The two Asp residues (D97 and D188) in cells to bulk RNA, with a dynamic range spanning six orders of the Tn5 catalytic core were structurally similar to those of the magnitude. SHERRY shows superior cross-sample robustness and other two enzymes (Fig. 1B). Moreover, divalent ions, which are comparable detectability for both bulk RNA and single cells com- important for stabilizing substrate and catalyzing reactions, also pared with other commonly used methods and provides a unique occupy similar positions in all three proteins (Fig. 1B) (39). We solution for small bulk samples that existing approaches struggle to determined the nucleic acid substrate binding pocket of Tn5 handle.