Nanopore Sequencing of Single-Cell Transcriptomes with Sccolor-Seq

Total Page:16

File Type:pdf, Size:1020Kb

Nanopore Sequencing of Single-Cell Transcriptomes with Sccolor-Seq BRIEF COMMUNICATION https://doi.org/10.1038/s41587-021-00965-w Nanopore sequencing of single-cell transcriptomes with scCOLOR-seq Martin Philpott1,2, Jonathan Watson3, Anjan Thakurta2,4,5, Tom Brown Jr 3, Tom Brown Sr 3,6, Udo Oppermann 1,2,7 ✉ and Adam P. Cribbs 1,2 ✉ Here we describe single-cell corrected long-read sequencing sequencing error rate of up to 10% (Fig. 1c). In single-cell sequenc- (scCOLOR-seq), which enables error correction of barcode ing, UMIs allow the deduplication of single transcripts that are and unique molecular identifier oligonucleotide sequences detected multiple times in sequencing, arising from PCR copies pro- and permits standalone cDNA nanopore sequencing of single duced during library preparation. The directional network-based cells. Barcodes and unique molecular identifiers are synthe- method first proposed by UMI-tools6 was modified to correct for sized using dimeric nucleotide building blocks that allow error UMI sequence duplication (Fig. 1d and Supplementary Fig. 3). detection. We illustrate the use of the method for evaluating Using simulated data, we show that the dimer-synthesized UMIs barcode assignment accuracy, differential isoform usage in can be used to deduplicate UMIs even with a sequencing error rate myeloma cell lines, and fusion transcript detection in a sar- above 10% (Fig. 1e,f and Supplementary Fig. 4). coma cell line. scCOLOR-seq was validated using human HEK293T and mouse Long-read sequencing technologies such as PacBio 3T3 single-cell Drop-seq libraries from approximately 1,200 cells (at single-molecule real-time (SMRT) sequencing1 or Oxford Nanopore a 50:50 mouse:human cell ratio), followed by Illumina short-read sequencing2 enable the sequencing of full-length transcripts. The sequencing. Overall, 68% of all reads show complete dinucleotide application of PacBio SMRT to single-cell sequencing has been block complementarity across the full barcode sequence. This hindered by a low sequencing capacity (four million reads per flow suggests that the theoretical base-calling accuracy for the bar- cell), which means that a single run can report on only 40–133 cells code should be 98.4%, which aligns with the reported accuracy of at a comparable read depth to short-read approaches, or on thou- Illumina sequencing. The dimer-correction approach was evaluated sands of cells by sacrificing quantitative information3. Zeng et al.4 by measuring the proportion of human, mouse and mixed spe- have improved the platform for single-cell sequencing workflows by cies cells identified following increased edit distances (that is, the concatenating multiple full-length cDNAs into a single insert. This Levenshtein distance) between the error-sequenced and the accu- approach led to a return of 10 million reads from a single SMRT cell, rately sequenced barcodes (Supplementary Fig. 5). An edit distance providing eight times more data output than the standard PacBio of 4 was found to result in accurate assignment of both mouse and sequencing protocol. Alternatively, nanopore sequencing provides human reads (Supplementary Fig. 6), enabling the recovery of an up to 250 million reads per PromethION flow cell, but its main draw- extra 8% of total reads. Although further reads could be recov- back is its high error rate compared with both PacBio long-read and ered using an edit distance of 5, this was obtained at the expense of Illumina short-read sequencing (5–15% compared with less than increased numbers of mixed species cells (Supplementary Fig. 5i). 1%)5. To overcome the low base-calling accuracy of Oxford Nanopore Application of scCOLOR-seq to nanopore sequencing identified sequencing, here we describe single-cell corrected long-read the presence of a poly(A) sequence in 40% (range, 24–62%) of all sequencing (scCOLOR-seq), in which the barcode and unique nanopore sequencing reads and detected 12.9% (range, 9–15%) of molecular identifier (UMI) regions of the oligonucleotide-barcoded these reads with dual nucleotide complementarity across the full RNA-capture microbeads are synthesized using homodimeric nucle- barcode sequence (Supplementary Fig. 7). This suggests that the oside phosphoramidite building blocks (Supplementary Fig. 1), theoretical base-calling accuracy of single-cell nanopore sequenc- which provides a means for sequencing-error detection and correc- ing should be 91.8%. However, barcodes were observed to often tion of the barcode and UMI (Fig. 1a). contain more than one error per barcode, which has the effect of To correctly assign barcodes to cells, a computational strategy reducing the overall measurable base-calling accuracy to 86% was developed in which barcodes were identified in a two-pass (Supplementary Fig. 8). Naive collapsing of barcodes and UMIs assignment method (Fig. 1b and Supplementary Fig. 2). First, bar- sequenced into single-base sequences without error correction led codes without errors were identified on the basis of nucleotide pair to only 81 recovered cells (Supplementary Fig. 9). The dimer cor- complementarity across the full length of the barcode. Next, these rection approach using an edit distance of 6 led to the recovery of accurate barcodes were used as a guide to correct the remaining 54% (range, 43%–68%) of barcodes containing sequencing errors erroneous barcodes. Using simulated data, we show that this strat- (Fig. 2a,b and Supplementary Fig. 7c). Increasing the edit distance to egy is capable of correcting erroneous barcodes with a high sequenc- 7 increased recovery to 82% (range, 79.8%–83.6%), at the expense of ing error rate, with 96% of barcodes recovered with a barcode slightly increased numbers of mixed cells (Supplementary Fig. 10). 1Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, National Institute of Health Research Oxford Biomedical Research Unit (BRU), University of Oxford, Oxford, UK. 2Oxford Centre for Translational Myeloma Research University of Oxford, Oxford, UK. 3ATDBio, Oxford, UK. 4Radcliffe Department of Medicine, Oxford University, Oxford, UK. 5Translational Medicine, Bristol Myers Squibb, Summit, NJ, USA. 6Chemistry Research Laboratory, Department of Chemistry, University of Oxford, Oxford, UK. 7Centre for Medicines Discovery, University of Oxford, Oxford, UK. ✉e-mail: [email protected]; [email protected] NaturE BIotECHnoloGY | www.nature.com/naturebiotechnology BRIEF COMMUNICATION NATURE BIOTECHNOLOGY a Barcode and UMI synthesized using homodimeric blocks of reverse phosphoramidites Mononucleotide Block nucleotide PCR handle Barcode UMI Poly(A) site Bead Original barcode and UMI: Sequenced read: Error Error b c Barcode blocks 100 Barcode have a mismatch 75 Guide 50 Barcode blocks agree across the 25 full length Cell barcodes correctly Barcodes recovered (%) 0 assigned –3 –2 –1 log10-transformed sequencing error rate d e f UMI UMI block mismatch. 0.20 UMI split and collapsed 7,500 into single nucleotides 0.15 5,000 UMI nucleotide 0.10 blocks agree Difference 2,500 across the full length. Directional 0.05 UMI collapsed into Coefficient of variation UMI-tools count (corrected versus truth) 0 single correction nucleotides –3 –2 –1 –3 –2 –1 log10-transformed log10-transformed sequencing error rate sequencing error rate No correction Directional Directional paired Fig. 1 | Developing a strategy to error-correct barcode and UMI sequences from droplet-based sequencing. a, Schematic bead and oligonucleotide structure using dimer blocks of nucleotides for Buc-seq. b, Cell barcode-assignment strategy. c, UMI deduplication strategy. d, Simulated data showing the number of barcodes recovered with increasing simulated sequencing error rates. e,f, Simulated data showing the difference and coefficient of variation between the deduplicated UMIs and the ground truth. Correction of the UMI counts was performed using a basic directional network-based approach after accounting for sequencing errors within homodimeric blocks of nucleotides. However, filtering on the basis of at least the presence of 200 fea- within these cells. We next searched for differentially regulated tures per cell removed a substantial proportion of mixed cells transcripts between cell types and clusters. In this experiment, (Supplementary Fig. 11). Cells were then projected into two cell-type-specific usage was observed for 359 genes and 416 dif- dimensions using uniform manifold approximation and projec- ferentially expressed isoforms. Differential transcript usage was tion (UMAP) and a clear separation of the mouse and human cell particularly apparent for the marker CD74 (Fig. 2e–h), which is a populations was observed, with 1,077 cells recovered using an edit potential therapeutic target in multiple myeloma7. Furthermore, in distance of 6 and 1,064 cells recovered using an edit distance of 7 agreement with the literature and the biology of plasma cells8, a dif- (Supplementary Fig. 10). ferential expression of both immunoglobulin κ and λ light-chain scCOLOR-seq was applied to a mixture (1:1:1 ratio) of human isoform use between the different myeloma cell lines was observed NCI-H929, JJN3 and DF15 myeloma cell lines and approximately (Supplementary Figs. 16, 17). 500 cells were sequenced using a MinION flow cell and 1,200 cells Long read sequencing permits the measurement of fusion tran- sequenced using a PromethION flow cell. After filtering with a scripts that are often key drivers of tumor development. To illus- minimum of 200 features per cell (Supplementary Figs. 12, 13), we trate the principle, Ewing’s sarcoma was selected, which harbors show that nanopore sequencing can resolve the different myeloma the t(11:22)(q24:q212) translocation that generates EWS-FLI, a cell types at both the gene level (Fig. 2c and Supplementary Fig. 14) fusion between EWSR1 (Ewing’s sarcoma breakpoint region 1) and and the transcript level (Fig. 2d and Supplementary Fig. 14b–f). the ETS transcription factor FLI1 (Friend leukemia integration 1) There was also a good correlation between Oxford Nanopore and genes9. The EWS–FLI protein regulates the expression of numerous Illumina gene counts per cell (R = 0.67) and the number of UMIs target genes that promote cancer survival and drug resistance10,11.
Recommended publications
  • Recovery of Small Plasmid Sequences Via Oxford Nanopore Sequencing
    bioRxiv preprint doi: https://doi.org/10.1101/2021.02.21.432182; this version posted February 22, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Recovery of small plasmid sequences via Oxford Nanopore sequencing Ryan R. Wick1*, Louise M. Judd1 , Kelly L. Wyres1 and Kathryn E. Holt1,2 1. Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia 2. Department of Infection Biology, London School of Hygiene & Tropical Medicine, London, WC1E 7HT, UK * [email protected] Abstract Oxford Nanopore Technologies (ONT) sequencing platforms currently offer two approaches to whole-genome native-DNA library preparation: ligation and rapid. In this study, we compared these two approaches for bacterial whole-genome sequencing, with a specific aim of assessing their ability to recover small plasmid sequences. To do so, we sequenced DNA from seven plasmid-rich bacterial isolates in three different ways: ONT ligation, ONT rapid and Illumina. Using the Illumina read depths to approximate true plasmid abundance, we found that small plasmids (<20 kbp) were underrepresented in ONT ligation read sets (by a mean factor of ~4) but were not underrepresented in ONT rapid read sets. This effect correlated with plasmid size, with the smallest plasmids being the most underrepresented in ONT ligation read sets. We also found lower rates of chimeric reads in the rapid read sets relative to ligation read sets. These results show that when small plasmid recovery is important, ONT rapid library preparations are preferable to ligation-based protocols.
    [Show full text]
  • Nanopore Sequencing of Long Ribosomal DNA Amplicons Enables
    bioRxiv preprint first posted online Jun. 29, 2018; doi: http://dx.doi.org/10.1101/358572. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. Nanopore sequencing of long ribosomal DNA amplicons enables portable and simple biodiversity assessments with high phylogenetic resolution across broad taxonomic scale Henrik Krehenwinkel1,4, Aaron Pomerantz2, James B. Henderson3,4, Susan R. Kennedy1, Jun Ying Lim1,2, Varun Swamy5, Juan Diego Shoobridge6, Nipam H. Patel2,7, Rosemary G. Gillespie1, Stefan Prost2,8 1 Department of Environmental Science, Policy and Management, University of California, Berkeley, USA 2 Department of Integrative Biology, University of California, Berkeley, USA 3 Institute for Biodiversity Science and Sustainability, California Academy of Sciences, San Francisco, USA 4 Center for Comparative Genomics, California Academy of Sciences, San Francisco, USA 5 San Diego Zoo Institute for Conservation Research, Escondido, USA 6 Applied Botany Laboratory, Research and development Laboratories, Cayetano Heredia University, Lima, Perú 7 Department of Molecular and Cell Biology, University of California, Berkeley, USA 8 Research Institute of Wildlife Ecology, Department of Integrative Biology and Evolution, University of Veterinary Medicine, Vienna, Austria Corresponding authors: Henrik Krehenwinkel ([email protected]) and Stefan Prost ([email protected]) Keywords Biodiversity, ribosomal, eukaryotes, long DNA barcodes, Oxford Nanopore Technologies, MinION Abstract Background In light of the current biodiversity crisis, DNA barcoding is developing into an essential tool to quantify state shifts in global ecosystems.
    [Show full text]
  • Genomic Sequencing of SARS-Cov-2: a Guide to Implementation for Maximum Impact on Public Health
    Genomic sequencing of SARS-CoV-2 A guide to implementation for maximum impact on public health 8 January 2021 Genomic sequencing of SARS-CoV-2 A guide to implementation for maximum impact on public health 8 January 2021 Genomic sequencing of SARS-CoV-2: a guide to implementation for maximum impact on public health ISBN 978-92-4-001844-0 (electronic version) ISBN 978-92-4-001845-7 (print version) © World Health Organization 2021 Some rights reserved. This work is available under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 IGO licence (CC BY-NC-SA 3.0 IGO; https://creativecommons.org/licenses/by-nc-sa/3.0/igo). Under the terms of this licence, you may copy, redistribute and adapt the work for non-commercial purposes, provided the work is appropriately cited, as indicated below. In any use of this work, there should be no suggestion that WHO endorses any specific organization, products or services. The use of the WHO logo is not permitted. If you adapt the work, then you must license your work under the same or equivalent Creative Commons licence. If you create a translation of this work, you should add the following disclaimer along with the suggested citation: “This translation was not created by the World Health Organization (WHO). WHO is not responsible for the content or accuracy of this translation. The original English edition shall be the binding and authentic edition”. Any mediation relating to disputes arising under the licence shall be conducted in accordance with the mediation rules of the World Intellectual Property Organization (http://www.wipo.int/amc/en/mediation/rules/).
    [Show full text]
  • Using Long Nanopore Reads to Delineate Structural Variants (Svs)
    Using long nanopore reads to delineate structural variants (SVs) in the human genome SVs, including large deletions, duplications, inversions, translocations and copy-number changes are abundant in large genomes, and require long reads for precise characterisation Contact: [email protected] More information at: www.nanoporetech.com and publications.nanoporetech.com Unique Repeat Unique Repeat Unique a) b) a) b) sequence 1 1 sequence 2 2 sequence 3 1,000 60 Short reads Insertions Long A B C D E Reference chromosome 1 40 reads 800 Deletions > 50 bp Short-read assembly 20 Collapsed repeat consensus Unique contig 1 Long-read Bases sequenced (Mb) assembly 600 0 Unique contig 1 Unique contig 3 0 10 20 30 40 V W X Y Z Reference chromosome 2 Single, fully-resolved contig Count Read length (kb) c) > 50 bp 400 chr7 (q33) 7p21.3 15.321.1 15.3 7p14.3 7p14.1 13 11.2 11.21 11.22 11.23 7q21.11 q21.3 7q22.1 7q31.1 7q33 7q34 7q35 36.1 36.3 Scale 50 kb hg38 chr7: 134,550,000 134,600,000 134,650,000 134,700,000 Inversion A D C B E GENCODE v24 comprehensive transcript set (only Basic displayed by default) 200 AKR1B10 AKR1B15 BGPM CALD1 AKR1B15 BGPM Deletion BGPM A B C E AC009276.4 Duplication A B C C C D E 0 1,000 10,000 20,000 30,000 Translocation V W C D E + A B X Y Z Event size (bp) Adapted from Huddleston, J. et al. Discovery and genotyping of structural variation from long-read haploid genome sequence data.
    [Show full text]
  • High-Fidelity Nanopore Sequencing of Ultra-Short DNA Sequences
    bioRxiv preprint doi: https://doi.org/10.1101/552224; this version posted February 16, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Title: High-Fidelity Nanopore Sequencing of Ultra-Short DNA Sequences Authors: Brandon D. Wilson1, Michael Eisenstein2,3, H. Tom Soh2,3,4* Affiliations: 1Department of Chemical Engineering, Stanford University, Stanford, CA 94305, USA. 2Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA. 3Department of Radiology, Stanford University, Stanford, CA 94305, USA. 4Chan Zuckerberg Biohub, San Francisco, CA 94158, USA. * Correspondence to [email protected] One Sentence Summary: We introduce a simple method of accurately sequencing ultra-short (<100bp) target DNA on a nanopore sequencing platform. Abstract Nanopore sequencing offers a portable and affordable alternative to sequencing-by-synthesis methods but suffers from lower accuracy and cannot sequence ultra-short DNA. This puts applications such as molecular diagnostics based on the analysis of cell-free DNA or single- nucleotide variants (SNV) out of reach. To overcome these limitations, We report a nanopore-based sequencing strategy in Which short target sequences are first circularized and then amplified via rolling-circle amplification to produce long stretches of concatemeric repeats. These can be sequenced on the Oxford Nanopore Technology’s (ONT) MinION platform, and the resulting repeat sequences aligned to produce a highly-accurate consensus that reduces the high error-rate present in the individual repeats.
    [Show full text]
  • Nanopore Sequencing Is a Credible Alternative to Recover Complete Genomes of Geminiviruses
    microorganisms Article Nanopore Sequencing Is a Credible Alternative to Recover Complete Genomes of Geminiviruses Selim Ben Chehida 1 , Denis Filloux 2,3, Emmanuel Fernandez 2,3, Oumaima Moubset 2,3, Murielle Hoareau 1, Charlotte Julian 2,3, Laurence Blondin 2,3, Jean-Michel Lett 1, Philippe Roumagnac 2,3 and Pierre Lefeuvre 1,* 1 CIRAD, UMR PVBMT, F-97410 St Pierre, La Réunion, France; [email protected] (S.B.C.); [email protected] (M.H.); [email protected] (J.-M.L.) 2 CIRAD, PHIM, F-34398 Montpellier, France; [email protected] (D.F.); [email protected] (E.F.); [email protected] (O.M.); [email protected] (C.J.); [email protected] (L.B.); [email protected] (P.R.) 3 PHIM Plant Health Institute, University Montpellier, CIRAD, INRAE, Institut Agro, IRD, F-34398 Montpellier, France * Correspondence: [email protected] Abstract: Next-generation sequencing (NGS), through the implementation of metagenomic protocols, has led to the discovery of thousands of new viruses in the last decade. Nevertheless, these protocols are still laborious and costly to implement, and the technique has not yet become routine for everyday virus characterization. Within the context of CRESS DNA virus studies, we implemented two alternative long-read NGS protocols, one that is agnostic to the sequence (without a priori knowledge of the viral genome) and the other that use specific primers to target a virus (with a priori). Agnostic Citation: Ben Chehida, S.; Filloux, D.; and specific long read NGS-based assembled genomes of two capulavirus strains were compared to Fernandez, E.; Moubset, O.; Hoareau, those obtained using the gold standard technique of Sanger sequencing.
    [Show full text]
  • Managing Infectious Disease Outbreaks Through Rapid Pathogen Genome Sequencing
    BRIEFING PAPER Managing infectious disease outbreaks through rapid pathogen genome sequencing February 2021 oxfordnanoporetech.com 1 Introduction With over two million attributed deaths to Global health security depends on the rapid potential association of novel pathogen variants recognition and containment of infectious diseases (such as the COVID-19 B1.1.7 and B1.351 variants date and a projected economic cost of and no government can afford to be complacent originally identified in the UK and South Africa) with $28 trillion1, the COVID-19 pandemic has about the risks posed to population health, changes to disease severity, transmission, and economic, political, and social stability and diagnostic and therapeutic efficacy. refocused global attention on the acute, wellbeing. It is possible to be prepared to prevent ever-present threat of infectious disease. and control such threats by investing in intelligent This briefing paper describes when, where, and how and agile public health tools to monitor for potential genomic epidemiology can offer critical and timely risks, enabling responses at appropriate speed and insights for infectious disease experts, public health scale to problems as they appear. professionals, and policy-makers to stay a step ahead of infectious disease threats, responding with Executive summary Genomic epidemiology is a crucial weapon in the maximal effect. public health fight against infectious diseases, • The threat of infectious disease is ever present — providing rapid identification and complete This briefing
    [Show full text]
  • Direct RNA Nanopore Sequencing of Full-Length Coronavirus Genomes Provides Novel Insights Into Structural Variants and Enables Modification Analysis
    Downloaded from genome.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press Method Direct RNA nanopore sequencing of full-length coronavirus genomes provides novel insights into structural variants and enables modification analysis Adrian Viehweger,1,2,5 Sebastian Krautwurst,1,2,5 Kevin Lamkiewicz,1,2 Ramakanth Madhugiri,3 John Ziebuhr,2,3 Martin Hölzer,1,2 and Manja Marz1,2,4 1RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany; 2European Virus Bioinformatics Center, Friedrich Schiller University Jena, 07743 Jena, Germany; 3Institute of Medical Virology, Justus Liebig University Gießen, 35390 Gießen, Germany; 4Leibniz Institute on Aging–Fritz Lipmann Institute, 07743 Jena, Germany Sequence analyses of RNA virus genomes remain challenging owing to the exceptional genetic plasticity of these viruses. Because of high mutation and recombination rates, genome replication by viral RNA-dependent RNA polymerases leads to populations of closely related viruses, so-called “quasispecies.” Standard (short-read) sequencing technologies are ill-suit- ed to reconstruct large numbers of full-length haplotypes of (1) RNA virus genomes and (2) subgenome-length (sg) RNAs composed of noncontiguous genome regions. Here, we used a full-length, direct RNA sequencing (DRS) approach based on nanopores to characterize viral RNAs produced in cells infected with a human coronavirus. By using DRS, we were able to map the longest (∼26-kb) contiguous read to the viral reference genome. By combining Illumina and Oxford Nanopore sequencing, we reconstructed a highly accurate consensus sequence of the human coronavirus (HCoV)-229E genome (27.3 kb). Furthermore, by using long reads that did not require an assembly step, we were able to identify, in infected cells, diverse and novel HCoV-229E sg RNAs that remain to be characterized.
    [Show full text]
  • De Novo Sequencing and Variant Calling with Nanopores Using Poreseq
    De novo Sequencing and Variant Calling with Nanopores using PoreSeq Tamas Szalay1 & Jene A. Golovchenko1,2* 1School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138 USA 2Department of Physics, Harvard University, Cambridge, Massachusetts 02138 USA *Corresponding author, email: [email protected] 1 1 The single-molecule accuracy of nanopore sequencing has been an area of rapid academic 2 and commercial advancement, but remains challenging for the de novo analysis of genomes. 3 We introduce here a novel algorithm for the error correction of nanopore data, utilizing 4 statistical models of the physical system in order to obtain high accuracy de novo sequences 5 at a range of coverage depths. We demonstrate the technique by sequencing M13 6 bacteriophage DNA to 99% accuracy at moderate coverage as well as its use in an assembly 7 pipeline by sequencing E. coli and ࣅ DNA at a range of coverages. We also show the 8 algorithm’s ability to accurately classify sequence variants at far lower coverage than 9 existing methods. 10 DNA sequencing has proven to be an indispensable technique in biology and medicine, 11 greatly accelerated by the technological developments that led to multiple generations of low 12 cost and high throughput tools1,2. Despite these advances, however, most existing sequencing-by- 13 synthesis techniques remain limited to short reads using expensive devices with complex sample 14 preparation procedures3. 15 Initially proposed two decades ago by Branton, Deamer, and Church4, nanopore 16 sequencing has recently emerged as a serious contender in the crowded field of DNA 17 sequencing.
    [Show full text]
  • A Systematic Benchmark of Nanopore Long Read RNA Sequencing for Transcript Level Analysis in Human Cell Lines
    bioRxiv preprint doi: https://doi.org/10.1101/2021.04.21.440736; this version posted April 22, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. A systematic benchmark of Nanopore long read RNA sequencing for transcript level analysis in human cell lines Authors: Ying Chen1, Nadia M. Davidson2,3, Yuk Kei Wan1, Harshil Patel4, Fei Yao1, Hwee Meng Low1, Christopher Hendra1,5, Laura Watten1, Andre Sim1, Chelsea Sawyer4, Viktoriia Iakovleva1,6, Puay Leng Lee1, Lixia Xin1, Hui En Vanessa Ng7, Jia Min Loo1, Xuewen Ong8, Hui Qi Amanda Ng1, Jiaxu Wang1, Wei Qian Casslynn Koh1, Suk Yeah Polly Poon1, Dominik Stanojevic1,9, Hoang-Dai Tran1, Kok Hao Edwin Lim1, Shen Yon Toh10, Philip Andrew Ewels11, Huck-Hui Ng1, N.Gopalakrishna Iyer10,12, Alexandre Thiery13, Wee Joo Chng6,7,14, Leilei Chen7,15, Ramanuj DasGupta1, Mile Sikic1,9, Yun-Shen Chan1, Boon Ooi Patrick Tan1,7,8, Yue Wan1, Wai Leong Tam1,7,16,17, Qiang Yu1, Chiea Chuan Khor1,12,18, Torsten Wüstefeld1,10,14, Ploy N. Pratanwanich1,19,20, Michael I. Love21,22, Wee Siong Sho Goh1, 23, Sarah B. Ng1, Alicia Oshlack2,24, Jonathan Göke1,10*, SG-NEx consortium Affiliations: 1.Genome Institute of Singapore, Singapore, Singapore 2.Peter MacCallum Cancer Centre, Melbourne, VIC, Australia 3.School of BioSciences, The University of Melbourne, Parkville, Victoria, Australia 4.Bioinformatics and Biostatistics, The
    [Show full text]
  • Biological Nanopores: Engineering on Demand
    life Review Biological Nanopores: Engineering on Demand Ana Crnkovi´c*, Marija Srnko and Gregor Anderluh National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia; [email protected] (M.S.); [email protected] (G.A.) * Correspondence: [email protected] Abstract: Nanopore-based sensing is a powerful technique for the detection of diverse organic and inorganic molecules, long-read sequencing of nucleic acids, and single-molecule analyses of enzymatic reactions. Selected from natural sources, protein-based nanopores enable rapid, label-free detection of analytes. Furthermore, these proteins are easy to produce, form pores with defined sizes, and can be easily manipulated with standard molecular biology techniques. The range of possible analytes can be extended by using externally added adapter molecules. Here, we provide an overview of current nanopore applications with a focus on engineering strategies and solutions. Keywords: nanopores; pore-forming toxins; sensing; aptamers; oligomerization 1. Introduction Nanopore-based sensing is an emerging technology with great potential for the de- tection of diverse organic molecules, sequencing of nucleic acids, and single-molecule analyses of enzymatic reactions and protein folding. Conceptually, nanopore biosensing belongs to the so-called resistive-pulse methods. A classic example of such a method is the Coulter counter, originally developed in the 1950s to count blood cells [1]. The instrument contains a capillary, which is divided into two parts by a wall containing a 20 µm–2 mm aperture. The capillary is filled with an electrolyte solution and an applied electric field causes ions to move through the opening, creating a constant ionic current. As the blood Citation: Crnkovi´c,A.; Srnko, M.; cells move through the narrow aperture, they partially block the aperture, causing a de- Anderluh, G.
    [Show full text]
  • Nanopore Sequencing the Advantages of Long Reads for Genome Assembly
    WHITE PAPER Nanopore sequencing The advantages of long reads for genome assembly SEPTEMBER 2017 nanoporetech.com nanoporetech.com/publications OXFORD NANOPORE TECHNOLOGIES | THE ADVANTAGESAPPLICATION ANDOF LONG ADVANTAGES READS FOR OF GENOMELONG-READ ASSEMBLY NANOPORE SEQUENCING TO STRUCTURAL VARIATION ANALYSIS Contents 1 De novo assembly 2 Advantages of long-read sequencing for genome assembly 3 Genome assembly tools 4 Case studies 5 Summary 6 About Oxford Nanopore Technologies 7 References OXFORD NANOPORE TECHNOLOGIES | THE ADVANTAGES OF LONG READS FOR GENOME ASSEMBLY Introduction Over the last decade, improvements in Whole genome assembly – next generation DNA sequencing solving the puzzle technology have transformed the field Traditional technologies have required of genomics, making it an essential tool users to sequence short lengths of DNA, in modern genetic and clinical research which must then be reassembled back laboratories. The facility to sequence into their original order as accurately as whole genomes or specific genomic possible. Such short-read sequencing regions of interest is delivering new insights technologies, however, present a number into a variety of applications such as of challenges, particularly the difficulty of human health and disease, metagenomics, accurately analysing repetitive regions and antimicrobial resistance, evolutionary large structural variations.1 biology and crop breeding. This means that many reference genomes that were created using short-read For applications such as the analysis of sequencing are highly fragmented, which larger structural variation, or de novo in turn introduces bias into any alignments assembly, whole genome sequencing made against that reference2. This review shows how these challenges are now (WGS) is typically the technique of choice.
    [Show full text]