cDNA SEQUENCING WITH OXFORD NANOPORE

Getting Started In medicine, the identification of differentially With nanopore sequencing, there is no Figure 1. Introduction spliced isoforms and fusion transcripts can upper read length limit: size selection is inform disease diagnosis and treatment. not necessary and whole transcripts can be sequenced end-to-end in single reads, RNA sequencing provides a rapid, effective enabling simple, accurate assembly, with the The ability to sequence RNA has method of virus identification, and in ability to distinguish between highly similar provided invaluable insight into developmental biology the ability to track isoforms, identify novel transcripts, and detect the study of living things across transcriptional changes over time helps to fusions. Furthermore, rates of multimapping numerous applications, shedding resolve the developmental mechanisms at are significantly lower . light on the dynamics of the (see figure 1) play. Transcriptomics also has applications in 2 Multimapping reads (%) , from single cells environmental and agricultural science, such Accurate isoform quantification is attainable to whole tissues. as in strategies for pest management. with nanopore sequencing, as demonstrated using spike-in RNA standards (ERCC) Despite the numerous advances made in Short read ONT cDNA (see figure 2). transcriptome analysis using short-read cDNA sequencing technologies, these methods Figure 2. have several shortcomings. a) Standard PCR-cDNA b) Strand-specific PCR-cDNA • Transcripts are generally several kilobases long: the typical human gene contains 12 exons each with an average length of

1 236 base pairs and alternative splicing has 2 2

2 observed counts observed counts been observed in 95% of human genes. 10 Spearman r = 0.989; 10 Spearman r = 0.96; p < 0.001 p < 0.001 Log Log • Short reads only partially cover a 1 1 Log expected counts Log expected counts transcript’s length, making accurate 10 10 isoform assembly a difficult process c) Direct cDNA d) Direct RNA reliant on computational reconstruction; short reads also exhibit high rates of multimapping. 2 2 observed counts

Spearman r = 0.989; observed counts Spearman r = 0.97; 10 10 p < 0.001 p < 0.001 Log Log 1 1

Log10 expected counts Log10 expected counts References 1 1 2 RNA n n 1. NCBI: https://www.ncbi.nlm.nih.gov/Web/ Newsltr/Spring03/human.html

2. NCBI: https://www.ncbi.nlm.nih.gov/Web/ Newsltr/Spring03/human.html

2 Figure 3.

NT PC-cN NT irect cN NT irect N Short-read cN

Pn Pn 1 1 Pn 1 2 Pn 1 1 12 1 1 1 1 1 og count og count og count og count 2 2 2

1 2 1 2 1 2 1 2 C content C content C content C content

GC bias is virtually absent in nanopore cDNA sequencing run, nanopore sequencing experiment. Here, we present our newly data (see figure 3) compared to short-read runs in real-time: reads can be basecalled updated Direct and PCR cDNA Sequencing data, whilst PCR bias is minimal in PCR-cDNA and analysed as sequencing progresses, Kits, providing greater enrichment for full- sequencing and absent in direct cDNA significantly reducing sample-to-answer length transcripts from lower sample inputs sequencing (see figure 4). Furthermore, unlike time. Runs can also be stopped as soon as and giving the highest yields yet for high- with traditional sequencing methods where sufficient data has been generated – for depth transcriptome sequencing. data analysis begins after completion of a example, once a coverage target is reached. For high-throughput cDNA sequencing without Oxford Nanopore offers highly flexible, PCR bias, the Direct cDNA Sequencing Kit Figure 4. scalable methods of long-read RNA provides an amplification-free sequencing. In a sequencing first, the Direct preparation method. If RNA is limited, the RNA Sequencing Kit provides the only method PCR-cDNA Sequencing Kit is ideal for of sequencing full-length native RNA strands generating the highest sequencing output 2 without the need for conversion to cDNA, from as little as 1 ng of starting material. allowing for the analysis of both sequence

e PC bias and base modifications.

1 elati v For high-depth sequencing of full-length transcripts, we offer both direct and PCR- based cDNA sequencing kits, enabling both 1 ng 1 ng 1 ng 50 ng high-confidence identification and accurate 10 cycles 14 cycles 18 cycles direct cN quantification of transcripts within a single

3 New library: 2.5x less for direct cDNA and 50x Improved detection of cDNA less for PCR-cDNA samples. Finally, when full-length PCR-cDNA reads As the strand-switching method of cDNA combined with improvements across our preparation relies on generating strands devices and software, this improved efficiency Optimisation of the cDNA PCR primers which are end-to-end reverse transcribed, updates has significantly boosted sequencing (cPRM) used to amplify full-length transcripts the efficiency of the reverse transcriptase throughput capabilities for in the PCR-cDNA library preparation has used is a major factor in successfully both direct and PCR cDNA libraries: whole allowed for greater detection of end-to-end obtaining full-length cDNA strands for A new generation of can be sequenced to high reads by our software, again increasing sequencing. By updating our reverse cDNA sequencing coverage on single flow cells. full-length transcript coverage. transcriptase to Maxima H Minus Introducing the newly updated Direct (ThermoFisher) we have observed greater cDNA and PCR-cDNA Sequencing Greater stability efficiency in generating full-length cDNA Kits: with multiple improvements to strands, with a marked increase in Improvements to the formulation of our chemistry, in combination with representation of longer transcripts in sequencing adapters have ensured that upgrades to our sequencing platforms sequencing data. This switch has facilitated they are now freeze-thaw resistant, giving and software, nanopore cDNA a dramatic reduction in the amount of starting greater stability and further improving sequencing now delivers full-length poly-A+ RNA required to generate a cDNA sequencing throughput. transcripts at our best-yet sequencing throughput, from dramatically reduced RNA input. cDNA meets PromethION Both cDNA kits are now fully PromethION- enabled, facilitating the generation of very WHAT’S NEW? high yields of transcriptomic data for high- resolution rare isoform discovery, identification • Lower input of low copy number transcripts and de novo • Longer transcripts sequencing of entire transcriptomes to high • Higher throughput coverage. Multiplexing options for both direct and PCR-cDNA protocols also allow the sequencing of multiple transcriptomes on single PromethION Flow Cells.

4 Start hich it do I choose here

NO Already prepared YES cDNA Library?

Interested in NO Poly-A YES ≥100 ng YES modifications? enriched? sample?

NO NO YES

Direct RNA Sequencing Kit PCR-cDNA Sequencing Kit Direct cDNA Sequencing Kit

Preparation time 115 mins Preparation time 165 mins Preparation time 275 mins

Input requirement 500 ng RNA (poly-A+) Input requirement 1 ng RNA (poly-A+) Input requirement 100 ng RNA (poly-A+)

RT required Optional RT required Yes RT required Yes

PCR required No PCR required Yes PCR required No

Read length Equal to RNA length Enriched for Enriched for Read length Read length full-length cDNA full-length cDNA Typical throughput ● ● ● Typical throughput ● ● ● Typical throughput ● ● ● Typical number of reads 1 million Typical number of reads 7–12+ million Typical number of reads 5–10 million Multiplexing options In development Multiplexing options Yes Multiplexing options Yes

Buy now https://store.nanoporetech.com/direct- Buy now https://store.nanoporetech.com/cdna- Buy now https://store.nanoporetech.com/direct- rna-sequencing-kit.html pcr-sequencing-kit.html cdna-sequencing-kit.html

5 rom sampe to anser

S

The interactive ford rd anre rt ider Nanopore Protocol Builder enables you to generate your own application-specific d protocol advice starts at G deign the beginning with N S rt etraction taking you through library prep sequencing and finally data analysis.

e rt ider enae t reate r n t rt

https://community.nanoporetech.com/knowledge/protocolbuilder/

anre nit nedge The preservation of high-quality N requires great care in handling and storage. The nowledge section of the Community provides in-depth guidance for best practice when working with N samples with tips tailored at d need t nider towards preserving full-length mNs to make the most of long-read nanopore sequencing. en ring it ead re at ring it

https://community.nanoporetech.com/knowledge

6 The strand-switching method can be tailored to select for Seeneeii eati transcripts of interest allowing for targeted PC cN sequencing. The N primer (NP) supplied in the PC-cN Sequencing it hybridises to any poly()-tailed N and is tailed with a primer site for subsequent amplification. By tailing your an d custom gene-specific primers with this primer site sequence you geneeii S can ensure that first-strand cN synthesis is initiated only for eening targets of interest. isit the protocol for more information.

Seeneeii eening rt

https://community.nanoporetech.com/protocols/ss-cdna- pcr-sequencingsqk-pcs109/v/pcs109ssv1

or best results we recommend that poly- enriched N is used as a starting material in PC-cN library prep however cN libraries can also be successfully prepared from total N. When working with human samples we have observed that 50 ng total N produces both good amplification yields and sequencing throughput but this is an approimate guide the proportion of an + tart r poly- N within a total N sample varies between organisms and tissues and some optimisation may be required. tta

erie te rt

https://community.nanoporetech.com/protocols/cdna-pcr- sequencingsqk-pcs109/v/pcs9085v109revb04feb2019

7 arding it Barcoding options are available for iret arding eati both cN library prep methods: the Native Barcoding Epansion Packs can be used for irect cN multipleing whilst the PC Barcoding kit can be used to multiple PC-cN samples. d We are currently developing integrated tie cN barcoding kits for both irect ae and PC-cN enabling end-to-end multipleed cN library preparation.

erie te iret arding rt

https://community.nanoporetech.com/protocols/direct-cdna- nativebarcodingsqk-dcs109/v/dcs109barcodingv1/over view-of-the-strand-swi

cN libraries which have already been synthesised using a third-party kit can be prepared for sequencing using the irect cN Sequencing it as the kit has been specifically optimised for cN libraries this will give greater performance in sequencing than when using the igation Sequencing it. To ensure that your workflow is best tailored to nanopore sequencing, we strongly recommend the use of Oxford Nanopore’s cDNA an tart r kits for end-to-end sample prep from N: our primers and adapters are rereared purified to a very high standard and the strand-switching method of irarie enriching for full-length transcripts ensures the best results from long-read sequencing technology.

erie te iret rt

https://community.nanoporetech.com/protocols/direct-cdna- sequencingsqk-dcs109/v/dcs109v1/overview-of-the-strand-swi

8 rgt er deie er 8 r The chemistry underpinning nanopore sequencing can be scaled up or down: our devices make full use of this ranging from the miniature portable longle and inIN through to the fleible ridIN and up to the very high-throughput benchtop PromethIN platform. i To help choose a sequencing device that fits your deie d SG application visit the product comparison page. need

rdt detai a e nd ere

https://nanoporetech.com/products/comparison

etermining the etent of sequencing required before transcript discovery plateaus ensures that samples can be sequenced until sufficient data is generated to cover transcripts of interest. To help identify where this plateau could occur we sequenced the Drosophila melanogaster transcriptome using the PC-cN Sequencing it. igned read er tranrit erage t 10 million reads approimately 30000 transcripts were identified a further 20 million reads enabled the identification of only 500 more. Whilst more data may be required data d depending on sample type or when looking for rarer SG need transcripts or isoforms 10 million reads is an approimate guide this can be achieved using the direct or PC-cN sequencing kit on a single inIN low Cell.

ind te rigt it r r eerient

https://store.nanoporetech.com/cdna-and-direct-rna/

9 We have a suite of transcriptomic data analysis tools available on github: er identification of full-length cN reads ini genome annotation using long-read an transcriptomics data egin anaing ieine ini anai a Snakemake pipeline using SS data minimap2 and pinfish to annotate genomes using long reads We are also developing application-specific tutorials providing complete step-by-step instructions for analysing your data. These can be found in the nowledge section of the Community.

ete gide are aaiae in te ne iinrati etin n te nit

https://community.nanoporetech.com/knowledge/bioinformatics

S

10 cDNA CASE STUDY 1 Differential gene expression in an agricultural pest: and the cDNA sequencing of the developing olive fruit fly embryo Nanopore In his talk at the Nanopore Community Meeting short-read technology, 40x fewer nanopore doublesex gene, initially generated using 2018, Anthony Bayega described how he used reads and 7.9x fewer bases were required to short-read sequencing; Anthony explained how nanopore cDNA sequencing to delineate the detect the same number of genes, whilst gene targeting this gene via CRISPR-Cas9 in caged Community transcripts associated with embryonic counts were concordant between the two. The mosquitoes had enabled complete population development of the olive fly, an important pest transcriptional expression profiles were suppression, suggesting doublesex could for cultivated and wild olive fruits. Sex compared over the 6 hours, identifying clear make an excellent target for olive fly population determination occurs within the first 6 hours of transitions in the developmental processes suppression. Finishing on a description of his olive fly embryo development, and is mediated taking place. Temporal clustering of transcripts updated methods, Anthony noted that the new through alternative splicing events; however, using absolute expression profiles revealed PCR-cDNA kit had doubled the yield from his the male-determining factor remains unknown. clusters of genes involved in early libraries with a simpler workflow. In a time series experiment, cDNA libraries development becoming less abundant over

from olive fly embryo total RNA taken at 1-6 time, whilst those involved in later development View Anthony’s full talk here hours post-oviposition were prepared and increased over time. The long reads were also https://nanoporetech.com/resource-centre/ sequenced. Compared to data from a used to improve the annotation of the anthony-bayega-transcriptome-landscape- developing-olive-fruit-fly-embryo-delineated

CASE STUDY 2

Rapid oncogenic gene fusion testing with cDNA sequencing

William Jeck has demonstrated how nanopore more rapidly by sequencing samples on the fusion. A PML-RARA fusion was identified in a cDNA sequencing can be used to detect MinION. The team used Anchored Multiplex leukaemia sample, whilst a HAS2-PLAG fusion clinically relevant fusion transcripts in research PCR (AMP) to generate targeted cDNA was called in an FFPE lipoblastoma patient patient samples. In his talk at the Nanopore libraries; these were then sequenced in specimen as a part of the research project. Community Meeting 2018, he used clinical multiplex on the MinION. Reads were aligned An ELBR-FLI1 gene fusion was also cases to highlight the challenges of identifying to a synthetic transcript library: the resulting successfully identified in a sample by translocations responsible for diseases, such dataset enabled both the identification and sequencing on the Flongle device, showing as acute promyelocytic leukemia and sarcoma, quantification of fusion transcripts from the Flongle’s potential as a future single-sample, using traditional methods: same dataset. The pipeline was first used to cost-effective diagnostic tool. immunohistochemistry testing can be successfully identify a BCR-ABL1 fusion in an imprecise, and outsourcing sequencing can erythroleukemia cell line, with the first fusion View William’s full talk here take a week or more. With time to treatment reads generated within seconds of starting the https://nanoporetech.com/resource-centre/ often crucial, William investigated whether run; furthermore, long reads enabled resolution william-jeck-nanopore-sequencing-and-rapid- these fusion transcripts could be identified of long-range exon structures across the fusion-testing-killer-app-molecular

11 Oxford Nanopore Technologies Phone: +44 (0)845 034 7900 Email: [email protected] Twitter: @nanopore www.nanoporetech.com

Oxford Nanopore Technologies, the Wheel icon, EPI2ME, Flongle, GridION, MinION, MinIT, PromethION and VolTRAX are registered trademarks of Oxford Nanopore Technologies in various countries.

© 2019 Oxford Nanopore Technologies.

Oxford Nanopore Technologies products are currently for Research Use Only. Not for use in diagnostic procedures.

GS_1010(EN)_V1_13Feb2019