Advancing Human Genetics Research with Nanopore Sequencing

HUMAN GENETICS RESEARCH WHITE PAPER

Advancing human genetics nanoporetech.com/publications research with nanopore sequencing

nanoporetech.com OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Contents

1The advantages of nanopore sequencing for human genetics research

2Case studies

3Summary

4About Oxford Nanopore Technologies

5References OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Introduction

High-throughput sequencing technologies Despite offering significant advancement have revolutionised the field of human in terms of speed and resolution over the genetics, allowing researchers to more older techniques of Sanger sequencing easily investigate and understand biological and microarrays respectively, there are processes and their impact. Using these still a number of limitations inherent to technologies, researchers can analyse traditional short-read, high-throughput entire genomes or specific targeted regions sequencing platforms. From closing of interest, with further functional insights genome gaps to characterising full-length garnered through the characterisation transcripts, this review will present how and quantification of RNA transcripts and real-time, long-read, high-throughput isoforms. Together, these capabilities nanopore sequencing technology is have provided unprecedented insight into being used to address these limitations, human genetic diversity and its implications resulting in new biological insights. Specific in health and disease. case studies reveal how researchers are applying the benefits of nanopore technology to a variety of sequencing techniques, including whole genome, targeted and RNA sequencing.

1 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

The advantages of nanopore sequencing for human genetics research

Whole genome sequencing The required DNA fragmentation step effectively loses relative positional without the holes information, meaning that during genome 1 To date, the majority of large genomes assembly, these small fragments must sequenced have utilised short-read be pieced back together through technologies, which require the DNA to overlap with other short fragments. For be fragmented into small (typically 150- repetitive regions, this can be particularly 300 bp) lengths prior to sequencing. challenging and may result in the While these technologies allow rapid collapsing of potentially long regions of and relatively cost-effective genome repeats down to much shorter lengths, analysis compared to older sequencing leaving gaps in the assembly (Figure 1)5. methodologies, their inherent read- length limitations preclude the analysis In the same way, the presence of some of vast stretches of DNA corresponding structural variants such as deletions, to repetitive regions and large structural insertions, duplications, inversions and variations1,2,3. It is for these reasons translocations may be missed when using that approximately 8% of the human short sequencing reads alone. It is well genome is intractable to assembly and established that some repeat regions interpretation4. and structural variants are associated with human health and disease (e.g. ageing6, triplet expansion diseases7, autism8, epilepsy9 and cancer10,11), making their routine characterisation highly advantageous.

Figure 1 Genome sequence Schematic highlighting the advantages of long reads in de novo assembly of Short reads repetitive regions. Long read lengths are more likely to incorporate the whole repetitive region (blue boxes) Short-read consensus allowing more accurate assembly. Image adapted from Kellog12. Long reads

Long-read consensus

2 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Unlike short-read sequencing technologies, Rapid analysis of targeted nanopore-based sequencing processes regions the entire length of the DNA fragment that is presented to the pore. Complete For researchers wishing to study specific fragments of thousands of kb are routinely genomic loci, a targeted sequencing processed and ultra-long read lengths over approach is commonly employed. The 2 Mb have been achieved13. Clearly, such facility to focus on the regions most likely long reads are more likely to span entire to provide relevant data reduces cost, regions of repetitive DNA and structural allows a higher depth of coverage and variation. As a result, nanopore sequencing simplifies analysis. provides a more complete view of genetic variation. A range of targeted sequencing methodologies are available with nanopore ‘Nanopore sequencing allows same- sequencing, from bespoke assays day detection of structural variants, to targeted panels and whole exome point mutations, and methylation approaches. Nanopore sequencing has been successfully utilised with both profiling using a single device with PCR- and hybrid capture-based targeted 10 negligible capital cost’ . enrichment strategies – including whole exome enrichment16,17. Using nanopore sequencing, highly contiguous genome assemblies have As discussed for whole genome been generated for many large and sequencing, the long reads provided by complex organisms that had previously nanopore technology provide a range of been deemed inaccessible to modern advantages for researchers interested in sequencing methods14,15. Researchers targeted sequencing. For example, it is recently used nanopore sequencing to possible to sequence much larger regions deliver the most complete human genome in a single read, which allows improved ever assembled with a single technology characterisation of highly repetitive regions (Case study 1) and the first complete and/or structural variants (see Case study 4). and accurate sequence of a human Furthermore, long sequencing reads allow centromere (Case study 2). Furthermore, the phasing of alleles and variants, which as will be discussed later, the long, direct is extremely challenging when using sequencing reads delivered by nanopore short-read sequencing18. technology also permit the phasing of variants and modified bases.

3 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Figure 2 The Reference Alignment tool allows easy, real-time visualisation of targeted or whole genome sequencing coverage, allowing researchers easily assess the efficiency of their sequence targeting.

A number of researchers are now also outside of the chosen criteria is achieved investigating the potential of CRISPR/ through reversing the current applied to Cas9 to enrich for long, targeted DNA the individual pore, thereby freeing up molecules (see Case study 5). Initial results the pore to sequence an alternative DNA have shown significant promise for such fragment. Although this potential real- techniques, which may offer faster, more time enrichment strategy is in its infancy, streamlined workflows and, through direct researcher-developed tools such as Read sequencing of the target molecule, enable Until19 and RUBRIC20 are available. the analysis of base modifications. Oxford Nanopore provides researchers ‘The challenge of aligning short reads to with a cloud-based analysis platform, regions with high homology is often not EPI2ME. Among the workflows available 18 is the Human Alignment analysis tool, fully appreciated’ . which aligns nanopore reads to the human GRCh38 reference genome, A unique feature of nanopore technology generating real-time coverage plots at the is the facility to utilise real-time analysis chromosome or gene level (Figure 2). This to selectively sequence or reject DNA unique tool allows researchers to easily molecules as they pass through the assess the efficiency of their sequence pore. Rejection of molecules that fall targeting.

4 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Full-length RNA transcripts, reconstructed; however, a study by Steijger .21 revealed that automated isoform characterisation and et al transcript assembly methods fail to identify accurate quantification all constituent exons in over half of the Due to the fragmentation required by transcripts analysed. Furthermore, of most traditional sequencing methods, those transcripts with all exons identified, accurate assembly of complete transcripts over half were incorrectly assembled. Such is exceptionally challenging, especially in complications are further compounded instances where a read maps to more than where reads from highly similar transcripts, one location. With nanopore technology, such as those of paralogous genes, are entire RNA molecules are processed under investigation. Rare isoforms could regardless of their length, allowing the also remain altogether undetected22. sequencing of complete transcripts in single reads. Describing the additional insight gained from nanopore sequencing, Dr Christopher In addition to reducing multiple-locus Vollmers at the University of California, alignment issues, long, full-length reads Santa Cruz, comments: ‘It’s rare to find provide a significant advantage in the an isoform that perfectly matches the analysis and correct identification of annotation, and that’s in the human alternative splicing (Figure 3). When genome which is already super-curated. using short-reads, different transcript Once in a while, you find an exon that isoforms have to be computationally nobody has annotated mostly because

Gene / precursor Exon 1 Exon 2 Exon 3 Exon 4 Exon 5 Exon 6 Exon 7 Exon 8 Exon 9 mRNA

Short-read RNA-Seq

Nanopore long-read RNA sequencing

Figure 3 Alternative splicing can give rise to numerous mRNA isoforms per gene, which in turn can alter protein composition and function. The short reads generated by traditional RNA sequencing techniques lose positional information, making the correct assembly of alternative mRNA isoforms challenging. Long nanopore reads can span full-length transcripts simplifying their identification.

5 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

it may contain an Alu element or other Direct RNA sequencing kind of repeat, so short reads won’t align to it. Most splice junctions [identified by Until recently, sequence-based analysis of short-read technology] are correct but RNA required the conversion of RNA to transcription start sites and poly-A sites, complementary DNA (cDNA), a process that’s the wild west – there is so much that can introduce bias through reverse going on that is not on any annotations’. transcription or amplification25. These issues can be exacerbated by the use Recently, a team of researchers from of traditional short-read sequencing the University of Oxford and Earlham technologies which are known to Institute utilised nanopore sequencing exhibit GC bias, where sequences to investigate expression of the with low or high levels of GC content neuropsychiatric disease-associated are underrepresented (Figure 4). The gene CACNA1C 23. The long nanopore amplification step required to generate reads enabled the identification of 38 cDNA also loses all modified base putative novel exons and 90 transcript information. Such base modifications isoforms, of which only 7 had been are known to have a role in modulating previously identified. Interestingly, 9 of the the activity and stability of RNA and top 10 expressed isoforms were novel, are therefore of increasing interest to previously unknown transcripts. This and researchers. Nanopore sequencing other studies have also demonstrated overcomes all of these challenges through how nanopore sequencing provides the facility for direct RNA sequencing — highly accurate measurement of transcript delivering unbiased, full-length, strand- abundance23,24. specific RNA sequences26. The longest transcript processed by direct RNA The advantages of nanopore technology sequencing currently stands at over 20 kb are now also being applied to single cell in length27. transcriptome studies, delivering further, more detailed insight into gene expression and function (see Case study 7).

6 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Figure 4 Sequencing workflows that a) xord anopore xord anopore xord anopore incorporate amplification -cD direct cD direct hort-read cD are vulnerable to sequence-

earson r 4; p 6 e- earson r 6; p earson r 3; p 9 earson r 9; p 6 e- specific biases. Yeast 10 transcriptome libraries 8 were prepared using three 6 6 nanopore sequencing 6 4 4 techniques (PCR-DNA, 4 og count og count og count og count direct cDNA and direct 2 RNA) and a typical short- 0 read cDNA technique. In 3 4 6 3 4 6 3 4 6 3 4 6 content content content content all cases, GC bias in the nanopore data sets was lower than in the short-read b) xord anopore xord anopore xord anopore data set26. A further-cD benefit of direct RNAdirect cD Thesedirect tails can vary in size,hort-read with the cD largest sequencing is the ability to accurately being over 250 nucleotides in length and earson r 4; p 7 earson r ; p e- earson r 3; p 4e-9 earson r 3; p 7e-4 measure poly-A tail length (see Case therefore beyond the typical analysis 27,28 study 6) . In eukaryotes, messenger capabilities of short-read sequencing 6 6 27,28 RNA6 (mRNA) is augmented with a series technologies . Research suggests that 4 4 4 og count og count og count og count of adenosine bases at the 3’ end known poly-A tail length is an important factor in as the poly-A tail. post-transcriptional regulation and further 4 6 4 6 4 6 4 6 ength kb ength kb studyength may kbprovide new insightsength into kb gene ‘We believe that direct RNA sequencing expression and disease27,29. will become a versatile tool for transcriptome analysis in the “complete genome era” of the future’50.

7 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Analysis of base modifications The requirement for nucleic acid amplification in traditional short‑read The importance of base modifications sequencing technology erases these such as 5‑methylcytosine (5mC) and base modifications, meaning they N6‑methyladenosine (m6A) on gene cannot be detected without additional expression and function is becoming time‑consuming and often inefficient increasingly apparent. For example, 5mC sample processing methods10,32. has been linked to many human diseases, Nanopore sequencing does not require including neurological disorders30 and amplification or strand synthesis, allowing cancer31, and may offer significant both the base and its modification to potential as a diagnostic and prognostic be detected in the same sequencing indicator. run (Figure 5). To date, researchers have utilised nanopore sequencing to ‘Methylation data can directly be detect a number of modified bases obtained from the same WGS data set from both DNA and RNA, including pseudouridine27, N6‑methyladenosine which makes time-consuming bisulfite (m6A)25, 5‑methylcytosine (5mC)25, and conversion and specialized methylation 7‑methylguanosine (m7G)32. assays (sequencing or hybridization- based) expendable’10.

Figure 5 The Tombo data analysis package 2.5 allows identification of a range of modified bases from raw nanopore signal. 0 Signal

-2.5

Consensus A U C C C C C A U G A A C G C C C A A A U C C C A G U A A G U

Reference A U U C C C C A U G A A C G A G G A A U U C C C A G U A A G U Mods Y m7G Y

Fig. 1 Modifications in human rRNA a) raw reads

X03205.1

Human SSU rRNA 88 bp 580 bp 590 bp 600 bp 610 bp 620 bp 630 bp 640 bp 650 bp 660 bp

[0-0.99]

Proportion of reads modified (Tombo prediction)

Reference UA A AUCCUUUA ACGAGGAUCCAUUGGAGGGCA AGUCUGGUGCCAGCAGCCGCGGUA AUUCCAGCUCCA AUAGCGUAUAUUA A AGUUGC rRNA modifications Am Am Gm Y Um Gm Y Y (2’-O-methyl A) (2’-O-methyl G) (pseudouridine) (2’-O-meU)

Fig. 1 Modifications in human rRNA a) Tombo prediction OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Nanopore costs just $1000 (including two Phasing of variants flow cells and sequencing reagents) and Determining the maternal or paternal is powered by the USB port on a laptop inheritance of an allele can deliver or the MinIT™ accessory (Figure 6). With insights into genome function, health, current yields of up to 30 Gb per flow cell, disease and evolution. It is also becoming the uniquely transportable and affordable an increasingly important avenue of MinION provides any researcher with research for the advancement of precision access to the benefits of long-read, real- medicine33. However, the nature of short- time sequencing technology. read sequencing technologies make it extremely challenging to unambiguously ‘We’ve gone from a assign parental origin for variants situation where you separated over large genomic regions. can only do genome

The long reads delivered by nanopore sequencing for a huge sequencing simplify the phasing of alleles. amount of money in In one recent study, researchers were able well-equipped labs to to phase the entire 4 Mb human major one where we can have histocompatibility complex (see Case study 1)1. In addition to the phasing of genome sequencing SNVs, nanopore technology also allows literally in your pocket the phasing of SVs and base modifications just like a mobile — providing unprecedented depth of phone’43. genomic characterisation.

Cost-effective, scalable and GridION™ X5 and PromethION™ offer respectively 5 and 300 times the yield on-demand analysis in real-time of the MinION, providing users with Oxford Nanopore provides a range of the facility to cost-effectively scale devices that provide cost-effective, fully- their research to meet the demands scalable and on-demand sequencing of routine whole genome sequencing to suit all research requirements. — for example, the characterisation Unlike traditional sequencing platforms of cancer cell lines — or high-depth that require large capital investments RNA sequencing of a large number of (>$50k–1M)34, significant infrastructure samples (Figures 7 and 8). Both devices and calibration by trained engineers, are available with no capital expenditure the MinION™ Starter Pack from Oxford and deliver a comparable cost-per-base

9 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

to traditional sequencing platforms. In ‘PromethION generates addition, the facility to use flow cells such a lot of data in such independently allows other projects to be a consistent way that we run concurrently with no need for sample batching — delivering rapid access to can more easily access results. any genome’49.

Oxford Nanopore is also developing Flongle™, a flow cell adapter designed to provide even more cost-effective analysis of smaller, more frequently performed tests and experiments.

A range of dedicated, streamlined library preparation kits are available to suit all experimental requirements and include the facility for sample multiplexing.

Figure 6 Figure 7 Figure 8 MinION: a pocked sized, portable device. GridION X5: 5 independent flow cells with PromethION: 48 independent flow Each flow cell can run up to 512 channels integrated data processing. cells, each of which can run up to at a time. 3,000 channels at a time. Integrated data processing for high-throughput sequencing.

10 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

2 Case studies

11 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Case study 1 The most complete human genome ever assembled with a single technology

Although short-read sequencing Using the assembly tool Canu, a highly technologies have improved considerably contiguous assembly was produced, over the last decade in terms of yield and comprising 2,886 contigs with an NG50* turnaround times, according to Jain and contig size of ~3 Mb. The superior coworkers: ‘assembling human genomes genome contiguity offered by nanopore with high accuracy and completeness sequencing was exemplified by the remains challenging’1. At ~3.1 Gb, the inclusion of the highly repetitive — and human genome is not only large, it also thereby notoriously difficult to assemble — contains regions of uneven nucleotide HLA class I region in a single contig. composition, high levels of repetitive content (up to 69%35) and large segmental At over 2 Mb, the longest duplications. As a result, most human sequencing read set genome assemblies are highly fragmented a new record for a and contain gaps that both limit their structural integrity and subsequent single contiguous DNA biological interpretation. Furthermore, sequence. short sequencing reads prevent the assignment of alleles or variants to their To investigate the impact of increasing original chromosome. Such ‘phasing’ read length on assembly contiguity, information provides significantly more the team further used a modified insight into gene expression and function phenol:chloroform extraction technique and is of particular importance when together with the streamlined Rapid studying genetic disease. Sequencing Kit to generate ultra-long reads. Approximately 18 Gb of ultra-long To assess the potential of long nanopore read data was obtained (equivalent to sequencing reads to overcome these 5x genome coverage), with the longest issues and deliver more contiguous, mapped read being 882 kb. More complete genomes, a team comprising recently, researchers from the University researchers from the UK, USA and of Nottingham have obtained a human Canada, used the MinION to sequence ultra-long read in excess of 2 Mb — a the well-characterised human reference new record for a single contiguous DNA genome NA12878. The team deployed sequence36. a standard kit-based DNA extraction method together with the Ligation Sequencing Kit to generate long sequence reads. In total, 91.2 Gb of data was generated, which is equivalent to 30x genome coverage.

* The NG50 value represents the longest contig such that contigs of this length or greater sum to at least half of the haploid genome size. 12 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Long nanopore sequencing reads allowed was further reflected through the utilisation phasing of the entire 4 Mb MHC region. of the nanopore data to close 12 large (>50 kb) gaps in the GRCh38 reference genome, which corresponded to 83,980 The additional ultra-long reads not only bp of previously unknown euchromatic doubled the assembly contiguity (NG50 sequence. ~6.4 Mb) but also significantly improved the facility to phase alleles. For example, Unlike short-read technology, nanopore it was possible to phase the entire 4 Mb sequencing also allows the direct major histocompatibility complex (MHC) detection of DNA modifications alongside that was contained within a single 16 Mb the nucleotide sequence. In this study, the contig (Figure 9). As stated by the levels of 5-methylcytosine (5mC), detected researchers: ‘The increased single- were highly concordant with results molecule read length that we report obtained using alternative methylation here, obtained using a MinION nanopore analysis techniques. sequencer, enabled us to analyse regions of the human genome that were Data from this study is available at: github. previously intractable with state-of-the-art com/nanopore-wgs-consortium/NA12878 sequencing methods’1. This sentiment

Figure 9 Phasing of the entire 4 Mb MHC region. The long read lengths delivered by nanopore sequencing allowed the creation of haplotigs (contigs derived from the same chromosome) enabling phasing of genes and variants – providing potential new insights into gene expression. Blue box signifies the MHC class II region.

13 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Case study 2 Comprehensive and cost- effective characterisation of the Y chromosome

Mammalian Y chromosomes are often To overcome the limitations of short- neglected from genomic analysis due read technology and address the lack to their inherent assembly difficulties of reference-quality Y chromosomes, caused by a high level of repetitive DNA researchers from Spain developed a novel and palindromes. To date, just a single strategy to sequence native, unamplified reference-quality human Y chromosome, of flow-sorted DNA of African ancestry European ancestry, is available — thereby using the MinION. Approximately 9 million increasing the potential for reference bias Y chromosomes were sorted from a and overlooking the significant genomic lymphoblastoid cell line (HG02982), whose variation present in other populations37. haplogroup (A0) represents one of the earliest known human lineages. While isolation of the Y chromosome using flow cytometry can simplify the Nanopore sequencing of the flow-sorted assembly challenge — reducing the chromosomes generated 2.3 Gb of data overlap of repetitive DNA with that on other with an average read N50* of ~18 kb. De chromosomes — due to their limited read novo assembly and sequence polishing lengths, amplification bias and removal of was carried out using the Canu38 and epigenetic modifications, traditional short- nanopolish39 tools respectively, with further read sequencing technologies offer an sequence polishing using short-read data imperfect solution. performed using Pilon40. The final assembly totalled 21.5 Mb in length and comprised 35 contigs with a contig N50 of 1.46 Mb. Figure 10 HG02982_chrY Gorilla_chrY Assembly contiguity comparison between the The researchers commented that this human HG02982 and gorilla technique: ‘ Y chromosomes. The size of …constitutes a significant each rectangle corresponds improvement over comparable previous to the size of a contig within methods, increasing continuity by more each assembly. The HG02982 than 800%’ (Figure 10)37. assembly which combined long nanopore reads with short- read sequencing displayed Comparing the assembly against the significantly higher contiguity GRCh38 reference, the team were able than the recently published gorilla assembly which identify extensive genic copy number utilised an alternative ‘long’- variation with expansions in 5 of the read sequencing technology 9 multi-copy genes, four of which are combined with short-read implicated in male infertility. They also sequencing. Both sequencing data sets were derived from identified 347 structural variants of over flow-sorted Y chromosomes. 50 bp in size between the two assemblies.

* The N50 value represents the fragment length where half of the data are contained in fragments of this length and greater. 14 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

The team were also able to detect the centromere41. Human centromeres are epigenetic modification 5-methylcytosine composed of long tracts of near identical alongside the nucleotide sequence, tandem repeats making them intractable demonstrating good correlation with data to assembly using short-read sequencing obtained using whole genome bisulfite technology. Using the long reads generated sequencing. by nanopore technology, the team sequenced eight BAC clones that together This study highlights how long nanopore spanned the Y chromosome centromere. sequencing reads can be used to deliver In total the team generated over 3,500 new insights into complex genomic regions reads that were greater than 150 kb in which have previously proven challenging length. Consensus sequence polishing to analyse using traditional sequencing was performed using the BLASR tool, technology. Commenting on this research and variants were validated against short- the team suggest that: ‘Given the current read sequencing data. These informative developments in sequencing throughput, markers, together with structural variants, a single MinION flowcell should now be allowed alignment of the BAC consensus sufficient to assemble a whole human Y sequences, revealing the centromere to be chromosome. Furthermore, it is becoming 365 kb in length (Figure 11). According to clear that the upper read length boundary the researchers, their assembly: ‘enables is only delimited by the integrity of the the precise number of repeats in an array DNA, suggesting the possibility that to be robustly measured and resolves complete Y chromosome assemblies, the order, orientation, and density of both including full resolution of amplicons, repeat-length variants across the full extent might be possible in the near future’37. of the array. This work could potentially advance studies of centromere evolution Expanding on this possibility, recent and function and may aid ongoing efforts research led by Dr. Karen Miga at the to complete the human genome’41. The University of California, Santa Cruz team are now optimising the methodology demonstrated the use of nanopore to sequence centromeres directly technology to deliver the first complete from whole genomic DNA without the and accurate sequence of a human requirement for BACs.

Figure 11 Assembly of the human Y chromosome centromere. Eight BAC clones covering the entire centromere were ordered using sequence variants. The centromere is dominated by 5.8 kb higher order repeats (HOR) (light blue boxes) interspersed by HOR variants (purple boxes). Highly divergent monomeric alpha satellite is indicated in dark blue. Figure adapted from Jain et al.41

15 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Case study 3 Mapping and phasing structural variation

Accounting for a far greater number of data allowed 24 of 29 previously validated variable bases than single nucleotide breakpoints to be detected; however, variations (SNVs), structural variation further investigation revealed that two of (SV) is an important class of genetic the 5 undetected breakpoints represented variation that has been implicated in a complex combination of joined a wide range of genetic disorders. To segments which had been incorrectly address the limitations of short-read assigned in the long-insert mate pair sequencing technology to accurately validated data set — further highlighting and cost-effectively characterise SV, the benefits of long sequencing reads an international research team led by (Figure 12). Detection of the remaining Dr. Wigard Kloosterman of the University breakpoint junctions was hampered by Medical Centre Utrecht, assessed the insufficient depth of coverage. In total, in performance of long-read sequencing the second sample, nanopore sequencing delivered by nanopore technology3. at 11x depth allowed the detection of 29 The team performed whole genome of 31 (91%) breakpoint junctions, which sequencing of two DNA samples using compared favourably to the 22 (69%) both the MinION and a short-read detected using short-read sequencing at sequencing technology. The samples 30x coverage. Furthermore, four validated were obtained from individuals with breakpoint junctions were only detected congenital disease resulting from complex using nanopore sequencing and were not chromothripsis, which is characterised found in either the long-mate pair or short- by dozens of locally clustered genomic read data set. By subsampling their data, rearrangements affecting one or a few the team were able to identify 14x depth chromosome(s) in a cell. of coverage as the minimum required to detect all breakpoint junctions using In one sample, nanopore sequencing nanopore sequencing. at 16x depth allowed the detection of all of the previously validated de novo chromothripsis breakpoint junctions. For the second sample, which was sequenced at 11x depth, the nanopore

16 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

chr1

Figure 12 a) chr5 Nanopore sequencing accurately detects more chromothripsis breakpoints than alternative sequencing approaches. a) Circos plot of all breakpoint junctions in a complex chromothripsis sample. b) Comparison of different sequencing approaches to genotype breakpoint junctions. SVs were detected in short-read and nanopore data using the Delly not detected and NanoSV tools respectively. deletion Figure adapted from Cretu duplication 3 Stancu et al. inversion interchromosomal chr9 b)

Short-read Nanopore Long-insert mate-pair

Phasing of all chromothripsis break- In the course of this research, the points demonstrated paternal origin. performance of a number of long-read SV calling tools was assessed, with the team demonstrating that their in-house An important benefit of long nanopore developed tool, NanoSV, provided sequencing reads is the facility for phasing. superior performance over a range of It had previously been hypothesised that experimental parameters. germline chromothripsis originates from paternal chromosomes; however, this was Summarising their research, the team based on only a few breakpoint junction suggest that their work: ‘demonstrates sequences or deletions. Using nanopore the potential of long-read, portable sequencing, the team were able to phase sequencing technology for human all of the chromothripsis breakpoints genomics research and clinical detected, identifying their paternal origin applications’.* and thereby providing further weight to the earlier hypothesis.

* Nanopore devices are currently for research use only.

17 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Case study 4 Analysing SNVs, SVs and phasing using targeted nanopore sequencing

Gaucher disease (GD), the most common (SNV) calling using nanopolish. All lysosomal storage disorder, is caused previously characterised coding missense by homozygous or biallelic mutations in mutations were correctly identified. In the GBA gene. Heterozygous mutations addition, a number of non-coding SNVs in this gene are also a significant risk were detected and any false-positives, factor for Parkinson’s disease and other while rare, could be easily identified disorders42. The complex structure of the and excluded. The team found that the genomic region incorporating GBA, which NGMLR alignment tool provided optimal includes multiple pseudogenes (with up results and recommended a coverage to 96% homology), complicates analysis of >300x for accurate determination using PCR and traditional short-read of zygosity. Using the Sniffles and DNA sequencing techniques. Long- nanopolish tools for structural variant sequencing reads offer an alternative, analysis, the team detected a single 55 bp more streamlined solution for complete exonic deletion in one of the samples. analysis of the GBA gene. In order to assess this the validity of this approach, a The long reads provided by nanopore team of researchers from the UK and USA sequencing allowed more accurate utilised the MinION in combination with characterisation of this SV than was long-range PCR to amplify and sequence possible using previous short-read based the entire ~8 kb gene18. methodology – indicating a different site of recombination than previously Long nanopore reads allowed more thought. A further advantage of the long accurate characterisation of a 55 bp nanopore reads was the facility to phase the variants, which helps overcome the exonic deletion. requirement to analyse relatives in clinical research samples. Using the Whatshap The team amplified theGBA gene from tool, the team were able to confirm brain or saliva samples taken from compound heterozygosity in all relevant different individuals with GD. All 10 samples. samples were multiplexed and run on a single MinION flow cell, delivering 150- 500x coverage. Reads were aligned to a human reference genome (Hg19) using both the Graphmap and NGMLR tools prior to single nucleotide variant

18 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Commenting on this research, the The team now plan to amplify the whole team state: ‘The rapid evolution of genomic region incorporating the GBA specific bioinformatic methods, and the gene and pseudogene (located 20 kb improvements in accuracy and data downstream) to fully characterise yield, combined with the minimal footprint structural variation across the entire region and capital investment, make the of interest. MinION a suitable platform for long-read sequencing of difficult genes such as GBA, both in the diagnostic and research ’ 18.* Data analysisenvironments

Figure 13 1 DataThe analysis analysis workflow. workflow belowOnly downstreamhas been updated analysis from thattools used recommended by Leija-Salazar byet theal. toauthors take advantage are of the improved basecalling presented;and functionality however, offered other by tools the Albacore were assessed basecaller.18. Only downstream analysis tools recommended by the authors are presented; however, other tools were assessed. More information can be found in the full publication.

BASECALLING Reads base called and Reads mapped to reference MAP TO REFERENCE & DEMULTIPLEXING demultiplexed (Albacore now genome (hg19) using NGMLR. Raw data. recommended platform). FASTQ Coverage calculated using file output. Only ‘pass’ reads bedtools. further analysed.

VARIANT CALLING

SNVs detected using PHASING True variants phased Nanopolish and structural using WhatsHap. variants detected using Sniffles.

References

1. Leija-Salazar, M. et al (2018) Detection of GBA missense 4. Quinlan, A.R. and Hall, I.M. (2010) BEDTools: a flexible suite mutations and other variants using the Oxford Nanopore MinION. of utilities for comparing genomic features. Bioinformatics bioRxiv 288068. 26(6):841-2. 2. Nacheva, E. et al (2017) DNA isolation protocol effects on nuclear 5. Loman, N.J., Quick, J. and Simpson, J.T. (2015) A complete DNA analysis by microarrays, droplet digital PCR, and whole bacterial genome assembled de novo using only nanopore genome sequencing, and on mitochondrial DNA copy number sequencing data. Nature methods 12(8):733-735. estimation. PLoS One12:e0180467. 6. Martin, M. et al (2016) WhatsHap: fast and accurate read- 3. Sedlazeck, F.J. et al (2017) Accurate detection of complex based phasing. bioRxiv 85050. structural variations using single molecule sequencing. bioRxiv 169557.

Find out more about real-time, long-read amplicon sequencing at www.nanoporetech.com.

* Nanopore devices are currently for research use only.

nanoporetech.com nanoporetech.com/publications Oxford Nanopore Technologies, the Wheel icon and MinION are registered trademarks of Oxford Nanopore Technologies in various countries. All other brands and names contained are the property of their respective owners. © 2018 Oxford Nanopore Technologies. All rights reserved. The MinION is for research use only. OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Case study 5 Targeted, amplification-free DNA sequencing using CRISPR/Cas9

Traditional amplification-based targeted encodes a core protein component of the sequencing approaches can be limited telomerase complex. Telomerase, which by a number of factors, including base acts to maintain the telomeric sequence composition (e.g. GC-rich content) and at the ends of chromosomes, is inactive bias (e.g. allele bias). In addition, while in most somatic cells; however in cancer long-range PCR approaches can, with cells, telomerase activity is commonly careful optimisation, generate fragments turned on. This telomerase activity may be in the region of 20 kb, such fragment associated with methylation of the hTERT sizes may not cover all genes or regions of gene promoter. interest and do not fully exploit the ultra- long read length capabilities provided The repetitive nature and high GC by nanopore sequencing. Furthermore, content of the hTERT gene region makes the process of amplification removes it difficult to analyse using conventional all information on base modifications, PCR amplicons. Using the CRISPR/Cas9 thereby losing a potentially informative approach on a thyroid cancer cell line, source of variation. In order to address Timothy was able to demonstrate 50-fold these challenges, researchers are now increase in coverage of the targeted investigating the potential of CRISPR/ 2.4 kb region. Subsequent analysis of Cas9 techniques to enrich for specific the nanopore sequencing data allowed regions of interest. identification of the epigenetic modification 5-methylcytosine with high concordance At Johns Hopkins University, USA, Timothy to alternative bisulfite sequencing Gilpatrick is researching the methylation approaches. A further benefit of the long profiles of known cancer driver genes44. nanopore reads is the facility for phasing In order to cost-effectively target specific and the identification of methylation regions of the genome whilst maintaining patterns across both parental alleles. epigenetic marks, Timothy employed a CRISPR/Cas9 enrichment methodology together with nanopore sequencing. Following a successful pilot study in E. coli, where 20,000-fold coverage of a 5 kb targeted fragment was observed, Timothy moved on to human DNA where he targeted the hTERT gene, which

20 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Figure 14 Distal, allele-specific DNA methylation patterns in the TERT locus of a BCPAP ReferenceReference thyroid cancer cell line allele identified using nanopore Allele sequencing. Black lines indicate points of CpG methylation. The alleles are distinguishable by a point mutation (green line). AlternateAlternate alleleAllele

TERT 150bp mutation

The team at Johns Hopkins are also In brief, the CATCH methodology uses examining the potential of CRISPR/Cas9 Cas9 to excise the targeted region, which enrichment to analyse the methylation is then separated using pulsed field gel status of a panel of gene promoters, electrophoresis, prior to amplification and which, they believe, may lead to diagnostic, sequencing. Cas9 allows enrichment of prognostic and therapeutic applications. a target region without prior knowledge of its sequence. Only the sequence of At Tel Aviv University, Israel, researchers flanking regions needs to be known. are applying Cas9-assisted targeting Applying this technique, the team of chromosome segments (CATCH) to enriched a 200 kb region containing the characterise the entire BRCA1 gene45. entire 80 kb BRCA1 gene, regulatory Mutations in BRCA1 are associated with elements and non-coding regions. significantly increased risk of breast, The target was enriched 237-fold and ovarian and other cancers. Due to the sequenced at up to 70x coverage on a large size of this gene (~80 kb), current single MinION flow cell. The data revealed sequencing methodologies focus on the a deletion of three di-nucleotide blocks coding sequence, neglecting potentially in a 44 bp repeat array which was not important intronic and regulatory regions. detected using short-read sequencing. In addition, the BRCA1 genomic region In addition, the read-lengths provided by is also highly repetitive (50%), which nanopore sequencing were sufficient for contributes to genetic instability and analysis of structural variation. The team genomic rearrangements, and can be commented that this approach: ‘…may difficult to analyse using short-read shed light on the mechanisms of disease sequencing. onset and progression’45. They are now investigating the potential for direct A 200 kb region containing the entire sequencing without the requirement for 80 kb gene was enriched. amplification, in order to retain and study BRCA1 epigenetic marks.

21 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Case study 6 Native RNA sequencing of human polyadenylated transcripts

According to Dr. Angela Brooks of Good correlation of gene expression the University of California, Santa levels was observed between the two Cruz: ‘Short-read sequencing has techniques and also with data obtained revolutionised our understanding of the from short-read sequencing of the same transcriptome […] but there is a huge cell line – confirming the validity of the limitation’28. These limitations include the nanopore data set. The reproducibility of loss of positional information and base the native RNA sequencing technique was modifications through the requirement to also demonstrated through the delivery fragment and amplify the RNA molecules of highly concordant data across all respectively. In addition, amplification consortium laboratories. also leads to bias which can negatively impact results. Angela and her colleagues The team are now using orthogonal at University of California, Santa Cruz are data to build a high-confidence set of part of the Nanopore RNA Consortium, full-length isoforms. The longest isoform which comprises laboratories from six that was detected by Angela and the leading universities. The aim for the Consortium was for Sorl1, a >10 kb read Consortium is to generate a reference which spanned 48 exons and has been dataset for the human transcriptome implicated in Alzheimer’s disease. that has been sequenced in its native form, sharing methods and data with the The Consortium also demonstrated the scientific community. potential of long-read RNA sequencing to detect allele-specific expression. The Nanopore RNA Consortium Examining the coverage data for a provides methods and data to the number of nucleotide positions across the Xist gene, which is located on the X scientific community. chromosome, allowed the identification of paternal expression bias (Figure 15). The Consortium sequenced mRNA from the GM12878 cell line using both native Another area of interest for the consortium nanopore RNA sequencing and cDNA is the detection of poly-A tail length, sequencing, generating ~13 million and which has been shown to play a role in ~24 million reads respectively28. post-transcriptional regulation. As the nanopore sequencing adapter sits at the Initial analysis showed the median native 3’ end of the poly-A tail, it is possible to RNA read length to be longer than the use specific signals in the raw data, such median cDNA read length, which may be as dwell times, to estimate the poly-A tail due to PCR bias in the cDNA preparation. length.

22 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Figure 15 Nanopore sequencing allows Xist variants 2487834 2487835 2487837 identification of allele-specific paternal =C paternal = T paternal = A expression. Figure courtesy maternal =T maternal = C maternal = T of the Nanopore RNA Consortium.

Coverage ( 0 – 350)

Reads [squished]

1 @NanoporeConf | #NanoporeConf

The accuracy of this technique was molecules without this modification, the confirmed using spike-in controls with team were able to show clear shifts in the known tail lengths. nanopore signal. The team are now using these model training datasets to enhance Direct RNA sequencing also allows the basecalling algorithms, allowing the analysis of base modifications which are detection of both the position and type of lost when using alternative sequencing modification in native RNA molecules. approaches. By synthesising and sequencing RNA transcripts containing The Nanopore RNA Consortium data is only a specific, known modification and available at: github.com/nanopore-wgs- comparing these with sequences from consortium/NA12878

23 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Case study 7 High-throughput analysis of single cell transcriptomes

Traditionally, gene expression experiments Concatermeric Consensus (R2C2) method are undertaken on samples containing allows the generation of a consensus millions of cells. While this allows the sequence for each transcript, thereby identification of differentially expressed increasing base accuracy (Figure 16). genes and transcripts in distinct cell populations, many subtle differences Utilising this technique allowed between cells in the same sample can transcriptome analysis of 96 individual be overlooked. Recent advances in B cells, delivering over 400,000 full- high-throughput cell separation and length cDNA reads with a median base transcriptomic analysis techniques has accuracy of 94%47. Using an updated spawned the burgeoning field of single- version of their Mandalorion data analysis cell transcriptomics, which allows much pipeline, these reads could be used to more detailed analysis of gene expression. identify high-confidence RNA transcript isoforms. A key finding of their study was that many of the B cells analysed, The R2C2 technique delivers highly which were obtained from a healthy accurate, full-length transcripts. individual, express isoforms of the CD19 gene that lack the epitope targeted by CAR T-cell therapy — a discovery At the University of California, Santa Cruz, which may have significant implications Dr. Christopher Vollmers and his team in cancer treatment. This finding would are using single-cell transcriptomics to not have been possible using short-read investigate gene expression in B cells. sequencing or without single-cell analysis. As Dr. Vollmers points out: ‘Each B cell makes a unique antibody transcript, so According to the team: ‘The R2C2 method you really have to go at the single cell generates a larger number of accurate level to understand what B cells do’46. reads of full-length RNA transcript isoforms than any other available long- In order to obtain full-length transcripts, read sequencing method’46. They further which is a significant challenge when comment that they: ‘...believe that R2C2 using short-read sequencing technology, has the potential to replace short-read the team developed a novel amplification RNA-seq and its shotgun approach to strategy, which, when combined with transcriptome analysis entirely, especially the long reads delivered by nanopore considering the […] wide release sequencing, provided highly accurate, of the high-throughput PromethION full-length reads. This Rolling Circle to sequencer’47.

24 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

cDNA molecule DNA splint Gibson assembly Accurate full-length cDNA sequence Consensus calling (racon) RCA

Subread alignment (poaV2) Rapid library prep SW repeat ﬁnder Inaccurate raw read Nanopore sequencing

Figure 16 Schematic of the R2C2 method. Following cDNA circularisation, rolling circle amplification creates multiple joined copies of the transcript. After sequencing, each read is split into its constituent subreads which are then aligned to generate an accurate consensus sequence. Figure courtesy of Dr. Christopher Vollmers, University of California, Santa Cruz.

25 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Summary

Our knowledge of the human genome, its The long, direct and real-time sequencing genes and their function has advanced reads offered by nanopore technology considerably since the publication of the delivers a step-change in human genetics 3 first human genome sequence in 2003. research, allowing routine and complete These advances have been supported characterisation of highly important by the rapid development of genomic genomic events such as structural analysis technologies, allowing faster, variation, repetitive regions, phasing, more detailed and more affordable RNA isoforms and base modifications. genetic analyses. However, the inherent Using nanopore sequencing, researchers challenges of traditional short-read are now unlocking the secrets of the sequencing technologies limits their ability genome — from characterising complete to fully characterise the whole spectrum of centromeres to discovering new, highly genetic variation. expressed transcript isoforms. As stated by Dr. Winston Timp at Johns Hopkins University, USA, ‘Nanopore sequencing is a tool that can let us look at things that we couldn’t otherwise see’ 48.

26 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

About Oxford Nanopore Technologies

Oxford Nanopore Technologies introduced the world’s first nanopore DNA sequencer, the MinION — a portable, real-time, long-read, low-cost device – followed by the larger GridION X5 and PromethION, and smaller Flongle. The long reads offered by nanopore 4 sequencing deliver a complete understanding of human genetic variation, allowing enhanced characterisation of structural variation, repetitive regions, haplotype phasing, RNA splice variants, isoforms and fusion transcripts.

A range of platforms are available to meet the coverage and throughput requirements for all sequencing applications (Table X).

Flongle MinION GridION X5 PromethION PromethION (1 flow cell) (48 flow cells)

Read length Fragment length = read length. Longest read now >2 Mb

Run time1 1 min - 16 hrs 1 min - 48 hrs 1 min - 48 hrs 1 min - 64 hrs 1 min - 64 hrs

Theoretical Up to 3.3 Gb Up to 40 Gb Up to 200 Gb Up to 315 Gb Up to 15 Tb maximum 1D Yield

Current yield Early access Up to 30 Gb Up to 150 Gb Up to 150 Gb range 1D to start ASAP (Rev D Chip) (Rev D Chip) - commercial target 1 Gb

Multiplexing enabled Not in first Kits for 96 Kits for 96 Coming soon Coming soon release samples samples

Number of channels Up to 126 Up to 512 Up to 2,560 Up to 3,000 Up to 144,000 available for sequencing

For the latest information about applying long-read nanopore sequencing to your human genetics research, visit www.nanoporetech.com/applications.

27 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

References

1. Jain, M. et al. Nanopore sequencing and 13. Payne, A., Holmes, N., Rakyan, V. and Loose, M. assembly of a human genome with ultra-long Whale watching with BulkVis: A graphical viewer reads. Nat Biotechnol. 36(4):338-345 (2018). for Oxford Nanopore bulk fast5 files. bioRxiv 312256 (2018). 2. Conrad, D. F. et al. Origins and functional impact 5 of copy number variation in the human genome. 14. Jansen, H. The beauty and the beast. Nature 464, 704–712 (2010). Presentation. Available at: https://nanoporetech. com/resource-centre/talk/beauty-and-beast. 3. Cretu Stancu, M. et al. Mapping and phasing [Accessed: 15 June 2018] of structural variation in patient genomes using nanopore sequencing. Nat Commun. 8(1):1326 15. Salzberg, S. Assembly of large genomes (2017). using Oxford Nanopore and Illumina data. Presentation. Available at: https://nanoporetech. 4. Miga, K.H., Eisenhart, C. and Kent, W.J. Utilizing com/resource-centre/assembly-large-genomes- mapping targets of sequences underrepresented using-oxford-nanopore-and-illumina-data in the reference assembly to reduce false positive [Accessed 30 July 2018] alignments. Nucleic Acids Research 43(20): e133-e133 (2015). 16. Oxford Nanopore Technologies. Incorporating sequence capture into library preparation for 5. Cao, M.D. et al. Scaffolding and completing MinION GridION and PromethION. Online. genome assemblies in real-time with nanopore Available at: https://nanoporetech.com/resource- sequencing. Nat Commun. 8:14515 (2017). centre/incorporating-sequence-capture-library- preparation-minion-gridion-and-promethion 6. Cawthon, R.M., Smith, K.R., O’Brien, E., [Accessed: 01 August 2018] Sivatchenko, A., and Kerber, R.A. Association between telomere length in blood and mortality 17. Agilent. Use of Agilent SureSelect to perform in people aged 60 years or older. Lancet. 361 targeted long-read nanopore sequencing. (9355): 393–5 (2003). Online. Available at: https://www.agilent.com/ cs/library/applications/5991-8056EN-2%20 7. De Roeck, A. Human genome sequencing on Sure%20Select%20App%20Note.pdf [Accessed: PromethION to investigate tandem repeats in 1 August 2018] dementia. Presentation. Available at: https:// nanoporetech.com/resource-centre/human- 18. Leija-Salazar, M. et al. Detection of GBA missense genome-sequencing-promethion-investigate- mutations and other variants using the Oxford tandem-repeats-dementia [Accessed: 18 August Nanopore MinION. BioRxiv 288068 (2018). 2018]. 19. Loose, M., Malla, S., and Stout, M. Real-time 8. Brandler, W.M. et al. Paternally inherited cis- selective sequencing using nanopore technology. regulatory structural variants are associated with Nat Methods. 13(9):751-4 (2016). autism. Science. 360(6386):327-331 (2018). 20. Bartsch, M. et al. Real-time selective sequencing 9. Ishiura, H. Expansions of intronic TTTCA and with RUBRIC. Presentation. Available at: https:// TTTTA repeats in benign adult familial myoclonic nanoporetech.com/resource-centre/real-time- epilepsy. Nat Genet. 50(4):581-590 (2018). selective-sequencing-rubric [Accessed 30 July 2018] 10. Euskirchen, P. et al. Same-day genomic and epigenomic diagnosis of brain tumors using real- 21. Steijger, T. et al. Assessment of transcript time nanopore sequencing. Acta Neuropathol. reconstruction methods for RNA-seq. Nature 134(5):691-703 (2017). Methods 10, 1177–1184 (2013).

11. Gong, L. et al. Picky comprehensively detects 22. Martin, J. A. and Wang, Z. Next-generation high-resolution structural variants in nanopore transcriptome assembly. Nature Reviews long reads. Nat Methods 15(6):455-460 (2018). Genetics 12, 671-682 (2011).

12. Kellog, E.A. Genome sequencing: Long reads for 23. Clark, M. et al. Long-read sequencing reveals a short plant. Nature Plants 1, 15169 (2015). the splicing profile of the calcium channel gene CACNA1C in human brain. bioRxiv 260562 (2018).

28 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

24. Oikonomopoulos, S., Wang, Y. C., Djambazian, 38. GitHub. CANU. Available at: [Accessed: 20 August 2018] of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA 39. GitHub. Nanopolish. Available at: [Accessed: 20 August 2018]

25. Garalde, D. R. et al. Highly parallel direct RNA 40. GitHub. Pilon. Available at: nanopores. Nature broadinstitute/pilon> [Accessed: 20 August 2018] Methods 15(3): 201–206 (2018). 41. Jain, M. Linear assembly of a human centromere 26. Oxford Nanopore Technologies. Low bias on the Y chromosome. Nat Biotechnol. 36(4):321- RNA-seq: PCR-cDNA, PCR-free direct cDNA 323 (2018). and direct RNA sequencing. Poster. Available at: https://nanoporetech.com/resource-centre/ 42. Proukakis, C. Detection of GBA missense low-bias-rna-seq-pcr-cdna-pcr-free-direct-cdna- mutations and other variants using Oxford and-direct-rna-sequencing [Accessed: 20 August Nanopore MinION. Presentation. Available at: 2018] https://nanoporetech.com/resource-centre/ detection-gba-missense-mutations-and- 27. Timp, W. and Jain, M. Direct RNA cDNA other-variants-using-oxford-nanopore-minion sequencing of the human transcriptome. [Accessed:1 August 2019] Presentation. Available at: https://nanoporetech. com/resource-centre/videos/direct-rna-cdna- 43. BBC. Handheld device sequences human sequencing-human-transcriptome [Accessed: 20 genome. Online. Available at: https://www.bbc. August 2018] co.uk/news/health-42838821 [Accessed: 01 August 2018] 28. Brooks, A. Native RNA sequencing of polyadenylated transcripts. Available at: https:// 44. Gilpatrick, T. Cas9 targeted enrichment for nanoporetech.com/resource-centre/native-rna- nanopore profiling of methylation at known sequencing-human-polyadenylated-transcripts cancer drivers. Presentation. Available at: [Accessed: 1 August 2018] https://nanoporetech.com/resource-centre/ cas9-targeted-enrichment-nanopore-profiling- 29. Jalkanen, A.L., Coleman, S.J. and Wilusz, J. methylation-known-cancer-drivers [Accessed: 1 Determinants and implications of mRNA poly(A) August 2018] tail size — Does this protein make my tail look big? Semin Cell Dev Biol. 34: 24–32 (2014). 45. Gabrieli, T. et al. Selective nanopore sequencing of human BRCA1 by Cas9-assisted targeting of 30. Weng, Y.L., An, R., Shin, J., Song, H., and chromosome segments (CATCH). Nucleic Acids Ming, G.L. DNA modifications and neurological Res. gky411 (2018). disorders. Neurotherapeutics. (4):556-67 (2013) 46. Vollmers, C. Improving MinION read accuracy 31. Simpson, J.T. et al. Detecting DNA cytosine to enable the high-throughput analysis of single methylation using nanopore sequencing. Nature cell transcriptomes. Presentation. Available at: Methods 14: 407–410 doi:10.1038/nmeth.4184 https://nanoporetech.com/resource-centre/ (2017). improving-minion-read-accuracy-enable-high- throughput-analysis-single-cell [Accessed: 1 32. Smith, A.M. et al. Reading canonical and August 2018] modified nucleotides in 16S ribosomal RNA using nanopore direct RNA sequencing. bioRxiv 47. Volden, R. et al. Improving nanopore read 132274 (2017). accuracy with the R2C2 method enables the sequencing of highly-multiplexed full-length 33. Ammar, R. et al. Long read nanopore sequen- single-cell cDNA. Proc Natl Acad Sci U S A. doi: cing for detection of HLA and CYP2D6 variants 10.1073/pnas.1806447115 (2018). and haplotypes. F1000Research 4, 17 (2015). 48. Timp, W. Direct RNA-seq project shows nanopore 34. Norris, A.L. et al. Nanopore sequencing detects sequencing can reveal new insights into basic structural variants in cancer. Cancer Biol Ther biology. Podcast. Available at: https://mendelspod. 17(3): 246-253 (2016). com/podcasts/direct-rna-seq-project-shows- nanopore-sequencing-can-reveal-new-insights- 35. de Koning, A.P., Gu, W., Castoe, T.A., Batzer, basic-biology [Accessed: 17 August 2018] M.A., and Pollock, D.D. PLoS Genet. Repetitive elements may comprise over two-thirds of the 49. Jansen, H. The beauty and the beast. human genome. 7(12):e1002384 (2011). Presentation. Available at: https://nanoporetech. com/resourcecentre/talk/beauty-and-beast [01 36. Payne, A., Holmes, N., Rakyan, V. and Loose, M. August 2018] Whale watching with BulkVis: A graphical viewer for Oxford Nanopore bulk fast5 files. bioRxiv 50. Jenjaroenpun, P. Complete genomic and 312256 (2018). transcriptional landscape analysis using third- generation sequencing. Nucleic Acids Res. 37. Kuderna, L.F.K. et al. Selective single molecule 46(7):e38 (2018). sequencing and assembly of a human Y chromosome of African origin. bioRxiv 342667 (2018).

29 Oxford Nanopore Technologies phone +44 (0)845 034 7900 email [email protected] twitter @nanopore www.nanoporetech.com

Oxford Nanopore Technologies, the Wheel icon, GridION, Flongle, Metrichor, MinION, MinIT, MinKNOW, PromethION, SmidgION and VolTRAX are registered trademarks of Oxford Nanopore Technologies in various countries. All other brands and names contained are the property of their respective owners. © 2018 Oxford Nanopore Technologies. All rights reserved. Flongle, GridION, MinION, PromethION and VolTRAX are currently for research use only.

HG_W1010_v1_revA_03Oct2018