HUMAN GENETICS RESEARCH WHITE PAPER
Advancing human genetics nanoporetech.com/publications research with nanopore sequencing
nanoporetech.com OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING
Contents
1The advantages of nanopore sequencing for human genetics research
2Case studies
3Summary
4About Oxford Nanopore Technologies
5References OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING
Introduction
High-throughput sequencing technologies Despite offering significant advancement have revolutionised the field of human in terms of speed and resolution over the genetics, allowing researchers to more older techniques of Sanger sequencing easily investigate and understand biological and microarrays respectively, there are processes and their impact. Using these still a number of limitations inherent to technologies, researchers can analyse traditional short-read, high-throughput entire genomes or specific targeted regions sequencing platforms. From closing of interest, with further functional insights genome gaps to characterising full-length garnered through the characterisation transcripts, this review will present how and quantification of RNA transcripts and real-time, long-read, high-throughput isoforms. Together, these capabilities nanopore sequencing technology is have provided unprecedented insight into being used to address these limitations, human genetic diversity and its implications resulting in new biological insights. Specific in health and disease. case studies reveal how researchers are applying the benefits of nanopore technology to a variety of sequencing techniques, including whole genome, targeted and RNA sequencing.
1 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING
The advantages of nanopore sequencing for human genetics research
Whole genome sequencing The required DNA fragmentation step effectively loses relative positional without the holes information, meaning that during genome 1 To date, the majority of large genomes assembly, these small fragments must sequenced have utilised short-read be pieced back together through technologies, which require the DNA to overlap with other short fragments. For be fragmented into small (typically 150- repetitive regions, this can be particularly 300 bp) lengths prior to sequencing. challenging and may result in the While these technologies allow rapid collapsing of potentially long regions of and relatively cost-effective genome repeats down to much shorter lengths, analysis compared to older sequencing leaving gaps in the assembly (Figure 1)5. methodologies, their inherent read- length limitations preclude the analysis In the same way, the presence of some of vast stretches of DNA corresponding structural variants such as deletions, to repetitive regions and large structural insertions, duplications, inversions and variations1,2,3. It is for these reasons translocations may be missed when using that approximately 8% of the human short sequencing reads alone. It is well genome is intractable to assembly and established that some repeat regions interpretation4. and structural variants are associated with human health and disease (e.g. ageing6, triplet expansion diseases7, autism8, epilepsy9 and cancer10,11), making their routine characterisation highly advantageous.
Figure 1 Genome sequence Schematic highlighting the advantages of long reads in de novo assembly of Short reads repetitive regions. Long read lengths are more likely to incorporate the whole repetitive region (blue boxes) Short-read consensus allowing more accurate assembly. Image adapted from Kellog12. Long reads
Long-read consensus
2 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING
Unlike short-read sequencing technologies, Rapid analysis of targeted nanopore-based sequencing processes regions the entire length of the DNA fragment that is presented to the pore. Complete For researchers wishing to study specific fragments of thousands of kb are routinely genomic loci, a targeted sequencing processed and ultra-long read lengths over approach is commonly employed. The 2 Mb have been achieved13. Clearly, such facility to focus on the regions most likely long reads are more likely to span entire to provide relevant data reduces cost, regions of repetitive DNA and structural allows a higher depth of coverage and variation. As a result, nanopore sequencing simplifies analysis. provides a more complete view of genetic variation. A range of targeted sequencing methodologies are available with nanopore ‘Nanopore sequencing allows same- sequencing, from bespoke assays day detection of structural variants, to targeted panels and whole exome point mutations, and methylation approaches. Nanopore sequencing has been successfully utilised with both profiling using a single device with PCR- and hybrid capture-based targeted 10 negligible capital cost’ . enrichment strategies – including whole exome enrichment16,17. Using nanopore sequencing, highly contiguous genome assemblies have As discussed for whole genome been generated for many large and sequencing, the long reads provided by complex organisms that had previously nanopore technology provide a range of been deemed inaccessible to modern advantages for researchers interested in sequencing methods14,15. Researchers targeted sequencing. For example, it is recently used nanopore sequencing to possible to sequence much larger regions deliver the most complete human genome in a single read, which allows improved ever assembled with a single technology characterisation of highly repetitive regions (Case study 1) and the first complete and/or structural variants (see Case study 4). and accurate sequence of a human Furthermore, long sequencing reads allow centromere (Case study 2). Furthermore, the phasing of alleles and variants, which as will be discussed later, the long, direct is extremely challenging when using sequencing reads delivered by nanopore short-read sequencing18. technology also permit the phasing of variants and modified bases.
3 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING
Figure 2 The Reference Alignment tool allows easy, real-time visualisation of targeted or whole genome sequencing coverage, allowing researchers easily assess the efficiency of their sequence targeting.
A number of researchers are now also outside of the chosen criteria is achieved investigating the potential of CRISPR/ through reversing the current applied to Cas9 to enrich for long, targeted DNA the individual pore, thereby freeing up molecules (see Case study 5). Initial results the pore to sequence an alternative DNA have shown significant promise for such fragment. Although this potential real- techniques, which may offer faster, more time enrichment strategy is in its infancy, streamlined workflows and, through direct researcher-developed tools such as Read sequencing of the target molecule, enable Until19 and RUBRIC20 are available. the analysis of base modifications. Oxford Nanopore provides researchers ‘The challenge of aligning short reads to with a cloud-based analysis platform, regions with high homology is often not EPI2ME. Among the workflows available 18 is the Human Alignment analysis tool, fully appreciated’ . which aligns nanopore reads to the human GRCh38 reference genome, A unique feature of nanopore technology generating real-time coverage plots at the is the facility to utilise real-time analysis chromosome or gene level (Figure 2). This to selectively sequence or reject DNA unique tool allows researchers to easily molecules as they pass through the assess the efficiency of their sequence pore. Rejection of molecules that fall targeting.
4 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING
Full-length RNA transcripts, reconstructed; however, a study by Steijger .21 revealed that automated isoform characterisation and et al transcript assembly methods fail to identify accurate quantification all constituent exons in over half of the Due to the fragmentation required by transcripts analysed. Furthermore, of most traditional sequencing methods, those transcripts with all exons identified, accurate assembly of complete transcripts over half were incorrectly assembled. Such is exceptionally challenging, especially in complications are further compounded instances where a read maps to more than where reads from highly similar transcripts, one location. With nanopore technology, such as those of paralogous genes, are entire RNA molecules are processed under investigation. Rare isoforms could regardless of their length, allowing the also remain altogether undetected22. sequencing of complete transcripts in single reads. Describing the additional insight gained from nanopore sequencing, Dr Christopher In addition to reducing multiple-locus Vollmers at the University of California, alignment issues, long, full-length reads Santa Cruz, comments: ‘It’s rare to find provide a significant advantage in the an isoform that perfectly matches the analysis and correct identification of annotation, and that’s in the human alternative splicing (Figure 3). When genome which is already super-curated. using short-reads, different transcript Once in a while, you find an exon that isoforms have to be computationally nobody has annotated mostly because
Gene / precursor Exon 1 Exon 2 Exon 3 Exon 4 Exon 5 Exon 6 Exon 7 Exon 8 Exon 9 mRNA
Short-read RNA-Seq
Nanopore long-read RNA sequencing
Figure 3 Alternative splicing can give rise to numerous mRNA isoforms per gene, which in turn can alter protein composition and function. The short reads generated by traditional RNA sequencing techniques lose positional information, making the correct assembly of alternative mRNA isoforms challenging. Long nanopore reads can span full-length transcripts simplifying their identification.
5 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING
it may contain an Alu element or other Direct RNA sequencing kind of repeat, so short reads won’t align to it. Most splice junctions [identified by Until recently, sequence-based analysis of short-read technology] are correct but RNA required the conversion of RNA to transcription start sites and poly-A sites, complementary DNA (cDNA), a process that’s the wild west – there is so much that can introduce bias through reverse going on that is not on any annotations’. transcription or amplification25. These issues can be exacerbated by the use Recently, a team of researchers from of traditional short-read sequencing the University of Oxford and Earlham technologies which are known to Institute utilised nanopore sequencing exhibit GC bias, where sequences to investigate expression of the with low or high levels of GC content neuropsychiatric disease-associated are underrepresented (Figure 4). The gene CACNA1C 23. The long nanopore amplification step required to generate reads enabled the identification of 38 cDNA also loses all modified base putative novel exons and 90 transcript information. Such base modifications isoforms, of which only 7 had been are known to have a role in modulating previously identified. Interestingly, 9 of the the activity and stability of RNA and top 10 expressed isoforms were novel, are therefore of increasing interest to previously unknown transcripts. This and researchers. Nanopore sequencing other studies have also demonstrated overcomes all of these challenges through how nanopore sequencing provides the facility for direct RNA sequencing — highly accurate measurement of transcript delivering unbiased, full-length, strand- abundance23,24. specific RNA sequences26. The longest transcript processed by direct RNA The advantages of nanopore technology sequencing currently stands at over 20 kb are now also being applied to single cell in length27. transcriptome studies, delivering further, more detailed insight into gene expression and function (see Case study 7).
6 OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING
Figure 4 Sequencing workflows that a) x ord anopore x ord anopore x ord anopore incorporate amplification -cD direct cD direct hort-read cD are vulnerable to sequence-