Incorporating sequence capture into library preparation for MinIONTM, GridIONTM and PromethIONTM Hybrid sequence capture allows users to select thousands of loci of interest simultaneously prior to sequencing, making more efficient use of the sequencing run Contact: [email protected] More information at: www.nanoporetech.com and publications.nanoporetech.com

gDNA a) Randomly fragmented ~6 kb library b) Human exome report containing regions of interest Fragmentation OVERVIEW PER OVERVIEW PER GENE

Reads by exit status Alignments per gene End-prep Showing the exit status of all the reads analysed Select a gene in the table below to show summary Size-selection and p A p USP17L8 hybridisation to A Filter . . . NCBI gene ID 100287144 Ensembl: ENSG00000231396 T Barcoded PCR- biotinylated capture probes Ubiquitin-specific peptidase 17-like family member 10

adapter ligation Gene Estimated coverage Alignment count B B B

B

B

B B

B

B B B B B

B B B

B B 82.8%

B B B B USP17L8 114 X 350 B 105 81 X 86.6% B successful

Alignments Estimated coverage Average accuracy

B B B B B B

B

B B B B B B

B USP17L7 87 X 127

B

B B

B B B B B B

B B

B B B B B F PCR with R universal primers USP17L10 81 X 105 100 Coverage Gene coverage Capture of probe- USP17L12 73 X 90 50 Alignment quality too low 96,424 reads template duplexes POM121L7 67 X 115

Workflow successful onto beads 465,170 reads CTAGE9 62 X 90 0 0 500 1,000 1,500 2,000 Reference sequence position

B B B

B B

B B B B CTAGE1 61 X 148

B B

B B B B

B B B Pooling and sequence capture

B B B

B B

B B B B FAM90A26 58 X 197 B B B B B B

B B B 70 Count Accuracy distribution Alignments - key figures MST1 58 X 190 35 Elution and PCR TSPY2 57 X 94 Elution, amplification, 0 adapter-ligation 80 85 90 95 465,170 17,702 86.6% First < 1 of 1768 > Last and sequencing Amplicon-sequencing protocol Alignments Average accuracy Accuracy (%)

Fig. 1 Long-read sequence capture a) workflow overview b) multiplexed sequence capture Fig. 2 Analysis report of sequence-capture data for the human exome Sequence capture uses complementary Resequencing analysis workflow for probes to enrich for targets of interest sequence-capture experiments Sequence capture is a technique which allows the enrichment of specific regions of interest from We have released an updated analysis workflow for sequence-capture experiments. To illustrate a genome. It is useful when: this, we captured the human exome using Agilent’s SureSelect Human All Exon V6 panel, and i) the user is not interested in analysing the entire genome generated ~ 2.35 Gb of 1D sequence data from a MinION run, representing ~40x average ii) the genome is too large for the throughput of the sequencer coverage. We analysed the data using the resequencing analysis workflow (Fig. 2). Following iii) the user wishes to save money and time on sequencing and analysis iv) the regions are longer than can be amplified by PCR, or too many PCRs would be required. basecalling, reads are mapped to the human exome reference sequence. Individual gene information is displayed, including coverage and read-accuracy distribution at that position. Sequence capture is performed during library preparation by hybridising the library fragments to Future releases of this application will support the uploading of target regions, highlight known probes which are specific to the regions of interest (Fig. 1). SNPs in the target regions and allow those SNPs to be displayed with a confidence value.

a) BRCA1, position: chr17:41569479–41650296, band: 17q21.31 1: 16,536,698–16,639,185

p31.3 p31.3 q32.1 q41 q43 q44

p36.13 80,818 bp Forward strand Contig 1.00 Mb

< EPHA2 RP4-733M16.2 > SZRD1 > NECAP2 > RP4-798A10.7 > RP5-1182A14.5 > < RP5-1182A14.6 < RP11-108M9.1 < MFAP2 RP11-276H7.2 > < MTL1 < RP11-430L17.1RP4-798A10.2 > < RP5-875O13.6MST1P2 RP1-163M9.7> > RP11-108M9.2 > RP1-37C10.3 > RP11-276H7.3 > < FBXO42 < SPATA21 < CROCCP3 < RP5-875O13.1< CROCCP2< RP5-875O13.7 RP11-108M9.3 > < ATP13A2 < ARHGEF19 RP4-798A10.4 > RP1-163M9.8 > < RP5-1182A14.7 < RP11-108M9.4 < SDHB b) Tyr1563Ter, wild type is C and SNP is G < ANO7P1 FAM231B > FAM231.A > CEOCC > RP11-108M9.5 > < RSG1 < EIF1AP1RP1-163M9.5 > < RP11-108M9.6 A T T C C T T T C A G A G G G A A * C C C T T A C C T G G A A T C T G G A A T C A G C C T C T T C T C T G A < ESPNP < MST1L A T T C C T T T G A G A G G G A A C C C C T T A C C T G G A A T C T G G C A T C A G C C T C T T C T C T G A < NBPF1 A T T C C A T T C A G A G G G A A C C C C T T A G C T G G A A T C T G G A A T C A G C C T C T T C T C T G A A T T C C T T T C A G A G G G A A C C C C T T A G C T G G A A T C T G G A A T C A G C C T C G T C T C T G A A T T C C T T T C A G A G G G A A C C C C T T A C C T G G A A T C T T G A A T C A G C C T C T T C T C T G A A T T C C T T T C A G A A G G A A C C C C T T A C C T G G A A T C T G G A A T C A G C C T C T T C T C T G A A T T C C T T T C A G A G G G A A C C C C T T A G C T G G A A T C T G G A A T C A G C C T C T T C A C T G A A T T C C T T T C A G A G G G A A C C C C T T A C C T G G A A T C T G G A A A C A G C C T C T T C T C T G A A T T C C T T T C A G A G G G A A C C C C T T A G C T G G * A T C T G G A A T C A C C C T C T T C T C T G A 40 kb A T T C C T T T C A G A G G G * A C C C C T T A C C T G G A A T C T G G A A T C A G C C T C T T C T C T G A A * T C C T T T C A G A G G G A A C C C C T T A G C T G G A A T C T G G A A T C A G C C T C T T C T C T G A A T T C C T T T C A G A G G G A A T C C C T T A G C T G G A A T C T G G A A T T A G C C T C T T C T C T G A A T T C C T T T C A G A G G G A A C C C C T T A C C T G G A A T C T G G A A T C A G C C T C T T C T C T G A A T T C C T T T C A G A G G G A A C C C C T T A C C T G G A A T C T G G A A T C A G C C T C T T C T C T T A A T T C C T T T * A G A G G G A A C C C C T T A G C T G G A A T C T G G A A T C A G C C T C T T G T C T G A A T T C C T T T C A G A G G G A A C C C C T T A C C T G G A A T C T G G A A T C A G C C T C T T C T C T G A A T T C C T T T C * G A G G G A A C C C C T T A C C T G A A A T C T G G A A T C A G C C T C T T C T C T G A A T T C C T T T C A G C G G G A A C C C C T T A G C T G G A A T C T G G * A T C A G C C T C T T C T C T G A c) Arg1443Gly, wild type is C and SNP is G A A G T G A C T C T T C T G C C C T T G A G G A C C T G G G A A A T C C A G A A C A A A G C A C A T C A G A A A G T G A C T C T T C T G C C C T T G A G G A C C T G C G A A A T C * A G A A C A A A G C A C A T C A G A A A G T G A C T * C T C T G C C C T T G A G G A C C T G G G A A A T C C A G A A C A A A G C A C G T C A G A A A G T G A C T C T T C T G C C C T T G A G G A C C T G G G A A A T C C A G A A C A A A G C A C A T C A G A T A G T G A C T C T T C T G C C C T T G A G G A C C T G G G A A A T C C A G A A C A A A G C A C A T C A G A A A G T G A C T C T * * * G C C C T T G A G G A C C T G C G A A A T C C A G A A C A A A G C A C A T C A G A A A G T G A C T C T T C T * C C C T T G A G G A C C T G C G A A A T C C A G A A C A A A G C A C A T C A G A A A G T G * C T C T T C T G C C C T T G A G G A C C T G G G A A A T G C A G A A C A A A G C A C A T C A G A A A G T G A C T C C T C T G C C C T T G A G G A C C T G C G A A A T C C A G A A C A A A G C A C A T C A G A A A G T G A C T C T A C T G C C C T T G A G G A C C T G C G A A A T C C A G A A C A A A G C A A A T C A G A A A G T G A C T C T T C T G C C C T T G A G G A C C T G G G A A A T C C A G A A C A A A G C A C A T C A G A

Fig. 3 Plots showing reads enriched for the BRCA1 gene and SNP-calling Fig. 4 Long reads mapping to exons 1–27 of NBPF1 after whole exome sequence capture Enrichment of Comprehensive Cancer panel Long fragment capture protocol generates genes, allowing detection of SNPs reads which can span multiple exons We evaluated Agilent’s ClearSeq Comprehensive Cancer panel using DNA from two BRCA1 We performed sequence capture on human genomic DNA, using Agilent’s SureSelect Human All SNP-carriers with a positive family history of breast cancer (NA13708 and NA13710, Coriell Exon V6, using a randomly fragmented library which had been size-selected using the Blue NIGMS Human Genetic Cell Repository). We performed sequence capture as described, Pippin automated gel fractionation system, leaving only fragments above 6 kb. The library was sequenced the library and basecalled reads using the 1D workflow. We aligned all reads to the then amplified by PCR prior to performing hybrid capture. The library was sequenced on a single reference sequence from Agilent, using BWA with standard parameters, and visualised reads run and the reads were mapped to the (H19). Fig. 4 shows reads mapping to with Savant (Fig. 3a). Figs. 3b and 3c show alignments of reads from exon 16 and exon 13, the neuroblastoma BPF1 gene, and it can be seen that multiple exons are spanned by many of respectively, of the BRCA1 gene. Sequencing errors are distributed randomly, allowing SNPs to the reads, meaning that both exon- and intron-specific variations can be detected and phased be seen clearly in the data. haplotypes can be obtained.

P17013 - Version 4.01 © 2017 Oxford Nanopore Technologies. All rights reserved.