Using Long Nanopore Reads to Delineate Structural Variants (Svs)
Total Page:16
File Type:pdf, Size:1020Kb
Using long nanopore reads to delineate structural variants (SVs) in the human genome SVs, including large deletions, duplications, inversions, translocations and copy-number changes are abundant in large genomes, and require long reads for precise characterisation Contact: [email protected] More information at: www.nanoporetech.com and publications.nanoporetech.com Unique Repeat Unique Repeat Unique a) b) a) b) sequence 1 1 sequence 2 2 sequence 3 1,000 60 Short reads Insertions Long A B C D E Reference chromosome 1 40 reads 800 Deletions > 50 bp Short-read assembly 20 Collapsed repeat consensus Unique contig 1 Long-read Bases sequenced (Mb) assembly 600 0 Unique contig 1 Unique contig 3 0 10 20 30 40 V W X Y Z Reference chromosome 2 Single, fully-resolved contig Count Read length (kb) c) > 50 bp 400 chr7 (q33) 7p21.3 15.321.1 15.3 7p14.3 7p14.1 13 11.2 11.21 11.22 11.23 7q21.11 q21.3 7q22.1 7q31.1 7q33 7q34 7q35 36.1 36.3 Scale 50 kb hg38 chr7: 134,550,000 134,600,000 134,650,000 134,700,000 Inversion A D C B E GENCODE v24 comprehensive transcript set (only Basic displayed by default) 200 AKR1B10 AKR1B15 BGPM CALD1 AKR1B15 BGPM Deletion BGPM A B C E AC009276.4 Duplication A B C C C D E 0 1,000 10,000 20,000 30,000 Translocation V W C D E + A B X Y Z Event size (bp) Adapted from Huddleston, J. et al. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Research 27 (5) 677-685 (2016). Fig. 1 Structural variation a) classes b) variant size and frequency in the human genome Fig. 2 Read length a) typical distribution b) assembly c) mapped long human MinION reads Structural variation: large inversions, Nanopore sequencing can give extremely deletions, duplications and translocations long reads without size selection Structural variation (SV) refers to inversions, insertions, deletions and translocations > 50 bp in The read length that can be obtained from nanopore sequencing is limited only by the integrity of length (Fig. 1a). SV encompasses millions of bases of DNA per human genome, can span tens the DNA extracted from the sample and the care taken during library preparation. The read- of kilobases containing entire genes and their regulatory regions (Fig. 1b) and contributes length distribution corresponds closely to the fragment-length distribution of the sample DNA. substantially to genome variation. SV can alter the copy number of dosage-sensitive genes, can When starting with high-molecular weight genomic DNA, it is straightforward to obtain reads that unmask recessive alleles and can disrupt the integrity or regulation of a gene, all of which can are tens of kilobases in length (Fig. 2a). The longer the sequence read, the longer the repetitive cause genetic disease. The study of SVs is challenging because they frequently arise in repetitive region or SV that can be resolved, allowing the correct structure of the variant to be elucidated regions of the genome, and can have highly complex structures. Short-read sequencing (Fig. 2b). Recent increases in throughput make it realistic to sequence whole human genomes technologies cannot span long SVs, leading to incomplete reference assemblies. on a MinION (Fig. 2c). Scale 10 kb hg19 a) 15q21.2: chr10 p14 p13 12.1 q21.1 q21.3 q23.1 q25.1 26.3 chr15: 52,265,000 52,275,000 52,285,000 14-591_wt Scale 20 kb hg19 Scale 10 kb hg19 Scale 5 kb hg19 10q23.1 10q23.1 10q24.33 14-59|14-591_del chr10: 85,445,000 85,465,000 chr10: 86,220,000 86,240,000 chr10: 105,450,000 105,460,000 SH3PXD2A No UCSC genes CCSER2 SH3PXD2A CCSER2 SH3PXD2A CCSER2 SH3PXD2A F0182|REACH000319_wt CCSER2 SH3PXD2A R1 R4 R2 R3 R6 R5 R7 F0182|REACH000319_del R8 Mapped reads R10 Mapped reads R9 Mapped reads R11 R13 R14 R3 R1 R2 R7 R4 R5 R9 R6 R10 R14 R8 R13 F0208|REACH000426_wt R11 Breakpoint 1 Breakpoint 2 Breakpoint 3 F0208|REACH000426_del b) CCSER2 SH3PXD2A LEO1 10q23.1 10q23.31 10q23.32 10q24.1 10q22.2 10q24.32 10q25.1 Layered H3K27Ac Complex rearrangement or cut-and-paste transposition DNase clusters Txn Factor ChIP 10q23.1 10q23.31 10q23.32 10q24.1 10q22.2 10q24.32 10q25.1 Adapted from Brandler, W. et al. Paternally inherited cis-regulatory structural variants are associated with autism. Science 360 (6386) 327-331 (2018). Adapted from Brandler, W. et al. Paternally inherited cis-regulatory structural variants are associated with autism. Science 360 (6386) 327-331 (2018). Fig. 3 Confirmation of LEO1 breakpoints and parental origin with nanopore reads Fig. 4 Detection of SVs by whole-genome sequencing a) mapped reads b) SV resolution Deletion of a regulatory element in autistic Using long-read whole-genome sequencing patients validated by long nanopore reads to resolve SVs in the human genome To demonstrate the utility of long nanopore reads in resolving structural variants, we amplified One individual who participated in the autism spectrum disorder study described in Fig. 3 had and barcoded patient and wild-type alleles from three families with known deletions in the LEO1 been diagnosed with depression/anxiety. She appeared to have an SV in chromosome 10 which locus on chromosome 15, and sequenced them on a flowcell. Deletion amplicons were had been identified as a complex break-end by Lumpy analysis of paired-end Illumina data. The approximately 10 kb in length, and the amplifiable wild-type amplicons spanned up to 20 kb. SV was not found in the individual’s parents, so was taken to be de novo, but the precise LEO1 encodes an RNA polymerase-associated protein which is expressed during foetal brain structure was unclear. We performed whole-genome library prep using an LSK-108 kit, and development. For the deletions, we created consensus reference haplotypes using Nanopolish sequenced the library on a FLO-MIN106 flowcell, generating approximately 24 Gb of sequence and realigned reads to these references for SNP-calling with MUMmer. All three deletions, as data. The long reads allowed us to fully resolve the variant, and nanopore data was phased well as the parental origin, were successfully validated by the nanopore reads (Fig. 3). using WhatsHap, revealing the individual’s mother to be the parent of origin of the SV. P17009 - Version 5.0 © 2018 Oxford Nanopore Technologies. All rights reserved..