NanoChannel Based Next-Generation Mapping for Interrogation of Clinically Relevant Structural Variations

A Hastie, Ž Džakula, A W Pang, E T Lam, T Anantharaman, M Saghbini, H Cao 1BioNano Genomics, San Diego, CA, USA

Abstract

Structurally complex loci underlie many diseases. These loci can be very challenging to resolve by currently available disease, cerebrovascular disease, atherosclerosis, thrombosis, and stroke. Another important variable length tandem methods such as karyotyping, clinical array, PCR-based tests, and next-generation sequencing. Next-generation repeat is D4Z4, associated with facioscapulohumeral muscular dystrophy (FSHD). FSHD muscular dystrophy is mapping (NGM) by BioNano Genomics Irys® System offers a high-throughput, genome-wide method able to strongly associated with a low copy number (< 10 units), occurring in 95% of FSHD cases. Copy number of tandem interrogate genome structural differences in the range of two kilobase pairs to hundreds of kilobase pairs. The Irys repeats is extremely hard to measure accurately with available methods, but we show that NGM on the Irys System System uses extremely long reads to span interspersed and even long tandem repeats making it ideally suitable for can accurately measure the copy number of the kringle IV domain and D4Z4. The second class of complex structural elucidating the structure and copy number of complex regions of the genome, such as complex pseudogene and variation is those that involve with paralogs such as amylase and UGT2B17, two genes whose copy number paralogous families. Clinically relevant regions often contain genes with paralogs and other complex repetitive have been shown to be involved in human health (testosterone and estradiol metabolism, osteopathic health and graft structures complicating the interpretation of data and diagnosis of disease. We present several examples of genetic versus host disease). We show deletions of UGT2B17 in a family trio and > 10 different structures at the Amylase loci that can be easily interrogated with genome map data including tandem repeats, paralogous gene families, and region. The third class of genomic variation which is very difficult to interrogate are those flanked by segmental loci flanked by segmental duplications in a single NGM run. duplications. These are especially important because spontaneous rearrangements are common between paralogous segmental duplications causing copy number aberrations and translocation, thus resulting in developmental disorders, Some open reading frames or entire genes are amplified with variable copy number such as tRNAs, kringle IV, and such as the 22q11.2 deletion syndrome mediated by segmental duplication rearrangements. We show the assembly of D4Z4. The LPA gene, for example, contains variable copies of a repeat, kringle IV, that results in different lengths of the region, including the normal and pathogenic alleles, using molecules that span and disambiguate the structure of the resultant Lp(a) , and there is a direct correlation between the size of the protein and risk of coronary heart the segmental duplications. We demonstrate that NGM using the Irys System is proving to be a highly accurate method for detection of clinically relevant structural variation.

Methods

1 2 3 4 5

IrysPrep® IrysPrep® IrysChip® Irys® IrysView® and IrysSolve® Isolate High Molecular Weight DNA Label DNA Molecules at Motif-Specific Linearize DNA Molecules in NanoChannels Automatically Image Linearized DNA Assemble, Visualize, and Molecules Locations Molecules Analyze Genome Data

Free DNA Displaced Strand Free DNA Solution DNA in a Microchannel DNA in a Nanochannel

Blood Cell Tissue Microbes

Gaussian Coil Partially Elongated Linearized

Nickase Nick Site Polymerase Recognition Motif

Amylase Structural Variants Gene Deletion in Mother and Son D4Z4 Array Length Measurement Philadelphia – Genomes from the Ashkenazi Trio Haplotype and Paralog Resolution Balanced Translocation

YTHDC1 TMPRSS11E UGT2B17 UGT2B15 UGT2B10 Reference hap4A

BNG based gene model BNG assembly

Genome Molecules

Father map hap1 38 units 133,653,967 hg38

4qB 34 units Chr4 hap2 Reference 14 units Hg19 Chr 9 ABL1 hap4B BioNano Map 23,631,928 Genome BCR map Molecules Hg19 Chr 22 Mother

Genome Molecules map 1 Molecules

hap1 Chr22 Chr9 Reference hg38 Chr10 hap2

100kb Genome The above genome map was de novo assembles from blood from a CML, Son The amylase gene is polymorphic for structural variation. A previous map 2 among other mutations, a chromosome 9/22 translocation was identified that study determined that copy number is associated with body mass index (BMI) fused BCR to ABL to produce the Philadelphia chromosome. The blue but a more recent study questioned that conclusion. However, neither study Molecules BioNano map aligns to BCR on the left and ABL1 on the right. Single investigated balanced structural variation in the population. At BioNano, molecules containing flanking regions on chromosome 9 and 22 of the through the analysis of numerous human genomes, we found that inversions breakpoint are shown below the genome map. A reciprocal translocation is occur frequently in the amylase locus, identifying new types of variants that A 117 kbp deletion shows the missing UDP glucuronosyltransferase 2 family, D4Z4 repeat arrays occur at the subtelomeric region of chromosome 4q and 10q. 4q has two found as well, demonstrating that the translocation is balanced. may explain phenotypic observations. A formal study is needed to correlate our polypeptide B17 gene (UGT2B17). Deletion of UGT2B17 has been reported haplotypes, 4qA and 4qB, if the array length of 4qA falls below ~5 copies, Facioscapulohumeral (FSHD) findings with biological outcome. The top graphic shows some of the copy to result in better quality of osteopathic health as well as higher testosterone muscular dystrophy can occur. In order to diagnose a pathogenic repeat array, the array for each allele number variants studied, while the bottom graphic shows a 2nd generation and estradiol levels. UGT2B17 is believed to produce an important antigen or paralog must be measured and differentiated. The figure shows the differentiation and measurement pedigree. In both cases, there are two alleles with the same copy number involved in graft versus host disease (McCarroll). pattern but with different structures. of each allele on and paralogs on chromosome 10 (this is a non pathologic measurement) in genome maps (blue bars) and single molecules shown below. Conclusions References

1) Cao, H., et al., Rapid detection of structural variation in a using nanochannel-based genome mapping Next generation mapping can be used for the detection of clinically relevant genomic features otherwise difficult to technology. Gigascience (2014); 3(1):34 detect including: 2) Hastie, A.R., et al. Rapid Genome Mapping in Nanochannel Arrays for Highly Complete and Accurate De Novo Sequence Assembly of the Complex Aegilops tauschii Genome. PLoS ONE (2013); 8(2): e55864. 3) Lam, E.T., et al. Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nature • Insertions/CNV (Amylase variants) Biotechnology (2012); 10: 2303 4) Xiao, M., et. al. Rapid DNA mapping by fluorescent single molecule detection. Nucleic Acids Research (2007); 35:e16. 5) Usher et al., Structural forms of the human amylase locus and their relationships to SNPs, haplotypes, and obesity. Nature • SVs involving paralogous regions and segmental duplications (UGT2B17) Genetics (2015); 47(8):921-5. Genomics. All rights reserved. Nat Genet. 2009 Dec;41(12):1341-4. doi: 10.1038/ng.490. • Tandem repeat arrays (D4Z4) 6). McCarroll, S.A., et al., Donor-recipient mismatch for common gene deletion polymorphisms in graft-versus-host disease. Nat Genet. (2009) 2009 Dec;41(12):1341-4. BioNano 7) Menard, V., et al., Copy-number variations (CNVs) of the human sex steroid metabolizing genes UGT2B17 and UGT2B28 and • Balanced events (BCR/ABL and Amylase) their associations with a UGT2B15 functional polymorphism. Hum. Mutat. (2009) 30: 1310-1319. ©2015

.