Hum Genet (2016) 135:1269–1278 DOI 10.1007/s00439-016-1720-4

ORIGINAL INVESTIGATION

A 1.35 Mb DNA fragment is inserted into the DHMN1 on 7q34–q36.2

Alexander P. Drew1 · Anthony N. Cutrupi1,3 · Megan H. Brewer1,3 · Garth A. Nicholson1,2,3 · Marina L. Kennerson1,2,3

Received: 5 April 2016 / Accepted: 25 July 2016 / Published online: 3 August 2016 © Springer-Verlag Berlin Heidelberg 2016

Abstract Distal hereditary motor neuropathies predomi- for hereditary motor neuropathies and highlights the grow- nantly affect the motor neurons of the peripheral nervous ing importance of interrogating the non-coding genome for system leading to chronic disability. Using whole genome SV mutations in families which have been excluded for sequencing (WGS) we have identified a novel structural genome wide coding mutations. variation (SV) within the distal hereditary motor neuropa- thy locus on chromosome 7q34–q36.2 (DHMN1). The SV involves the insertion of a 1.35 Mb DNA fragment Introduction into the DHMN1 disease locus. The source of the inserted sequence is 2.3 Mb distal to the disease locus at chromo- The distal hereditary motor neuropathies (dHMN) are a some 7q36.3. The insertion involves the duplication of five group of progressive neurodegenerative disorders that pri- (LOC389602, RNF32, LMBR1, NOM1, MNX1) and marily affect the motor neurons of distal limbs without partial duplication of UBE3C. The genomic structure of affecting sensory neurons. The disorder is a length depend- genes within the DHMN1 locus are not disrupted by the ant neuropathy in which the longest nerves are initially insertion and no disease causing point mutations within affected. Chronic denervation leads to paresis and atro- the locus were identified. This suggests the novel SV is the phy of distal muscles, generally affecting lower limbs to a most likely DNA mutation disrupting the DHMN1 locus. greater extent than upper limbs. Distal HMN is a geneti- Due to the size and position of the DNA insertion, the cally heterogeneous group of diseases broadly classified (s) directly affected by the genomic re-arrangement by mode of inheritance, age of onset or additional clinical remains elusive. Our finding represents a new genetic cause features. Subtypes are defined by the specific gene mutated or a mapped disease locus (Drew et al. 2011; Rossor et al. 2012). Currently, there are 11 HMN types represented Electronic supplementary material The online version of this by 11 causative genes and five disease loci (Rossor et al. article (doi:10.1007/s00439-016-1720-4) contains supplementary 2012). material, which is available to authorized users. Genetic linkage analysis in a large Australian family * Alexander P. Drew with dHMN (F-54) mapped a disease locus to a 12 Mb [email protected] interval on chromosome 7q34–q36.2 (DHMN1) (Gopi- * Marina L. Kennerson nath et al. 2007). The family is described as dHMN type [email protected] I (DHMN1: OMIM %182960) or dHMN with pyramidal signs (Rossor et al. 2012). The neuropathy in this family 1 Northcott Neuroscience Laboratory, ANZAC Research Institute, University of Sydney, Concord, Sydney 2139, is characterised by juvenile onset and autosomal dominant Australia inheritance, with the lower limbs primarily affected. 2 Molecular Medicine Laboratory, Concord Hospital, Concord, Previous work to identify the pathogenic mutation in Sydney 2139, Australia F-54 has examined coding sequences of positional can- 3 Sydney Medical School, University of Sydney, Sydney, didate genes and copy number variations (CNV) in the Australia DHMN1 locus. Initially, Sanger sequencing of exons and

1 3 1270 Hum Genet (2016) 135:1269–1278 flanking intronic sequences ruled out mutations in 33 Isaac aligner (Illumina). Variant calls and annotations by candidate genes (Drew 2012; Gopinath et al. 2007). Next Macrogen were made using Isaac variant caller (Illumina), generation sequencing of targeted capture of the DHMN1 CNVseg (Ivakhno et al. 2010) and Manta (Chen et al. locus with the 454 platform (<20 read depth) and the Illu- 2016). Variant reports from Macrogen for SNP/INDEL and × mina Solexa platform (>220 read depth) in one affected CNV were further annotated using ANNOVAR software × individual and whole exome sequencing (WES) of an addi- (Wang et al. 2010). We selected SNP/INDELS within the tional two affected and one unaffected individuals (40 DHMN1 locus using the vcfintersect program from vcflib × depth) did not identify any disease candidate SNPs or (https://github.com/vcflib/vcflib). For in-depth investiga- INDELS (Drew 2012). Cytogenetic analysis using Trypsin tion of CNV we performed copy number variation analy- (GTL) banding did not detect any microscopic structural sis based on read depth of WGS data using the software variation (SV). Pathogenic copy number variation (CNV) CNVNator (Abyzov et al. 2011) with analysis of chromo- within the DHMN1 locus was excluded with an Agilent some 7 using a bin size of 250 bp. CNV were visualised SurePrint G3 custom CGH microarray specifically target- using CNVNator and ROOT (http://root.cern.ch). ing the DHMN1 locus (Drew 2012). This combination of sequencing and CNV analysis ruled out SNP, INDEL CNV analysis from WES and CNV mutations within the coding sequences of the DHMN1 locus and large microscopic SV. Retrospective CNV analysis of historic WES data from Despite this extensive investigation of genes, the F-54 was carried out using the software ExomeDepth DHMN1 locus remained genetically unsolved. Here we (Plagnol et al. 2012). In brief, WES was carried out by present analysis using whole genome sequencing (WGS) Axeq Technologies (Korea) using the TruSeq Exome with a focus on querying the patient DNA for SVs to iden- Library Prep Kit (Illumina) and sequenced using the tify the likely pathogenic mutation in family F-54 within HiSeq 2000 Sequencer (Illumina) as 100 bp paired-end the DHMN1 locus. reads to 40 read depth. Sequencing reads were assem- × bled to the Human Feb. 2009 (GRCh37/hg19) assembly using BWA-MEM (Li 2013) and bam files were sorted and Materials and methods indexed using Samtools (Li et al. 2009a). Each F-54 patient exome was compared to panel of six control exomes Study subjects matched from the same sequencing batch as per the Exo- meDepth guidelines. Patient ascertainment and blood sample collection was performed with informed consent according to protocols PCR and Sanger sequencing approved by the Sydney Local Health District Human Eth- ics Review Committee, Concord Repatriation General Hos- The genotyping assay using multiplex PCR was carried pital, Sydney, Australia (HREC/11/CRGH/105). Genomic out in a 10 μL reaction volume containing 1 MyTaq × DNA was extracted from whole blood using the PureGene HS Red Mix, 10 ng DNA template, 4 pmol each of for- kit (Qiagen). Clinical history, neurological examination and ward and reverse primer (A, C, D and F) and 8 pmol of the neurophysiology studies for members of the kindred have dual forward/reverse primer (B and E). Primer sequences been previously described (Gopinath et al. 2007). Four (5′–3′) include, Assay 1: Primer A, TCATGAACGCTG members from F-54 were selected for this study. The three TCGAATTT; Primer B, TCCAAAACCAAACAAGG affected individuals had different haplotypes for the normal ACA; Primer C, GTCACATGGTGAAGGCAGTC. Assay 2: chromosome at the DHMN1 locus and were therefore con- Primer D, TGTCCTTTGAATTCATCCATGT; Primer E, sidered to be as distantly related as possible. An unaffected TCAGCCTCTCAGGTTCAGG; Primer F, AACGGGG married-in parent of one of the affected individuals under- TCTCCAACAAATA. PCR cycling was performed using going WGS was chosen by DNA availability. a Mastercycler pro S (Eppendorf) and touchdown proto- col consisting of: initial denaturation 95 °C for 5 min; 11 Whole genome sequencing cycles of 95 °C for 30 s, 72 °C (reducing by 1.5 °C per cycle) for 30 s, and 72 °C for 30 s; 30 cycles of 95 °C for WGS was outsourced to Macrogen (South Korea). Sam- 30 s, 60 °C for 30 s, and 72 °C for 30 s; final extension at ple library construction was performed using the Illumina 72 °C for 5 min. PCR amplicons were prepared with stand- TruSeq Nano DNA sample preparation protocol. Libraries ard PCR protocols. Samples amplifying both wild type were sequenced using the Illumina HiSeq Ten Sequencer (wt) and mutant PCR amplicons were gel extracted using × to a target depth of 30 . Sequence reads were mapped the Isolate II PCR and gel kit (Bioline).and outsourced for × to the Human Feb. 2009 (GRCh37/hg19) assembly using sequence validation using the BigDye Terminator Cycle

1 3 Hum Genet (2016) 135:1269–1278 1271

Table 1 Rare nonsynonymous SNVs identified in the DHMN1 locus from WGS data after patient/control segregation analysis Gene Transcript ID Nucleotide Amino acid Frequency 1000 Frequency dbSNP142 variation variation genomes ExAC

EPHB6 NM_004445 c.G364A p.G122S 0.003 0.007 rs8177173 GIMAP1–GIMAP5 NM_001199577 c.C499T p.R167C 0.006 0.006 rs9657892 TMEM176B NM_014020 c.C538T p.R180W 0.020 0.033 rs17256042 ABP1 NM_001272072 c.C995T p.S332F 0.053 0.060 rs1049742 ASIC3 NM_004769 c.C865T p.P289S 0.004 0.009 rs114024820

Sequencing protocols at the ACRF Facility, Garvan Insti- INDELs were selected. After filtering for rare variants with tute of Medical Research (Australia). 1000 Genomes frequency of 0.05 or less, 199 SNVs and 146 INDELs remained. Of these, five variants had a puta- Agarose gel electrophoresis tive pathogenic function (Table 1). However, these SNVs were all reported in the dbSNP142 and ExAC databases Agarose gels [1.5 % (w/v)] were prepared in 1 TAE in multiple populations, making them too common and × buffer with 6–10 % (v/v) SYBR Safe DNA gel stain. unlikely to be the pathogenic mutation in F-54. PCR products (3–5 μL) were size fractionated at 90 V 1 for 40 min (40 V cm− ). DNA was visualised with a Safe Analysis of CNV in the DHMN1 locus using WGS data Imager Transilluminator 2.0 (Invitrogen) and the gel image was recorded using a Canon PhotoShot S5-IS digital cam- Based on the CNV analysis using CNVSeg, a total of four era with Hoya O(G) filter. CNV localising within DHMN1 locus (Table S1) were identified in the three affected and one unaffected individu- Hi‑c interactions als. No CNV was common to all three affected individuals. Based on the CNV analysis using CNVNator, we identified Hi-c interactions were examined with the Interactive Hi-C 55–75 deletions and 15–18 duplications per person within Data Browser (http://www.3dgenome.org) using data for the DHMN1 locus. Segregation analysis of the CNVs iden- the SK-N-DZ and SK-N-MC neuroblastoma cell lines. Hi-c tified one duplication and three deletions common to the data was available at a 40 kb resolution for these two cell affected individuals and absent from the control (Table S2). lines. All but one of the duplications and deletions were identi- fied by both CNVSeg and CNVNator. The CNVNator analysis found two of the four CNVs identified by previous Results aCGH analysis (Drew 2012). The two CNV that were not identified by CNVNator (1.1 and 2.6 kb) were smaller than Whole genome sequencing in F‑54 the CNVs identified (178 and 5.2 kb) and may be below the limit of detection with the 150 bp sequencing reads used The genomes of three patients with DHMN1 (II:2, III:6 and here (Abyzov et al. 2011). III:11) and one unaffected married-in parent (II:5) from our previously published F-54 pedigree (Gopinath et al. 2007) WGS identifies a novel structural variation in the were sequenced as 150 bp paired-end sequence reads with chromosome 7q34–q36.2 disease locus 118,065–126,354 Mb of total sequencing yield. Genome wide average mapped read depth was 32–35 for the four Genomic SVs within the DHMN1 locus that was identi- × samples. Genome coverage for the four samples was 98.8– fied by Manta software were examined as priority disease 99.4 % at 1 read depth and 94.0–96.8 at 20 read depth. candidates. We identified nine SVs that were present in all × × three affected individuals and absent in the unaffected con- Analysis of SNV/INDELs in the DHMN1 locus using trol individual (Table 2). Seven of these SVs were previ- WGS ously reported in the Database of Genomic Variation [DGV (MacDonald et al. 2014)] and considered to be nonpatho- The WGS identified a total of 16,938–18,429 single nucle- genic polymorphisms. The remaining two SVs (inversions otide variants (SNVs) and 2378–2612 INDELs per sam- 7 and 8, Table 2) had overlapping genomic coordinates and ple within the DHMN1 locus. Based on their segregation proximal break points adjacent to each other. We there- pattern within the family WGS data 2884 SNVs and 400 fore predicted the SV calls, were the result of different

1 3 1272 Hum Genet (2016) 135:1269–1278

Table 2 Summary of SV ID SV type Start position End position SV length (bp) Present in identified by Manta software DGV within the 7q34–q36.2 DHMN1 locus 1 Deletion 144,770,355 144,775,177 4822 Yes 2 Deletion 147,355,064 147,355,378 314 Yes 3 Deletion 150,463,049 150,463,390 341 Yes 4 Inversion 151,010,035 151,012,105 2070 Yes 5 Inversion 151,010,436 151,012,102 1666 Yes 6 Deletion 153,064,510 153,064,562 52 Yes 7 Inversiona 153,333,424 156,992,436 3,659,012 No 8 Inversiona 153,334,580 155,649,164 2,314,584 No 9 Deletion 153,756,228 153,759,636 3408 Yes

a Novel unreported SV

Table 3 Split-reads and Individual Read type Sequence read number discordant read pairs identified from F-54 WGS data. The split- Proximal breakpoint Distal breakpoint reads located at the DHMN1 insertion break points were II:2 (affected) Split-reads (of read total) 4 (37) 14 (35) counted Discordant read pairs 6 4 III:6 (affected) Split-reads (of read total) 5 (34) 10 (39) Discordant read pairs 6 11 III:11 (affected) Split-reads (of read total) 0 (32) 12 (35) Discordant read pairs 10 9 II:5 (unaffected) Split-reads (of read total) 0 (43) 0 (44) Discordant read pairs 0 0 annotations of the same SV event. Both inversions 7 and 8 genomic orientation, location and source of the 1.35 Mb spanned the distal end of the disease locus. DNA insertion for a novel SV within the DHMN1 locus.

Split and discordant reads identify an insertion into the Sanger sequencing confirms the SV breakpoints of the DHMN1 locus 1.35 Mb DHMN1 insertion

Sequence reads flanking the predicted inversion break PCR assays were designed to detect the proximal and distal points were assessed for discordant paired-end reads and ends of the DHMN1 insertion in patients. Each assay was split-reads to identify and interpret the correct SV struc- a multiplex PCR using three primers which co-amplified ture. Split-reads and discordant read pairs were identi- the wild type and mutant sequences containing the inser- fied in the three affected individuals and absent in the tion (Fig. 1a, b). Sanger sequencing of the translocation unaffected control (Table 3). Manual annotation of inver- break points confirmed the genomic re-arrangement pre- sions 7 and 8 confirmed they were the same SV event. dicted from the WGS read alignments (Fig. 2a). The Sanger The reads predicted a nonreciprocal intra-chromosomal sequencing also showed extra bases of DNA inserted at translocation of a 1.35 Mb DNA fragment into the dis- each end of the breakpoint; 13 bases at the proximal end tal end of the DHMN1 locus had occurred (Fig. 1a). The and 21 bases at the distal end (Fig. 2a). The source of these source of the inserted sequence is located 2.3 Mb dis- extra DNA sequences is unclear as each one had multiple tal to the disease locus at chromosome 7q36.3 and was chromosomal alignments. inserted in the reverse orientation. A small 1.15 kb dele- tion (chr7:g.153333424_153334580del) occurs at the The DHMN1 insertion segregates with the disease insertion site. The translocated DNA is duplicated from in family F‑54 and is absent in neurologically normal chr7:g.155649165_156992436 and inserted into the controls 1.15 kb deletion site. We will refer to this genomic arrange- ment as the DHMN1 insertion. The orientation and distance Both breakpoint PCR assays were used to test segregation between the discordant paired-end reads supported the of the DHMN1 insertion with the disease phenotype in

1 3 Hum Genet (2016) 135:1269–1278 1273

Fig. 1 Split and discordant A reads identify a novel DNA insertion into the DHMN1 disease locus on chromosome 7q34–q36.2 caused by an intra-chromosomal transloca- tion. a Schematic overview of insertion site at chromosome 7q36.2 with wild type (top) and mutant (bottom) alleles shown. The location of primers for PCR assay 1 (distal breakpoint) and assay 2 (proximal breakpoint) are displayed. Assay 1 primer A is deleted by the SV event, indicated by a crossed out primer for the mutant allele. The translocated region is B depicted in orange, dotted red lines indicate the breakpoints. b PCR genotyping assays amplify both wild type sequence and mutant sequences spanning the proximal and distal breakpoints. Predicted amplicon sizes for unaffected (WT) and affected (Aff) individuals are depicted below the gel image. These assays confirmed segregation of the translocation insertion with the disease in family F-54 and excluded the SV genomic re-arrangement in control chro- mosomes

family F-54. The pedigree for family F-54 is unchanged examination of ENCODE data using the UCSC Genome from our previous work (Gopinath et al. 2007). The SV Browser shows the DHMN1 insertion also includes was present in all ten DHMN1 patients and absent in the enhancers and promoters (Fig. 2b). Several of these 15 unaffected individuals in the family (Fig. S1). This are long range enhancer motifs for the SHH gene, the confirmed the SV is carried on the chromosome 7q34– first gene adjacent to the donor region at 7q36.3 located q36.2 DHMN1 disease haplotype previously defined by between the RNF32 gene and the distal breakpoint (Ander- linkage analysis (Gopinath et al. 2007). A panel of 1054 son et al. 2014). The insertion site within the DHMN1 neurologically normal control (527 indi- locus is a gene desert located between the genes DPP6 and viduals) were also screened for the insertion breakpoints ACTR3B (Fig. 2b). No genes within the DHMN1 locus are and only the wild type genotype was amplified. Therefore, directly disrupted by the DHMN1 insertion. The DPP6 the DHMN1 insertion into the disease locus is unique to is the closest gene, located approximately 250 kb distal family F-54. to the insertion site, whilst ACTR3B is located approxi- mately 775 kb proximal to the insertion site. The closest The DHMN1 insertion changes the genomic landscape annotated genomic feature is the long non-coding RNA of the DHMN1 locus LINC01287, which is 225 kb proximal to the insertion site on the negative strand. Based on the Hi-c data from The 1.35 Mb DHMN1 insertion leads to the duplica- two neuroblastoma cell lines, the DHMN1 insertion site tion of five -coding genes (LOC389602, RNF32, appears to be within a topological domain (Fig. S2). The LMBR1, NOM1, MNX1), partial duplication of one gene inserted sequence has a high interaction frequency and is (UBE3C) and duplication of three long non-coding RNA’s likely to greatly increase the topological domain size at the (LOC285889, LINC01006, MNX1-AS1) (Fig. 2b). Visual insertion site.

1 3 1274 Hum Genet (2016) 135:1269–1278

A

C C A T A G T A GGT T T A G G T G C G GGC T C C T A G C A C T T T GGG A G G C

A T G C TTC C A G A G T A T G G C T G T A A G T G A C G T C T G C T A G T A T T

B

Fig. 2 a Top panel sequence alignment of a DHMN1 patient 7q36.2 (based on UCSC Genome Browser; http://genome.ucsc.edu/) sequence with the chromosome 7q36.2 (insertion site; black text) and (Kent et al. 2002). Top panel genes (full height exon symbols) and the chromosome 7q36.3 (donor sequence; orange text) at the proxi- non-coding RNAs (half height exon symbols) are coloured black for mal and distal ends of the insertion. Blue text indicates additional chromosome 7q36.2 or orange for the DHMN1 insertion. Arrows nucleotide bases, of unknown origin that make up the proximal and indicate the direction of gene transcription. Lower panels; ENCODE distal breakpoints. Lower panel Sanger sequencing traces span- tracks (H3K27Ac Mark, DNAseI Hypersensitivity clusters and Tran- ning the proximal and distal end of the DHMN1 insertion showing scription Factor ChIP-seq) showing evidence for active enhancers the breakpoints depicted above. b Schematic diagram showing the within the DHMN1 insertion and flanking regions (Rosenbloom et al. gene re-arrangement due to the DHMN1 insertion into chromosome 2013)

1 3 Hum Genet (2016) 135:1269–1278 1275

400 300 350 250

h 300 h 250 200

200 150 150 Total Read Dept 100 100 Total Read Dept

50 50

6 6 0 10 0 10 154.5 155 155.5 156 156.5 157 154.5 155 155.5 156 156.5 157 Position Chromosome 7 Position

400 350 350 300 300 250 250 200 200 150 Total Read Depth

Total Read Depth 150 100 100 50 50

0 6 6 10 0 10 154.5 155 155.5 156 156.5 157 154.5 155 155.5 156 156.5 157 Chromosome 7 Position Chromosome 7 Position

Fig. 3 CNV analysis across the chromosome 7q36.3 donor site an increase in read depth corresponding to an additional copy of the of the DHMN1 insertion using CNVNator. The total read depth is DHMN1 insertion (chr7:g.155649165_156992436) compared to the plotted against chromosome 7 position (black) and normalised read unaffected control. The DHMN1 disease locus is located approxi- depth is indicated by the green lines. Three affected individuals have mately 2.3 Mb proximal to the duplicated region

Copy number variation analysis confirms the source We retrospectively examined our whole exome sequenc- 7q36.3 DNA is duplicated ing data from F-54 using the R software package Exome- Depth. The original WES data from Drew (2012) did not As the genomic location of the DHMN1 insertion source have suitable controls available for this analysis, however, (chromosome 7q36.3) was located outside the disease alternative WES data for five patients from F-54 were locus, we predicted the DHMN1 insertion would be sta- obtained and duplication of some of the genes were iden- ble within the germline of family members in F-54. tified (Table S3). The RNF32 and LMBR1 genes were Duplication of the donor region was not identified by the found to be duplicated (three copies) in all five patients Manta software, and this may have occurred because the while NOM1 and UBE3C were duplicated in four of the donor region was included in the incorrect inversion call five patients. The genes MNX1 and LOC389602 were not (Table 2). Additionally, the duplicated genes were not duplicated. The MNX1 gene was poorly covered by WES identified by CNVSeg software (Table S1). Therefore, we and LOC389602 was not targeted by the exome capture kit examined the copy number of the donor region based on used for library construction. the read depth of WGS data using the software CNVNator (Fig. 3). CNVNator was selected as the program is specifi- cally designed to predict copy number variation rather than Discussion other classes of SV. The analysis showed there are three copies of the translocated 1.35 Mb interval. This suggests We have identified a novel SV resulting in the inser- the donor region at 7q36.3 is intact. We found CNVNator tion of a 1.35 Mb DNA fragment into the DHMN1 dis- to be more sensitive than Manta and CNVSeg for detecting ease locus on chromosome 7q36.2. This SV is likely to the duplicated region. have arisen from an intra-chromosomal translocation of

1 3 1276 Hum Genet (2016) 135:1269–1278 chromosome 7q36.3 into chromosome 7q36.2. Through A change in gene expression is also possible through a historic homologous recombination in this family, only position effect, where the DHMN1 insertion alters the posi- the DHMN1 insertion site is linked to the disease, with tion of a nearby gene relative to its long range promoter/ the donor site of the inserted DNA on chromosome enhancer element(s). The closest gene to the DHMN1 7q36.3 being normal in patients. No pathogenic SNVs, insertion is DPP6, which is orientated in a direction that INDELs or CNVs within the disease locus were identified is likely to be disrupted by the 1.35 Mb insertion. DPP6 is using WGS, thereby confirming our previous work using strongly expressed in the brain and spinal cord which are Sanger sequencing, WES, and array CGH (Drew 2012). tissues relevant to the DHMN1 phenotype. Additionally, Therefore, this novel SV is the most likely cause of dis- in genome wide association studies of an Irish population, ease within the chromosome 7 DHMN1 locus. The novel DPP6 was associated with susceptibility to sporadic amyo- DHMN1 insertion expands the spectrum of genetic muta- trophic lateral sclerosis, a fatal form of motor neuron dis- tions causing dHMN which up until now have included ease (Chen et al. 2012; Cronin et al. 2009; Fogh et al. 2011; point mutations or small CNVs affecting the coding Li et al. 2009b), and therefore is a high priority candidate regions of genes (Drew et al. 2011; Rossor et al. 2012). gene for DHMN1. Whilst the DNA re-arrangement has been identified in Ectopic expression of genes within the disease locus family F-54, we do not currently know which gene(s) are is also a plausible mechanism. The DHMN1 insertion directly affected by this SV. contains regulatory elements for the five intact genes and Unlike dHMN, other inherited peripheral neuropa- one partial gene. The inserted DNA also contains a large thies (IPN) are known to be caused by structural variation. genomic region containing elements known to regulate The most common cause of Charcot-Marie-Tooth disease expression of the SHH gene (Anderson et al. 2014). This (CMT), a motor and sensory peripheral neuropathy, is the region accounts for approximately half of the DHMN1 duplication of a 1.5 Mb genomic region on chromosome insertion. It is possible that these regulatory elements may 17p11.2 (CMT1A; OMIM#118220) (Lupski et al. 1991). be incorrectly driving neuronal expression of genes in the The cause of disease is a dosage effect due to the dupli- motor neurons of DHMN1 patients. Hi-c data suggests cation of the PMP22 gene (Matsunami et al. 1992; Patel the DHMN1 insertion could possibly alter the topological et al. 1992; Timmerman et al. 1992; Valentijn et al. 1992). domain structure which may influence expression of nearby The reciprocal deletion of 17p11.2 causes hereditary neu- genes (ACTR3B and DPP6). ropathy with liability to pressure palsies (HNPP; OMIM A fusion protein between the partial transcript UBE3C #162500) (Chance et al. 1993) through monosomy of the and a gene on chromosome 7q36.2 is a possible disease PMP22 gene. Apart from the CMT1A and HNPP duplica- mechanism. The inserted UBE3C partial transcript would tion/deletion, the contribution of SV to the disease burden be transcribed from the negative strand. The nearest gene of IPN remains relatively unstudied compared to the role of on the negative strand of chromosome 7q36.2, (XRCC2) coding point mutations. is located 950 kb from the insertion site and the genomic The genomic arrangement of the 7q34–q36.2 DHMN1 distance greatly exceeds the length of introns present in disease locus is now very different to the normal genome. the (Sakharkar et al. 2004). Therefore, this We postulate that 1.35 Mb DHMN1 insertion may cause makes aberrant gene splicing between UBE3C and XRCC2 disease by three possible disease mechanisms: (1) trisomy unlikely. A new UBE3C transcript may be formed by fusion of the intact genes introduced by the insertion leading to with non-coding regions if the transcript does not undergo altered gene dosage; (2) transcriptional dysregulation of nonsense-mediated mRNA decay (Frischmeyer and Dietz the genes nearby the insertion by either a position effect or 1999). inserted regulatory elements; or (3) fusion between The size of the DHMN1 insertion makes the study of the partial transcript UBE3C and an adjacent gene on chro- gene expression or other functional effects challenging. mosome 7q36.2. Gene expression is often tissue specific and access to the A gene dosage effect, with over expression caused appropriate tissues for analysis can be limiting. Patient spi- by trisomy of the genes introduced into the DHMN1 dis- nal cord can only be obtained from post-mortem tissue, so ease locus, is one potential mechanism of disease. This functional studies of neuropathy gene mutations are often mechanism would be similar to the disease mechanism in performed using lymphoblasts (Zimon et al. 2012), fibro- CMT1A, with the duplication of PMP22 causing a gene blasts (Echaniz-Laguna et al. 2013; Kennerson et al. 2010) dosage effect. Several of the five intact genes present in the or sural nerve biopsies (Higuchi et al. 2016). The DHMN1 inserted DNA fragment are normally expressed in neuronal insertion, makes using traditional mammalian expres- tissues including brain and spinal cord, and therefore rep- sion systems unsuitable as the genomic re-arrangement is resent plausible candidates for a gene dosage mechanism physically too large for current cloning expression systems. (Wu et al. 2009). Many of the candidate genes are highly expressed in brain

1 3 Hum Genet (2016) 135:1269–1278 1277 and spinal cord and not expressed in fibroblasts and lymph- genetic variants with amyotrophic lateral sclerosis in a Chinese oblasts. Therefore, it is likely that studies using induced population. Neurobiol Aging 33(2721):e3–e5. doi:10.1016/j. neurobiolaging.2012.06.004 pluripotent stem cell (iPSC) derived motor neurons from Chen X, Schulz-Trieglaff O, Shaw R, Barnes B, Schlesinger F, Kall- patient and control fibroblasts will be required to identify berg M, Cox AJ, Kruglyak S, Saunders CT (2016) Manta: rapid the causative gene underlying the DHMN1 phenotype. detection of structural variants and indels for germline and can- Generating iPSC derived motor neurons will facilitate the cer sequencing applications. Bioinformatics 32:1220–1222. doi:10.1093/bioinformatics/btv710 identification of the disease gene by enabling the analysis Cronin S, Tomik B, Bradley DG, Slowik A, Hardiman O (2009) of gene expression in a relevant tissue that maintains the Screening for replication of genome-wide SNP associations disease-associated genetic background of the DHMN1 in sporadic ALS. Eur J Hum Genet 17:213–218. doi:10.1038/ patients (Saporta et al. 2015). ejhg.2008.194 Drew AP (2012) Genetics of distal hereditary motor neuropathies. Research into SV as a disease mechanism in IPN has PhD Thesis, University of Sydney been limited by the available techniques. WGS is a power- Drew AP, Blair IP, Nicholson GA (2011) Molecular genetics and ful tool that allows SV to be robustly analysed alongside mechanisms of disease in distal hereditary motor neuropa- SNVs/INDELs as a cause of disease. Small nuclear fami- thies: insights directing future genetic studies. Curr Mol Med 11:650–665 lies with multiple affected members across several genera- Echaniz-Laguna A, Ghezzi D, Chassagne M, Mayencon M, Padet tions, in which genome wide coding mutations have been S, Melchionda L, Rouvet I, Lannes B, Bozon D, Latour P, excluded, will be suitable unsolved candidate families to Zeviani M, Mousson de Camaret B (2013) SURF1 deficiency target for this type of analysis. With up to 33 % of periph- causes demyelinating Charcot-Marie-Tooth disease. Neurology 81:1523–1530. doi:10.1212/WNL.0b013e3182a4a518 eral neuropathy cases unsolved after testing for mutations Fogh I, D’Alfonso S, Gellera C, Ratti A, Cereda C, Penco S, Corrado in known genes (Saporta et al. 2011), SV is likely to be a L, Soraru G, Castellotti B, Tiloca C, Gagliardi S, Cozzi L, Lup- more common cause of disease than currently realised. ton MK, Ticozzi N, Mazzini L, Shaw CE, Al-Chalabi A, Powell In conclusion, we have identified a 1.35 Mb insertion J, Silani V (2011) No association of DPP6 with amyotrophic lat- eral sclerosis in an Italian population. Neurobiol Aging 32:966– into the DHMN1 locus as the likely genetic mutation caus- 967. doi:10.1016/j.neurobiolaging.2009.05.014 ing disease in family F-54. However, due to the size and Frischmeyer PA, Dietz HC (1999) Nonsense-mediated mRNA decay position of the inserted DNA fragment, the gene(s) affected in health and disease. Hum Mol Genet 8:1893–1900 that cause the disease remain unknown. Due to the tissue Gopinath S, Blair IP, Kennerson ML, Durnall JC, Nicholson GA (2007) A novel locus for distal motor neuron degeneration specificity of the candidate genes and the size of the SV maps to chromosome 7q34–q36. Hum Genet 121:559–564. involved, the use of a stem cell derived neuronal model will doi:10.1007/s00439-007-0348-9 be necessary to further elucidate the disease mechanism. Higuchi Y, Hashiguchi A, Yuan J, Yoshimura A, Mitsui J, Ishiura H, Our finding represents a new genetic cause for hereditary Tanaka M, Ishihara S, Tanabe H, Nozuma S, Okamoto Y, Mat- suura E, Ohkubo R, Inamizu S, Shiraishi W, Yamasaki R, Ohyagi motor neuropathies and highlights the growing importance Y, Kira JI, Oya Y, Yabe H, Nishikawa N, Tobisawa S, Matsuda of interrogating the non-coding genome for SV mutations N, Masuda M, Kugimoto C, Fukushima K, Yano S, Yoshimura J, in families which have been excluded for genome wide Doi K, Nakagawa M, Morishita S, Tsuji S, Takashima H (2016) coding mutations. Mutations in MME cause an autosomal-recessive Charcot- Marie-Tooth disease type 2. Ann Neurol. doi:10.1002/ana.24612 Ivakhno S, Royce T, Cox AJ, Evers DJ, Cheetham RK, Tavare S Acknowledgments We gratefully acknowledge the participation and (2010) CNAseg–a novel framework for identification of copy contribution of family members throughout this study. This study was number changes in cancer from second-generation sequencing supported by an Australian National Health and Medical Research data. Bioinformatics 26:3051–3058. doi:10.1093/bioinformatics/ Project Grant (APP104668) awarded to M.L.K and G.A.N. btq587 Kennerson ML, Nicholson GA, Kaler SG, Kowalski B, Mercer JF, Tang J, Llanos RM, Chu S, Takata RI, Speck-Martins CE, Baets References J, Almeida-Souza L, Fischer D, Timmerman V, Taylor PE, Scherer SS, Ferguson TA, Bird TD, De Jonghe P, Feely SM, Shy ME, Garbern JY (2010) Missense mutations in the cop- Abyzov A, Urban AE, Snyder M, Gerstein M (2011) CNVnator: an per transporter gene ATP7A cause X-linked distal hereditary approach to discover, genotype, and characterize typical and motor neuropathy. Am J Hum Genet 86:343–352. doi:10.1016/j. atypical CNVs from family and population genome sequencing. ajhg.2010.01.027 Genome Res 21:974–984. doi:10.1101/gr.114876.110 Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler Anderson E, Devenney PS, Hill RE, Lettice LA (2014) Mapping the AM, Haussler D (2002) The human genome browser at UCSC. Shh long-range regulatory domain. Development 141:3934– Genome Res 12:996–1006. doi:10.1101/gr.229102 (Article pub‑ 3943. doi:10.1242/dev.108480 lished online before print in May) Chance PF, Alderson MK, Leppig KA, Lensch MW, Matsunami N, Li H (2013) Aligning sequence reads, clone sequences and assembly Smith B, Swanson PD, Odelberg SJ, Disteche CM, Bird TD contigs with BWA-MEM. arXiv:1303.3997v2 (q-bio.GN) (1993) DNA deletion associated with hereditary neuropathy with Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth liability to pressure palsies. Cell 72:143–151 G, Abecasis G, Durbin R (2009a) The sequence alignment/ Chen Y, Zeng Y, Huang R, Yang Y, Chen K, Song W, Zhao B, Li J, map format and SAMtools. Bioinformatics 25:2078–2079. Yuan L, Shang HF (2012) No association of five candidate doi:10.1093/bioinformatics/btp352

1 3 1278 Hum Genet (2016) 135:1269–1278

Li XG, Zhang JH, Xie MQ, Liu MS, Li BH, Zhao YH, Ren HT, Cui Sakharkar MK, Chow VT, Kangueane P (2004) Distributions of exons LY (2009b) Association between DPP6 polymorphism and the and introns in the human genome. In Silico Biol 4:387–393 risk of sporadic amyotrophic lateral sclerosis in Chinese patients. Saporta AS, Sottile SL, Miller LJ, Feely SM, Siskind CE, Shy ME Chin Med J (Engl) 122:2989–2992 (2011) Charcot-Marie-Tooth disease subtypes and genetic testing Lupski JR, de Oca-Luna RM, Slaugenhaupt S, Pentao L, Guzzetta strategies. Ann Neurol 69:22–33. doi:10.1002/ana.22166 V, Trask BJ, Saucedo-Cardenas O, Barker DF, Killian JM, Saporta MA, Dang V, Volfson D, Zou B, Xie XS, Adebola A, Liem Garcia CA, Chakravarti A, Patel PI (1991) DNA duplication RK, Shy M, Dimos JT (2015) Axonal Charcot-Marie-Tooth dis- associated with Charcot-Marie-Tooth disease type 1A. Cell ease patient-derived motor neurons demonstrate disease-specific 66:219–232 phenotypes including abnormal electrophysiological properties. MacDonald JR, Ziman R, Yuen RK, Feuk L, Scherer SW (2014) The Exp Neurol 263:190–199. doi:10.1016/j.expneurol.2014.10.005 database of genomic variants: a curated collection of structural Timmerman V, Nelis E, Van Hul W, Nieuwenhuijsen BW, Chen KL, variation in the human genome. Nucleic Acids Res 42:D986– Wang S, Ben Othman K, Cullen B, Leach RJ, Hanemann CO D992. doi:10.1093/nar/gkt958 et al (1992) The peripheral myelin protein gene PMP-22 is con- Matsunami N, Smith B, Ballard L, Lensch MW, Robertson M, Albert- tained within the Charcot-Marie-Tooth disease type 1A duplica- sen H, Hanemann CO, Muller HW, Bird TD, White R et al tion. Nat Genet 1:171–175. doi:10.1038/ng0692-171 (1992) Peripheral myelin protein-22 gene maps in the duplica- Valentijn LJ, Baas F, Wolterman RA, Hoogendijk JE, van den Bosch tion in chromosome 17p11.2 associated with Charcot-Marie- NH, Zorn I, Gabreels-Festen AW, de Visser M, Bolhuis PA Tooth 1A. Nat Genet 1:176–179. doi:10.1038/ng0692-176 (1992) Identical point mutations of PMP-22 in Trembler-J mouse Patel PI, Roa BB, Welcher AA, Schoener-Scott R, Trask BJ, Pentao L, and Charcot-Marie-Tooth disease type 1A. Nat Genet 2:288– Snipes GJ, Garcia CA, Francke U, Shooter EM, Lupski JR, Suter 291. doi:10.1038/ng1292-288 U (1992) The gene for the peripheral myelin protein PMP-22 is Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annota- a candidate for Charcot-Marie-Tooth disease type 1A. Nat Genet tion of genetic variants from high-throughput sequencing data. 1:159–165. doi:10.1038/ng0692-159 Nucleic Acids Res 38:e164. doi:10.1093/nar/gkq603 Plagnol V, Curtis J, Epstein M, Mok KY, Stebbings E, Grigoriadou S, Wu C, Orozco C, Boyer J, Leglise M, Goodale J, Batalov S, Hodge Wood NW, Hambleton S, Burns SO, Thrasher AJ, Kumararatne CL, Haase J, Janes J, Huss JW 3rd, Su AI (2009) BioGPS: an D, Doffinger R, Nejentsev S (2012) A robust model for read extensible and customizable portal for querying and organizing count data in exome sequencing experiments and implications gene annotation resources. Genome Biol 10:R130. doi:10.1186/ for copy number variant calling. Bioinformatics 28:2747–2754. gb-2009-10-11-r130 doi:10.1093/bioinformatics/bts526 Zimon M, Baets J, Almeida-Souza L, De Vriendt E, Nikodinovic J, Rosenbloom KR, Sloan CA, Malladi VS, Dreszer TR, Learned K, Parman Y, Battaloglu E, Matur Z, Guergueltcheva V, Tournev I, Kirkup VM, Wong MC, Maddren M, Fang R, Heitner SG, Lee Auer-Grumbach M, De Rijk P, Petersen BS, Muller T, Fransen E, BT, Barber GP, Harte RA, Diekhans M, Long JC, Wilder SP, Van Damme P, Loscher WN, Barisic N, Mitrovic Z, Previtali SC, Zweig AS, Karolchik D, Kuhn RM, Haussler D, Kent WJ (2013) Topaloglu H, Bernert G, Beleza-Meireles A, Todorovic S, Savic- ENCODE data in the UCSC Genome Browser: year 5 update. Pavicevic D, Ishpekova B, Lechner S, Peeters K, Ooms T, Hahn Nucleic Acids Res 41:D56–D63. doi:10.1093/nar/gks1172 AF, Zuchner S, Timmerman V, Van Dijck P, Rasic VM, Janecke Rossor AM, Kalmar B, Greensmith L, Reilly MM (2012) The distal AR, De Jonghe P, Jordanova A (2012) Loss-of-function muta- hereditary motor neuropathies. J Neurol Neurosurg Psychiatry tions in HINT1 cause axonal neuropathy with neuromyotonia. 83:6–14. doi:10.1136/jnnp-2011-300952 Nat Genet 44:1080–1083. doi:10.1038/ng.2406

1 3