The Role of Small In-Frame Insertions/Deletions in Inherited Eye
Total Page:16
File Type:pdf, Size:1020Kb
Sergouniotis et al. Orphanet Journal of Rare Diseases (2016) 11:125 DOI 10.1186/s13023-016-0505-0 RESEARCH Open Access The role of small in-frame insertions/ deletions in inherited eye disorders and how structural modelling can help estimate their pathogenicity Panagiotis I. Sergouniotis1,2, Stephanie J. Barton3, Sarah Waller3, Rahat Perveen3, Jamie M. Ellingford2,3, Christopher Campbell3, Georgina Hall3, Rachel L. Gillespie2,3, Sanjeev S. Bhaskar3, Simon C. Ramsden3, Graeme C. Black2,3* and Simon C. Lovell4 Abstract Background: Although the majority of small in-frame insertions/deletions (indels) has no or little effect on protein function, a subset of these changes has been causally associated with genetic disorders. Notably, the molecular mechanisms and frequency by which they give rise to disease phenotypes remain largely unknown. The aim of this study is to provide insights into the role of in-frame indels (≤21 nucleotides) in two genetically heterogeneous eye disorders. Results: One hundred eighty-one probands with childhood cataracts and 486 probands with retinal dystrophy underwent multigene panel testing in a clinical diagnostic laboratory. In-frame indels were collected and evaluated both clinically and in silico. Variants that could be modeled in the context of protein structure were identified and analysed using integrative structural modeling. Overall, 55 small in-frame indels were detected in 112 of 667 probands (16.8 %); 17 of these changes were novel to this study and 18 variants were reported clinically. A reliable model of the corresponding protein sequence could be generated for 8 variants. Structural modeling indicated a diverse range of molecular mechanisms of disease including disruption of secondary and tertiary protein structure and alteration of protein-DNA binding sites. Conclusions: In childhood cataract and retinal dystrophy subjects, one small in-frame indel is clinically reported in every ~37 individuals tested. The clinical utility of computational tools evaluating these changes increases when the full complexity of the involved molecular mechanisms is embraced. Keywords: Inherited eye disease, Retinal dystrophy, Childhood cataract, In-frame insertions/deletions, Homology modeling Background affecting gene expression [2]. A number of computational Small insertions/deletions (indels) are the second most tools that functionally annotate indels are available includ- abundant form of human genetic variation after single ing SIFT-indel [3], PROVEAN [4], DDG-in [5], CADD nucleotide variants (SNVs) [1]. These DNA changes [6], PriVar [7], PinPor [2], HMMvar [8], KD4i [9], and can influence gene products through multiple mecha- VEST-indel [10]. Although some of these tools are re- nisms, including altering amino acid sequence and ported to achieve relatively high sensitivity and specificity values [10], predicting the effect of protein-coding (frame- * Correspondence: [email protected] shifting, in-frame) and non-protein-coding indels in the 2Centre for Ophthalmology & Vision Sciences, University of Manchester, Manchester, UK clinical setting remains a formidable challenge [11]. 3Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester Inherited eye disorders such as childhood cataracts Academic Health Sciences Centre, Manchester, UK (CC) and retinal dystrophies (RD) are a major cause of Full list of author information is available at the end of the article © The Author(s). 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Sergouniotis et al. Orphanet Journal of Rare Diseases (2016) 11:125 Page 2 of 8 blindness among children and working-age adults [12, phenotypic information was available for subjects re- 13]. Over the past decades, exciting progress has been ferred from the Central Manchester University Hospi- made in elucidating the genetic basis of these disorders. tals, Manchester, UK. Ethics committee approval was Hundreds of disease-causing genes have been identified obtained from the North West Research Ethics Commit- leading to the development of diagnostic tests that are tee (11/NW/0421 and 15/YH/0365) and all investigations now regularly used in clinical practice [14, 15]. The pre- were conducted in accordance to the tenets of the Declar- ferred testing method at present is panel-based genetic ation of Helsinki. diagnostic testing [16], although whole genome sequen- cing is increasingly being used in the clinical domain Genetic and bioinformatic analysis [17]. For these tests to have the greatest medical impact, Testing and analysis were performed at the Manchester it is necessary to be able to pinpoint the disease-causing Regional Genetic Laboratory Service, a United Kingdom variant(s) among the considerable background of de- Accreditation Service (UKAS) - Clinical Pathology Accre- tected rare changes that might be potentially functional dited (CPA) medical laboratory (CPA number 4015). DNA but not actually responsible for the phenotype under samples were processed using Agilent SureSelect (Agilent investigation [18]. Guidelines for assigning clinical sig- Technologies, Santa, Clara, CA, USA) target enrichment nificance to sequence variants have been developed [19] kits designed to capture all exons and 5 base pairs (bp) of and it is clear that, among protein-coding changes, in- the flanking intronic sequence of either frame indels present a unique challenge. When the phenotypic relevance of a protein-coding (i) 114 genes associated with CC and/or anterior variant is investigated, knowledge of the structure and segment developmental anomalies [14]or biochemistry of the associated protein can be very (ii)176 genes associated with RD. useful. Unfortunately, due to limitations of mainstream structural biology techniques (X-ray crystallography The genes were selected after interrogating publically [XRC], nuclear magnetic resonance [NMR], 3D elec- available databases (http://cat-map.wustl.edu and http:// tron microscopy [3DEM]), experimentally determined sph.uth.edu/retnet/) and the literature. A list of all the structures are available for only a small proportion of tested transcripts/genes can be found in Additional file proteins [20]. Recently, computational methods have 1: Table S1. been used to generate reliable structural models based After enrichment, the samples were sequenced on an on complementary experimental data and theoretic Illumina HiSeq 2500 system (Illumina Inc, San Diego, information [21]. Such integrative modeling approaches CA, USA; 100 bp paired-end reads) according to the can be utilised to evaluate protein-coding variants in manufacturer’s protocols. Sequence reads were subse- silico, on the basis of 3D structure and molecular quently demultiplexed using CASAVA v1.8.2 (Illumina dynamics [22]. Inc, San Diego, CA, USA) and aligned to the hg19 In this study, a variety of methods including integrative reference genome using the Burrows Wheeler Aligner modeling, are used to gain insights into the role of in- (BWA-short v0.6.2) [23]. Duplicate reads were removed frame indels in two genetically heterogeneous Mendelian using Samtools before base quality score recalibration disorders, CC and RD. Clinical genetic data (multigene and indel realignment using the Genome Analysis Tool panel testing) from 667 individuals are presented and 17 Kit (GATK-lite v2.0.39) [24]. The UnifiedGenotyper previously unreported in-frame indels are described. within GATK was used for SNV and indel discovery [25]; indels supported by <0.1 of the reads were discarded and Methods the quality metrics for keeping SNVs included read depth Clinical samples ≥50x and mean quality value (MQV) ≥45. Unrelated subjects with inherited eye disorders were Previous studies have shown that the number of indels retrospectively ascertained through the database of the called has a significant positive correlation with the Manchester Regional Genetic Laboratory Service, coverage depth [26–28]. Therefore, only samples in which Manchester, UK. Referrals were received between Oc- ≥99.5 % of the target region was covered to a minimum tober 2013 and December 2015 from multiple clinical depth of 50x were included. institutions in the UK and around the world, although Variant annotation and clinical variant interpretation a significant proportion of samples came from the was performed as previously described [14, 15]. North West of England. After obtaining informed con- Briefly, the Ensembl Variant Effect Predictor (VEP) sent from the affected individual/family, the referring was used to assign functional consequences to SNVs physician requested a multigene panel test. The reason and indels. Variants with allele frequency >1 % in in for referral was included in the clinical data completed large publically available datasets (National Heart, by the referring medical specialist. Extensive Lung, and Blood Institute Exome Sequencing Project Sergouniotis et al. Orphanet Journal of Rare Diseases (2016) 11:125 Page 3 of 8 − Exome Variant Server ESP6500 and dbSNP v135) were the