Improving the Clinical Interpretation of Missense Variants in X Linked Genes

Diagnostics J Med Genet: first published as 10.1136/jmedgenet-2020-107404 on 25 March 2021. Downloaded from Original research Improving the clinical interpretation of missense variants in X linked genes using structural analysis Shalaw Rassul Sallah ,1,2 Jamie M Ellingford ,1,2 Panagiotis I Sergouniotis,2 Simon C Ramsden,2 Nicholas Lench,3 Simon C Lovell,1 Graeme C Black1,2 ► Additional material is ABSTRACT lack robustness and are commonly inconsistent in published online only. To view Background Improving the clinical interpretation their predictions2 3 and their performance.4 Taking please visit the journal online this into account, the American College of Medical (http:// dx. doi. org/ 10. 1136/ of missense variants can increase the diagnostic jmedgenet- 2020- 107404). yield of genomic testing and lead to personalised Genetics and Genomics (ACMG) and the Associa- management strategies. Currently, due to the imprecision tion for Molecular Pathology guidelines for variant 1Division of Evolution and of bioinformatic tools that aim to predict variant interpretation5 have concluded that bioinformatics Genomic Sciences, The pathogenicity, their role in clinical guidelines remains tools can provide only supporting evidence for University of Manchester Faculty of Biology, Medicine and Health, limited. There is a clear need for more accurate prediction pathogenicity. Improving the performance of these Manchester, UK algorithms and this study aims to improve performance algorithms is expected to have significant implica- 2Manchester Centre for by harnessing structural biology insights. The focus of this tions for variant interpretation and ultimately for Genomic Medicine, St Mary’s work is missense variants in a subset of genes associated clinical decision making. Hospital, Manchester Academic In a previous study, we integrated genetic and Health Sciences Centre, with X linked disorders. Manchester, UK Methods We have developed a protein-sp ecific variant structural biology data to predict variant–disease 3Congenica Ltd, Biodata interpreter (ProSper) that combines genetic and protein association with high accuracy in the X linked Innovation Centre, Wellcome structural data. This algorithm predicts missense variant gene CACNA1F (MIM: 300110); the area under Genome Campus, Hinxton, pathogenicity by applying machine learning approaches the receiver operating characteristic (ROC) and London, UK to the sequence and structural characteristics of variants. precision–recall (PR) curves was 0.84; Matthews correlation coefficient (MCC) was 0.52.6 Here, Correspondence to Results ProSper outperformed seven previously Professor Graeme C Black; described tools, including meta- predictors, in correctly we replicate the accuracy and robustness of this graeme. black@ manchester. evaluating whether or not variants are pathogenic; this approach in several other disease-implicated X ac. uk was the case for 11 of the 21 genes associated with X linked genes. Furthermore, we evaluate seven linked disorders that met the inclusion criteria for this prediction tools and show that the meta- predictors SCL and GCB are joint senior 7 authors. study. We also determined gene-specific pathogenicity REVEL (rare exome variant ensemble learner), thresholds that improved the performance of VEST4, VEST4 (variant effect scoring tool 4.0),8 and Clin- Received 19 August 2020 REVEL and ClinPred, the three best-performing tools out Pred9 are generally the most accurate in predicting Revised 18 January 2021 of the seven that were evaluated; this was the case in the impact of missense variants in this group of http://jmg.bmj.com/ Accepted 21 January 2021 11, 11 and 12 different genes, respectively. disorders. We also show that applying a gene- Conclusion ProSper can form the basis of a molecule- specific pathogenicity threshold when using these specific prediction tool that can be implemented tools can improve their performance at least for into diagnostic strategies. It can allow the accurate some genes. More importantly, we demonstrate that prioritisation of missense variants associated with X the protein- specific variant interpreter (ProSper) linked disorders, aiding precise and timely diagnosis. that we developed as part of this study performs In addition, we demonstrate that gene-specific better than REVEL, VEST4 and ClinPred in 11 of on September 24, 2021 by guest. Protected copyright. pathogenicity thresholds for a range of missense the 21 studied genes. These insights can help clini- prioritisation tools can lead to an increase in prediction cians and diagnostic laboratories better prioritise accuracy. missense changes in these molecules. METHODS INTRODUCTION Missense variant data sets Advances in high- throughput DNA sequencing tech- The Human Gene Mutation Database (HGMD nologies have transformed how clinical diagnoses V.2019.4)10 was used to retrieve missense variants are made in individuals and families with Mende- that have been associated with disease (marked © Author(s) (or their lian disorders. Genetics tests using these approaches ‘DM’), that is, presumably pathogenic. The employer(s)) 2021. Re- use are now widely used in the clinical setting, reducing Genome Aggregation Database (gnomAD V.2.1.1)11 permitted under CC BY. Published by BMJ. diagnostic uncertainty and improving patient was used to retrieve benign/likely benign missense management.1 Notably, results of these tests are variants reported in males. The variants present To cite: Sallah SR, often ambiguous and it is common for these inves- in gnomAD which were also present in HGMD as Ellingford JM, Sergouniotis PI, tigations to yield a number of variants of uncertain ‘DM?’, that is, disease association is dubious, or as et al. J Med Genet Epub ahead of print: [please significance (VUS). Interpreting these VUS is not a ‘DM’ were filtered out to minimise the inclusion of include Day Month Year]. trivial task and numerous in silico prediction tools possible misannotated variants. Missense changes doi:10.1136/ have been developed to filter and prioritise such reported in patients tested at the Manchester jmedgenet-2020-107404 changes for further analysis. However, these tools Genomic Diagnostic Laboratory (MGDL), a UK Sallah SR, et al. J Med Genet 2021;0:1–8. doi:10.1136/jmedgenet-2020-107404 1 Diagnostics J Med Genet: first published as 10.1136/jmedgenet-2020-107404 on 25 March 2021. Downloaded from accredited genomic diagnostic laboratory (Clinical Pathology protein stability or protein–protein interaction was measured Accredited identifier no 4015), were also included; these were using predictions made by mCSM (mutation Cutoff Scanning classified using the ACMG guidelines. The rare as well as the Matrix).29 Variants predicted to be in intrinsically disordered common variants reported in gnomAD were included in order regions of the protein were identified using IUPRED2A (an for the model to differentiate the benign rare variants from algorithm for predicting intrinsically unstructured/disordered the pathogenic changes.7 We limited our analyses to X linked proteins and domains).30 Variants involving residues with special genes from HGMD that contained a minimum of 70 pathogenic physicochemical characteristics were predicted to affect protein missense variants as informed by earlier findings.6 structure and function, for example, the introduction of proline onto β-strands, the introduction/loss of glycine in the core or Protein structures and homology modelling the introduction/loss of cysteine in extracellular regions possibly Experimentally determined three- dimensional (3D) structures leading to the breakage of disulfide bridges. The features which were used to perform structural analysis, where available. Other- could be retrieved and used for variant annotation in all genes wise, a homologous model of the protein was generated using were named ‘general features’; these included physicochemical either SWISS- MODEL12 or RaptorX.13 These resources provide changes, solvent accessibility, molecular goodness- of- fit and the results of alignments and sequence identity of multiple conservation. Other features which could only be retrieved and templates informing model selection; RaptorX also allows the used for variant annotation in certain genes were named ‘gene- production of multi- template models. The protein sequences of specific features’; these included variant clustering and protein the transcript used in HGMD were obtained from UniProt data- information such as functional domains and binding- site regions, base.14 PyMOL15 was used to visualise the structures/models. where available. These features are further described in online supplemental table S1. The scripts used in this study are available at the following GitHub repository: https:// github. com/ Performance assessment of in silico tools shalawsallah/CA CNA1F- variants- analysis. A number of prediction tools were evaluated using the two variant data sets assembled above. These included SIFT (Sorting Intolerant From Tolerant),16 which uses sequence homology Machine learning and variant classification or conservation, and PolyPhen2 (Polymorphism Phenotyping The pathogenicity features that were used to train and validate v2),17 which uses sequence homology combined with structural our prediction model (ProSper) used three different classifica- properties.18 Other tools assessed in this study included the later tion algorithms (Hoeffding tree, logit boost and simple logistic) generation meta- predictors REVEL,7 VEST4,8 ReVe (a combi-

Improving the Clinical Interpretation of Missense Variants in X Linked Genes

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support