Optimizing Genomic Medicine in Epilepsy Through a Gene-Customized Approach to Missense Variant Interpretation
Total Page:16
File Type:pdf, Size:1020Kb
Downloaded from genome.cshlp.org on October 7, 2021 - Published by Cold Spring Harbor Laboratory Press Method Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation Joshua Traynelis,1,7 Michael Silk,1,7 Quanli Wang,2 Samuel F. Berkovic,3 Liping Liu,4 David B. Ascher,5 David J. Balding,6 and Slavé Petrovski1,8 1Department of Medicine, The University of Melbourne, Austin Health and Royal Melbourne Hospital, Melbourne, Victoria 3010, Australia; 2Simcere Diagnostics, Nanjing, 210042, China; 3Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, Heidelberg, Victoria 3084, Australia; 4Department of Mathematics, North Carolina A&T State University, Greensboro, North Carolina 27411, USA; 5Department of Biochemistry and Molecular Biology, The University of Melbourne, Parkville, Victoria 3010, Australia; 6Centre for Systems Genomics, School of BioSciences and School of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria 3010, Australia Gene panel and exome sequencing have revealed a high rate of molecular diagnoses among diseases where the genetic ar- chitecture has proven suitable for sequencing approaches, with a large number of distinct and highly penetrant causal var- iants identified among a growing list of disease genes. The challenge is, given the DNA sequence of a new patient, to distinguish disease-causing from benign variants. Large samples of human standing variation data highlight regional varia- tion in the tolerance to missense variation within the protein-coding sequence of genes. This information is not well cap- tured by existing bioinformatic tools, but is effective in improving variant interpretation. To address this limitation in existing tools, we introduce the missense tolerance ratio (MTR), which summarizes available human standing variation data within genes to encapsulate population level genetic variation. We find that patient-ascertained pathogenic variants preferentially cluster in low MTR regions (P < 0.005) of well-informed genes. By evaluating 20 publicly available predictive tools across genes linked to epilepsy, we also highlight the importance of understanding the empirical null distribution of existing prediction tools, as these vary across genes. Subsequently integrating the MTR with the empirically selected bio- informatic tools in a gene-specific approach demonstrates a clear improvement in the ability to predict pathogenic missense variants from background missense variation in disease genes. Among an independent test sample of case and control mis- sense variants, case variants (0.83 median score) consistently achieve higher pathogenicity prediction probabilities than con- trol variants (0.02 median score; Mann-Whitney U test, P <1×10−16). We focus on the application to epilepsy genes; however, the framework is applicable to disease genes beyond epilepsy. [Supplemental material is available for this article.] Over the past decade, the rapid progress in genomic technologies recurring pathogenic mutation), or when the accompanying segre- has made research and clinical sequencing increasingly accessible. gation data includes confirmation that the variant arose de novo Exome and gene-panel sequencing data are routinely generated (germline/mosaic) in the affected child of unaffected (or mosaic and available to researchers, treating physicians, and patient fam- carrier) parents (Richards et al. 2015). For the latter, although iden- ilies. Occasionally, the clinical reports are definitive. Currently, tifying a de novo epilepsy gene variant in an epilepsy-ascertained however, variants of uncertain significance remain the over- patient carries great weight in variant interpretation, the back- whelming majority class of variants described in genetic reports ground mutation rate in every gene (including disease genes) im- (Richards et al. 2015). This is especially true in cases in which pa- plies that some de novo mutations in disease genes have no rental sequence is unavailable for interpretation (Lee et al. 2014; clinical relevance. Regarding the recurring mutations, variant cat- Farwell et al. 2015). Moreover, once the reports are shared the pa- alogs such as ClinVar (Landrum et al. 2016) and HGMD (Stenson tient’s physician, patient and family may generate their own im- et al. 2003) have proven invaluable in guiding variant classifica- pressions over the relevance of uncertain findings in known tions among known disease genes. In the absence of these data, disease genes (Fogel 2011). it is difficult to interpret a novel missense variant. Currently, the Two of the more reliable clues to pathogenicity are when the final variant interpretation process often reflects a cumulative as- precise variant has been previously reported in unrelated cases (a sessment of different sources of evidence, with the clinical genet- icist or genetic counselor’s previous experience influencing how much weight they choose to assign to each source. As a result, dif- fering knowledge bases between individuals interpreting the ge- 7These authors contributed equally to this work. 8Present address: Centre for Genomics Research, IMED Biotech Unit, netic data can make this process highly subjective, which can AstraZeneca, Cambridge SG8 6HB, UK Corresponding author: [email protected] Article published online before print. Article, supplemental material, and publi- © 2017 Traynelis et al. This article, published in Genome Research, is available cation date are at http://www.genome.org/cgi/doi/10.1101/gr.226589.117. under a Creative Commons License (Attribution 4.0 International), as described Freely available online through the Genome Research Open Access option. at http://creativecommons.org/licenses/by/4.0/. 27:1715–1729 Published by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/17; www.genome.org Genome Research 1715 www.genome.org Downloaded from genome.cshlp.org on October 7, 2021 - Published by Cold Spring Harbor Laboratory Press Traynelis et al. contribute to the variability reported in variant classification qualifying genetic support were that either the variant had been re- (Amendola et al. 2015, 2016; Rehm et al. 2015). Efforts from the ported to have arisen de novo (germline or somatic) or the variant American College of Medical Genetics and Genomics (ACMG) had been reported as pathogenic in a pedigree in which the variant and others have drastically reduced variability in classifications was present among all (and more than three) affected family mem- by systematizing the variant interpretation process (MacArthur bers that were genotyped and was not present in more than one of et al. 2014; Richards et al. 2015). Yet, one of the major current tasks the reported unaffected family members. This additional require- in medical genomics remains the reduction of the rate of Variants ment of segregation evidence was performed to refine the list of of Uncertain Significance (VUS) by better predicting disease poten- pathogenic reported variants to a subset that is enriched for clini- tial of novel variants found in known disease genes. This has im- cally relevant missense variants (Supplemental Data S1). We ac- portant implications for the integrity of precision diagnostics knowledge that some of the 606 qualified missense variants in with the rapid emergence of precision medicine opportunities, the 11 studied epilepsy genes may not be clinically relevant. particularly in the epilepsies (EpiPM Consortium 2015). Although population reference cohorts can influence wheth- With little promising diagnostic data emerging from large- er a variant was reported as pathogenic in ClinVar and HGMD, we scale common variant studies in epilepsy (International League did not directly use presence or frequency of an allele in the pop- Against Epilepsy Consortium on Complex Epilepsies 2014), next- ulation reference cohorts to qualify pathogenic-reported variants. generation sequencing has achieved high rates of molecular diag- Thus, we could evaluate the differences in population frequencies noses for epilepsy patients on the basis of a single clinically relevant of the 606 variants that we designated as qualified, versus the dominant mutation in a growing list of epilepsy genes, each with a 437 unqualified missense variants. Comparing against population growing allelic series (Epi4K Consortium and Epilepsy Phenome/ reference cohorts—ExAC v1, ExAC v2, ESP6500SI, the 1000 Genome Project 2013, 2017; EuroEPINOMICS-RES Consortium Genomes Project, and gnomAD—we find that nine (1.5%) of et al. 2014). Achieving precision diagnostics is important for preci- the 606 qualified variants were observed at least once in the popu- sion medicine trials in epilepsies as they will frequently be restrict- lation reference cohorts, with allelic count of at most two; among ed in sample size; thus, selection of the appropriate patients for the 437 unqualified variants, we observe a significantly higher rate enrollment in these trials will benefit from the continuous refine- of 44 variants (10.1%) reported at least once (Fisher’s exact test P = − ment of tools that facilitate accurate classification of pathogenic 5×10 10), with allelic count ranging up to 351 (Supplemental variants. Data S2). Only one of the 606 qualified variants was observed The increased sequencing of large and diverse populations more than once among the 277,264 chromosomes in the ExAC has also enhanced our understanding of the patterns typical of v2 and gnomAD data set,