Paralogue Annotation Identifies Novel Pathogenic Variants in Patients With
Total Page:16
File Type:pdf, Size:1020Kb
Functional genomics J Med Genet: first published as 10.1136/jmedgenet-2013-101917 on 17 October 2013. Downloaded from ORIGINAL ARTICLE Paralogue annotation identifies novel pathogenic variants in patients with Brugada syndrome and catecholaminergic polymorphic ventricular tachycardia Roddy Walsh,1 Nicholas S Peters,2 Stuart A Cook,2,3,4 James S Ware1,2 ▸ Additional material is ABSTRACT With on-going developments in DNA sequencing published online only. To view Background Distinguishing genetic variants that cause technology, it is expected that clinical genetic please visit the journal online (http://dx.doi.org/10.1136/ disease from variants that are rare but benign is one of testing will become very widely available. However, jmedgenet-2013-101917). the principal challenges in contemporary clinical it is increasingly appreciated that many healthy 1 genetics, particularly as variants are identified at a pace individuals carry rare variants in disease-associated NIHR Royal Brompton – Cardiovascular Biomedical exceeding the capacity of researchers to characterise genes (3 6% for genes associated with inherited 45 Research Unit, Royal Brompton them functionally. arrhythmia syndromes), and that additional and Harefield NHS Trust, Methods We previously developed a novel method, information is required to determine whether a London, UK fi 2 called paralogue annotation, which accurately and novel variant identi ed during genetic testing is National Heart and Lung fi fi Institute, Imperial College, speci cally identi es disease-causing missense variants by pathogenic. This is particularly the case for mis- London, UK transferring disease-causing annotations across families of sense variants (single amino acid substitutions) 3Cardiovascular & Metabolic related proteins. Here we refine our approach, and apply caused by single nucleotide variants. Disorders, Duke National it to novel variants found in 2266 patients across two In order to determine whether a rare variant University of Singapore, large cohorts with inherited sudden death syndromes, found in a patient is disease causing, sufficiently Singapore, Singapore 4National Heart Centre namely catecholaminergic polymorphic ventricular powered segregation analysis or functional bio- Singapore, Singapore, tachycardia (CPVT) or Brugada syndrome (BrS). chemical characterisation are ideally performed. Singapore Results Over one third of the novel non-synonymous However, these are often impractical due to cost variants found in these studies, which would otherwise and time constraints, or a lack of phenotypically Correspondence to ‘ Roddy Walsh, Cardiovascular be reported in a clinical diagnostics setting as variants of characterised family members for segregation Biomedical Research Unit, unknown significance’, are categorised by our method as studies. Several in silico algorithms, such as Royal Brompton and Harefield likely disease causing (positive predictive value 98.7%). SIFT and Polyphen67try to predict the effect of NHS Trust, London SW3 6NP, This identified more than 500 new disease loci for BrS novel variants, based on the conservation and UK; [email protected] and CPVT. physicochemical properties of the variant amino http://jmg.bmj.com/ Received 15 July 2013 Conclusions Our methodology is widely transferable acid, and variants in certain protein regions and 89 Revised 6 September 2013 across all human disease genes, with an estimated domains have a high probability of pathogenicity. Accepted 23 September 2013 150 000 potentially informative annotations in more than However, more evidence is needed to classify Published Online First 1800 genes. We have developed a web resource that variants with sufficient confidence for clinical 17 October 2013 allows researchers and clinicians to annotate variants application. found in individuals with inherited arrhythmias, We have recently proposed a new method to comprising a referenced compendium of known missense analyse novel missense variants and assess their on September 27, 2021 by guest. Protected copyright. variants in these genes together with a user-friendly likelihood of pathogenicity.10 This method, ‘paralo- implementation of our approach. This tool will facilitate gue annotation’, identifies functionally important the interpretation of many novel variants that might residues that are intolerant of variation across fam- otherwise remain unclassified. ilies of evolutionarily related proteins (paralogues), using clinically ascertained human genotype– phenotype relationships. By aligning the protein INTRODUCTION sequences of members of these protein families, we Inherited arrhythmias such as long QT syndrome can identify amino acids that are functionally (LQTS), Brugada syndrome (BrS) and catecholami- equivalent across the different proteins. A variant nergic polymorphic ventricular tachycardia (CPVT) that is known to be pathogenic in one member of a are life-threatening diseases, caused predominantly protein family can then be used to annotate the by genetic variation in ion channels. In BrS, equivalent amino acid of other members of the loss-of-function mutations in the SCN5A-encoded family for which no clinical genetic information Open Access cardiac sodium channel (MIM 601144) have been exists (figure 1). For example, if a missense variant Scan to access more 12 RYR1 free content shown to be responsible for 15–30% of cases in alters protein function and causes malig- with mutations in other minor genes accounting nant hyperthermia when expressed in skeletal for a proportion of remaining cases. In CPVT, muscle, then we hypothesise that a novel variant To cite: Walsh R, Peters NS, gain-of-function mutations in the cardiac ryanodine affecting the equivalent amino acid in RYR2, Cook SA, et al. J Med Genet receptor encoded by RYR2 (MIM 604772) are expressed in cardiac muscle, is likely to be disease – 2014;51:35 44. responsible for at least 50% of cases.3 causing in a patient with CPVT. Walsh R, et al. J Med Genet 2014;51:35–44. doi:10.1136/jmedgenet-2013-101917 35 Functional genomics J Med Genet: first published as 10.1136/jmedgenet-2013-101917 on 17 October 2013. Downloaded from Figure 1 An overview of paralogue annotation. (1) Paralogues (evolutionarily related genes, with homologous sequence and protein domain structures) are identified for a gene of interest. A subset of paralogues for SCN5A is shown here for illustration. (2) Protein sequences of paralogues are aligned, identifying functionally equivalent amino acids across the protein family. (3) Disease-causing variants in paralogues are identified from previous literature reports, and their locations are mapped to the gene of interest. Variants at these sites have a high probability of pathogenicity. This approach was developed and experimentally validated by expressed, such as epilepsy, myotonia, pain disorders, night application to a large set of known variants in eight LQTS blindness and hemiplegic migraine. SCN5A paralogues and their genes, and was found to have a positive predictive value (PPV) disease associations are shown in table 1. of 98.4% in these genes.10 Here we present novel refinements For each paralogue gene, variants reported as pathogenic to increase accuracy, and apply this approach in two large were identified using the HGMD Professional V.2012.3.13 Only cohorts of patients with BrS or CPVT to determine whether it disease-causing missense mutations, that is, single nucleotide provides additional useful information in a clinical diagnostic variants causing a single non-terminal amino acid change, were setting. We also report a web application that allows researchers considered. and clinicians to use paralogue annotation to interrogate novel variants in arrhythmia genes. Multiple sequence alignment and paralogue annotation The protein sequences of RYR2 and SCN5A and their respective paralogues were aligned using the T-Coffee/M-Coffee alignment http://jmg.bmj.com/ MATERIALS AND METHODS packages.14 These packages combine the output of a number of Identification of variants in paralogues of RYR2 and SCN5A alignment algorithms into a single consensus alignment and Paralogues of RYR2 and SCN5A, that is, evolutionarily related provide a consensus score at each point in the alignment (0–9), genes with homology in sequence and structure, were identified which is a measure of the reliability of the alignment at each using the IUPHAR database on receptor nomenclature11 and amino acid residue. Using these alignments, each paralogue through homology searches (Blastp searches of the SCN5A and protein residue with a disease-causing variant was mapped onto RYR2 protein sequences against the human Refseq protein the equivalent amino acid residue in RYR2 and SCN5A. on September 27, 2021 by guest. Protected copyright. database).12 The transcripts and protein isoforms used for To distinguish aligned amino acids that are truly functionally these genes were RYR2: NM_001035/NP_001026 (Refseq), equivalent from alignment artefacts, an amino acid alignment ENST00000366574/ENSP00000355533 (Ensembl), LRG_402t1/ quality classification was devised. Mappings were classified as LRG_402p1 (Locus Reference Genomic) and SCN5A: high quality if the reference amino acid was conserved between NM_198056/NP_932173 (Refseq), ENST00000333535/ paralogues, and the alignment consensus score was greater than ENSP00000328968 (Ensembl), LRG_289t1/LRG_289p1 (Locus 3. Medium quality mappings required conservation of the refer- Reference Genomic). ence amino acid or a consensus score greater than 3 or more RYR2 has two