Comparison of Predicted and Actual Consequences of Missense Mutations
Total Page:16
File Type:pdf, Size:1020Kb
Comparison of predicted and actual consequences of PNAS PLUS missense mutations Lisa A. Miosgea,1, Matthew A. Fielda,1, Yovina Sontania, Vicky Choa,b, Simon Johnsona,b, Anna Palkovaa,b, Bhavani Balakishnanb, Rong Liangb, Yafei Zhangb, Stephen Lyonc, Bruce Beutlerc, Belinda Whittleb, Edward M. Bertramb, Anselm Endersd, Christopher C. Goodnowa,e,2,3, and T. Daniel Andrewsa,2,3 aImmunogenomics Laboratory, John Curtin School of Medical Research, Australian National University, Canberra City, ACT 2601, Australia; bAustralian Phenomics Facility, John Curtin School of Medical Research, Australian National University, Canberra City, ACT 2601, Australia; cCenter for Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390; dRamaciotti Immunisation Genomics Laboratory, John Curtin School of Medical Research, Australian National University, Canberra City, ACT 2601, Australia; and eImmunogenomics Laboratory, Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia Contributed by Christopher C. Goodnow, July 9, 2015 (sent for review February 24, 2015; reviewed by Jean-Laurent Casanova and Alain Fischer) Each person’s genome sequence has thousands of missense vari- Many tools now exist that use diverse information to make in- ants. Practical interpretation of their functional significance must ferences of the functional importance of single amino acid sub- rely on computational inferences in the absence of exhaustive stitutions (3, 4). The majority of better-performing tools use a experimental measurements. Here we analyzed the efficacy of protein multiple sequence alignment and judge the importance of these inferences in 33 de novo missense mutations revealed by residues depending on their conservation across available ho- sequencing in first-generation progeny of N-ethyl-N-nitrosourea– mologous sequences. Notably, these tools are sensitive to the treated mice, involving 23 essential immune system genes. Poly- sequence choice in the input alignment (5). Some examples of Phen2, SIFT, MutationAssessor, Panther, CADD, and Condel were commonly used tools that use a sequence conservation approach, ’ used to predict each mutation s functional importance, whereas with diverse methods to calculate conservation, are SIFT (6), the actual effect was measured by breeding and testing homozy- MutationAssessor (7), MAPP (8), AlignGVGD (9), PANTHER gotes for the expected in vivo loss-of-function phenotype. Only (10), and GERP (11). Protein structural information is also in- 20% of mutations predicted to be deleterious by PolyPhen2 (and cluded in other tools to judge whether an amino acid substitution INFLAMMATION 15% by CADD) showed a discernible phenotype in individual ho- IMMUNOLOGY AND may importantly alter protein stability or catalysis, but few com- mozygotes. Half of all possible missense mutations in the same monly used tools rely just on these data alone (3). 23 immune genes were predicted to be deleterious, and most of The widely used PolyPhen2 and CADD tools (12, 13) integrate these appear to become subject to purifying selection because few persist between separate mouse substrains, rodents, or primates. a number of different information sources, including sequence- Because defects in immune genes could be phenotypically masked and structure-based features (and in the case of CADD, the re- in vivo by compensation and environment, we compared infere- sults of other tools), and use a machine learning approach to nces by the same tools with the in vitro phenotype of all 2,314 categorize variants as benign or deleterious. It is worth noting that possible missense variants in TP53; 42% of mutations predicted by inference tools that integrate diverse information, such as PolyPhen2 PolyPhen2 to be deleterious (and 45% by CADD) had little measur- able consequence for TP53-promoted transcription. We conclude Significance that for de novo or low-frequency missense mutations found by genome sequencing, half those inferred as deleterious correspond Computational tools applied to any human genome sequence to nearly neutral mutations that have little impact on the clinical identify hundreds of genetic variants predicted to disrupt the phenotype of individual cases but will nevertheless become sub- function of individual proteins as the result of a single codon ject to purifying selection. change. These tools have been trained on disease mutations and common polymorphisms but have yet to be tested against an de novo mutation | immunodeficiency | evolution | nearly neutral | cancer unbiased spectrum of random mutations arising de novo. Here we perform such a test comparing the predicted and actual ef- he genome sequence of any particular person contains ex- fects of de novo mutations in 23 genes with essential functions Ttensive protein-altering genetic variation and de novo point for normal immunity and all possible mutations in the TP53 tu- mutations (1), of which only a minority are unambiguously dele- mor suppressor gene. These results highlight an important gap terious and introduce premature stop codons or disrupt normal in our ability to relate genotype to phenotype in clinical genome mRNA splicing. When considering a person with a suspected sequencing: the inability to differentiate immediately clinically genetic illness, the first clinical question is whether any of the relevant mutations from nearly neutral mutations. mutations identified in their genome sequence involve essential genes whose disruption is known to cause a phenotype resembling Author contributions: L.A.M., M.A.F., B. Beutler, B.W., E.M.B., A.E., C.C.G., and T.D.A. designed research; L.A.M., M.A.F., Y.S., V.C., S.J., A.P., B. Balakishnan, R.L., Y.Z., S.L., B.W., that of the patient in question. The most numerous class of pro- and T.D.A. performed research; S.L., B. Beutler, and T.D.A. contributed new reagents/analytic tein-altering mutations is missense mutations, where a single co- tools; L.A.M., M.A.F., Y.S., V.C., S.J., A.P., B. Balakishnan, R.L., Y.Z., S.L., B. Beutler, B.W., E.M.B., don is altered to encode a different amino acid. On average, 2% of A.E., C.C.G., and T.D.A. analyzed data; and M.A.F., C.C.G. and T.D.A. wrote the paper. people carry a missense mutation in any given gene (2). Hence, by Reviewers: J.-L.C., The Rockefeller University; and A.F., Imagine Institute, Hôpital Necker chance, missense mutations will often be found in genes that are Enfants Malades. seemingly relevant to a person’s disease phenotype, and the next The authors declare no conflict of interest. key clinical question is whether or not these substitutions alter the Freely available online through the PNAS open access option. function of the corresponding protein. Short of mutational studies 1L.A.M. and M.A.F. contributed equally to this work. of all possible amino acid substitutions coupled with comprehen- 2C.C.G. and T.D.A. contributed equally to this work. sive functional assays, the sheer number and diversity of missense 3To whom correspondence may be addressed. Email: [email protected] or dan. mutations present in each person’s genome means that their [email protected]. functional importance must presently be addressed primarily by This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. computational inference. 1073/pnas.1511585112/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1511585112 PNAS Early Edition | 1of10 Downloaded by guest on September 25, 2021 and SNAP (14), do not necessarily have efficacy better than sim- each gene in the mouse genome, we developed a resource of mice pler tools (3). Interestingly, there is further utility in integrating the with identified mutations and corresponding exome sequence data predictions of individual methods (15), even those that are very (databases.apf.edu.au/mutations and mutagenetix.org/incidental/ similar or that already integrate wide ranges of information, po- incidental_list.cfm). First generation (G1) offspring from male tentially due to minimization of outlier effects. mice treated with ENU have a mean of 45 nonsynonymous de Functional inferences of the severity of protein disruption only novo mutations (23), although the functional effect of each mu- weakly predict disease incidence, disease severity, or clinical tation is largely unknown. Mutations induced by exposure to ENU outcome. For example, functional inferences of missense vari- are likely to have diverse functional consequences, from benign to ants in the cystic fibrosis gene, CFTR, are not well correlated severely deleterious. This diversity replicates that of spontaneous with disease incidence or severity (16). Similarly, the functional nonsynonymous de novo mutations that arise in humans at an inference of mutation severity in the tumor suppressor gene, average rate of 0.75 mutations per child (24). It is also likely to TP53, does not correlate significantly with patient clinical out- replicate the functional spectrum of inherited nonsynonymous come (17). An emerging consensus has formed that functional variants in recessive genes that occur at frequencies below 1% in inference scores lack the sensitivity and specificity for their the human population. clinical use (18–20). The human genome of any particular indi- To gain an understanding of the functional effects of these vidual likely contains many apparently unambiguous disease- phenotypically unselected point mutations, we propagated 33 associated variants (1), and these very often occur in people without mutations within 23