Analysis and Interpretation of the Impact of Missense Variants in Cancer
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Molecular Sciences Review Analysis and Interpretation of the Impact of Missense Variants in Cancer Maria Petrosino 1, Leonore Novak 1, Alessandra Pasquo 2, Roberta Chiaraluce 1 , Paola Turina 3 , Emidio Capriotti 3,* and Valerio Consalvi 1,* 1 Dipartimento Scienze Biochimiche “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Roma, Italy; [email protected] (M.P.); [email protected] (L.N.); [email protected] (R.C.) 2 ENEA CR Frascati, Diagnostics and Metrology Laboratory FSN-TECFIS-DIM, 00044 Frascati, Italy; [email protected] 3 Dipartimento di Farmacia e Biotecnologie (FaBiT), University of Bologna, 40126 Bologna, Italy; [email protected] * Correspondence: [email protected] (E.C.); [email protected] (V.C.) Abstract: Large scale genome sequencing allowed the identification of a massive number of genetic variations, whose impact on human health is still unknown. In this review we analyze, by an in silico-based strategy, the impact of missense variants on cancer-related genes, whose effect on protein stability and function was experimentally determined. We collected a set of 164 variants from 11 proteins to analyze the impact of missense mutations at structural and functional levels, and to assess the performance of state-of-the-art methods (FoldX and Meta-SNP) for predicting protein stability change and pathogenicity. The result of our analysis shows that a combination of experimental data on protein stability and in silico pathogenicity predictions allowed the identification of a subset of variants with a high probability of having a deleterious phenotypic effect, as confirmed by the Citation: Petrosino, M.; Novak, L.; significant enrichment of the subset in variants annotated in the COSMIC database as putative cancer- Pasquo, A.; Chiaraluce, R.; Turina, P.; driving variants. Our analysis suggests that the integration of experimental and computational Capriotti, E.; Consalvi, V. Analysis approaches may contribute to evaluate the risk for complex disorders and develop more effective and Interpretation of the Impact of treatment strategies. Missense Variants in Cancer. Int. J. Mol. Sci. 2021, 22, 5416. https://doi. org/10.3390/ijms22115416 Keywords: protein structure; protein stability; protein function; single amino acid variant; putative cancer driving variant; free-energy change Academic Editors: Giovanni Minervini and Emanuela Leonardi Received: 30 March 2021 1. Introduction Accepted: 17 May 2021 Recent advances in the high-throughput sequencing technologies have led to the detec- Published: 21 May 2021 tion of a large amount of genomic data, which are used for generating detailed catalogues of genetic variations in both diseased and healthy patients [1,2]. Such genetic differences Publisher’s Note: MDPI stays neutral are at the basis of distinctive traits associated with the susceptibility to specific disease with regard to jurisdictional claims in and/or drug response [3]. The most common genetic differences in the human genome are published maps and institutional affil- single nucleotide polymorphisms (SNPs), which are defined as single nucleotide variations iations. (SNVs) occurring with a frequency of more than 1% in the population [4,5]. These differ- ences occur on average once every 300–400 base pairs [6], either in coding or in non-coding regions (Figure1). SNVs may affect exon splicing or transcription [ 3], and are found more frequently than other types of genetic variations, such as differences in copy number, Copyright: © 2021 by the authors. insertions, deletions, duplications, and rearrangements. SNVs in protein-coding regions Licensee MDPI, Basel, Switzerland. have received the most attention, in spite of the fact that those regions account for only This article is an open access article about 2% of the total human genome [7]. SNVs in the coding region can be synonymous distributed under the terms and if no amino acid change is produced, or non-synonymous if the substitution leads to a conditions of the Creative Commons change in the protein sequence. The non-synonymous variants can be further divided into Attribution (CC BY) license (https:// two categories: missense mutations, which lead to single amino acid changes, or nonsense creativecommons.org/licenses/by/ mutations, which produce truncated or longer proteins (Figure1). 4.0/). Int. J. Mol. Sci. 2021, 22, 5416. https://doi.org/10.3390/ijms22115416 https://www.mdpi.com/journal/ijms Int. J. Mol. Sci. 2021, 22, x FOR PEER REVIEW 2 of 23 synonymous variants can be further divided into two categories: missense mutations, which lead to single amino acid changes, or nonsense mutations, which produce truncated or longer proteins (Figure 1). Missense mutations, that generate protein variants with a single amino acid variation (SAV), are of particular interest in biomedicine, since even just a single amino acid substitution may induce drastic structural alterations, which compromise the protein stability, or may induce crucial structural alterations able to perturb binding interfaces, to the point of impairing the protein function [8,9] (Figure 1). The huge numbers of variants identified over the past twenty years have been collected in several databases, the most representative of which are summarized in Table 1. These databases represent the main source of information for studying the effect of protein variants and understanding the genotype/phenotype relationship [10]. To maximize the impact of sequence technologies in clinical settings, the scientific community is defining standard protocols and guidelines for discriminating disease- causing variants from nonpathogenic ones [11]. As a consequence, a large variety of genetic alterations, including nonsynonymous single nucleotide variants (nsSNVs), were found to be associated with monogenic and multigenic diseases [12,13], such as type II diabetes mellitus [14], acute lymphoblastic leukemia [15], or cancer [16,17]. Nevertheless, for the vast majority of variants, their impact at phenotypic level is still unknown. In this framework, the structural and functional analysis of nsSNVs proteins by experimental approaches can significantly contribute to their phenotypic characterization [18]. In particular, it has been recognized that most disease-causing variants affect the protein thermodynamic stability, expressed as the difference in folding free energy between the native and the denatured state (ΔGf). The experimental determination of the difference in Int. J. Mol. Sci. 2021, 22, 5416 2 of 22 ΔG between the mutant and wild-type proteins (ΔΔGf) constitutes a first and essential analysis for the characterization of the protein variants. Figure 1. (a) Single nucleotide variations (SNVs) can occur in the coding or in the non-coding region. SNVs in the coding regionFigure can1. (a be) Single synonymous nucleotide if no variations amino acid (SNVs) changes can areoccur produced, in the co non-synonymousding or in the non-coding if the single region. nucleotide SNVs in substitution the coding inducesregion can changes be synonymous in the protein if no sequence. amino acid Usually, changes two are typesproduced, of non-synonymous non-synonymous changes if the si canngle be nucleotide described: substitution missense mutation,induces changes that produces in the anprotein amino sequence acid change. Usually, in the proteintwo types (SAV) of andnon-synonymous nonsense mutation changes which can produces be described: a truncated missense or a longermutation, protein. that (producesb) A single an nucleotide amino acid substitution change in canthe leadprotein to a(SAV) single and amino nonsense acid change mutation generating which aproduces protein variant a truncated with structuralor a longer and/or protein. functional (b) A single alterations nucleotide as shown substitution in the substitutioncan lead to a of single the residue amino Asp183acid change with generating a His in the a human protein frataxin variant proteinwith structural (PDB code: and/or 1EKG). functional alterations as shown in the substitution of the residue Asp183 with a His in the human frataxin protein (PDB code: 1EKG). Missense mutations, that generate protein variants with a single amino acid variation (SAV), are of particular interest in biomedicine, since even just a single amino acid substitu- tion may induce drastic structural alterations, which compromise the protein stability, or may induce crucial structural alterations able to perturb binding interfaces, to the point of impairing the protein function [8,9] (Figure1). The huge numbers of variants identified over the past twenty years have been collected in several databases, the most representative of which are summarized in Table1. These databases represent the main source of information for studying the effect of protein variants and understanding the genotype/phenotype relationship [10]. To maximize the impact of sequence technologies in clinical settings, the scientific com- munity is defining standard protocols and guidelines for discriminating disease-causing variants from nonpathogenic ones [11]. As a consequence, a large variety of genetic alter- ations, including nonsynonymous single nucleotide variants (nsSNVs), were found to be associated with monogenic and multigenic diseases [12,13], such as type II diabetes melli- tus [14], acute lymphoblastic leukemia [15], or cancer [16,17]. Nevertheless, for the vast majority of variants, their impact at phenotypic level is still unknown. In this framework,