<<

Journal of Genetics (2019)98:104 Ó Indian Academy of Sciences

https://doi.org/10.1007/s12041-019-1153-7 (0123456789().,-volV)(0123456789().,-volV)

RESEARCH ARTICLE

An in silico approach to characterize nonsynonymous SNPs and regulatory SNPs in human TOX3

MEHRAN AKHTAR1 , TAZKIRA JAMAL1, JALAL UD DIN1, CHANDNI HAYAT2, MAMOONA RAUF3, SYED MANZOOR UL HAQ1, RAHAM SHER KHAN1, AFTAB ALI SHAH4, MUHSIN JAMAL5 and FAZAL JALIL1*

1Department of Biotechnology, Abdul Wali Khan University, Mardan 23200, Pakistan 2Department of Biochemistry, Abdul Wali Khan University, Mardan 23200, Pakistan 3Department of Botany, Abdul Wali Khan University, Mardan 23200, Pakistan 4Departement of Biotechnology, University of Malakand, Chakdara 18800, Pakistan 5Department of Microbiology, Abdul Wali Khan University, Mardan 23200, Pakistan *For correspondence. E-mail: [email protected].

Received 25 June 2019; revised 6 August 2019; accepted 19 August 2019

Abstract. Cancer is one of the deadliest complex diseases having multigene nature where the role of single-nucleotide polymorphism (SNP) has been well explored in multiple . TOX high mobility group box family member 3 (TOX3) is one such gene, in which SNPs have been found to be associated with . In this study, we have examined the potentially damaging nonsynonymous SNPs (nsSNPs) in TOX3 gene using in silico tools, namely PolyPhen2, SNP&GO, PhD-SNP and PROVEAN, which were further confirmed by I-Mutant, MutPred1.2 and ConSurf for their stability, functional and structural effects. nsSNPs rs368713418 (A266D), rs751141352 (P273S, P273T), rs200878352 (A275T) have been found to be the most deleterious that may have a vital role in breast cancer. Premature stop codon producing SNPs (Q527STOP), rs1259790811 (G495STOP), rs1294465822 (S395STOP) and rs1335372738 (G8STOP) were also found having prime importance in truncated and malfunctional protein formation. We also characterized regulatory SNPs for its potential effect on TOX3 gene regulation and found nine SNPs that may affect the gene regulation. Further, we have also designed 3D models using I-TASSER for the wild type and four mutant TOX3 proteins. Our study concludes that these SNPs can be of prime importance while studying breast cancer and other associated diseases as well. They are required to be studied in model organisms and cell cultures, and may have potential importance in personalized medicines and gene therapy.

Keywords. breast cancer; in silico analysis; single-nucleotide polymorphisms; protein modelling; TOX3 gene.

Introduction genes (Tavtigian et al. 2009; Gabai-Kapara et al. 2014). It is obvious that mutations in these moderate and high Cancer is one of the deadliest diseases and is one of the penetrate genes are associated with the breast cancer risk, major causes of human death worldwide. Especially in although it is still unclear for a score for cancer related developed countries, breast cancer is one of the most genomic variations whether they are associated with dis- common causes of death in females (Jemal et al. 2011). ease risk or not, especially in low penetrated genes. Many Pathogenically and clinically, breast cancer has been found genomewide association studies (GWASs) have proved a to be a diverse and heterogenic disease. Many environ- strong association between single-nucleotide polymor- mental and polygenic inheritance factors are involved in phisms (SNPs) and disease risk of breast carcinoma breast cancer, thus making it a clinically complex disease. (Zheng et al. 2009;Heet al. 2015; Zhang and Long Many studies have been carried out on genomic variations 2015). for improvement of breast cancer treatment and developing The TOX3 gene is located at 6q12 chromosomal region diagnosis strategies. Breast cancer has been verified to be which encodes high mobility group (HMG) box protein associated with mutations in ATM, BRCA1 and BRCA2 (O’Flaherty and Kaye 2003). It regulates BRCA1 expression 104 Page 2 of 10 Mehran Akhtar et al. negatively by binding to its promoter (Shan et al. 2013). Effect of nsSNPs on structure and function of TOX3 protein Migration, survival and proliferation of breast cancer cells were increased with abnormal TOX3 expression and found As nsSNPs result in a single amino acid change in protein to be associated with progression of tumour in mouse sequence, it can have damaging effect on protein structure models (Shan et al. 2013). In breast tumours, it has been and/or function. This effect was predicted by using a web- reported that mutations in TOX3 gene are responsible for based tool Mutpred1.2 (Li et al. 2009)(http://mutpred. 4.5% of overall frequency of breast cancer (Jones et al. mutdb.org/), which predicts such effects on the basis of 2013). TOX3 gene has been found to have transcription several functional and structural properties. List of residual factor (TF) potential role in those transcription which are amino acid changes along with protein FASTA sequences dependent on calcium (Yuan et al. 2009). In populations were submitted to MutPred1.2. such as Asian, European and African-American, many variants of TOX3 gene has been found to be associated with susceptibility of breast cancer, validated by epidemiological nsSNPs and protein stability and GWAS (Easton et al. 2007; Stacey et al. 2007; Ruiz- Narvaez et al. 2010; Udler et al. 2010; Slattery et al. 2011; I-Mutant2.0 (http://folding.biofold.org/i-mutant/i-mutant2.0. Low et al. 2013; Elematore et al. 2014;Heet al. 2014). html), a support vector machine-based web server was used nsSNPs leads to single amino acid change in protein to predict stability of protein with mutated amino acid. It sequence which may be damaging or neutral. Many studies provides results having 0–10 reliability index (RI); where 0 so far have shown the role of such SNPs as causing disease. shows the minimum, while 10 shows the maximum relia- In TOX3 gene, no such reports are available for nsSNPs. bility (Capriotti et al. 2005). The nsSNPs which decrease This is the first study about characterization and identifica- TOX3 protein stability were selected to be analysed further. tion of nsSNPs and regulatory SNPs that may play potential Protein FASTA sequence along with list of residual amino role in disease causing breast cancer and other associated acid changes were submitted to I-Mutant2.0. diseases. In this study, we characterized the SNPs in TOX3 gene Evolutionary conservation profile of TOX3 that may have a potential role in cancer risk association, polycystic ovary syndrome and restless leg syndrome. ConSurf tool (http://consurf.tau.ac.il/2016/), which works on phylogenetic relations between homologous sequences, was used to predict conservation profiles of all the amino Materials and methods acid present in a protein sequence submitted (Berezin et al. 2004). Protein FASTA sequence were submitted to the Selection of SNPs in human TOX3 gene server.

For the selection of SNPs in TOX3 gene, database of SNPs and dbSNPS were used (Bhagwat 2010) (accessed on 22 Prediction of PTM Sites February 2019). A total of 25,544 SNPs were searched, from which only nsSNPs and regulatory SNPs were selected for For each of the PTM type, i.e. methylation, phosphorylation initial analysis. and ubiquitylation, different tools were used to predict their sites. For methylation, GPS-MSP3.0 (http://msp.biocuckoo. org/online.php) was used (Deng et al. 2017). Whereas, Damaging nsSNPs prediction NetPhos3.1 (http://www.cbs.dtu.dk/services/NetPhos/) and GPS3.0 (http://gps.biocuckoo.org/online.php) were used for For the prediction of the most damaging nsSNPs in TOX3 phosphorylation prediction (Blom et al. 1999; Xue et al. gene, four different tools were used: protein variation 2008). Similarly, for ubiquitylation sites prediction in pro- effect analyzer (PROVEAN) (Choi et al. 2012)(http:// tein, two tools UbPred (http://www.ubpred.org/) and BDM- provean.jcvi.org/seq_submit.php), SNPs&GO (Capriotti PUB (http://bdmpub.biocuckoo.org/prediction.php) were et al. 2013)(http://snps.biofold.org/snps-and-go/snps-and- used (Li et al. 2009; Radivojac et al. 2010). Protein FASTA go.html), PhD-SNP (predictor of human deleterious SNP) sequence along with list of residual amino acid changes were (Capriotti et al. 2006)(http://snps.biofold.org/phd-snp/ submitted to each tool. phd-snp.html) and PolyPhen2 (polymorphism phenotyping 2) (Adzhubei et al. 2010)(http://genetics.bwh.harvard. edu/pph2/). List of residual amino acid changes along 3D-modelling of TOX3 protein with protein FASTA sequences were submitted for each tool. Those nsSNPs which were predicted to be delete- For designing the 3D-models of TOX3 protein and its rious by all the four tools were selected for further selected mutant forms, I-TASSER (https://zhanglab.ccmb. investigation. med.umich.edu/I-TASSER/) was used (Zhang 2008; Roy Human TOX3 in silico analysis Page 3 of 10 104 et al. 2010; Yang et al. 2015), which is the most advanced Results tool for protein 3D modelling. The resultant structures were then viewed using Chimera1.11 (Pettersen et al. nsSNPs in human TOX3 gene 2004). TM-align (https://zhanglab.ccmb.med.umich.edu/ TM-align/) was used to check mutants’ root mean square By digging out the dbSNP NCBI, a total of 25,544 SNPs deviation (RMSD) from wild TOX3 protein along with were found in TOX3 gene region of which 619 and 374 were TM score (template modelling) which confirms nsSNPs located in 30UTR and 50UTR, respectively. A total of 382 effect on protein structure. Protein FASTA sequence were nsSNPs were found in these SNPs in which there was no submitted individually for the wild-type and mutant enough data available for 25 nsSNPs and were skipped. The proteins. remaining 357 nsSNPs were selected for further investiga- tions. Other types of SNPs include intronic, splice site, synonymous, uncategorized etc. A graph of the total SNPs is Effect of regulatory region SNPs provided in figure 1.

To predict the role of SNPs in regulatory region of TOX3 gene, we used MicroSNiPer (http://vm24141.virt. Damaging nsSNPs identification gwdg.de/services/microsniper/) (Bhattacharya et al. 2014) and PolymiRTS database (http://compbio.uthsc. Four tools, namely PROVEAN, SNP&GO, PhD-SNP and edu/miRSNP/). MicroSNiPer predicts the effect of SNP PolyPhen-2 were used for the identification of damaging in UTR region showing us whether it will affect miRNA nsSNPs. PROVEAN predicted 67 nsSNPs (18.76%) as target site or not. PolymiRTS database is a web server damaging, while SNP&GO and PhD-SNP predicted 11 which predicts miRNA seed and target site being (3.08%) and 63 nsSNPs (17.64%), respectively, to be the affected by SNPs in UTR regions. Both the transcripts most damaging SNP. These three tools have predicted six NM_001080430.4 and NM_001146188.2 were used as nsSNPs (seven amino acid residual changes) in common query terms. For both the tools, list of rs IDs of the which were further screened in PolyPhen-2 and were pre- SNPs were submitted. dicted damaging there as well. The detail result of six For all the tools used in this study, the GRCh38 was used nsSNPs having most damaging effects identified by all the as reference genome. Conditions were set as default by the four tools are provided in table 1. Results from these tools browser. No manual setting was applied in setting the with percentage of prediction are given in figure 2. parameters for the tools. The parameters used for all these Mutpred1.2 was used to predict structural and functional tools were discussed in detail in our previous work (Akhtar effects of these selected nsSNPs. MutPred1.2 predictions et al. 2019). with probability values are presented in table 2.

Figure 1. Percentage of SNPs in different regions of TOX3 gene. 104 Page 4 of 10 Mehran Akhtar et al.

Table 1. Prediction tools results for six most damaging nsSNPs.

PROVEAN Polyphen2 (HumDiv) PhD-SNP SNPs&GO

Amino acid Score (threshold Score (threshold Probability (threshold Probability (threshold rs ID change = –2.5) = 0.5) RI = 0.5) RI = 0.5) rs368713418 A266D –5.388 1 7 0.834 2 0.610 rs751141352 P273S –7.540 0.999 0 0.514 1 0.538 P273T –7.540 0.975 0 0.524 1 0.538 rs200878352 A275T –3.770 0.999 3 0.656 1 0.533 rs1310423573 Y310N –8.492 1 6 0.806 3 0.659 rs371378216 A315V –3.774 1 2 0.580 1 0.527 rs1397711483 Y317N –8.492 1 7 0.851 4 0.719

Table 3. Effect of selected nsSNPs on TOX3 protein stability.

SNP ID Amino acid change Stability RI

rs368713418 A266D Decrease 8 rs751141352 P273S Decrease 8 P273T Decrease 7 rs200878352 A275T Decrease 8 rs1310423573 Y310N Increase 1 rs371378216 A315V Increase 2 rs1397711483 Y317N Increase 0

nsSNPs. Only P273 was highly conserved, exposed and functionally important while the remaining (A266, A275, Y310, A315 and Y317) were highly conserved, buried and Figure 2. Nonsynonymous SNPs prediction by PROVEAN, PhD- SNP, SNP&GO and PolyPhen2 nsSNPs structural and functional structurally important. For all these amino acids, conserva- effect prediction. tion score was nine.

Table 2. Probability scores predicted by Mutpred1.2. Prediction of post-translational modifications in TOX3 Mutation P values Mutation P values protein

A266D 0.898 P273S 0.830 For methylation prediction, the GPS-MSP server was used P273T 0.804 A275T 0.787 which did not show any site having methylation potential in Y310N 0.881 A315V 0.712 TOX3 protein. Y317N 0.896 Phosphorylation in TOX3 protein was predicted through NetPhos3.1 and GPS3.0 servers. GPS3.0 showed nine nsSNPs effect on TOX3 stability phosphorylation sites of which three were threonine specific while six were serine specific. We did not find To see whether these selected nsSNPs increase or decrease any phosphorylation site specific to tyrosine using GPS3.0. TOX3 protein stability, I-Mutant server was used. According NetPhos3.1 predicted a wide range of phosphorylation to I-Mutant, nsSNPs rs368713418, rs751141352 and sites (n=71). Of these 71 sites, 23 were threonine specific rs200878352 showed decrease in stability which means that and 47 were serine specific while only one site was tyr- these nsSNPs can be of prime importance in breast cancer osine specific. Of these phosphorylation sites, seven sites and other associated diseases. Results of the I-Mutant are were common in both GPS3.0 and NetPhos3.1 (see provided in table 3. table 4). Ubiquitylation in TOX3 protein was explored through two bioinformatics tools, i.e. BDM-PUB and UbPred. BDM- Conservation profile of TOX3 protein PUB predicted 20 potential ubiquitylation sites in TOX3 protein while UbPred came up with only three possible ConSurf provided us evolutionary profile for all the amino ubiquitylation sites. Findings from ubiquitylation results are acids of TOX3 protein. We focussed on our selected provided in table 5. Human TOX3 in silico analysis Page 5 of 10 104

Table 4. Prediction of phosphorylation sites in TOX3 protein using GPS3.0 and NetPhos3.1.

GPS3.0 NetPhos3.1 (threshold 0.5)

Position Peptide Kinase Score Cutoff Kinase Score

204 ASKSATPSPSS AGC/AKT/AKT1/ 3.942 1.597 GSK3/cdk5 0.510 206 KSATPSPSSSI AGC/AKT/AKT1 1.856 1.597 GSK3/cdk5 0.646 238 KKPKTPKKK AGC/AKT/AKT1 1.76 1.597 PKC/cdk5 0.859 341 RSVQQTLASTN AGC/AKT/AKT1 1.721 1.597 PKC 0.621 529 SPRQHSPVASQ AGC/AKT/AKT1 3.663 1.597 Cdk5 0.577 209 TPSPSSSINEE AGC/AKT/AKT2 6.571 6.504 CKII 0.538 344 QQTLASTNLTS AGC/AKT/AKT2 7.571 6.504 Cdc2/PKC 0.566

Table 5. Prediction of ubiquitylation sites in TOX3 protein using BDM-PUB and UbPred.

Peptide Position BDM-PUB score (threshold 0.3) UbPred score (threshold 0.62)

******MKCQPRSGA 2 2.82 Not ubiquitylated SPSPPASKSATPSPS 201 4.39 0.70 ANRAIGEKRAAPDSG 226 3.77 0.85 RAAPDSGKKPKTPKK 234 2.72 Not ubiquitylated AAPDSGKKPKTPKKK 235 2.48 Not ubiquitylated PDSGKKPKTPKKKKK 237 3.02 Not ubiquitylated GKKPKTPKKKKKKDP 240 4.22 Not ubiquitylated KKPKTPKKKKKKDPN 241 3.61 Not ubiquitylated KPKTPKKKKKKDPNE 242 2.87 Not ubiquitylated PKTPKKKKKKDPNEP 243 3.66 Not ubiquitylated KTPKKKKKKDPNEPQ 244 2.35 Not ubiquitylated TPKKKKKKDPNEPQK 245 2.32 Not ubiquitylated RDTQAAIKGQNPNAT 269 3.00 Not ubiquitylated DSLGEEQKQVYKRKT 296 0.90 Not ubiquitylated QKQVYKRKTEAAKKE 302 1.26 Not ubiquitylated KRKTEAAKKEYLKAL 307 2.38 Not ubiquitylated RKTEAAKKEYLKALA 308 2.47 Not ubiquitylated AAKKEYLKALAAYRA 312 3.11 Not ubiquitylated YRASLVSKAAAESAE 324 2.32 0.85 LPRSIAPKPLTMRLP 381 1.70 Not ubiquitylated

3D-modelling of TOX3 protein Neff-PPAS, HHSEARCH 1, HHSEARCH, pGen- THREADER, PROSPECT2 and wdPPAS to develop 3D The three nsSNPs (corresponds to four amino acid substi- protein structure. The resulted structures were then subjected tution) which were predicted to decrease TOX3 protein to TM-align to calculate their TM-score and RMSD values stability, as predicted by I-Mutant, were selected for final (table 6). The protein structures were viewed using Chi- protein modelling. Protein sequence of wild type and mera1.11 to study their molecular characterization (figure 3, mutants with single amino acid substitution were submitted a–e). individually to I-TASSER. The templates used by I-TASSER were 2nbiA (85% coverage of the threading alignment) and 2co9 (83% identity). I-TASSER used 10 threading program, Role of regulatory SNPs namely MUSTER, SPARKS-X, FFAS-3D, HHSEARCH-2, RNA half-life and translation of mRNA can be effected by Table 6. TM-Scores and RMSD values of the selected most SNPs located in UTR region of the gene as they may affect damaging nsSNPs in TOX3 proteins with reference to the wild type miRNA-binding site. MicroSNiPer identified eight SNPs in TOX3 protein. UTR region that may affect the miRNA-binding site while PolymiRTS database came with an additional SNP (total SNP ID Residual change TM score RMSD values nine SNPs) that were predicted to affect the miRNA-binding rs368713418 A266D 0.84238 2.15A˚ site. Twelve sites for miRNA binding were predicted to be rs751141352 P273S 0.92780 2.67A˚ abolished due to seven SNPs. On the other hand, 14 miRNA P273T 0.94889 2.20A˚ sites were created by eight SNPs. The results of both the rs200878352 A275T 0.81693 1.71A˚ tools are provided in table 7. 104 Page 6 of 10 Mehran Akhtar et al.

Figure 3. (a) Wild-type TOX3 protein structure; (b) wild-type and A266D mutant protein structures superimposed with labelled substitutions; (c) wild-type and P273S mutant protein structures superimposed with labelled substitutions; (d) wild-type and P273T mutant protein structures superimposed with labelled substitutions; (e) wild-type and A275T mutant protein structures superimposed with labelled substitutions. Human TOX3 in silico analysis Page 7 of 10 104

Discussion REVEL and MetalR. Mutation Assessor predicted six of the seven mutations to be damaging while mutation A266D was Many studies that were previously performed on SNPs in predicted to be tolerated. All the CADD scores were TOX3 gene showed association with many diseases includ- between 26 and 31 (CADD score of 20 means it is among ing polycystic ovary syndrome (Shi et al. 2012), restless leg 1% of the most damaging SNPs in human genome and syndrome (Winkelmann et al. 2011) and breast cancer CADD score 30 means it is among 0.1% of the most dam- (Zheng et al. 2009). However, differentiating functionally aging and so on). MutPred1.2 server predicts effect on the important and damaging neutral SNPs is always challenging. basis of many features like loss of methylation, gain of There are many SNPs in TOX3 gene region that may have acetylation, altered ordered interface, altered disordered vital role in these diseases especially breast cancer. Here, we interface and gain of intrinsic disorder. Mutation A266D has have analysed the SNPs in TOX3 gene region for its the highest P value of 0.898 followed by Y317N and Y310N potential effect on TOX3 protein and regulatory regions with 0.896 and 0.881, respectively. Mutations A315V and which may play important role in these diseases. TOX3 gene A275T had the lowest P values of 0.712 and 0.787, forms two transcript variants: ENST00000219746.14 and respectively. These findings suggest that these nsSNPs can ENST00000407228.7. All the nsSNPs were screened for have major effects on the TOX3 protein’s structure and transcript ENST00000219746.14. The identified nsSNPs function. were crosschecked and found no role in alternative splicing For protein stability prediction, I-Mutant gave us results of TOX3 genes. predicting effect of nsSNPs with reliability index (RI) val- Of the total SNPs, only 1.40% were nsSNPs while 1.40% ues. The RI values are from 0 to10 where 0 is minimum and 2% were found in 50 UTR and 30 UTR, respectively. The reliability while 10 is maximum reliability. Three SNPs remaining SNPs were of other types (nonsense, intronic correspond to four amino acid substitutions A266D, P273S, uncategorized etc.) and were not included in this study. This P273T and A275T were predicted to decrease protein sta- data suggests that TOX3 protein is subjected to a very small bility while the remaining three SNPs Y310N, A315V and number of SNPs to have effect on its protein directly. All the Y317N were predicted to increase protein stability. The tools for damaging nsSNPs prediction came up with agree- reliability indexes for Y310N, A315V and Y317N were low ment on six nsSNPs (seven amino acid substitution) that (table 3). Therefore, to make our predictions reliable, we were predicted to be the most damaging. PROVEAN pre- crosschecked these mutations in CUPSAT server (http:// dicted Y310N and Y317N with highest score of –8.492 cupsat.tu-bs.de/). Predictions from CUPSAT server were in while A315V had the lowest score of –3.774 among these complete agreement with I-Mutant predictions. ConSurf six selected nsSNPs. PolyPhen2 prediction had four nsSNPs predicted conservation profile of TOX3 protein giving pre- with exact 1 score on a scale of 0–1 which included A266D, diction for every amino acid to be conserved, embedded, Y310N, A315V and Y317N. PhD-SNP and SNP&GO pre- functionally or structurally important. On the basis of amino dicted Y317N as the most damaging with the highest acid position in protein, those with highly conserved were probability score of 0.851 and 0.719, respectively. We also predicted to either functionally or structurally important crosschecked these nsSNPs in Ensembl genome browser 96 (Berezin et al. 2004). Highly conserved amino acids are (accessed 25 June 2019) as several additional tools like supposed to have many important functions in protein CADD, REVEL, Mutation Assessor and MetalR are also activities like interactions. As stated by Miller and Kumar mentioned in Ensembl. All the seven mutations (six nsSNPs) (2001), nsSNPs that lie at conserved regions are the most were predicted to be deleterious and damaging by CADD, damaging. We focussed on our six selected nsSNPs which

Table 7. Prediction of effect of SNPs in regulatory region of TOX3 by MicroSNiPer and PolymiRTS database.

SNP miR ID Score change* rs80129475 hsa-miR-4328 ? hsa-miR-216b-3p –0.205 rs138057965 hsa-miR-203a ? no pattern –0.143 rs117905209 No Pattern ? hsa-miR-3908 –0.241 rs113577155 hsa-miR-1226-3p, hsa-miR-4733-3p, hsa-miR-6879-3p ? hsa-miR- –0.289 4633-5p, hsa-miR-4715-3p, hsa-miR-491-3p rs149503838 hsa-miR-323a-3p ? hsa-miR-153-5p –0.036 rs72805466 hsa-miR-3919, hsa-miR-4756-3p, hsa-miR-6758-5p, hsa-miR-6856- –0.197 5p ? hsa-miR-4753-5p, hsa-miR-6768-3p rs144919658 (only by PolymiRTS) No pattern ? hsa-miR-101-5p, hsa-miR-361-5p, hsa-miR-374a-3p, –0.122 hsa-miR-5094 rs8049149 hsa-miR-4800-3p ? hsa-miR-4762-5p –0.307 rs193018526 hsa-miR-7151-5p ?hsa-miR-4800-3p –0.307

*If score change is more negative, it means greater chance of miRNA target site disrupted or created. 104 Page 8 of 10 Mehran Akhtar et al. were located at highly conserved regions. A66, A275, Y310, have contributed for the likelihood of cancer (Nicoloso et al. A315 and Y317 were buried and structurally important 2010; Quann et al. 2015). For TOX3 gene, there are no while P273 was exposed and functionally important. This reports for such SNPs. SNPs located in miRNA target sites further confirms the deleterious effect of these nsSNPs on can alter gene expression and may contribute to pathological TOX3 protein. We modelled the protein structures using conditions. In this study, we predicted nine SNPs that can I-TASSER. Only protein FASTA sequence were used as either create or abolish miRNA target site or may create input. Template selection and protein modelling was done by target site for some miRNA and abolish for others. SNP the tools itself. It used 2nbiA and 2co9 templates having rs117905209 was the only SNP that may lead to create 85% coverage and 83% identity, respectively. We calculated miRNA target site of hsa-miR-3908 with seed length of 10 RAMPAGE values for the modelled protein structures using bp. Other SNPs affected target sites were having length of Ramachandran Plot analysis (RAMPAGE server) (Lovell 6–8 bp. SNPs rs8049149 and rs193018526 were having the et al. 2002)(http://mordred.bioc.cam.ac.uk/rapper/rampage. highest score change of –0.307 followed by rs113577155 php). Wild-type TOX3 protein had RAMPAGE value of and rs117905209 with score change of –0.289 and –0.241 81.2% favoured and allowed residue and 18.8% outlier respectively. Our study also suggested that the four nonsense residues. For A266D mutant structure, favoured and allowed SNPs, rs1335235301 (Q527STOP), rs1259790811 residues were 81.9% and 18.1% were outlier residues. For (G495STOP), rs1294465822 (S395STOP) and P273S, P273T and A275T, favoured and allowed residues rs1335372738 (G8STOP) that may form premature stop were 83.8%, 80.7% and 85.4%, respectively and 16.2%, codons and ultimately lead to truncated protein or even 19.3% and 14.6% were outlier residues respectively. Protein inactivation of protein. structures will be considered as better structures if This study was carried out in detail and all the results were their RAMPAGE values are greater than 80% (Morris et al crosschecked to avoid any ambiguity. Every study has cer- 1992). tain limitations and so does ours. Although our study was in All the four amino acid substitution were having very detail but it is based on mathematical and statistical algo- high values of root mean square deviation which shows that rithms driven web servers and computer tools which need these nsSNPs are the most damaging SNPs in TOX3 pro- experimental investigation for confirmation. tein. nsSNP rs751141352 (P273S) has the highest RMSD In conclusion, our study identified SNPs in TOX3 gene value (2.67A˚ ) with 0.9278 TM-score, suggests that it has that could be vital in breast cancer. These SNPs included the highest effect on protein structure. rs200878352 nsSNPs and regulatory SNPs. nsSNPs rs368713418 (A275T) has the lowest RMSD value (1.71A˚ ) in the (A266D), rs751141352 (P273S, P273T) and rs200878352 selected nsSNPs (table 6). As RMSD values of greater than (A275T) which were predicted to be the most damaging. 2A˚ means that the mutant and wild structures have greater We also identified four nonsense SNPs, i.e. rs1335235301 differences. Three of the mutant structures A266D, P273S (Q527STOP), rs1259790811 (G495STOP), rs1294465822 and P273T have RMSD values[2A˚ which means that these (S395STOP) and rs1335372738 (G8STOP) that lead to nsSNPs can be highly damaging for the TOX3 protein. premature stop codon that are considered to produce Post-translational modifications (PTMs) are the key factors truncated proteins which are not functional. The results that directs proteins to perform many important functions supported by high RMSD values and high Mutpre1.2 like protein–protein interactions (PPIs) and cell signalling P values suggest that these nsSNPs are very important in (Dai and Gu 2010; Shiloh and Ziv 2013). We further breast cancer and other associated disease studies. These looked for PTMs sites in TOX3 protein whether it had SNPs may have deleterious effect on TOX3 protein and potential PTMs at these nsSNPs positions or not. Interest- might play a vital role in these diseases. Besides nsSNPs, ingly, none of the phosphorylation and ubiquitylation sites we also investigated regulatory SNPs for its potential effect were found to be located at any of the damaging nsSNPs on TOX3 regulation and found nine SNPs possibly effect position. We searched for other nsSNPs that were located at miRNA target site and seed region, includes rs80129475, the seven sites predicted to be phosphorylated by GPS3.0 rs138057965, rs117905209, rs113577155, rs149503838, and NetPhos3.1 (table 4). We found six nsSNPs at four rs72805466, rs144919658, rs8049149, rs193018526. These potential phosphorylation sites: rs773384434 at amino acid three most damaging nsSNPs, four stop gaining SNPs and position 238, rs747488842 and rs1000682430 at amino acid nine regulatory SNPs are suggested to have possible role in position 341 and rs771386648 at position 529. At the three breast cancer and can also be considered to have potential potential ubiquitylation sites predicted by BDM-PUB and role in other types of associated diseases and cancers as UbPred, only one nsSNP rs1404777742 was located at well. Our study can help in personalized medicines being amino acid position 201. designed for breast cancer patients and can also be helpful SNPs in miRNA target site and seed region can lead to in future prediction of having breast cancer history in disease susceptibility. There are many studies that have been family. Although our study was conducted in detail, as it is performed on association of miRNA region regulatory SNP in silico study, thus cell cultures and animal models car- with disease susceptibility. SNPs in miRNA target sites of rying these SNPs are needed to be studied using wet lab cancer genes like BRCA1 and TGF-b have been reported to practices. Human TOX3 in silico analysis Page 9 of 10 104

References disease from amino acid substitutions. Bioinformatics 25, 2744–2750. Lovell S. C, Davis I. W, Arendall III W. B, de Bakker P. I. W, Word Adzhubei I. A., Schmidt S., Peshkin L., Ramensky V. E., J. M., Prisant M. G. et al. 2002 Structure validation by Calpha Gerasimova A., Bork P. et al. 2010 A method and server for geometry: phi, psi and Cbeta deviation. Proteins 50, 437–450. predicting damaging missense mutations. Nat. Methods 7, Low S., Takahashi A., Ashikawa K., Inazawa J., Miki Y., Kubo M. 248–249. et al. 2013 Genome-wide association study of breast cancer in Akhtar M., Jamal T., Jamal H., Din J. U., Jamal M., Arif M. et al. the Japanese population. PLoS One 8, e76463. 2019 Identification of most damaging nsSNPs in human CCR6 Miller M. P. and Kumar S. 2001 Understanding human disease gene: in silico analyses. Int. J. Immunogenet. Article ID. 12449. mutations through the use of interspecific genetic variation. Hum. Berezin C., Glaser F., Rosenberg J., Paz I., Pupko T., Fariselli P. Mol. Genet. 10, 2319–2328. et al. 2004 ConSeq: the identification of functionally and Morris A. L., MacArthur M. W., Hutchinson E. G. and Thornton J. structurally important residues in protein sequences. Bioinfor- M. 1992 Stereochemical quality of protein structure coordinates. matics 20, 1322–1324. Proteins 12, 345–364. Bhagwat M. 2010 Searching NCBI’s dbSNP database. Curr. Nicoloso M. S., Sun H., Spizzo R., Kim H., Wickramasinghe P., Protoc. Bioinformatics 32, 1–18. Shimizu M. et al. 2010 Single nucleotide polymorphisms inside Bhattacharya A., Ziebarth J. D. and Cui Y. 2014 PolymiRTS microRNA target sites influence tumor susceptibility. Cancer Database 3.0: linking polymorphisms in microRNAs and their Res. 70, 2789–2798. target sites with human diseases and biological pathways. Nu- O’Flaherty E. and Kaye J. 2003 TOX defines a conserved cleic Acids Res. 42, D86–D91. subfamily of HMG-box proteins. BMC Genomics 4, 13–22. Blom N., Gammeltoft S. and Brunak S. 1999 Sequence and Pettersen E. F., Goddard T. D., Huang C. C., Couch G. S., structure-based prediction of eukaryotic protein phosphorylation Greenblatt D. M., Meng E. C. et al. 2004 UCSF ChimeraÐa sites. J. Mol. Biol. 294, 1351–1362. visualization system for exploratory research and analysis. J. Capriotti E., Fariselli P and Casadio R. 2005 I-Mutant2.0: Comput. Chem. 13, 1605–1612. predicting stability changes upon mutation from the pro- Quann K., Jing Y and Rigoutsos I. 2015 Post-transcriptional tein sequence or structure. Nucleic Acids Res. 33, regulation of BRCA1 through its coding sequence by the miR- W306–W310. 15/107 group of miRNAs. Front Genet. 6, 242. Capriotti E., Calabrese R and Casadio R. 2006 Predicting the Radivojac P, Vacic V., Haynes C., Cocklin R. R., Mohan A., Heyen insurgence of human genetic diseases associated to single point J. W. et al. 2010 Identification, analysis and prediction of protein protein mutations with support vector machines and evolutionary ubiquitination sites. Proteins 78, 365–380 information. Bioinformatics 22, 2729–2734. Roy A., Kucukural A. and Zhang Y. 2010 I-TASSER: a unified Capriotti E., Calabrese R., Fariselli P., Martelli P. L., Altman R. B. platform for automated protein structure and function prediction. and Casadio R. 2013 SNPs&GO: a web server for predicting the Nat. Protoc. 5, 725–738. deleterious effect of human protein variants using functional Ruiz-Narvaez E., Rosenberg L., Cozier Y., Cupples L., Adams- annotation. BMC Genomics 3, S6. Campbell L. and Palmer J. 2010 Polymorphisms in the TOX3/ Choi Y., Sims G. E., Murphy S., Miller J. R. and Chan A. P. 2012 LOC643714 locus and risk of breast cancer in African-American Predicting the functional effect of amino acid substitutions and women. Cancer Epidemiol. Biomarkers Prev. 19, 1320–1327. indels. PLoS One 7, e46688. Shan J., Dsouza S. P., Bakhru S., Al-Azwani E. K., Ascierto M. L., Dai C and Gu W. 2010 post-translational modification: Sastry K. S. et al. 2013 TNRC9 downregulates BRCA1 deregulated in tumorigenesis. Trends Mol. Med. 16, 528–536. expression and promotes breast cancer aggressiveness. Cancer Deng W., Wang Y., Ma L., Zhang Y., Ullah S., Xue Y. 2017 Res. 73, 2840–2849. Computational prediction of methylation types of covalently Shi Y., Zhao H., Shi Y., Cao Y., Yang D. and Li Z. 2012 Genome- modified lysine and arginine residues in proteins. Brief. Bioin- wide association study identifies eight new risk loci for form. 18, 647–658. polycystic ovary syndrome. Nat. Genet. 44, 1020–1025. Easton D. F., Pooley K. A., Dunning A. M., Pharoah P. D. P., Shiloh Y. and Ziv Y. 2013 The ATM protein kinase: regulating the Thompson D., Ballinger D. G. et al. 2007 Genome-wide cellular response to genotoxic stress, and more. Nat. Rev. Mol. association study identifies novel breast cancer susceptibility Cell Biol. 14, 197–210. loci. Nature 447, 1087–1093 Slattery M., Baumgartner K., Giuliano A., Byers T., Herrick J. and Elematore I., Gonzalez-Hormazabal P., Reyes J., Blanco R., Bravo Wolff R. 2011 Replication of five GWAS-identified loci and T., Octavio Peralta et al. 2014 Association of genetic variants at breast cancer risk among Hispanic and non-Hispanic white TOX3, 2q35 and 8q24 with the risk of familial and early-onset women living in the Southwestern United States. Breast Cancer breast cancer in a South-American population. Mol. Biol. Rep. Res. Treat. 129, 531–539. 41, 3715–3722. Stacey S., Manolescu A., Sulem P., Rafnar T., Gudmundsson J., Gabai-Kapara E., Lahad A., Kaufman B., Friedman E., Segev S., Gudjonsson S. A. et al. 2007 Common variants on chromosomes Renbaum P. et al. 2014 Population-based screening for breast 2q35 and 16q12 confer susceptibility to estrogen – and ovarian cancer risk due to BRCA1 and BRCA2. Proc. Natl. positive breast cancer. Nat. Genet. 39, 865–869. Acad. Sci. USA 111, 14205–14210. Tavtigian S., Oefner P., Babikyan D., Hartmann A., Healey S., Le He X., Yao G., Li F., Li M. and Yang X. 2014 Risk-association of Calvez-Kelm F. et al. 2009 Rare, evolutionarily unlikely five SNPs in TOX3/LOC643714 with breast cancer in southern missense substitutions in ATM confer increased risk of breast China. Int. J. Mol. Sci. 15, 2130–2141. cancer. Am. J. Hum. Genet. 85, 427–446. Jemal A., Bray F., Center M., Ferlay J., Ward E. and Forman D. Udler M. S., Ahmed S., Healey C. S., Meyer K., Struwing J., 2011 Global cancer statistics. CA Cancer J. Clin. 61, 69–90. Maranian M. et al. 2010 Fine scale mapping of the breast cancer Jones J. O., Chin S. F., Wong-Taylor L. A., Leaford D., Ponder B. 16q12 locus. Hum. Mol. Genet. 19, 2507–2515. A., Caldas C. et al. 2013 TOX3 mutations in breast cancer. PLoS Winkelmann J., Czamara D., Schormair B., Knauf F., Schulte E. C., One 8, e74102. Trenkwalder C. et al. 2011 Genome-wide association study Li B., Krishnan V. G., Mort M. E., Xin F., Kamati K. K., Cooper D. identifies novel restless legs syndrome susceptibility loci on 2p14 N. et al. 2009 Automated inference of molecular mechanisms of and 16q12.1. PLoS Genet. 7, e1002171. 104 Page 10 of 10 Mehran Akhtar et al.

Xue Y., Ren J., Gao X., Jin C., Wen L. and Yao X. 2008 GPS 2.0, a Zhang Y. 2008 I-TASSER server for protein 3D structure predic- tool to predict kinase-specific phosphorylation sites in hierarchy. tion. BMC Bioinformatics 9, 40. Mol. Cell Proteomics 7, 1598–1608. Zhang L. and Long X. 2015 Association of three SNPs in TOX3 Yang J., Yan R., Roy A., Xu D., Poisson J. and Zhang Y. 2015 The and breast cancer risk: evidence from 97275 cases and 128686 I-TASSER suite: protein structure and function prediction. Nat. controls. Sci. Rep. 5, 12773. Methods 12, 7–8. Zheng W., Long J., Gao Y., Li C., Zheng Y. and Xiang Y. Yuan S. H., Qiu Z. and Ghosh A. 2009 TOX3 regulates calcium- 2009 Genome-wide association study identifies a new dependent transcription in neurons. Proc. Natl. Acad. Sci. USA. breast cancer susceptibility locus at 6q25.1. Nat. Genet. 41, 106, 2909–2914. 324–328.

Corresponding editor: H. A. RANGANATH