Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 Contents lists available at ScienceDirect 2 3 4 Journal of Theoretical Biology 5 6 journal homepage: www.elsevier.com/locate/yjtbi 7 8 9 10 11 12 3D model for cancerous inhibitor of protein phosphatase 2 a armadillo 13 domain unveils highly conserved protein–protein 14 15 interaction characteristics 16 n Q117 Käthe M. Dahlström, Tiina A. Salminen 18 Structural Bioinformatics Laboratory, Biochemistry, Faculty of Science and Engineering, Åbo Akademi University, Tykistökatu 6 A, FI-20520, Turku, Finland 19 20 21 22 HIGHLIGHTS GRAPHICAL ABSTRACT 23 24 3D structural model for the arma- 25 dillo domain of CIP2A. 26 Centrally located groove is involved – 27 in protein protein interactions. 28 A highly conserved polar ladder can mediate peptide binding. 29 Cancer-causing Arg229Gln mutation 30 disrupts surface charge distribution. 31 32 33 34 35 article info abstract 36 37 Article history: Cancerous Inhibitor of Protein Phosphatase 2A (CIP2A) is a human oncoprotein, which exerts its cancer- 38 Received 17 June 2015 promoting function through interaction with other proteins, for example Protein Phosphatase 2A (PP2A) 39 Received in revised form and MYC. The lack of structural information for CIP2A significantly prevents the design of anti-cancer 40 26 August 2015 therapeutics targeting this protein. In an attempt to counteract this fact, we modeled the three- Accepted 14 September 2015 41 dimensional structure of the N-terminal domain (CIP2A-ArmRP), analyzed key areas and amino acids, 42 and coupled the results to the existing literature. The model reliably shows a stable armadillo repeat fold Keywords: 43 with a positively charged groove. The fact that this conserved groove highly likely binds peptides is CIP2A corroborated by the presence of a conserved polar ladder, which is essential for the proper peptide- 44 KIAA1524 binding mode of armadillo repeat proteins and, according to our results, several known CIP2A interaction 45 Cancer partners appropriately possess an ArmRP-binding consensus motif. Moreover, we show that Arg229Gln, 3D structural modeling 46 fi Polar ladder which has been linked to the development of cancer, causes a signi cant change in charge and surface 47 properties of CIP2A-ArmRP. In conclusion, our results reveal that CIP2A-ArmRP shares the typical fold, 48 protein-protein interaction site and interaction patterns with other natural armadillo proteins and that, 49 presumably, several interaction partners bind into the central groove of the modeled CIP2A-ArmRP. By 50 providing essential structural characteristics of CIP2A, the present study significantly increases our 51 knowledge on how CIP2A interacts with other proteins in cancer progression and how to develop new 52 therapeutics targeting CIP2A. 53 & 2015 Published by Elsevier Ltd. 54 55 67 68 56 1. Introduction latter. Cancerous Inhibitor of PP2A (CIP2A), also called KIAA1524 69 57 and p90, is able to disrupt this series of events by inhibiting the 58 70 Protein Phosphatase 2A (PP2A) functions as a tumor suppressor PP2A protein and, in turn, stabilize the MYC protein and cause a 59 71 by dephosphorylating MYC, which results in degradation of the rise in the MYC levels (Junttila et al., 2007). The inhibition of PP2A 60 72 dephosphorylation is thought to be indirect in the sense that 61 73 62 n Corresponding author. CIP2A is likely to bind directly to MYC by recognition of Ser62, 74 63 E-mail address: tiina.salminen@abo.fi (T.A. Salminen). thereby shielding this dephosphorylation target residue from the 75 64 76 http://dx.doi.org/10.1016/j.jtbi.2015.09.010 77 65 0022-5193/& 2015 Published by Elsevier Ltd. 66 78

Please cite this article as: Dahlström, K.M., Salminen, T.A., 3D model for cancerous inhibitor of protein phosphatase 2 a armadillo domain unveils highly conserved protein–protein interaction.... J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.09.010i 2 K.M. Dahlström, T.A. Salminen / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 PP2A activity (Junttila et al., 2007). Hence, the overexpression of acids in the interaction partner. This study reveals that several 67 2 CIP2A enables immortalized human cells to transform and grow as proteins, which have previously been linked to CIP2A, possess a 68 3 malignant cells, which makes CIP2A a human oncoprotein. Fur- conserved motif that is essential for binding into the positively 69 4 thermore, there are strong indications that the cellular transfor- charged central groove of ArmRP proteins. Furthermore, we show 70 5 mation property of CIP2A results from the cooperation between it that the Arg229Gln variant of CIP2A identified by Li et al., 2012, 71 6 and other oncoproteins. affects the electrostatic surface of the protein by significantly 72 7 Over the years, it has been demonstrated that CIP2A over- decreasing the positive charge in this area, which possibly affects 73 8 expression can be used as a clinically relevant prognostic marker the interaction with other proteins and, thus, promotes liver can- 74 9 in many types of cancers (Khanna et al., 2013), i.e. in most human cer. The CIP2A-ArmRP model gives valuable information for future 75 10 cancer types (Khanna et al., 2011), including ovarian cancer crystallization efforts, especially for designing constructs that can 76 11 (Bockelman et al., 2011), breast cancer (Come et al., 2009), non- be produced and crystallized. Designing such a construct, which 77 12 small cell lung cancer (Dong et al., 2011), gastric cancer (Khanna can be produced with an intact folding pattern, gives way to many 78 13 et al., 2009), bladder cancer (Xue et al., 2013), head and neck additional experimental possibilities, for example testing specific 79 14 squamous carcinoma (HNSCC), colon cancer (Junttila et al., 2007), functions of the domains and identifying protein–ligand or pro- 80 15 and hepatocellular carcinoma (HCC), i.e. liver cancer (Soo Hoo et tein–protein interaction sites. Furthermore, this study helps the 81 16 al., 2002). The latter cancer type has a higher rate of occurrence in design of point mutations to validate the functional role of the 82 17 the Chinese Han population and is linked to polymorphisms in the amino acids in the binding groove. 83 18 CIP2A , where a T to C mutation causes an Arg229Gln muta- 84 19 tion at the protein level and alters the expression of CIP2A (Li et al., 85 20 2012). This genetic variation may be accelerated by hepatitis B or C 2. Materials and methods 86 21 infection and eventually give rise to and promote liver cancer (Li 87 22 et al., 2012). Furthermore, CIP2A was recently found to enhance 2.1. Sequence analysis 88 23 the activity of rapamycin complex 1 (mTORC1), which stimulates 89 24 protein synthesis and inhibits , thereby driving cell All the studied sequences were acquired from UniProtKB and, 90 25 growth and stabilizing CIP2A (Puustinen and Jaattela, 2014). thereafter, the human CIP2A sequence (UniProtKB Q8TCG1) was 91 26 Puustinen and Jäättelä show for the first time that CIP2A is not a subjected to NCBI BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi) 92 27 universal PP2A inhibitor, but rather acts together with a subset of against the non-redundant database and Protein Data Bank (PDB) 93 28 phosphorylated PP2A substrates, for example MYC, E2F1, AKT, and to search for homologous sequences and crystal structures of 94 29 death-associated protein kinase 1 (DAPK1) (Puustinen and Jaattela, homologous proteins, respectively. No relevant crystal structures 95 30 2014). This study further underpins CIP2A as an oncoprotein and were found, but the statistically significant homologous sequences 96 31 also as a potential anticancer target, as was earlier hypothesized by (e-valueo0.0001) were acquired and aligned with the CIP2A 97 32 He et al., 2012 and stated by Khanna et al., 2013, who also high- sequence in the BODIL modeling environment (Lehtonen et al., 98 33 light the importance of solving the three-dimensional (3D) struc- 2004) using the program MALIGN (Johnson and Lehtonen, 2000) 99 34 ture for CIP2A in order to know whether it is a druggable protein and the STRMAT110 scoring matrix with a gap penalty of 40. Pic- 100 35 or not. The 3D structure of a protein determines its biological ture of the alignment was made with ESPript 2.2 (Gouet et al., 101 36 function and how the protein interacts with its ligands or other 1999). The domain architecture of the CIP2A sequence was ana- 102 37 proteins, which makes the knowledge of protein 3D structures lyzed with the Simple Modular Architecture Research Tool 103 38 vitally important for rational drug design. Although X-ray crys- (SMART) (Letunic et al., 2012; Schultz et al., 1998) in normal mode 104 39 tallography is a powerful tool in determining protein 3D struc- to find regions or domains, which could be modeled. Furthermore, 105 40 tures, it is time-consuming and expensive. In addition, not all secondary structure predictions for the CIP2A sequence were 106 41 proteins can be successfully crystallized, e.g. most membrane made with PSIPred (Jones, 1999), APSSP2 (Raghava, 2002), JPred 107 42 proteins will not dissolve in normal solvents and are difficult to (Cole et al., 2008) and PORTER (Pollastri and McLysaght, 2005) and 108 43 crystallize and, therefore, very few membrane protein structures compared to the SMART (Letunic et al., 2012; Schultz et al., 1998) 109 44 have been solved. Moreover, NMR is a very powerful tool in results to check if the folding pattern was similar to the fold of the 110 45 determining the 3D structures of membrane proteins (see, e.g. predicted domain. 111 46 Berardi et al., 2011; OuYang et al., 2013), but it is also time- 112 47 consuming and costly. Therefore, protein structure prediction is a 2.2. 3D structural modeling and evaluation 113 48 common and important way to get a first perception about the 114 49 essential structural features involved in the biological function Amino acids 47–321 of the N-terminal CIP2A-ArmRP domain 115 50 (Lopez et al., 2007). Structural studies of proteins are especially were modeled with the I-TASSER server for protein structure and 116 51 valuable in the development of therapeutics and can also aid the function prediction (Roy et al., 2010, 2012; Zhang, 2008), which 117 52 design of experiments to verify interaction sites with ligands or uses the threading approach of Local MEta-Threading-Server 118 53 other proteins. Therefore, all structural data on CIP2A would be (LOMETS; Wu and Zhang, 2007) to identify structural templates 119 54 essential for to determine the mechanism behind the cancer from PDB and then assemble the template fragments into a full- 120 55 promoting function of CIP2A and the data would further aid the length protein model. The sequence for the N-terminal CIP2A- 121 56 development of new therapeutics aimed at preventing the cancer- ArmRP domain was also subjected to the protein homology/ana- 122 57 causing mechanism. logy recognition engine Phyre (Kelley and Sternberg, 2009) and 123 58 In this study, we present a three-dimensional model for the N- the homology detection and structure prediction server HHPred 124 59 terminal domain of human CIP2A and analyze its structural fea- (Remmert et al., 2011; Soding, 2005; Soding et al., 2005). The best 125 60 tures. We show that the domain is all α-helical and possess the model was evaluated with PROCHECK (Laskowski et al., 1993), 126 61 armadillo repeat (ArmRP) fold, where the α-helices twist around ProSA web (Sippl, 1993; Wiederstein and Sippl, 2007), ProQ 127 62 an axis to form a superhelix with a charged central groove. In (Wallner and Elofsson, 2003), Verify 3D (Eisenberg et al., 1997), 128 63 CIP2A, this groove is positively charged and multiple sequence ERRAT (Colovos and Yeates, 1993), QMEAN (Benkert et al., 2009) 129 64 alignment of CIP2A with homologous proteins indicates that it is and ModFOLD4 (Buenavista et al., 2012; McGuffin and Roche, 130 65 important for the function of CIP2A and highly likely mediates its 2010; McGuffin et al., 2013), as well as manually inspected and 131 66 protein–protein interactions by binding negatively charged amino compared to the template structure. Alternate conformations for 132

Please cite this article as: Dahlström, K.M., Salminen, T.A., 3D model for cancerous inhibitor of protein phosphatase 2 a armadillo domain unveils highly conserved protein–protein interaction.... J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.09.010i K.M. Dahlström, T.A. Salminen / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 3

1 the loop consisting of residues 225–230 was predicted with the equilibrium protocols can be found in the Supplementary Infor- 67 2 Loopy program in Jackal (http://wiki.c2b2.columbia.edu/honiglab_ mation. VMD (Humphrey et al., 1996) and the ptraj module in 68 3 public/index.php/Software:Jackal) and pictures of the model were AMBER were used to study the MD simulation trajectories and the 69 4 made with PyMOL (Version 1.4, Schrödinger, LLC). final structures were analyzed in PyMOL. 70 5 71 6 2.3. Binding site and electrostatic surface studies 72 7 3. Results and discussion 73 8 The ConSurf server (Ashkenazy et al., 2010; Celniker, 2013; 74 9 Glaser et al., 2003; Landau et al., 2005) was used to identify 3.1. Human CIP2A N-terminus folds into an ArmRP domain 75 10 functional regions in the CIP2A-ArmRP domain and to see the 76 11 distribution of conserved amino acids in the model. The server was Human CIP2A is a protein of 905 amino acids and a molecular 77 12 allowed to generate a multiple sequence alignment according to mass of 102 kDa (UniProtKB Q8TCG1). The SMART analysis 78 13 default parameters. In silico mutations (Arg229Gln) of the model (Letunic et al., 2012; Schultz et al., 1998) showed that the domain 79 14 were made in PyMOL, where also the electrostatic surfaces were architecture of CIP2A consists of an ArmRP domain from amino 80 15 calculated with the APBS tool (Adaptive Poisson–Boltzmann Sol- acids 47–308 (CIP2A-ArmRP), and a coiled coil region beginning 81 16 ver) and PyMOL generated PQR, hydrogens and termini. The from 636 and ending at amino acid 884. ArmRP pro- 82 17 pockets and ligand binding sites were calculated with MetaPocket teins are built of ArmRP motifs, which consist of 42 amino acids 83 18 2.0 (Huang, 2009; Zhang et al., 2011) and compared to the I- folded into three α-helices (H1, H2 and H3) each (Reichen et al., 84 19 TASSER results. The BODIL modeling environment (Lehtonen et al., 2014). Four to twelve ArmRP motifs then stack together and form a 85 20 2004) and the program MALIGN (Johnson and Lehtonen, 2000) right-handed superhelix, and this rigid structure itself is highly 86 21 were also used to analyze the amino acid sequences of the CIP2A conserved even though the sequence identity between different 87 22 interaction partners PP2A (UniProtKB Q13362 [56 kDa regulatory ArmRP proteins is low (Coates, 2003). In agreement with the 88 23 subunit γ], P67775 [catalytic subunit α], P30153 [65 kDa scaf- domain prediction, the secondary structure predictions from sev- 89 24 folding subunit α], P30154 [65 kDa scaffolding subunit β]), MYC eral servers consistently predict an all α-helical secondary struc- 90 25 (UniProtKB P01106), H-Ras (UniProtKB P01112), E2F1 (UniProtKB ture profile for CIP2A-ArmRP and support the fact that residues 91 26 Q010S4), AKT (UniProtKB P31749), DAPK1 (UniProtKB P53355) and 47–308 of CIP2A exhibit the ArmRP fold with six ArmRP repeats. 92 27 the mTORC1 complex (UniProtKB P42345 [subunit mTOR], Furthermore, Structural Classification of Proteins (SCOP; Murzin 93 28 Q8N122 [subunit RPTOR], Q8TB45 [subunit DPTOR], Q9BVC4 et al., 1995) suggested that the CIP2A-ArmRP domain is likely to be 94 29 [subunit MLST8], Q96B36 [subunit AKTS]). similar to the ArmRP protein β-catenin (PDB 1JDH; Graham et al., 95 30 2001) with a significant E-value of 1.00e04. β-catenin is all α- 96 31 2.4. Protein structure preparation for molecular dynamics helical with 12 ArmRP repeats, which together form a superhelix. 97 32 simulations To acquire the structural information in a timely manner, a 98 33 series of 3D protein structures have been developed by means of 99 34 Molecular dynamics simulations were performed to confirm structural bioinformatics tools such as homology technique (see, 100 35 that the 3D structural model of CIP2A-ArmRP is stable without e.g. Chou, 2005; Wang et al., 2007a, b, 2009b) and a comprehen- 101 36 unfolding or any significant changes in secondary structure. The sive review (Chou, 2004b), and were found very useful for drug 102 37 Protein Preparation Wizard in Maestro molecular modeling soft- development. In view of this, the structural bioinformatics tech- 103 38 ware (v.9.6; Schrödinger, Inc.) was used to prepare the I-TASSER nique was also adopted to develop the relevant protein 3D struc- 104 39 CIP2A-ArmRP model for the molecular dynamics (MD) simula- tures for the current study. To find a suitable crystal structure to be 105 40 tions. The model included all hydrogen atoms from the start, but used as a template for modeling of human CIP2A, the amino acid 106 41 the polar interactions of the His residues were manually checked sequence was used as bait in a BLAST search against PDB at the 107 42 and the protonation states selected to optimize the hydrogen bond NCBI web server, which did not give any relevant hits. Thereafter, 108 43 network. The crystallized template structure did not contain any we decided to try the same searches only for the CIP2A-ArmRP 109 44 His, whose protonation state would be important for the hydrogen domain, but this approach did not give any valuable results either, 110 45 bond network. since the known structures covered only the last 70 amino acids 111 46 of the CIP2A-ArmRP sequence. To aid this type of situation, several 112 47 2.5. Molecular dynamics simulations highly sensitive methods have been developed (Soding, 2005)to 113 48 detect remote homologs and produce alignments for inference of 114 49 The AMBER package (v.12; Case et al., 2012) and the AMBER possible structure, function and evolution. Due to the lack of a 115 50 ff03 force field (Duan et al., 2003) were used for energy mini- close homolog with known structure and function, we turned to 116 51 mization, thermal equilibration and standard production simula- the possibility of producing a structural model with the help of 117 52 tions. Three parallel simulations were run for the modeled struc- structure prediction servers. We tested the I-TASSER (Roy et al., 118 53 ture. All simulations were run in an octahedral box, which 2010, 2012; Zhang, 2008), HHPred (Remmert et al., 2011; Soding, 119 54 extended 10.0 Å from the protein and was filled with explicit TIP3P 2005; Soding et al., 2005) and Phyre (Kelley and Sternberg, 2009) 120 55 water molecules (Jorgensen et al., 1983) and six neutralizing Naþ servers for predicting the structure of the whole CIP2A sequence 121 56 ions for the model, while 15 neutralizing Naþ ions were needed and only the CIP2A sequence until amino acid 321, which 122 57 for the template structure. Periodic boundary conditions, particle- encompasses the CIP2A-ArmRP domain. The significant results for 123 58 mesh Ewald electrostatics (Essmann et al., 1995) and a cut-off of the structure prediction of the CIP2A sequence showed the whole 124 59 9 Å for non-bonded interactions were used. A 1 fs time step (for protein as an ArmRP protein, which is not reliable in the light of 125 60 Langevin dynamics during equilibration) or 2 fs was applied, while the SMART results indicating a coiled coil structure for the C- 126 61 bonds to hydrogen atoms were constrained with the SHAKE terminal part of CIP2A. The structure prediction for the CIP2A- 127 62 algorithm (Ryckaert et al., 1977). A constant temperature of 300 K ArmRP domain showed better confidence and higher reliability 128 63 and a pressure of 1 bar were used during the 20 ns production and, thus, we chose the highest ranked model of the CIP2A-ArmRP 129 64 simulations. The coupling constants for temperature and pressure from I-TASSER (Roy et al., 2010, 2012; Zhang, 2008) for further 130 65 (Berendsen et al., 1984) were 5.0 and 2.0 ps, respectively. A more analysis. I-TASSER employs LOMETS (Wu and Zhang, 2007)to 131 66 detailed description of the stepwise energy minimization and search for templates from PDB through threading approaches and 132

Please cite this article as: Dahlström, K.M., Salminen, T.A., 3D model for cancerous inhibitor of protein phosphatase 2 a armadillo domain unveils highly conserved protein–protein interaction.... J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.09.010i 4 K.M. Dahlström, T.A. Salminen / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 for CIP2A-ArmRP the crystallized synthetic protein OR329 arm8 67 2 (PDB 4HXT; Parmeggiani et al., 2014) was ranked as the best 68 3 template with a normalized Z-score of 1.08 for the alignment 69 4 (should be41 for confident alignment (Roy et al., 2010)) and 88% 70 5 coverage. Moreover, the C-score, which gives the confidence for 71 6 the quality of the predicted models, is 0.36 for the CIP2A-ArmRP 72 7 model (ranges between 5–2 with higher value being better (Roy 73 8 et al., 2010)) and the TM-score, which shows the topological 74 9 similarity between the template and the model, is 0.6770.13 (TM- 75 10 score40.5 approximate cutoff value (Roy et al., 2010)). Further- 76 11 more, I-TASSER has proven to generate the best 3D structure 77 12 predictions among all automated servers in the CASP710 (Zhang, 78 13 2014) experiments, which adds to the credibility of the model. 79 14 However, worth to notice that all of the used protein structure 80 15 prediction servers listed the OR329 arm8 protein as a possible 81 16 template: HHpred (Remmert et al., 2011; Soding et al., 2005; 82 17 Soding, 2005) suggested it to be the second best model (prob- 83 18 ability 98.2% and E-value 3e05), while Phyre (Kelley and Stern- 84 19 berg, 2009) showed OR329 arm8 as a possible template within the 85 20 20 highest ranked models (number 19 with a confidence of 97.6%). 86 Fig. 1. RMSD fluctuation of the protein backbones during Molecular Dynamics Q2 21 This indicates a high probability that the CIP2A-ArmRP domain (MD) Simulation. The RMSD fluctuation for the three parallel MD simulations of the 87 22 folds into a structure, which is similar to the ArmRP-fold of modeled CIP2A-ArmRP domain after least –squares fit to the initial structure are 88 23 OR329 arm8. shown in red (MD run 1), purple (MD run 2) and cyan (MD run 3) and the corre- 89 sponding fluctuation for the crystal structure of OR329 arm8 is shown in black. (For 24 The model was also validated with several evaluation pro- interpretation of the references to color in this figure legend, the reader is referred 90 25 grams/servers, as well as manually inspected and compared to the to the web version of this article.) 91 26 template structure, and all the results were acceptable. PROCHECK 92 27 (Laskowski et al., 1993) listed 91.5% of residues in the most favored and its final frame structure after the MD simulation, while the 93 28 regions and 8.5% of the residues in additional allowed regions of corresponding value is 0.7 Å for the crystal structure. It can be 94 29 the Ramachandran plot, while ProQ (Wallner and Elofsson, 2003) speculated that the crystal structure is more stable because it is a 95 4 30 gave a predicted LG score of 5.206 (values 4 is an extremely good synthetic protein, designed to be stable, but the MD simulation 96 4 31 model) and a MaxSub score of 0.490 (values 0.5 is a very good results also confirm that the N-terminal domain of CIP2A is likely 97 32 model), and ProSA web (Sippl, 1993; Wiederstein and Sippl, 2007) to fold into an ArmRP structure. Based on the evaluation results, 98 33 showed a Z-score of 8.61 (in the range of scores typically found 99 the model can be reliably used for structural and charge dis- 34 for native proteins of similar size) with the average energy over a 100 tribution studies, as well as for analysis of functional amino acids 35 40 amino acid window below the threshold. ModFOLD4 (Buena- 101 together with multiple sequence alignment. Manual evaluation 36 vista et al., 2012; McGuffin and Roche, 2010; McGuffin et al., 2013) 102 showed that Arg229 was pointing into a hydrophobic environment 37 gave a global model quality score of 0.6882 (values40.4 indicate 103 in the protein, which is rarely seen for charged amino acids unless 38 confident models) and a confidence and P-value of 7.052E4 104 39 (valueso0.001 means less than 1/1000 chance that your model is they form favorable salt bridges. Therefore, we searched for 105 – 40 incorrect). Many remarkable biological functions in proteins and alternative loop conformations for this area (amino acids 225 106 41 DNA and their profound dynamic mechanisms, such as switch 230) with the Loopy program in Jackal (http://wiki.c2b2.columbia. 107 42 between active and inactive states (Wang and Chou, 2009a), edu/honiglab_public/index.php/Software:Jackal), which gave three 108 43 allosteric transition (Wang et al., 2009c), intercalation of drugs low energy conformations and, in the one we chose, Arg229 109 44 into DNA (Chou and Mao, 1988), and assembly of microtubules interacts with the solvent instead. 110 45 (Chou et al., 1994), can be revealed by studying their internal The model of the CIP2A-ArmRP domain exhibits the typical 111 46 motions as elaborated in a comprehensive review by Chou, 1988. ArmRP fold with a total of 18 helices and altogether six ArmRP 112 47 Likewise, to really understand the action mechanism of receptor- motifs (Fig. 2), which twist into a right-handed superhelix, where 113 48 ligand binding, we should consider not only the static structures adjacent H3 helices create the concave central groove typical for 114 49 concerned but also the dynamical information obtained by simu- these proteins (Reichen et al., 2014; Varadamsetty et al., 2012). 115 50 lating their internal motions or dynamic process. To realize this, Furthermore, I-TASSER identifies structural analogs in PDB and all 116 51 the MD simulation is one of the feasible tools and, hence, three of the top 10 analogs detected are ArmRP proteins with a TM- 117 52 parallel MD simulations were performed to verify that the CIP2A- score40.8 (cutoff for detecting structural analogs is 0.5 (Roy et al., 118 53 119 ArmRP model exhibited a stable fold, which would not unfold or 2010)). Comparision of sequence logos deduced from naturally 54 change significantly during the simulations. The results confirm 120 occurring and designed ArmRP proteins (Parmeggiani et al., 2014) 55 that the model was stable with no drift in energy or temperature. 121 to the CIP2A-ArmRP domain, shows that the highly conserved Leu- 56 MD simulations were also performed for the OR329 arm8 crystal 122 Val-X-Leu-Leu motif (position 17-21 in the ArmRP domains (Par- 57 structure and the Root Mean Square Deviation (RMSD) of both 123 meggiani et al., 2014) is also present in the CIP2A-ArmRP sequence 58 CIP2A-ArmRP and OR329 arm8 backbones was plotted over the 124 – 59 simulation time after a least-squares fit to the initial structure (position 59 63). Additional conserved amino acids are Ile50, 125 60 (Fig. 1). The deviation difference between the model and the Leu56 and Asn67 (CIP2A-ArmRP numbering), and also Glu61 and 126 61 crystal structure is 2 Å and they are both stable since the sec- Glu64 when CIP2A-ArmRP is compared to only the designed 127 62 ondary structure elements are rigid and do not unfold during the ArmRP sequences (Parmeggiani et al., 2014). This further confirms 128 63 simulation. Most of the deviation was due to the flexibility of the that CIP2A has an ArmRP domain, which is likely to fold into the 129 64 N-terminus in both the model and the crystal structure, which highly conserved ArmRP fold similar to crystallized ArmRP 130 65 gives an RMSD of 2.4 Å between the initial CIP2A-ArmRP model proteins. 131 66 132

Please cite this article as: Dahlström, K.M., Salminen, T.A., 3D model for cancerous inhibitor of protein phosphatase 2 a armadillo domain unveils highly conserved protein–protein interaction.... J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.09.010i K.M. Dahlström, T.A. Salminen / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 5

1 67 2 68 3 69 4 70 5 71 6 72 7 73 8 74 9 75 10 76 11 77 12 78 13 Fig. 2. 3D structural model of CIP2A-ArmRP domain. The model shows high confidence for the typical armadillo fold. Each armadillo repeat consists of three helices 79 14 (indicated by separate colors), which twist into a superhelix and forms a central groove. 80 15 81 16 82 17 83 18 84 19 85 20 86 21 87 22 88 23 89 24 90 25 91 26 92 27 93 28 94

29 Fig. 3. Binding pockets found with MetaPocket 2.0. MetaPocket 2.0 finds the central groove of the CIP2A-ArmRP model as two pockets (black spheres). These two pockets 95 30 (green and cyan in the surface model to the left) share some residues (pink), indicating that they should form one unanimous pocket and binding site in CIP2A-ArmRP. To the 96 31 right is a close-up view of the pockets and the predicted functional residues (coloring according to the surface model on the left), with strictly or highly conserved residues in 97 32 bold. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) 98 33 99 34 3.2. The central groove in CIP2A-ArmRP is a binding pocket binding site in the CIP2A-ArmRP groove. The fact that all the listed 100 35 amino acids are conserved in the multiple sequence alignment 101 36 The concave central groove typical for ArmRP fold proteins is produced by BODIL (Fig. 4), further indicates a functional role for 102 37 known to be the interaction site for target proteins (Reichen et al., these amino acids. We also compared the results to the multiple 103 38 2014), which can be seen in the crystal structures of the typical sequence alignment generated by ConSurf (see Appendix A- 104 α 39 ArmRP fold proteins importin (PDB 3BTR; Cutress et al., 2008) Supplementary Information)(Ashkenazy et al., 2010; Celniker, 105 β 40 and -catenin (PDB 1JPP; Eklof Spink et al., 2001). This led us to 2013; Glaser et al., 2003; Landau et al., 2005), which we ran to 106 41 investigate whether the central groove in CIP2A-ArmRP would be analyze how the conserved areas are distributed in the CIP2A- 107 42 a possible binding pocket. In general, the information of a ligand- ArmRP model. This alignment shows that Met160, Pro161, Phe207, 108 43 binding pocket of a receptor is very important for drug design, Ser213, Leu249, Lys252, Tyr253 and Met259 are not as well con- 109 44 particularly for conducting mutagenesis studies (Chou, 2004b). In served as the rest of the residues, which in turn are highly con- 110 45 the literature, the ligand-binding pocket of a protein receptor is served. Residues Asn168, Ser214, Leu217 and Asp256 are strictly 111 46 usually defined by those residues that have at least one heavy conserved and, hence, probably critical for structure and/or func- 112 47 atom (i.e. an atom other than hydrogen) within a distance of 5 Å tion of CIP2A-ArmRP. To summarize the results, the central groove 113 48 from a heavy atom of the ligand. Such a criterion was originally in CIP2A-ArmRP is highly likely to be a binding pocket for CIP2A’s 114 49 used to define the binding pocket of ATP in the Cdk5-Nck5a* interaction partners, for example MYC and PP2A (Junttila et al., 115 50 complex (Chou et al., 1999) that has later proved quite useful in 2007). 116 51 identifying functional domains and stimulating the relevant 117 52 truncation experiments. Similar approaches have also been used to 3.3. CIP2A-ArmRP is involved in protein–protein interactions 118 53 define the binding pockets of many other receptor-ligand inter- 119 54 actions important in drug design (Chou, 2004a; Huang et al., 2008; ArmRP fold proteins are known to have differently charged 120 55 Li et al., 2011; Wang et al., 2007). We ran a MetaPocket 2.0 (Huang, central grooves, as is exemplified by the negatively charged groove 121 56 2009; Zhang et al., 2011) search for binding pockets in our model in importin α and the positively charged groove in β-catenin 122 57 and, indeed, this area was found to be a possible binding site. (Reichen et al., 2014). To determine the charge of the central 123 58 However, it was divided into two separate pockets (Fig. 3), but groove in CIP2A-ArmRP, we calculated the electrostatic surface of 124 59 when they are combined the pocket spans the whole central this domain, which shows that the central groove is positively 125 60 groove: one of the pockets is made up of residues Arg171, Ser210, charged (Fig. 5A). Interestingly, the back of CIP2A-ArmRP is divi- 126 61 Ser213, Ser214, Leu217, Lys252, Tyr253, Asp256, Met259, Asp260, ded almost in half: the right side is negatively charged and the left 127 62 while the other one consists of residues Gln122, Gln125, Met160, side positively charged (Fig. 5B), and also the top and the bottom 128 63 Pro161, Gly164, Asn168, Arg171, Val206, Phe207, Ser210, Leu249, of CIP2A show clearly defined areas of positive and negative 129 64 Tyr253. As can be seen from this list, the two pockets run into each charge (Fig. 5C, D). Furthermore, the ConSurf analysis (Ashkenazy 130 65 other since Arg171, Ser210 and Tyr253 are shared between them, et al., 2010; Celniker, 2013; Glaser et al., 2003; Landau et al., 2005) 131 66 which suggests that they would form a unanimous pocket and shows that the amino acids in the central groove are highly 132

Please cite this article as: Dahlström, K.M., Salminen, T.A., 3D model for cancerous inhibitor of protein phosphatase 2 a armadillo domain unveils highly conserved protein–protein interaction.... J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.09.010i 6 K.M. Dahlström, T.A. Salminen / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 67 2 68 3 69 4 70 5 71 6 72 7 73 8 74 9 75 10 76 11 77 12 78 13 79 14 80 15 81 16 82 17 83 18 84 19 85 20 86 21 87 22 88 23 89 24 90 25 91 26 92 27 93 28 94 29 95 30 96 31 97 32 98 33 99 34 100 35 101 36 102 37 103 38 104 39 105 40 106 41 107 42 108 43 109 44 110 45 111 46 112 47 113 48 114 49 115 50 116 51 117 52 118 53 119 54 120 55 121 56 122 57 123 58 124 59 125 60 126 61 Fig. 4. Sequence alignment of the modeled human CIP2A-ArmRP domain and homologous proteins. Conserved amino acids are boxed in red and shown with bold letters. 127 62 Orange boxes show the functional residues predicted by MetaPocket 2.0, while the ligand binding residues predicted by I-TASSER are marked with black triangles. Pink boxes 128 63 indicate the polar ladder, while the positively charged amino acids, which could form ionic interactions with a peptide, are marked in cyan. The Arg229Gln mutation is 129 64 marked in yellow and with a black star. Also Gln122, Gln125, Asn168 (polar ladder) and Arg171 and Lys252 (positively charged residues) were found by MetaPocket 2.0, but 130 for clarity they are only marked with pink and cyan, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version 65 of this article.) 131 66 132

Please cite this article as: Dahlström, K.M., Salminen, T.A., 3D model for cancerous inhibitor of protein phosphatase 2 a armadillo domain unveils highly conserved protein–protein interaction.... J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.09.010i K.M. Dahlström, T.A. Salminen / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 7

1 67 2 68 3 69 4 70 5 71 6 72 7 73 8 74 9 75 10 76 11 77 12 78 13 79 14 80 15 81 16 82 17 83 18 84 19 85 20 86 21 87 22 88 23 89 24 90 25 91 26 92 27 93 28 94 29 95 30 96 31 97 32 98 33 99 34 100 35 101 36 102 37 103 38 104 39 105 40 106 41 107 42 108 43 109 44 110 45 111 46 112 47 113 48 114 49 115 50 Fig. 5. Electrostatic surface and ConSurf distribution of conserved amino acids in the CIP2A-ArmRP model. (A) The central groove is positively charged (blue) with a small 116 51 patch of negative charge (red) on the bottom right side. (B) The back of the model shows interesting division with one half being positively charged and the other one 117 negatively charged. The top (C) and the bottom (D) show both positive and negative charge but they are clearly separated in their own areas. The electrostatic surfaces were 52 calculated with the APBS tool (Adaptive Poisson–Boltzmann Solver) in PyMOL and color ranges from 7 to 7. (E) The central groove consists of highly conserved amino acids 118 53 (pink), surrounded by amino acids with an average conservation rate (gray) or low conservation rate (green). (F) The back of the model shows high conservation rate for the 119 54 residues corresponding to the positively charged area, while the negatively charged area is not as well conserved. (G) On the top of the CIP2A-ArmRP model, the positively 120 55 charged area is again very conserved while other areas are more diverse. (H) The bottom of the structure indicates a high conservation rate for the area showing negative 121 charge, while the rest of the residues are of average or low conservation. (For interpretation of the references to color in this figure legend, the reader is referred to the web 56 122 version of this article.) 57 123 58 conserved (Fig. 5E), which corresponds well to the X-ray structures evaluated, it turned out that the protein binding function 124 59 125 of natural ArmRP proteins where this groove created by adjacent (GO:0005515) was a common feature. The GO-Score was 0.79 60 126 H3 α-helices in the ArmRP superhelix binds peptides of various (ranges between 0 and 1 and a higher value indicates a better 61 127 fi fi 62 lengths (Reichen et al., 2014; Varadamsetty et al., 2012). The con dence for prediction with a signi cance cutoff of 0.5 (Roy 128 – 63 function of CIP2A-ArmRP in protein protein interaction is further et al., 2010)), which suggests that the CIP2A-ArmRP domain would 129 64 underpinned by I-TASSER’s predicted GO terms for function, which be involved in protein binding, more specifically binding a part of a 130 65 based on the GO terms of the similar proteins found in PDB. When bigger protein in the central groove like other natural ArmRP 131 66 the consistency of the term amongst the top scoring proteins was proteins. Moreover, ConSurf indicates that the amino acids 132

Please cite this article as: Dahlström, K.M., Salminen, T.A., 3D model for cancerous inhibitor of protein phosphatase 2 a armadillo domain unveils highly conserved protein–protein interaction.... J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.09.010i 8 K.M. Dahlström, T.A. Salminen / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 forming the positively charged area on the back of the protein are of the APC-β-catenin complex (PDB 1JPP; Eklof Spink et al., 2001)for 67 2 highly conserved (compare Fig. 5B and F), while the rest of the further studies. Both the CIP2A-ArmRP and β-catenin have a posi- 68 3 amino acids are variable. Highly conserved regions can also be tively charged binding groove and the crystal structure of the APC-β- 69 4 found at the top of the structure in the area corresponding to the catenin complex shows that a polar ladder binds the backbone of the 70 5 positively charged patch (compare Fig. 5C and G), and at the APC protein fragment to the groove through salt bridges, hydrophobic 71 6 bottom of the structure where the electrostatic surface shows contacts and hydrogen bonds (PDB 1JPP; Eklof Spink et al., 2001). This 72 7 negative charge (compare Fig. 5D and H). The high degree of polar ladder is conserved in ArmRP proteins and essential for peptide 73 8 conservation coupled to a strongly charged region, indicates that binding (Andrade et al., 2001). When applying this knowledge to the 74 9 all these areas could make up important protein–protein interac- modeled CIP2A-ArmRP domain, we found five Gln (Gln82, Gln119, 75 10 tion sites on CIP2A-ArmRP, possibly even being involved in target Gln122, Gln125 and Gln311), five Asn (Asn130, Asn168, Asn173, 76 11 binding by increasing affinity and specificity, which has also been Asn218 and Asn264) and a His (His172) forming a straight polar 77 12 proposed for additional interaction sites on β-catenin (Choi et al., ladder down H3 of ArmRP motif 2 and 3 and continuing in a line 78 13 2006). In light of the previous results showing that CIP2A interacts along the bottom part of the modeled structure (Fig. 6). The multiple 79 14 with a range of proteins, for example PP2A, MYC, E2F1 (Junttila sequence alignment of CIP2A-ArmRP and homologous sequences 80 15 et al., 2007), AKT, DAPK1, and the mTORC1 complex (Puustinen confirms that all the residues forming the polar ladder (except for 81 16 and Jaattela, 2014), it can be speculated that these proteins could Gln311) are important, since Gln82, Asn130, Asn264 and His172 are 82 17 bind to the central groove of the CIP2A-ArmRP domain. all conserved, while Gln122, Gln125, Asn173, Asn218 are highly con- 83 18 served and Gln119 and Asn168 strictly conserved (Fig. 4). 84 19 3.4. CIP2A-ArmRP can form similar interactions as other ArmRP To analyze if other residues could also be involved in peptide 85 20 proteins binding, we superimposed the CIP2A-ArmRP model on the crystal 86 21 structure of the APC-β-catenin complex and analyzed the CIP2A- 87 22 Crystallized ArmRP proteins exhibit a highly conserved binding ArmRP residues within 4 Å of the peptide fragment. All the residues 88 23 mode (Reichen et al., 2014) where an extended peptide from the inthenearvicinitywerenexttoorexactlythesameastheresidues 89 24 interacting proteins is bound into the central groove of the ArmRP predicted by I-TASSER and also belong to the polar ladder (Table 1). 90 25 proteins through a combination of electrostatic and backbone inter- Hence, most of the amino acids predicted to have a function in 91 26 actions (PDB 3QHE, 4OIH, 3L6X/3L6Y, 2JDQ, 1JDH; Graham et al., peptide binding are conserved or conservatively substituted and four 92 27 2001; Ishiyama et al., 2010; Morishita et al., 2011; Roman et al., 2013; of them (Gln119, Tyr129, Asn168 and Leu169) are even strictly con- 93 28 Tarendeau et al., 2007). The bound peptides run antiparallel to the served. They include both charged and polar residues, which further 94 29 ArmRP motifs and the hydrogen bonds between the conserved Asn supports the conclusion that CIP2A would bind peptides with a 95 30 residues in the H3 α-helices of ArmRP and the peptide backbone keep typical ArmRP binding mode. Furthermore, five of these residues 96 31 the peptide in an extended conformation (Andrade et al., 2001; Conti (Gln122, Met160, Pro161, Asn168 and Arg171) coincide with the 97 32 et al., 1998), while other residues in the binding groove confer spe- functional residues predicted by MetaPocket2.0. 98 33 cificity by interacting with the side chains of the bound peptide It has been established that the peptides binding to ArmRP pro- 99 34 (Reichen et al., 2014). Based on I-TASSER’s suggestions and compar- teins have specific motifs, which are dependent on the charge of the 100 35 ison with the crystallized proteins with a similar binding site as the concavegroove(Reichen et al., 2014). β-catenin with its positively 101 36 modeled protein, CIP2A-ArmRP would bind peptides similarly as charged groove interacts with peptides having a conserved Asp-X- 102

37 mouse importin subunit α-2 interacts with human androgen receptor Hp-Hp-X-Ar-X2-7-Glu motif, where X is any amino acid, Hp is a 103 38 fragment (PDB 3BTR; Cutress et al., 2008)andmouseβ-catenin with hydrophobic amino acid and Ar is an aromatic amino acid. The 104 39 human adenomatous polyposis coli (APC) protein fragment (PDB conserved Asp and Glu make strong ionic interactions with two Lys 105 40 1JPP; Eklof Spink et al., 2001). The BS-Scores for these are 0.51 and residues in the β-catenin binding groove (Graham et al., 2000; Sun 106 41 0.90, respectively (BS-Score40.50 indicates a similar binding site and Weis, 2011; Xu and Kimelman, 2007). Similarly, positively 107 42 (Roy et al., 2010)). With these crystal structures as a base, I-TASSER charged residues can also be found in the binding groove of CIP2A- 108 43 lists the possible peptide-binding amino acids in the CIP2A-ArmRP ArmRP (Fig. 6): Lys252 is conserved, Arg171 and Lys263 are highly 109 44 model, which are the same regardless of whether the model is based conserved, while Lys126 can be substituted by Arg but the positively 110 45 on importin α or β-catenin (Fig. 4). Due to the high confidence of a charged property of the residue at this position is anyway strictly 111 46 similar binding site (BS-Score 0.90), we, however, chose the structure conserved (Fig. 4). Hence, the positively charged central groove of 112 47 113 48 114 49 115 50 116 51 117 52 118 53 119 54 120 55 121 56 122 57 123 58 124 59 125 60 126 61 127 62 128 63 Fig. 6. Conserved polar ladder in the CIP2A-ArmRP domain. Several conserved Gln and Asn residues form a polar ladder (pink sticks) in the CIP2A-ArmRP domain. The ladder 129 runs straight down H3 in ArmRP two and three (green cylinders) and continues along the bottom part of the protein. Additionally, the central groove has conserved Arg and 64 130 Lys residues (cyan sticks), which could be essential for peptide binding, similarly to β-catenin. Strictly or highly conserved residues are marked in bold, while the strictly 65 conserved positive charge at position 126 is shown in italics. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of 131 66 this article.) 132

Please cite this article as: Dahlström, K.M., Salminen, T.A., 3D model for cancerous inhibitor of protein phosphatase 2 a armadillo domain unveils highly conserved protein–protein interaction.... J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.09.010i K.M. Dahlström, T.A. Salminen / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 9

1 both CIP2A-ArmRP and β-catenin, coupled to the ability of the amino Ras and AKT did not contain such a motif, but instead we found it in 67 2 acids to make the same type of interactions, indicate that the inter- the 65 kDa scaffolding protein (known as the A or PR65 subunit) of 68 3 action mode between CIP2A-ArmRP and target peptides is highly PP2A. This scaffolding subunit exists as α and β isoforms in mam- 69 4 similar to that seen in peptide-β-catenin complexes. Furthermore, it malian cells, with the α isoform as more predominant (Xing et al., 70 5 seems likely that the peptide interacting with the central groove of 2006), but both of them contain the conserved motif. We also found 71 6 CIP2A-ArmRP has the same consensus motif with a conserved Asp this motif in E2F1, DAPK1 and in the DPTOR and RPTOR subunits of 72 7 and Glu as the peptides interacting with other ArmRP proteins, the mTORC1 complex. The mTOR subunit also contained two motifs, 73 8 which have a positively charged groove. It has previously been in which an Asp replaces the last conserved Glu in the motif. How- 74 9 reported that PP2A, MYC (Junttila et al., 2007), E2F1, the mTORC1 ever, this replacement still maintains the negatively charged prop- 75 10 complex (De et al., 2014), AKT, DAPK1 (Puustinen and Jaattela, 2014) erty, which means we cannot rule out that the mTOR subunit could 76 11 and H-Ras (Wu et al., 2015) are expected to interact with CIP2A to interact with the central groove of CIP2A-ArmRP. Hence, it is likely 77 12 some extent, which led us to search for the conserved consensus that all of these are target peptides, which bind to the positively 78 13 motif in the human sequences for these proteins (Table 2). MYC, H- charged groove of the CIP2A-ArmRP domain when CIP2A interacts 79 14 with the motif-containing proteins. The fact that MYC, H-Ras and 80 15 Table 1 AKTlackedtheconservedmotifindicatesthatthesewouldbind 81 16 Conservation rate of possible ligand binding residues. somewhere else than in the central groove when interacting with 82 17 CIP2A, for example to the other highly conserved, charged regions on 83 Conservation rate of possible ligand binding residues 18 the CIP2A-ArmRP domain (Fig. 5). 84 19 Strictly conserved Highly conserved Conserved Variable Taken together, the results from MetaPocket 2.0, I-TASSER, the 85 20 structural superimposition with β-catenin, and the sequence 86 a,c b 21 Gln119 Leu79 Glu157 Ser81 Val85 analysis of possible CIP2A interaction partners strongly suggest 87 Tyr129 Asp91 Pro161 Gln82c 22 that the central groove is likely to be involved in protein–protein 88 Asn168a,c Leu118 Leu165 Arg90 23 Leu169a Gln122b,c Arg171 Met160 interactions by binding an extended peptide fragment from the 89 24 Leu123 His172c Thr178 other protein. This is further supported by the fact that also CIP2A- 90 25 Lys126 Asn173c ArmRP has the conserved polar ladder, which has been proven to 91 26 Leu127 Leu215 be essential for proper peptide binding in natural ArmRPs 92 27 93 a Highly conserved in the ConSurf alignment. (Reichen et al., 2014) and, in agreement with this, the conservation 28 b Conserved in the ConSurf alignment. rate of possible peptide binding amino acids in CIP2A-ArmRP is 94 29 c Residues in the polar ladder. high, which further proves the importance of this central, posi- 95 30 tively charged groove. Moreover, the possible CIP2A interaction 96 31 97 Table 2 partners show the conserved motif required for binding to ArmRP 32 Conserved motif in CIP2A interaction partners. proteins with a positively charged groove. 98 33 99 a 34 Conserved Asp-X-Hp-Hp-X-Ar-X2-7-Glu motif in CIP2A interaction partners 3.5. Arg229Gln mutation changes the electrostatic properties of the 100 35 101 Protein Motif CIP2A-ArmRP domain 36 102 37 MYC No motif On the back of the CIP2A-ArmRP model and opposite to the 103 38 H-Ras No motif central groove, there is an area with a strong positive charge (Fig. 5B), 104 AKT No motif 39 largelyduetoresidueArg229.Thepositivechargeofthisspecific 105 PP2A – PR65 (α isoform) 282Asp-Leu-Val-Pro-Ala-Phe-Gln-Asn-Leu-Met- 40 106 Lys-Asp-Cys-Glu295 position is conserved in the multiple sequence alignment produced 41 E2F1 381Asp-Ser-Leu-Leu-Glu-His-Val-Arg-Glu389 with BODIL (Fig. 4)andalsoaccordingtoConSurf(Fig. 5F). Li et al., 107 42 DAPK1 161Asp-Phe-Gly-Leu-Ala-His-Lys-Ile-Asp-Phe-Gly- 2012 reported that an Arg to Gln mutation in this position of the 108 43 Asn-Glu173 CIP2A-ArmRP domain alters the CIP2A expression, which together 109 44 mTORC1-DPTOR 71Asp-Trp-Leu-Ile-Glu-His-Lys-Glu-Ala-Ser-Asp- 110 Arg-Glu83 with other factors make the Chinese Han population more prone to 45 mTORC1-RPTOR 1044Asp-Ser-Ile-Cys-Phe-Trp-Asp-Trp-Glu1052/ develop liver cancer. This prompted us to investigate the effect of the 111 46 Glu1055 Arg229Glnmutationontheelectrostatic surface potential of the 112 47 mTORC1-mTOR 1458Asp-Ala-Leu-Val-Ala-Tyr-Asp-Lys-Lys-Met- CIP2A-ArmRP domain. Based on the results, it is clearly visible that 113 48 Asp1468 114 2424Asp-Pro-Leu-Leu-Asn-Trp-Arg-Leu-Met- the Arg229Gln mutant has a slightly different surface shape and a 49 115 Asp2433 significant decrease in the positively charged area compared to the 50 wild type CIP2A-ArmRP. The wild-type has a positively charged area 116 a 51 Hp designates hydrophobic amino acids; Ar is an aromatic amino acid. around Arg229 with a small inward protruding cavity next to Arg229 117 52 118 53 119 54 120 55 121 56 122 57 123 58 124 59 125 60 126 61 127 62 128 63 129 Fig. 7. Effect of the Arg229Gln mutation on the electrostatic surface. (A) The wild type protein shows a small inward protruding cavity next to Arg229, which confers a strong 64 130 positive charge to this area. (B) Due to the Arg229Gln mutation, the inward protruding cavity increases and the positive charge is not as strong as in the wild type and 65 spreads over a much smaller area. The electrostatic surfaces were calculated with the APBS tool (Adaptive Poisson–Boltzmann Solver) in PyMOL and color ranges from 7to 131 66 7. 132

Please cite this article as: Dahlström, K.M., Salminen, T.A., 3D model for cancerous inhibitor of protein phosphatase 2 a armadillo domain unveils highly conserved protein–protein interaction.... J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.09.010i 10 K.M. Dahlström, T.A. Salminen / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 (Fig. 7A) whereas the corresponding area is considerably less charged (both at the Structural Bioinformatics Laboratory, Faculty of Science 67 2 in the Arg229Gln mutant and the inward protruding cavity is bigger and Engineering, Åbo Akademi University). We also want to thank 68 3 (Fig. 7B). Coupling these results to the conclusions drawn by Li et al., Prof. Jukka Westermarck (Turku Center for Biotechnology) for intro- 69 4 2012 suggests that the positive charge and shape of this surface area ducing us to this interesting protein. Use of Biocenter Finland infra- 70 5 are important for the expression and function of CIP2A, and this area structure at Åbo Akademi (bioinformatics, structural biology and 71 6 is possibly involved in the protein–protein complexes formed translational activities is acknowledged, along with the National 72 7 between CIP2A and other proteins, for example other oncoproteins. Doctoral Program in Informational and Structural Biology. 73 8 74 9 75 10 4. Conclusions 76 11 Appendix A. Supplementary material 77 12 We have demonstrated that CIP2A exhibits an ArmRP domain, 78 13 which folds into the typical, all α-helical ArmRP fold with a cen- Supplementary data associated with this article can be found in 79 14 tral, positively charged groove. By coupling the information on the online version at http://dx.doi.org/10.1016/j.jtbi.2015.09.010. 80 15 natural ArmRP proteins to the 3D-model of CIP2A-ArmRP, its 81 16 predicted binding pocket and function, we show that the con- 82 17 served amino acids in the central binding groove are required for 83 References 18 peptide-binding, and that they can form interactions typical for 84 19 the natural ArmRP-peptide complexes. Hence, CIP2A-ArmRP fol- 85 Andrade, M.A., Petosa, C., O'Donoghue, S.I., Muller, C.W., Bork, P., 2001. Comparison 20 lows both the folding and the interaction pattern seen among the of ARM and HEAT protein repeats. J. Mol. Biol. 309 (1), 1–18. 86 21 known ArmRP fold proteins. Furthermore, we were able to couple Ashkenazy, H., Erez, E., Martz, E., Pupko, T., Ben-Tal, N., 2010. ConSurf 2010: Cal- 87 22 our model to existing data on interaction partners and show that culating evolutionary conservation in sequence and structure of proteins and 88 nucleic acids. Nucleic Acids Res. 38, W529–W533, Web Server issue. 23 the 65 kDa scaffolding subunit of PP2A, E2F1, DAPK1 and the Benkert, P., Kunzli, M., Schwede, T., 2009. QMEAN server for protein model quality 89 24 DPTOR and RPTOR subunits of the mTORC1 complex all contain estimation. Nucleic Acids Res. 37, W510–W514, Web Server issue. 90 25 the Asp-X-Hp-Hp-X-Ar-X2-7-Glu motif (X – any amino acid, Hp – Berardi, M.J., Shih, W.M., Harrison, S.C., Chou, J.J., 2011. Mitochondrial uncoupling 91 26 hydrophobic amino acid, Ar – aromatic amino acid) proven to bind protein 2 structure determined by NMR molecular fragment searching. Nature 92 476 (7358), 109–113. 27 into positively charged grooves of ArmRP proteins, while MYC, Berendsen, H., Postma, J., van Gunsteren, W., DiNola, A., Haak, J., 1984. Molecular 93 28 AKT and H-Ras possibly bind another part of CIP2A. Most of the dynamics with coupling to an external bath. J. Chem. Phys. 81, 3684. 94 29 studied amino acids in the positively charged groove of CIP2A- Bockelman, C., Lassus, H., Hemmes, A., Leminen, A., Westermarck, J., Haglund, C., 95 et al., 2011. Prognostic role of CIP2A expression in serous ovarian cancer. Br. J. 30 ArmRP are conserved but, overall, four highly conserved residues Cancer 105 (7), 989–995. 96 31 stood out by appearing in multiple analyses: Gln122 and Asn168 Buenavista, M.T., Roche, D.B., McGuffin, L.J., 2012. Improvement of 3D protein 97 32 are involved in ligand binding and formation of the polar ladder, models using multiple templates guided by single-template model quality 98 assessment. Bioinformatics 28 (14), 1851–1857. 33 while Gln125 belongs to the polar ladder and Arg171 mediates Case, D.A., Darden, T.A., Cheatham, T.E., Simmerling, C.L., Wang, J., Duke, R.E., et al., 99 34 ligand binding and also maintains the positive charge in the cen- 2012. AMBER12. University of California, San Fransisco. 100 35 tral groove, which is essential for binding the consensus motif in Celniker, G., 2013. ConSurf: Using evolutionary data to raise testable hypotheses 101 about protein function. Isr. J. Chem. 53 (3), 199–206. 36 interaction partners. Additionally, we were able to show that the Chen, W., Feng, P.M., Lin, H., Chou, K.C., 2013. iRSpot-PseDNC: Identify recombi- 102 37 Arg229Gln mutation, which plays a key role for liver cancer in the nation spots with pseudo dinucleotide composition. Nucleic Acids Res. 41 (6), 103 38 Chinese Han population, does not affect the central binding groove e68. 104 Choi, H.J., Huber, A.H., Weis, W.I., 2006. Thermodynamics of beta-catenin-ligand 39 of CIP2A, but rather the charge and the surface properties on the interactions: The roles of the N- and C-terminal tails in modulating binding 105 40 opposite side of the groove and, therefore, possibly affect protein– affinity. J. Biol. Chem. 281 (2), 1027–1038. 106 41 protein interactions in this area. Chou, K.C., 1988. Low-frequency collective motion in biomacromolecules and its 107 – 42 As demonstrated in a series of recent publications (see, e.g. biological functions. Biophys. Chem. 30 (1), 3 48. 108 Chou, K.C., 2004a. Molecular therapeutic target for type-2 diabetes. J. Proteome Res. 43 Chen et al., 2013; Jia et al., 2015; Lin et al., 2014; Liu et al., 2015; Xu 3 (6), 1284–1288. 109 44 et al., 2013) and emphasized in a recent review (Chou, 2015), user- Chou, K.C., 2004b. Structural bioinformatics and its impact to biomedical science. 110 – 45 friendly and publicly accessible web-servers represent the future Curr. Med. Chem. 11 (16), 2105 2134. 111 Chou, K.C., 2005. Coupling interaction between thromboxane A2 receptor and 46 direction for developing practically more useful models, simulated alpha-13 subunit of guanine nucleotide-binding protein. J. Proteome Res. 4 (5), 112 47 methods, predictors, or demonstrating new and novel structures, 1681–1686. 113 48 we shall make efforts in our future work to provide a web-server Chou, K.C., 2015. Impacts of bioinformatics to medicinal chemistry. Med. Chem. 11 114 – fi (3), 218 234. 49 for the ndings presented in this paper. Next steps in the struc- Chou, K.C., Mao, B., 1988. Collective motion in DNA and its role in drug intercala- 115 50 tural studies of CIP2A would be to crystallize CIP2A-ArmRP in tion. Biopolymers 27 (11), 1795–1815. 116 51 complex with a possible interaction partner to deduce the exact Chou, K.C., Watenpaugh, K.D., Heinrikson, R.L., 1999. A model of the complex 117 between cyclin-dependent kinase 5 and the activation domain of neuronal 52 peptide-binding site and molecular interactions. We believe that Cdk5 activator. Biochem. Biophys. Res. Commun. 259 (2), 420–428. 118 53 this study can immensely aid the design and production of a sui- Chou, K.C., Zhang, C.T., Maggiora, G.M., 1994. Solitary wave dynamics as a 119 54 table CIP2A construct for X-ray structure determination. Addi- mechanism for explaining the internal motion during microtubule growth. 120 Biopolymers 34 (1), 143–153. 55 tionally, the designed construct can be used for mutation studies Coates, J.C., 2003. Armadillo repeat proteins: Beyond the animal kingdom. Trends 121 56 to conclude exact interaction sites and partners and the results, Cell Biol. 13 (9), 463–471. 122 57 together with this 3D model, can be used for docking studies. Cole, C., Barber, J.D., Barton, G.J., 2008. The jpred 3 secondary structure prediction 123 – 58 Through unveiling of the exact binding details, possible ther- server. Nucleic Acids Res. 36, W197 W201, Web Server issue. 124 Colovos, C., Yeates, T.O., 1993. Verification of protein structures: Patterns of non- 59 apeutics or anti-cancer medicine development with CIP2A as pri- bonded atomic interactions. Protein Sci. 2 (9), 1511–1519. 125 60 mary target can take a great leap forward. Come, C., Laine, A., Chanrion, M., Edgren, H., Mattila, E., Liu, X., et al., 2009. CIP2A is 126 61 associated with human breast cancer aggressivity. Clin. Cancer Res. 15 (16), 127 5092–5100. 62 Conti, E., Uy, M., Leighton, L., Blobel, G., Kuriyan, J., 1998. Crystallographic analysis 128 63 Acknowledgments of the recognition of a nuclear localization signal by the nuclear import factor 129 64 karyopherin alpha. Cell 94 (2), 193–204. 130 Cutress, M.L., Whitaker, H.C., Mills, I.G., Stewart, M., Neal, D.E., 2008. Structural 65 We want to thank professor MarkSJohnsonfortheexcellent basis for the nuclear import of the human androgen receptor. J. Cell. Sci. 121 (7), 131 66 computing facilities and Dr. Outi Salo-Ahen for technical assistance 957–968. 132

Please cite this article as: Dahlström, K.M., Salminen, T.A., 3D model for cancerous inhibitor of protein phosphatase 2 a armadillo domain unveils highly conserved protein–protein interaction.... J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.09.010i K.M. Dahlström, T.A. Salminen / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 11

1 De, P., Carlson, J., Leyland-Jones, B., Dey, N., 2014. Oncogenic nexus of cancerous McGuffin, L.J., Buenavista, M.T., Roche, D.B., 2013. The ModFOLD4 server for the 67 2 inhibitor of protein phosphatase 2A (CIP2A): An oncoprotein with many hands. quality assessment of 3D protein models. Nucleic Acids Res. 41, W368–W372, 68 Oncotarget 5 (13), 4581–4602. Web Server issue. 3 Dong, Q.Z., Wang, Y., Dong, X.J., Li, Z.X., Tang, Z.P., Cui, Q.Z., et al., 2011. CIP2A is McGuffin, L.J., Roche, D.B., 2010. Rapid model quality assessment for protein 69 4 overexpressed in non-small cell lung cancer and correlates with poor prog- structure predictions using the comparison of multiple models without struc- 70 5 nosis. Ann. Surg. Oncol. 18 (3), 857–865. tural alignments. Bioinformatics 26 (2), 182–188. 71 Duan, Y., Wu, C., Chowdhury, S., Lee, M., Xiong, G., Zhang, W., et al., 2003. A point- Morishita, E.C., Murayama, K., Kato-Murayama, M., Ishizuka-Katsura, Y., Tomabechi, 72 6 charge force field for molecular mechanics simulations of proteins based on Y., Hayashi, T., et al., 2011. Crystal structures of the armadillo repeat domain of 7 condensed-phase quantum mehcanical calculations. J. Comput. Chem. 24, 1999. adenomatous polyposis coli and its complex with the tyrosine-rich domain of 73 8 Eisenberg, D., Luthy, R., Bowie, J.U., 1997. VERIFY3D: Assessment of protein models Sam68. Structure 19 (10), 1496–1508. 74 fi – 9 with three-dimensional pro les. Methods Enzymol. 277, 396 404. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C., 1995. SCOP: A structural clas- 75 Eklof Spink, K., Fridman, S.G., Weis, W.I., 2001. Molecular mechanisms of beta- sification of proteins database for the investigation of sequences and structures. 10 catenin recognition by adenomatous polyposis coli revealed by the structure of J. Mol. Biol. 247 (4), 536–540. 76 11 an APC-beta-catenin complex. EMBO J. 20 (22), 6203–6212. OuYang, B., Xie, S., Berardi, M.J., Zhao, X., Dev, J., Yu, W., et al., 2013. Unusual 77 12 Essmann, U., Perera, L., Berkowitz, M., Darden, T., Lee, H., Pedersen, L., 1995. A architecture of the p7 channel from hepatitis C virus. Nature 498 (7455), 78 smooth particle mesh ewald method. J. Chem. Phys. 103, 8577. 521–525. 13 Glaser, F., Pupko, T., Paz, I., Bell, R.E., Bechor-Shental, D., Martz, E., et al., 2003. Parmeggiani, F., Huang, P.S., Vorobiev, S., Xiao, R., Park, K., Caprari, S., et al., 2014. A 79 14 ConSurf: Identification of functional regions in proteins by surface-mapping of general computational approach for repeat protein design. J. Mol. Biol. Q3 80 15 phylogenetic information. Bioinformatics 19 (1), 163–164. Pollastri, G., McLysaght, A., 2005. Porter: A new, accurate server for protein sec- 81 Gouet, P., Courcelle, E., Stuart, D.I., Metoz, F., 1999. ESPript: Analysis of multiple ondary structure prediction. Bioinformatics, 21; , pp. 1719–1720. 16 sequence alignments in PostScript. Bioinformatics 15 (4), 305–308. Puustinen, P., Jaattela, M., 2014. KIAA1524/CIP2A promotes cancer growth by 82 17 Graham, T.A., Ferkey, D.M., Mao, F., Kimelman, D., Xu, W., 2001. Tcf4 can specifically coordinating the activities of MTORC1 and MYC. Autophagy 10 (7), 1352–1354. 83 18 recognize beta-catenin using alternative conformations. Nat. Struct. Biol. 8 (12), G.P.S. Raghava, APSSP2: A combination method for protein secondary structure 84 1048–1052. prediction based on neural network and example based learning CASP5, 2002 19 Graham, T.A., Weaver, C., Mao, F., Kimelman, D., Xu, W., 2000. Crystal structure of a A-132. 85 20 beta-catenin/tcf complex. Cell 103 (6), 885–896. Reichen, C., Hansen, S., Pluckthun, A., 2014. Modular peptide binding: From a 86 21 He, H., Wu, G., Li, W., Cao, Y., Liu, Y., 2012. CIP2A is highly expressed in hepato- comparison of natural binders to designed armadillo repeat proteins. J. Struct. 87 cellular carcinoma and predicts poor prognosis. Diagn. Mol. Pathol. 21 (3), Biol. 185 (2), 147–162. 88 22 143–149. Remmert, M., Biegert, A., Hauser, A., Soding, J., 2011. HHblits: Lightning-fast itera- 23 Huang, B., 2009. MetaPocket: A meta approach to improve protein ligand binding tive protein sequence searching by HMM-HMM alignment. Nat. Methods 9 (2), 89 24 site prediction. OMICS 13 (4), 325–330. 173–175. 90 25 Huang, R.B., Du, Q.S., Wang, C.H., Chou, K.C., 2008. An in-depth analysis of the Roman, N., Christie, M., Swarbrick, C.M., Kobe, B., Forwood, J.K., 2013. Structural 91 biological functional studies based on the NMR M2 channel structure of characterisation of the nuclear import receptor importin alpha in complex with 26 influenza A virus. Biochem. Biophys. Res. Commun. 377 (4), 1243–1247. the bipartite NLS of Prp20. PLoS One 8 (12), e82038. 92 27 Humphrey, W., Dalke, A., Schulten, K., 1996. VMD: Visual molecular dynamics. J. Roy, A., Kucukural, A., Zhang, Y., 2010. I-TASSER: A unified platform for automated 93 – – 28 Mol. Graph. 14 (1), 33 38 27-8. protein structure and function prediction. Nat. Protoc. 5 (4), 725 738. 94 Ishiyama, N., Lee, S.H., Liu, S., Li, G.Y., Smith, M.J., Reichardt, L.F., et al., 2010. Roy, A., Yang, J., Zhang, Y., 2012. COFACTOR: An accurate comparative algorithm for 29 Dynamic and static interactions between p120 catenin and E-cadherin regulate structure-based protein function annotation. Nucleic Acids Res. 40, 95 30 the stability of cell-cell adhesion. Cell 141 (1), 117–128. W471–W477, Web Server issue. 96 31 Jia, J., Liu, Z., Xiao, X., Liu, B., Chou, K.C., 2015. iPPI-esml: An ensemble classifier for Ryckaert, J., Ciccotti, G., Berendsen, H., 1977. Numerical integration of the cartesian 97 identifying the interactions of proteins by incorporating their physicochemical equations of motion of a system with constraints: Molecular dynamics of n- 32 properties and wavelet transforms into PseAAC. J. Theor. Biol. 377, 47–56. alkanes. J. Comput. Phys. 23, 327. 98 33 Johnson, M.S., Lehtonen, J.V., 2000. Comparison of protein three-dimensional Schultz, J., Milpetz, F., Bork, P., Ponting, C.P., 1998. SMART, a simple modular 99 34 structures. In: Higgins, D., Taylor, W. (Eds.), Bioinformatics: Sequence, struc- architecture research tool: Identification of signaling domains. Proc. Natl. Acad. 100 ture and databanks. Oxford University Press, Oxford, UK, p. 15. Sci. U.S.A 95 (11), 5857–5864. 35 Jones, D.T., 1999. Protein secondary structure prediction based on position-specific Sippl, M.J., 1993. Recognition of errors in three-dimensional structures of proteins. 101 36 scoring matrices. J. Mol. Biol. 292 (2), 195–202. Proteins 17 (4), 355–362. 102 37 Jorgensen, W., Chandrasekhar, J., Madura, J., Impey, R., Klein, M., 1983. Comparison Soding, J., 2005. Protein homology detection by HMM-HMM comparison. Bioin- 103 of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926. formatics 21 (7), 951–960. 38 Junttila, M.R., Puustinen, P., Niemela, M., Ahola, R., Arnold, H., Bottzauw, T., et al., Soding, J., Biegert, A., Lupas, A.N., 2005. The HHpred interactive server for protein 104 39 2007. CIP2A inhibits PP2A in human malignancies. Cell 130 (1), 51–62. homology detection and structure prediction. Nucleic Acids Res. 33, 105 40 Kelley, L.A., Sternberg, M.J., 2009. Protein structure prediction on the web: A case W244–W248, Web Server issue. 106 – 41 study using the phyre server. Nat. Protoc. 4 (3), 363 371. Soo Hoo, L., Zhang, J.Y., Chan, E.K., 2002. Cloning and characterization of a novel 90 107 Khanna, A., Bockelman, C., Hemmes, A., Junttila, M.R., Wiksten, J.P., Lundin, M., kDa'companion' auto-antigen of p62 overexpressed in cancer. Oncogene 21 42 et al., 2009. MYC-dependent regulation and prognostic role of CIP2A in gastric (32), 5006–5015. 108 43 cancer. J. Natl. Cancer Inst. 101 (11), 793–805. Sun, J., Weis, W.I., 2011. Biochemical and structural characterization of beta-catenin 109 44 Khanna, A., Okkeri, J., Bilgen, T., Tiirikka, T., Vihinen, M., Visakorpi, T., et al., 2011. interactions with nonphosphorylated and CK2-phosphorylated lef-1. J. Mol. 110 ETS1 mediates MEK1/2-dependent overexpression of cancerous inhibitor of Biol. 405 (2), 519–530. 45 protein phosphatase 2A (CIP2A) in human cancer cells. PLoS One 6 (3), e17979. Tarendeau, F., Boudet, J., Guilligay, D., Mas, P.J., Bougault, C.M., Boulo, S., et al., 2007. 111 46 Khanna, A., Pimanda, J.E., Westermarck, J., 2013. Cancerous inhibitor of protein Structure and nuclear import function of the C-terminal domain of influenza 112 47 phosphatase 2A, an emerging human oncoprotein and a potential cancer virus polymerase PB2 subunit. Nat. Struct. Mol. Biol. 14 (3), 229–233. 113 therapy target. Cancer Res. 73 (22), 6548–6553. Varadamsetty, G., Tremmel, D., Hansen, S., Parmeggiani, F., Pluckthun, A., 2012. 48 Landau, M., Mayrose, I., Rosenberg, Y., Glaser, F., Martz, E., Pupko, T., et al., 2005. Designed armadillo repeat proteins: Library generation, characterization and 114 49 ConSurf 2005: The projection of evolutionary conservation scores of residues selection of peptide binders with high specificity. J. Mol. Biol. 424 (1–2), 68–87. 115 50 on protein structures. Nucleic Acids Res. 33, W299–W302, Web Server issue. Wallner, B., Elofsson, A., 2003. Can correct protein models be identified? Protein Sci. 116 Laskowski, R.A., Moss, D.S., Thornton, J.M., 1993. Main-chain bond lengths and bond 12 (5), 1073–1086. 51 angles in protein structures. J. Mol. Biol. 231 (4), 1049–1067. Wang, J.F., Chou, K.C., 2009a. Insight into the molecular switch mechanism of 117 52 Lehtonen, J.V., Still, D.J., Rantanen, V.V., Ekholm, J., Bjorklund, D., Iftikhar, Z., et al., human Rab5a from molecular dynamics simulations. Biochem. Biophys. Res. 118 53 2004. BODIL: A molecular modeling environment for structure-function ana- Commun. 390 (3), 608–612. 119 lysis and drug design. J. Comput. Aided Mol. Des. 18 (6), 401–419. Wang, J.F., Gong, K., Wei, D.Q., Li, Y.X., Chou, K.C., 2009b. Molecular dynamics stu- 54 Letunic, I., Doerks, T., Bork, P., 2012. SMART 7: Recent updates to the protein domain dies on the interactions of PTP1B with inhibitors: From the first phosphate- 120 55 annotation resource. Nucleic Acids Res. 40, D302–D305, Database issue. binding site to the second one. Protein Eng. Des. Sel. 22 (6), 349–355. 121 56 Li, X.B., Wang, S.Q., Xu, W.R., Wang, R.L., Chou, K.C., 2011. Novel inhibitor design for Wang, J.F., Wei, D.Q., Li, L., Zheng, S.Y., Li, Y.X., Chou, K.C., 2007a. 3D structure 122 fl 57 hemagglutinin against H1N1 in uenza virus by core hopping method. PLoS One modeling of cytochrome P450 2C19 and its implication for personalized drug 123 6 (11), e28111. design. Biochem. Biophys. Res. Commun. 355 (2), 513–519. 58 Li, Y., Wang, K., Dai, L., Wang, P., Song, C., Shi, J., et al., 2012. HapMap-based study of Wang, S.Q., Du, Q.S., Chou, K.C., 2007b. Study of drug resistance of chicken influenza 124 59 CIP2A gene polymorphisms and HCC susceptibility. Oncol. Lett. 4 (2), 358–364. A virus (H5N1) from homology-modeled 3D structures of neuraminidases. 125 – 60 Lin, H., Deng, E.Z., Ding, H., Chen, W., Chou, K.C., 2014. iPro54-PseKNC: A sequence- Biochem. Biophys. Res. Commun. 354 (3), 634 640. 126 based predictor for identifying sigma-54 promoters in prokaryote with pseudo Wang, S.Q., Du, Q.S., Huang, R.B., Zhang, D.W., Chou, K.C., 2009c. Insights from 61 k-tuple nucleotide composition. Nucleic Acids Res. 42 (21), 12961–12972. investigating the interaction of oseltamivir (tamiflu) with neuraminidase of the 127 62 Liu, B., Liu, F., Wang, X., Chen, J., Fang, L., Chou, K.C., 2015. Pse-in-one: A web server 2009 H1N1 swine flu virus. Biochem. Biophys. Res. Commun. 386 (3), 432–436. 128 63 for generating various modes of pseudo components of DNA, RNA, and protein Wiederstein, M., Sippl, M.J., 2007. ProSA-web: Interactive web service for the 129 sequences. Nucleic Acids Res. 43 (W1), W65–W71. recognition of errors in three-dimensional structures of proteins. Nucleic Acids 64 Lopez, G., Rojas, A., Tress, M., Valencia, A., 2007. Assessment of predictions sub- Res. 35, W407–W410, Web Server issue. 130 65 mitted for the CASP7 function prediction category. Proteins 69 (8), 165–174. Wu, S., Zhang, Y., 2007. LOMETS: A local meta-threading-server for protein struc- 131 66 ture prediction. Nucleic Acids Res., 35; , pp. 3375–3382. 132

Please cite this article as: Dahlström, K.M., Salminen, T.A., 3D model for cancerous inhibitor of protein phosphatase 2 a armadillo domain unveils highly conserved protein–protein interaction.... J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.09.010i 12 K.M. Dahlström, T.A. Salminen / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 Wu, Y., Gu, T.T., Zheng, P.S., 2015. CIP2A cooperates with H-ras to promote Xue, Y., Wu, G., Wang, X., Zou, X., Zhang, G., Xiao, R., et al., 2013. CIP2A is a predictor 67 2 epithelial-mesenchymal transition in cervical-cancer progression. Cancer Lett. of survival and a novel therapeutic target in bladder urothelial cell carcinoma. 68 – Med. Oncol. 30 (1) 406-012-0406-6. Epub 2012 Dec 30. 3 356 (2 Pt B), 646 655. 69 Xing, Y., Xu, Y., Chen, Y., Jeffrey, P.D., Chao, Y., Lin, Z., et al., 2006. Structure of protein Zhang, Y., 2008. I-TASSER server for protein 3D structure prediction. BMC. Bioin- Q4 4 phosphatase 2A core enzyme bound to tumor-inducing toxins. Cell 127 (2), formatics 9 40-2105-9-40. 70 5 341–353. Zhang, Y., 2014. Interplay of I-TASSER and QUARK for template-based and ab initio 71 protein structure prediction in CASP10. Proteins 82 (Suppl 2), 175–187. 6 Xu, W., Kimelman, D., 2007. Mechanistic insights from structural studies of beta- 72 catenin and its binding partners. J. Cell. Sci. 120 (Pt 19), 3337–3344. Zhang, Z., Li, Y., Lin, B., Schroeder, M., Huang, B., 2011. Identification of cavities on 7 Xu, Y., Shao, X.J., Wu, L.Y., Deng, N.Y., Chou, K.C., 2013. iSNO-AAPair: Incorporating protein surface using multiple computational approaches for drug binding site 73 – 8 amino acid pairwise coupling into PseAAC for predicting cysteine S- prediction. Bioinformatics 27 (15), 2083 2088. 74 9 nitrosylation sites in proteins. Peer J. 1, e171. 75 10 76 11 77 12 78 13 79 14 80 15 81 16 82 17 83 18 84 19 85 20 86 21 87 22 88 23 89 24 90 25 91 26 92 27 93 28 94 29 95 30 96 31 97 32 98 33 99 34 100 35 101 36 102 37 103 38 104 39 105 40 106 41 107 42 108 43 109 44 110 45 111 46 112 47 113 48 114 49 115 50 116 51 117 52 118 53 119 54 120 55 121 56 122 57 123 58 124 59 125 60 126 61 127 62 128 63 129 64 130 65 131 66 132

Please cite this article as: Dahlström, K.M., Salminen, T.A., 3D model for cancerous inhibitor of protein phosphatase 2 a armadillo domain unveils highly conserved protein–protein interaction.... J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.09.010i