Pages 1–14 1ki1 Evolutionary trace report by report maker June 20, 2010

4 Notes on using trace results 12 4.1 Coverage 12 4.2 Known substitutions 12 4.3 Surface 12 4.4 Number of contacts 12 4.5 Annotation 12 4.6 Mutation suggestions 12

5 Appendix 12 5.1 File formats 12 5.2 Color schemes used 12 5.3 Credits 12 5.3.1 Alistat 12 5.3.2 CE 13 5.3.3 DSSP 13 5.3.4 HSSP 13 5.3.5 LaTex 13 5.3.6 Muscle 13 5.3.7 Pymol 13 5.4 Note about ET Viewer 13 5.5 Citing this work 13 5.6 About report maker 13 CONTENTS 5.7 Attachments 13

1 Introduction 1 1 INTRODUCTION 2 Chain 1ki1A 1 From the original Data Bank entry (PDB id 1ki1): 2.1 Q9DDV6 overview 1 Title: Guanine nucleotide exchange region of intersectin in complex 2.2 Multiple sequence alignment for 1ki1A 1 with cdc42 2.3 Residue ranking in 1ki1A 1 Compound: Mol id: 1; molecule: g25k gtp-binding protein, placen- 2.4 Top ranking residues in 1ki1A and their position on tal isoform; chain: a, c; fragment: residues 1-188; synonym: cdc42; the structure 2 engineered: yes; mutation: yes; mol id: 2; molecule: intersectin long 2.4.1 Clustering of residues at 25% coverage. 2 form; chain: b, d; fragment: dbl homology and pleckstrin homology 2.4.2 Overlap with known functional surfaces at domains (residues 1229-1580); engineered: yes 25% coverage. 2 Organism, scientific name: Homo Sapiens; 2.4.3 Possible novel functional surfaces at 25% 1ki1 contains unique chains 1ki1A (178 residues) and 1ki1B coverage. 4 (342 residues) 1ki1C is a homologue of chain 1ki1A. 1ki1D is a homologue of chain 1ki1B. 3 Chain 1ki1B 6 3.1 Q15811 overview 6 2 CHAIN 1KI1A 3.2 Multiple sequence alignment for 1ki1B 6 3.3 Residue ranking in 1ki1B 7 2.1 Q9DDV6 overview 3.4 Top ranking residues in 1ki1B and their position on From SwissProt, id Q9DDV6, 100% identical to 1ki1A: the structure 7 Description: Rho GTPase Cdc42 (MGC52619 protein) (Rho family 3.4.1 Clustering of residues at 25% coverage. 7 small GTP binding protein cdc42). 3.4.2 Overlap with known functional surfaces at Organism, scientific name: Xenopus laevis (African clawed frog). 25% coverage. 8 Taxonomy: Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; 3.4.3 Possible novel functional surfaces at 25% Euteleostomi; Amphibia; Batrachia; Anura; Mesobatrachia; Pipoi- coverage. 10 dea; Pipidae; Xenopodinae; Xenopus; Xenopus.

1 Lichtarge lab 2006 Fig. 1. Residues 1-178 in 1ki1A colored by their relative importance. (See Appendix, Fig.15, for the coloring scheme.)

2.2 Multiple sequence alignment for 1ki1A For the chain 1ki1A, the alignment 1ki1A.msf (attached) with 505 sequences was used. The alignment was downloaded from the HSSP database, and fragments shorter than 75% of the query as well as duplicate sequences were removed. It can be found in the attachment to this report, under the name of 1ki1A.msf. Its statistics, from the alistat program are the following: Fig. 2. Residues in 1ki1A, colored by their relative importance. Clockwise: Format: MSF front, back, top and bottom views. Number of sequences: 505 Total number of residues: 87877 Smallest: 136 belong to. The clusters in Fig.3 are composed of the residues listed Largest: 178 Average length: 174.0 Alignment length: 178 Average identity: 56% Most related pair: 99% Most unrelated pair: 22% Most distant seq: 47%

Furthermore, <1% of residues show as conserved in this ali- gnment. The alignment consists of 50% eukaryotic ( 13% vertebrata, 1% arthropoda, 14% fungi, 9% plantae) sequences. (Descriptions of some sequences were not readily available.) The file containing the sequence descriptions can be found in the attachment, under the name 1ki1A.descr. 2.3 Residue ranking in 1ki1A The 1ki1A sequence is shown in Fig. 1, with each residue colored according to its estimated importance. The full listing of residues in 1ki1A can be found in the file called 1ki1A.ranks sorted in the attachment. 2.4 Top ranking residues in 1ki1A and their position on the structure Fig. 3. Residues in 1ki1A, colored according to the cluster they belong to: In the following we consider residues ranking among top 25% of resi- red, followed by blue and yellow are the largest clusters (see Appendix for dues in the protein . Figure 2 shows residues in 1ki1A colored by their the coloring scheme). Clockwise: front, back, top and bottom views. The importance: bright red and yellow indicate more conserved/important corresponding Pymol script is attached. residues (see Appendix for the coloring scheme). A Pymol script for producing this figure can be found in the attachment. in Table 1. 2.4.1 Clustering of residues at 25% coverage. Fig. 3 shows the top 25% of all residues, this time colored according to clusters they

2 Table 1. Table 3. continued cluster size member res type disruptive color residues mutations red 44 5,10,11,12,14,15,16,17,28,29 12 G (R)(K)(EH)(FQW) 32,34,35,36,37,39,40,55,57 14 V (R)(KYE)(H)(Q) 58,59,60,61,62,64,67,68,70 62 E (H)(FW)(Y)(R) 71,72,82,89,92,97,100,114 17 T (K)(R)(FQMWH)(E) 116,118,120,154,156,157,158 169 Table 3. List of disruptive mutations for the top 25% of residues in 1ki1A, that are at the interface with sulfate ion. Table 1. Clusters of top ranking residues in 1ki1A.

2.4.2 Overlap with known functional surfaces at 25% coverage. The name of the ligand is composed of the source PDB identifier and the heteroatom name used in that file. Sulfate ion binding site. Table 2 lists the top 25% of residues at the interface with 1ki1SO44001 (sulfate ion). The following table (Table 3) suggests possible disruptive replacements for these residues (see Section 4.6). Table 2. res type subst’s cvg noc/ dist (%) bb (A˚ ) 59 A A(99)S 0.02 4/1 3.78 15 G G(98)E. 0.07 15/15 2.76 16 K K(98).E 0.09 27/12 2.78 11 D D(97)N 0.16 4/4 3.64 .(1)EA 12 G G(97)A 0.16 9/9 3.63 .(1)SET MY 14 V V(67) 0.20 14/12 3.30 C(30)A .(1)I 62 E E(94) 0.23 2/0 4.27 Fig. 4. Residues in 1ki1A, at the interface with sulfate ion, colored by their D(3).AS relative importance. The ligand (sulfate ion) is colored green. Atoms further KF than 30A˚ away from the geometric center of the ligand, as well as on the line 17 T T(96)V 0.25 11/6 3.02 of sight to the ligand were removed. (See Appendix for the coloring scheme .(1) for the protein chain 1ki1A.) S(1)R Figure 4 shows residues in 1ki1A colored by their importance, at the Table 2. The top 25% of residues in 1ki1A at the interface with sulfate interface with 1ki1SO44001. ion.(Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each Interface with 1ki1D.By analogy with 1ki1C – 1ki1D interface. type in the bracket; noc/bb: number of contacts with the ligand, with the num- Table 4 lists the top 25% of residues at the interface with 1ki1D. The ber of contacts realized through backbone atoms given in the bracket; dist: following table (Table 5) suggests possible disruptive replacements distance of closest apporach to the ligand. ) for these residues (see Section 4.6). Table 4. Table 3. res type subst’s cvg noc/ dist A˚ res type disruptive (%) bb ( ) mutations 60 G G(99)SX 0.01 23/23 3.46 59 A (KR)(YE)(QH)(D) 59 A A(99)S 0.02 13/12 3.82 15 G (R)(K)(FWH)(M) 57 D D(99)CN 0.03 8/6 3.67 16 K (Y)(FW)(T)(VCAG) 34 P P(98)GS 0.05 15/10 2.98 11 D (R)(H)(FW)(Y) .QL continued in next column continued in next column

3 Table 4. continued Table 5. continued res type subst’s cvg noc/ dist res type disruptive (%) bb (A˚ ) mutations 37 F F(98) 0.06 10/10 3.01 61 Q (Y)(H)(FW)(T) L(1)A 35 T (R)(K)(H)(FW) 61 Q Q(98)PA 0.07 24/6 3.49 70 L (Y)(R)(T)(H) NMSKE 67 L (R)(Y)(H)(T) 35 T T(97) 0.12 53/14 2.74 64 Y (K)(Q)(R)(EM) S(1)I.P 71 S (R)(K)(H)(FQW) DN 32 Y (K)(M)(Q)(ER) 70 L L(96)M 0.14 33/9 3.02 36 V (Y)(R)(E)(K) F(1)RIV 40 Y (K)(QM)(E)(R) S 5 K (Y)(FW)(T)(CG) 67 L L(96) 0.15 48/10 3.24 39 N (Y)(FW)(H)(T) I(1)MFV S Table 5. List of disruptive mutations for the top 25% of residues in 1ki1A, 64 Y Y(92) 0.17 62/10 3.35 that are at the interface with 1ki1D. F(5).LS V 71 S S(90)G 0.18 14/8 3.58 A(3) C(4)NFE T 32 Y Y(96)H 0.20 104/8 2.13 .(1)INS 36 V V(93) 0.21 35/9 3.63 A(3).QP I(1)G 40 Y F(21) 0.21 10/10 3.83 Y(74) H(1)LRT QD 5 K K(95)Q 0.22 1/0 4.57 .(3)RS 39 N N(91) 0.24 37/8 3.07 K(1) D(1) H(1)Q T(1)VSR

Table 4. The top 25% of residues in 1ki1A at the interface with 1ki1D. (Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each type Fig. 5. Residues in 1ki1A, at the interface with 1ki1D, colored by their rela- in the bracket; noc/bb: number of contacts with the ligand, with the number of tive importance. 1ki1D is shown in backbone representation (See Appendix contacts realized through backbone atoms given in the bracket; dist: distance for the coloring scheme for the protein chain 1ki1A.) of closest apporach to the ligand. )

Figure 5 shows residues in 1ki1A colored by their importance, at the Table 5. interface with 1ki1D. res type disruptive 2.4.3 Possible novel functional surfaces at 25% coverage. One mutations group of residues is conserved on the 1ki1A surface, away from (or 60 G (KR)(E)(FMWH)(Q) susbtantially larger than) other functional sites and interfaces reco- 59 A (KR)(YE)(QH)(D) gnizable in PDB entry 1ki1. It is shown in Fig. 6. The right panel 57 D (R)(FWH)(Y)(K) shows (in blue) the rest of the larger cluster this surface belongs to. 34 P (Y)(R)(H)(T) The residues belonging to this surface ”patch” are listed in Table 37 F (KE)(TQDR)(SNCG)(Y) 6, while Table 7 suggests possible disruptive replacements for these continued in next column residues (see Section 4.6).

4 Table 6. continued res type substitutions(%) cvg RTQD 5 K K(95)Q.(3)RS 0.22 62 E E(94)D(3).ASKF 0.23 39 N N(91)K(1)D(1) 0.24 H(1)QT(1)VSR 116 Q K(78)Q(20)AH.N 0.24 17 T T(96)V.(1)S(1)R 0.25

Table 6. Residues forming surface ”patch” in 1ki1A.

Fig. 6. A possible active surface on the chain 1ki1A. The larger cluster it belongs to is shown in blue.

Table 6. Table 7. res type substitutions(%) cvg res type disruptive 60 G G(99)SX 0.01 mutations 97 W W(99).A 0.01 60 G (KR)(E)(FMWH)(Q) 59 A A(99)S 0.02 97 W (E)(K)(D)(Q) 72 Y Y(99)FP 0.02 59 A (KR)(YE)(QH)(D) 57 D D(99)CN 0.03 72 Y (K)(Q)(R)(E) 58 T T(99)APV 0.03 57 D (R)(FWH)(Y)(K) 68 R R(99)PA 0.04 58 T (R)(K)(H)(Q) 100 E E(99)SDKG 0.04 68 R (TYD)(E)(SCG)(FVLAWPI) 34 P P(98)GS.QL 0.05 100 E (FW)(H)(YR)(VA) 37 F F(98)L(1)A 0.06 34 P (Y)(R)(H)(T) 118 D D(99)VE.AT 0.06 37 F (KE)(TQDR)(SNCG)(Y) 15 G G(98)E. 0.07 118 D (R)(H)(FW)(K) 61 Q Q(98)PANMSKE 0.07 15 G (R)(K)(FWH)(M) 16 K K(98).E 0.09 61 Q (Y)(H)(FW)(T) 29 P P(98)YQT.SAN 0.11 16 K (Y)(FW)(T)(VCAG) 156 E E(97)QC.SDKG 0.11 29 P (R)(Y)(H)(K) 35 T T(97)S(1)I.PDN 0.12 156 E (FW)(H)(Y)(R) 70 L L(96)MF(1)RIVS 0.14 35 T (R)(K)(H)(FW) 154 Y Y(94)F(1)L(1)R. 0.14 70 L (Y)(R)(T)(H) SHE 154 Y (K)(Q)(M)(E) 67 L L(96)I(1)MFVS 0.15 67 L (R)(Y)(H)(T) 92 N N(97)STRXD.A 0.15 92 N (Y)(FWH)(R)(T) 11 D D(97)N.(1)EA 0.16 11 D (R)(H)(FW)(Y) 12 G G(97)A.(1)SETMY 0.16 12 G (R)(K)(EH)(FQW) 28 F F(94)Y(4)C.IS 0.17 28 F (K)(E)(Q)(DR) 64 Y Y(92)F(5).LSV 0.17 64 Y (K)(Q)(R)(EM) 71 S S(90)GA(3)C(4)N 0.18 71 S (R)(K)(H)(FQW) FET 120 R (T)(YD)(CG)(SE) 120 R R(96)EPKQLYV.G 0.18 14 V (R)(KYE)(H)(Q) 14 V V(67)C(30)A.(1) 0.20 32 Y (K)(M)(Q)(ER) I 36 V (Y)(R)(E)(K) 32 Y Y(96)H.(1)INS 0.20 40 Y (K)(QM)(E)(R) 36 V V(93)A(3).QP 0.21 5 K (Y)(FW)(T)(CG) I(1)G 62 E (H)(FW)(Y)(R) 40 Y F(21)Y(74)H(1)L 0.21 39 N (Y)(FW)(H)(T) continued in next column 116 Q (Y)(T)(FW)(H) 17 T (K)(R)(FQMWH)(E)

Table 7. Disruptive mutations for the surface patch in 1ki1A.

5 3 CHAIN 1KI1B 3.1 Q15811 overview From SwissProt, id Q15811, 96% identical to 1ki1B: Description: Intersectin 1 (SH3 domain-containing protein 1A) (SH3P17). Organism, scientific name: Homo sapiens (Human). Taxonomy: Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Hominidae; Homo. Function: Adapter protein that may provide indirect link between the endocytic membrane traffic and the actin assembly machinery. May Fig. 7. Residues 1229-1399 in 1ki1B colored by their relative importance. regulate the formation of clathrin-coated vesicles. Isoform 1 could be (See Appendix, Fig.15, for the coloring scheme.) involved in brain-specific synaptic vesicle recycling. Subunit: Interacts with dynamin, CDC42, SNAP25 and SNAP23. Clusters several dynamin in a manner that is regulated by alter- native splicing. Also binds clathrin-associated and other components of the endocytic machinery, such as SPIN90, EPS15, EPN1, EPN2 and STN2 (By similarity). Subcellular location: Cytoplasmic; membrane-associated protein. Enriched in synaptosomes (By similarity). Alternative products: Event=Alternative splicing; Named isoforms=4; Com- ment=Additional isoforms seem to exist. Alternative splicing affects domains involved in protein recognition and thus may play a role in selecting specific interactions; Name=1; Synonyms=Long, ITSN-l; IsoId=Q15811-1; Sequence=Displayed; Name=2; Synonyms=Short, Fig. 8. Residues 1400-1580 in 1ki1B colored by their relative importance. (See Appendix, Fig.15, for the coloring scheme.) ITSN-s; IsoId=Q15811-2; Sequence=VSP 004295; Name=3; Synonyms=Short 2, SH3P17; IsoId=Q15811-3; Sequence=VSP 004293, VSP 004294, VSP 004295; Name=4; IsoId=Q15811-4; Sequence=VSP 004294; database, and fragments shorter than 75% of the query as well as Tissue specificity: Ubiquitous in adult and fetal tissues, except iso- duplicate sequences were removed. It can be found in the attachment form 1 which is expressed almost exclusively in the brain. Highly to this report, under the name of 1ki1B.msf. Its statistics, from the expressed in skeletal muscle, heart, spleen, ovary, testis and all fetal alistat program are the following: tissues tested. Expressed at lower levels in thymus, blood, lung, liver and pancreas. Isoform 1 is expressed in all brain regions; not Format: MSF expressed in the spinal cord. Number of sequences: 21 Domain: SH3-3, SH3-4 and SH3-5, but not SH3-1 and SH3-2 Total number of residues: 6152 domains, bind to dynamin (By similarity). Smallest: 257 Domain: The KLERQ domain binds to SNAP-25 and SNAP-23 (By Largest: 342 similarity). Average length: 293.0 Disease: Overexpressed in brain from Down syndrome foetuses sug- Alignment length: 342 gesting a dosage-dependent contribution to the abnormalities of Average identity: 43% Down syndrome. Most related pair: 98% Similarity: Contains 1 C2 domain. Most unrelated pair: 20% Similarity: Contains 1 DH (DBL-homology) domain. Most distant seq: 32% Similarity: Contains 2 EF-hand domains. Similarity: Contains 2 EH domains. Similarity: Contains 1 PH domain. Furthermore, 3% of residues show as conserved in this alignment. Similarity: Contains 5 SH3 domains. The alignment consists of 66% eukaryotic ( 61% vertebrata, 4% About: This Swiss-Prot entry is copyright. It is produced through a fungi) sequences. (Descriptions of some sequences were not readily collaboration between the Swiss Institute of Bioinformatics and the available.) The file containing the sequence descriptions can be found EMBL outstation - the European Bioinformatics Institute. There are in the attachment, under the name 1ki1B.descr. no restrictions on its use as long as its content is in no way modified and this statement is not removed. 3.3 Residue ranking in 1ki1B The 1ki1B sequence is shown in Figs. 7–8, with each residue colored 3.2 Multiple sequence alignment for 1ki1B according to its estimated importance. The full listing of residues For the chain 1ki1B, the alignment 1ki1B.msf (attached) with 21 in 1ki1B can be found in the file called 1ki1B.ranks sorted in the sequences was used. The alignment was downloaded from the HSSP attachment.

6 3.4 Top ranking residues in 1ki1B and their position on the structure In the following we consider residues ranking among top 25% of resi- dues in the protein . Figure 9 shows residues in 1ki1B colored by their importance: bright red and yellow indicate more conserved/important residues (see Appendix for the coloring scheme). A Pymol script for producing this figure can be found in the attachment.

Fig. 10. Residues in 1ki1B, colored according to the cluster they belong to: red, followed by blue and yellow are the largest clusters (see Appendix for the coloring scheme). Clockwise: front, back, top and bottom views. The corresponding Pymol script is attached.

Table 8. continued cluster size member color residues 1461,1462,1465,1467,1469 Fig. 9. Residues in 1ki1B, colored by their relative importance. Clockwise: 1478,1482,1483,1485,1487 front, back, top and bottom views. 1488,1489,1490 blue 3 1315,1316,1317 yellow 2 1353,1356 3.4.1 Clustering of residues at 25% coverage. Fig. 10 shows the top 25% of all residues, this time colored according to clusters they Table 8. Clusters of top ranking residues in 1ki1B. belong to. The clusters in Fig.10 are composed of the residues listed in Table 8. Table 8. 3.4.2 Overlap with known functional surfaces at 25% coverage. cluster size member The name of the ligand is composed of the source PDB identifier color residues and the heteroatom name used in that file. red 78 1235,1237,1238,1239,1240 Interface with 1ki1A.Table 9 lists the top 25% of residues at 1242,1244,1245,1248,1249 the interface with 1ki1A. The following table (Table 10) suggests 1252,1256,1263,1277,1281 possible disruptive replacements for these residues (see Section 4.6). 1282,1284,1285,1287,1288 Table 9. 1323,1332,1333,1334,1336 res type subst’s cvg noc/ dist 1339,1340,1364,1366,1367 (%) bb (A˚ ) 1368,1369,1370,1371,1375 1244 E E(100) 0.04 50/2 2.98 1378,1379,1380,1381,1383 1383 T T(100) 0.04 38/13 3.46 1384,1385,1386,1387,1389 1387 L L(100) 0.04 28/8 3.56 1392,1396,1409,1413,1414 1381 R R(95) 0.05 2/0 4.39 1416,1417,1420,1421,1424 H(4) 1428,1429,1432,1433,1437 1421 N N(95) 0.05 73/10 2.96 1440,1452,1456,1457,1459 E(4) continued in next column continued in next column

7 Table 9. continued Table 10. res type subst’s cvg noc/ dist res type disruptive (%) bb (A˚ ) mutations 1428 E E(95) 0.05 19/0 2.65 1244 E (FWH)(YVCARG)(T)(SNKLPI) A(4) 1383 T (KR)(FQMWH)(NELPI)(D) 1333 C C(95) 0.06 3/1 4.28 1387 L (YR)(TH)(SKECG)(FQWD) G(4) 1381 R (TD)(E)(SVCLAPIG)(YM) 1380 Q Q(95) 0.06 16/1 3.08 1421 N (Y)(FWH)(T)(VCARG) T(4) 1428 E (H)(FYWR)(CG)(TKVA) 1248 T T(90) 0.07 7/0 3.94 1333 C (KER)(FQMWHD)(NYLPI)(SVA) S(9) 1380 Q (FYWH)(TVA)(SCDRG)(ELPI) 1420 V T(4) 0.07 10/5 3.89 1248 T (KR)(FQMWH)(NELPI)(D) V(90) 1420 V (R)(K)(Y)(E) L(4) 1336 Q (Y)(H)(T)(FW) 1336 Q Q(90) 0.08 12/0 4.05 1379 M (Y)(T)(HR)(CG) L(4) 1424 V (YR)(KE)(H)(QD) Y(4) 1368 G (E)(D)(FKMW)(YQLPHIR) 1379 M M(90) 0.09 27/0 3.49 1384 R (T)(YD)(SVCAG)(FELWPI) L(4) 1369 M (Y)(TH)(R)(SCG) Y(4) 1417 C (R)(KE)(H)(Y) 1424 V I(4) 0.10 13/1 3.59 1240 G (R)(H)(K)(FW) V(90) 1432 R (YD)(T)(E)(CG) A(4) 1367 K (Y)(T)(FW)(SVCAG) 1368 G G(52) 0.11 4/4 4.02 1237 K (Y)(FW)(T)(CHG) R(47) 1414 E (FW)(Y)(VCAHG)(T) 1384 R R(47) 0.12 39/0 3.01 K(52) Table 10. List of disruptive mutations for the top 25% of residues in 1369 M L(57) 0.13 33/1 3.48 1ki1B, that are at the interface with 1ki1A. M(42) 1417 C L(47) 0.13 25/14 3.35 C(47) I(4) 1240 G E(52) 0.17 19/19 2.13 G(42) I(4) 1432 R A(4) 0.19 1/0 4.79 R(85) M(4) S(4) 1367 K R(47) 0.20 5/5 3.63 K(52) 1237 K K(80) 0.21 21/9 3.59 D(14) A(4) 1414 E E(52) 0.22 7/5 4.02 R(38) K(9)

Table 9. The top 25% of residues in 1ki1B at the interface with 1ki1A. (Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each type in the bracket; noc/bb: number of contacts with the ligand, with the number of contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. ) Fig. 11. Residues in 1ki1B, at the interface with 1ki1A, colored by their rela- tive importance. 1ki1A is shown in backbone representation (See Appendix for the coloring scheme for the protein chain 1ki1B.)

Figure 11 shows residues in 1ki1B colored by their importance, at the interface with 1ki1A.

8 Interface with 1ki1C.Table 11 lists the top 25% of residues at Figure 12 shows residues in 1ki1B colored by their importance, at the the interface with 1ki1C. The following table (Table 12) suggests interface with 1ki1C. possible disruptive replacements for these residues (see Section 4.6). Interface with 1ki1D.Table 13 lists the top 25% of residues at the interface with 1ki1D. The following table (Table 14) suggests Table 11. possible disruptive replacements for these residues (see Section 4.6). res type subst’s cvg noc/ dist (%) bb (A˚ ) Table 13. 1304 K K(76) 0.22 10/0 2.77 res type subst’s cvg noc/ dist G(4) (%) bb (A˚ ) Q(19) 1356 F H(4) 0.10 43/0 3.52 1265 K N(4) 0.25 7/0 3.24 F(90) K(57) V(4) Q(33) E(4) Table 13. The top 25% of residues in 1ki1B at the interface with 1ki1D. (Field names: res: residue number in the PDB entry; type: amino acid type; Table 11. The top 25% of residues in 1ki1B at the interface with 1ki1C. substs: substitutions seen in the alignment; with the percentage of each type (Field names: res: residue number in the PDB entry; type: amino acid type; in the bracket; noc/bb: number of contacts with the ligand, with the number of substs: substitutions seen in the alignment; with the percentage of each type contacts realized through backbone atoms given in the bracket; dist: distance in the bracket; noc/bb: number of contacts with the ligand, with the number of of closest apporach to the ligand. ) contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. ) Table 14. res type disruptive Table 12. mutations res type disruptive 1356 F (E)(K)(Q)(D) mutations 1304 K (Y)(FW)(T)(S) Table 14. List of disruptive mutations for the top 25% of residues in 1265 K (Y)(FW)(T)(VCAG) 1ki1B, that are at the interface with 1ki1D.

Table 12. List of disruptive mutations for the top 25% of residues in 1ki1B, that are at the interface with 1ki1C.

Fig. 13. Residues in 1ki1B, at the interface with 1ki1D, colored by their rela- tive importance. 1ki1D is shown in backbone representation (See Appendix for the coloring scheme for the protein chain 1ki1B.)

Fig. 12. Residues in 1ki1B, at the interface with 1ki1C, colored by their rela- tive importance. 1ki1C is shown in backbone representation (See Appendix Figure 13 shows residues in 1ki1B colored by their importance, at the for the coloring scheme for the protein chain 1ki1B.) interface with 1ki1D.

9 3.4.3 Possible novel functional surfaces at 25% coverage. One Table 15. continued group of residues is conserved on the 1ki1B surface, away from (or res type substitutions(%) cvg susbtantially larger than) other functional sites and interfaces reco- 1462 K R(4)K(85)Q(9) 0.11 gnizable in PDB entry 1ki1. It is shown in Fig. 14. The right panel 1238 R R(90)E(4)Q(4) 0.12 shows (in blue) the rest of the larger cluster this surface belongs to. 1384 R R(47)K(52) 0.12 1369 M L(57)M(42) 0.13 1417 C L(47)C(47)I(4) 0.13 1440 V L(47)V(52) 0.14 1281 I I(80)L(19) 0.15 1285 W I(4)W(47)L(42) 0.16 V(4) 1323 L M(4)L(47)F(42) 0.16 W(4) 1452 N G(4)N(47)E(42) 0.16 T(4) 1456 N D(4)N(47)L(42) 0.16 E(4) 1240 G E(52)G(42)I(4) 0.17 Fig. 14. A possible active surface on the chain 1ki1B. The larger cluster it 1340 A G(4)A(42)L(52) 0.17 belongs to is shown in blue. 1461 R R(80)K(19) 0.17 1334 S V(4)S(80)R(4) 0.18 L(4)N(4) The residues belonging to this surface ”patch” are listed in Table 15, 1416 L I(42)L(52)S(4) 0.18 while Table 16 suggests possible disruptive replacements for these 1429 N D(19)N(80) 0.18 residues (see Section 4.6). 1432 R A(4)R(85)M(4) 0.19 S(4) Table 15. 1284 N N(80)S(19) 0.20 res type substitutions(%) cvg 1353 F I(9)F(85)L(4) 0.20 1244 E E(100) 0.04 1367 K R(47)K(52) 0.20 1371 L L(100) 0.04 1487 L L(90)V(9) 0.20 1383 T T(100) 0.04 1235 E E(80)A(4)Q(14) 0.21 1385 Y Y(100) 0.04 1237 K K(80)D(14)A(4) 0.21 1386 P P(100) 0.04 1414 E E(52)R(38)K(9) 0.22 1387 L L(100) 0.04 1364 P P(80)S(4)Y(14) 0.23 1485 D D(100) 0.04 1370 P D(4)P(47)Q(42) 0.23 1381 R R(95)H(4) 0.05 K(4) 1421 N N(95)E(4) 0.05 1457 C V(4)C(42)D(42) 0.23 1428 E E(95)A(4) 0.05 R(4)F(4) 1333 C C(95)G(4) 0.06 1375 I L(47)I(52) 0.24 1380 Q Q(95)T(4) 0.06 1459 G R(9)G(47)T(38) 0.24 1437 Q S(4)Q(95) 0.06 S(4) 1242 I I(90)L(9) 0.07 1277 E A(4)E(80)D(9) 0.25 1248 T T(90)S(9) 0.07 Y(4) 1420 V T(4)V(90)L(4) 0.07 1239 Q Q(90)D(4)R(4) 0.08 1336 Q Q(90)L(4)Y(4) 0.08 Table 15. Residues forming surface ”patch” in 1ki1B. 1396 T T(95)S(4) 0.08 1366 C A(9)C(90) 0.09 Table 16. 1379 M M(90)L(4)Y(4) 0.09 1287 E S(4)E(90)R(4) 0.10 res type disruptive 1356 F H(4)F(90)V(4) 0.10 mutations 1424 V I(4)V(90)A(4) 0.10 1244 E (FWH)(YVCARG)(T)(SNKLPI) 1489 L V(4)L(90)I(4) 0.10 1371 L (YR)(TH)(SKECG)(FQWD) 1252 Y H(47)Y(52) 0.11 1383 T (KR)(FQMWH)(NELPI)(D) 1368 G G(52)R(47) 0.11 1385 Y (K)(QM)(NEVLAPIR)(D) 1386 P (YR)(TH)(SKECG)(FQWD) continued in next column 1387 L (YR)(TH)(SKECG)(FQWD) continued in next column

10 Table 16. continued Table 16. continued res type disruptive res type disruptive mutations mutations 1485 D (R)(FWH)(KYVCAG)(TQM) 1381 R (TD)(E)(SVCLAPIG)(YM) Table 16. Disruptive mutations for the surface patch in 1ki1B. 1421 N (Y)(FWH)(T)(VCARG) 1428 E (H)(FYWR)(CG)(TKVA) 1333 C (KER)(FQMWHD)(NYLPI)(SVA) 1380 Q (FYWH)(TVA)(SCDRG)(ELPI) 1437 Q (Y)(FWH)(T)(VCAG) 4 NOTES ON USING TRACE RESULTS 1242 I (YR)(TH)(SKECG)(FQWD) 4.1 Coverage 1248 T (KR)(FQMWH)(NELPI)(D) Trace results are commonly expressed in terms of coverage: the resi- 1420 V (R)(K)(Y)(E) due is important if its “coverage” is small - that is if it belongs to 1239 Q (Y)(FW)(T)(H) some small top percentage of residues [100% is all of the residues 1336 Q (Y)(H)(T)(FW) in a chain], according to trace. The ET results are presented in the 1396 T (KR)(FQMWH)(NELPI)(D) form of a table, usually limited to top 25% percent of residues (or 1366 C (KER)(QHD)(FYMW)(N) to some nearby percentage), sorted by the strength of the presumed 1379 M (Y)(T)(HR)(CG) evolutionary pressure. (I.e., the smaller the coverage, the stronger the 1287 E (FW)(H)(Y)(VCAG) pressure on the residue.) Starting from the top of that list, mutating a 1356 F (E)(K)(Q)(D) couple of residues should affect the protein somehow, with the exact 1424 V (YR)(KE)(H)(QD) effects to be determined experimentally. 1489 L (YR)(H)(T)(KE) 1252 Y (K)(QM)(E)(NVLAPI) 4.2 Known substitutions 1368 G (E)(D)(FKMW)(YQLPHIR) 1462 K (Y)(T)(FW)(SVCAG) One of the table columns is “substitutions” - other amino acid types 1238 R (T)(Y)(VCAG)(FWD) seen at the same position in the alignment. These amino acid types 1384 R (T)(YD)(SVCAG)(FELWPI) may be interchangeable at that position in the protein, so if one wants 1369 M (Y)(TH)(R)(SCG) to affect the protein by a point mutation, they should be avoided. For 1417 C (R)(KE)(H)(Y) example if the substitutions are “RVK” and the original protein has 1440 V (YR)(KE)(H)(QD) an R at that position, it is advisable to try anything, but RVK. Conver- 1281 I (YR)(TH)(SKECG)(FQWD) sely, when looking for substitutions which will not affect the protein, 1285 W (KE)(TR)(QD)(SCG) one may try replacing, R with K, or (perhaps more surprisingly), with 1323 L (R)(TY)(KE)(SCHG) V. The percentage of times the substitution appears in the alignment 1452 N (FYWH)(R)(TEVA)(M) is given in the immediately following bracket. No percentage is given 1456 N (Y)(H)(FW)(TR) in the cases when it is smaller than 1%. This is meant to be a rough 1240 G (R)(H)(K)(FW) guide - due to rounding errors these percentages often do not add up 1340 A (R)(Y)(KE)(H) to 100%. 1461 R (T)(YD)(SVCAG)(FELWPI) 1334 S (R)(K)(YH)(FW) 4.3 Surface 1416 L (R)(Y)(H)(K) To detect candidates for novel functional interfaces, first we look for 1429 N (Y)(FWH)(TR)(VCAG) residues that are solvent accessible (according to DSSP program) by 2 1432 R (YD)(T)(E)(CG) at least 10A˚ , which is roughly the area needed for one water mole- 1284 N (Y)(FWH)(R)(TE) cule to come in the contact with the residue. Furthermore, we require 1353 F (KE)(T)(R)(QD) that these residues form a “cluster” of residues which have neighbor 1367 K (Y)(T)(FW)(SVCAG) within 5A˚ from any of their heavy atoms. 1487 L (YR)(H)(TKE)(SQCDG) Note, however, that, if our picture of protein evolution is correct, 1235 E (H)(FW)(Y)(R) the neighboring residues which are not surface accessible might be 1237 K (Y)(FW)(T)(CHG) equally important in maintaining the interaction specificity - they 1414 E (FW)(Y)(VCAHG)(T) should not be automatically dropped from consideration when choo- 1364 P (R)(K)(Y)(H) sing the set for mutagenesis. (Especially if they form a cluster with 1370 P (Y)(THR)(CG)(SFW) the surface residues.) 1457 C (KE)(R)(QD)(M) 1375 I (YR)(TH)(SKECG)(FQWD) 4.4 Number of contacts 1459 G (KE)(R)(FMW)(QHD) Another column worth noting is denoted “noc/bb”; it tells the num- 1277 E (HR)(FW)(YVCAG)(K) ber of contacts heavy atoms of the residue in question make across continued in next column the interface, as well as how many of them are realized through the backbone atoms (if all or most contacts are through the backbone, mutation presumably won’t have strong impact). Two heavy atoms are considered to be “in contact” if their centers are closer than 5A˚ .

11 4.5 Annotation If the residue annotation is available (either from the pdb file or from other sources), another column, with the header “annotation” appears. Annotations carried over from PDB are the following: site (indicating existence of related site record in PDB ), S-S (disulfide COVERAGE bond forming residue), hb (hydrogen bond forming residue, jb (james bond forming residue), and sb (for salt bridge forming residue). V 100% 50% 30% 5% 4.6 Mutation suggestions Mutation suggestions are completely heuristic and based on comple- mentarity with the substitutions found in the alignment. Note that they are meant to be disruptive to the interaction of the protein with its ligand. The attempt is made to complement the following V properties: small [AV GSTC], medium [LPNQDEMIK], large [WFYHR], hydrophobic [LPVAMWFI], polar [GTCY ]; posi- RELATIVE IMPORTANCE tively [KHR], or negatively [DE] charged, aromatic [WFYH], long aliphatic chain [EKRQM], OH-group possession [SDETY ], and NH2 group possession [NQRK]. The suggestions are listed Fig. 15. Coloring scheme used to color residues by their relative importance. according to how different they appear to be from the original amino acid, and they are grouped in round brackets if they appear equally disruptive. From left to right, each bracketed group of amino acid The colors used to distinguish the residues by the estimated types resembles more strongly the original (i.e. is, presumably, less evolutionary pressure they experience can be seen in Fig. 15. disruptive) These suggestions are tentative - they might prove disrup- 5.3 Credits tive to the fold rather than to the interaction. Many researcher will 5.3.1 Alistat alistat reads a multiple sequence alignment from the choose, however, the straightforward alanine mutations, especially in file and shows a number of simple statistics about it. These stati- the beginning stages of their investigation. stics include the format, the number of sequences, the total number of residues, the average and range of the sequence lengths, and the 5 APPENDIX alignment length (e.g. including gap characters). Also shown are 5.1 File formats some percent identities. A percent pairwise alignment identity is defi- ned as (idents / MIN(len1, len2)) where idents is the number of Files with extension “ranks sorted” are the actual trace results. The exact identities and len1, len2 are the unaligned lengths of the two fields in the table in this file: sequences. The ”average percent identity”, ”most related pair”, and • alignment# number of the position in the alignment ”most unrelated pair” of the alignment are the average, maximum, • residue# residue number in the PDB file and minimum of all (N)(N-1)/2 pairs, respectively. The ”most distant seq” is calculated by finding the maximum pairwise identity (best • type amino acid type relative) for all N sequences, then finding the minimum of these N • rank rank of the position according to older version of ET numbers (hence, the most outlying sequence). alistat is copyrighted • variability has two subfields: by HHMI/Washington University School of Medicine, 1992-2001, 1. number of different amino acids appearing in in this column and freely distributed under the GNU General Public License. of the alignment 5.3.2 CE To map ligand binding sites from different 2. their type source structures, report maker uses the CE program: • rho ET score - the smaller this value, the lesser variability of http://cl.sdsc.edu/. Shindyalov IN, Bourne PE (1998) this position across the branches of the tree (and, presumably, ”Protein structure alignment by incremental combinatorial extension the greater the importance for the protein) (CE) of the optimal path . Protein Engineering 11(9) 739-747. • cvg coverage - percentage of the residues on the structure which 5.3.3 DSSP In this work a residue is considered solvent accessi- have this rho or smaller ble if the DSSP program finds it exposed to water by at least 10A˚ 2, • gaps percentage of gaps in this column which is roughly the area needed for one water molecule to come in the contact with the residue. DSSP is copyrighted by W. Kabsch, C. 5.2 Color schemes used Sander and MPI-MF, 1983, 1985, 1988, 1994 1995, CMBI version The following color scheme is used in figures with residues colored by [email protected] November 18,2002, by cluster size: black is a single-residue cluster; clusters composed of http://www.cmbi.kun.nl/gv/dssp/descrip.html. more than one residue colored according to this hierarchy (ordered by descending size): red, blue, yellow, green, purple, azure, tur- 5.3.4 HSSP Whenever available, report maker uses HSSP ali- quoise, brown, coral, magenta, LightSalmon, SkyBlue, violet, gold, gnment as a starting point for the analysis (sequences shorter than bisque, LightSlateBlue, orchid, RosyBrown, MediumAquamarine, 75% of the query are taken out, however); R. Schneider, A. de DarkOliveGreen, CornflowerBlue, grey55, burlywood, LimeGreen, Daruvar, and C. Sander. ”The HSSP database of protein structure- tan, DarkOrange, DeepPink, maroon, BlanchedAlmond. sequence alignments.” Nucleic Acids Res., 25:226–230, 1997.

12 http://swift.cmbi.kun.nl/swift/hssp/ is copyrighted by Lichtarge Lab, Baylor College of Medicine, Houston. 5.3.5 LaTex The text for this report was processed using LATEX; Leslie Lamport, “LaTeX: A Document Preparation System Addison- Wesley,” Reading, Mass. (1986). 5.7 Attachments The following files should accompany this report: 5.3.6 Muscle When making alignments “from scratch”, report maker uses Muscle alignment program: Edgar, Robert C. (2004), • 1ki1A.complex.pdb - coordinates of 1ki1A with all of its inter- ”MUSCLE: multiple sequence alignment with high accuracy and acting partners high throughput.” Nucleic Acids Research 32(5), 1792-97. • 1ki1A.etvx - ET viewer input file for 1ki1A http://www.drive5.com/muscle/ • 1ki1A.cluster report.summary - Cluster report summary for 5.3.7 Pymol The figures in this report were produced using 1ki1A Pymol. The scripts can be found in the attachment. Pymol • 1ki1A.ranks - Ranks file in sequence order for 1ki1A is an open-source application copyrighted by DeLano Scien- • 1ki1A.clusters - Cluster descriptions for 1ki1A tific LLC (2005). For more information about Pymol see • 1ki1A.msf - the multiple sequence alignment used for the chain http://pymol.sourceforge.net/. (Note for Windows 1ki1A users: the attached package needs to be unzipped for Pymol to read the scripts and launch the viewer.) • 1ki1A.descr - description of sequences used in 1ki1A msf • 5.4 Note about ET Viewer 1ki1A.ranks sorted - full listing of residues and their ranking for 1ki1A Dan Morgan from the Lichtarge lab has developed a visualization • 1ki1A.1ki1SO44001.if.pml - Pymol script for Figure 4 tool specifically for viewing trace results. If you are interested, please visit: • 1ki1A.cbcvg - used by other 1ki1A – related pymol scripts • http://mammoth.bcm.tmc.edu/traceview/ 1ki1A.1ki1D.if.pml - Pymol script for Figure 5 • 1ki1B.complex.pdb - coordinates of 1ki1B with all of its inter- The viewer is self-unpacking and self-installing. Input files to be used acting partners with ETV (extension .etvx) can be found in the attachment to the • 1ki1B.etvx - ET viewer input file for 1ki1B main report. • 1ki1B.cluster report.summary - Cluster report summary for 5.5 Citing this work 1ki1B The method used to rank residues and make predictions in this report • 1ki1B.ranks - Ranks file in sequence order for 1ki1B can be found in Mihalek, I., I. Res,ˇ O. Lichtarge. (2004). ”A Family of • 1ki1B.clusters - Cluster descriptions for 1ki1B Evolution-Entropy Hybrid Methods for Ranking of Protein Residues by Importance” J. Mol. Bio. 336: 1265-82. For the original version • 1ki1B.msf - the multiple sequence alignment used for the chain of ET see O. Lichtarge, H.Bourne and F. Cohen (1996). ”An Evolu- 1ki1B tionary Trace Method Defines Binding Surfaces Common to Protein • 1ki1B.descr - description of sequences used in 1ki1B msf Families” J. Mol. Bio. 257: 342-358. • 1ki1B.ranks sorted - full listing of residues and their ranking for report maker itself is described in Mihalek I., I. Res and O. 1ki1B Lichtarge (2006). ”Evolutionary Trace Report Maker: a new type • of service for comparative analysis of proteins.” Bioinformatics 1ki1B.1ki1A.if.pml - Pymol script for Figure 11 22:1656-7. • 1ki1B.cbcvg - used by other 1ki1B – related pymol scripts • 5.6 About report maker 1ki1B.1ki1C.if.pml - Pymol script for Figure 12 • report maker was written in 2006 by Ivana Mihalek. The 1D ran- 1ki1B.1ki1D.if.pml - Pymol script for Figure 13 king visualization program was written by Ivica Res.ˇ report maker

13