Pages 1–9 2vj0 Evolutionary trace report by report maker June 1, 2010

4.3.1 Alistat 9 4.3.2 CE 9 4.3.3 DSSP 9 4.3.4 HSSP 9 4.3.5 LaTex 9 4.3.6 Muscle 9 4.3.7 Pymol 9 4.4 Note about ET Viewer 9 4.5 Citing this work 9 4.6 About report maker 9 4.7 Attachments 9

1 INTRODUCTION From the original Data Bank entry (PDB id 2vj0): Title: Crystal structure of the alpha-adaptin appendage domain, from the ap2 adaptor complex, in complex with an fxdnf peptide from amphiphysin1 and a wvxf peptide from synaptojanin p170 Compound: Mol id: 1; molecule: ap-2 complex subunit alpha-2; synonym: adapter-related protein complex 2 alpha-2 subunit, alpha- adaptin c, adaptor protein complex ap-2 alpha-2 subunit, clathrin CONTENTS assembly protein complex 2 alpha-c large chain, 100 kda coated vesicle protein c, plasma membrane adaptor ha2/ap2 adaptin alpha 1 Introduction 1 c subunit; chain: a; fragment: appendage domain, residues 693-938; engineered: yes; mol id: 2; molecule: synaptojanin-1; synonym: 2 Chain 2vj0A 1 synaptic inositol-1,4,5-trisphosphate 5-phosphatase 1,synaptojanin-1 2.1 P18484 overview 1 p170; chain: p; fragment: peptide containing wvxf motif, residues 2.2 Multiple sequence alignment for 2vj0A 1 1479-1490; mol id: 3; molecule: amphiphysin; synonym: amphiphy- 2.3 Residue ranking in 2vj0A 2 sin1; chain: q; fragment: peptide containing fxdnf motif, residues 2.4 Top ranking residues in 2vj0A and their position on 324-330 the structure 2 Organism, scientific name: Rattus Norvegicus; 2.4.1 Clustering of residues at 25% coverage. 2 2vj0 contains a single unique chain 2vj0A (246 residues long). 2.4.2 Overlap with known functional surfaces at Chains 2vj0P and 2vj0Q are too short to permit statistically signi- 25% coverage. 3 ficant analysis, and were treated as a peptide ligands. 2.4.3 Possible novel functional surfaces at 25% coverage. 5 2 CHAIN 2VJ0A 3 Notes on using trace results 7 3.1 Coverage 7 2.1 P18484 overview 3.2 Known substitutions 8 From SwissProt, id P18484, 99% identical to 2vj0A: 3.3 Surface 8 Description: Adapter-related protein complex 2 alpha 2 subunit 3.4 Number of contacts 8 (Alpha-adaptin C) (Adaptor protein complex AP-2 alpha-2 subu- 3.5 Annotation 8 nit) (Clathrin assembly protein complex 2 alpha-C large chain) (100 3.6 Mutation suggestions 8 kDa coated vesicle protein C) (Plasma membrane adaptor HA2/AP2 adaptin alpha C subunit). 4 Appendix 8 Organism, scientific name: Rattus norvegicus (Rat). 4.1 File formats 8 Taxonomy: Eukaryota; Metazoa; Chordata; Craniata; Verte- 4.2 Color schemes used 8 brata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; 4.3 Credits 9 Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Rattus.

1 Lichtarge lab 2006 Function: Adaptins are components of the adaptor complexes which link clathrin to receptors in coated vesicles. Clathrin-associated pro- tein complexes are believed to interact with the cytoplasmic tails of membrane , leading to their selection and concentration. Alpha adaptin is a subunit of the plasma membrane adaptor. Binds polyphosphoinositide-containing lipids. Subunit: Adaptor protein complex 2 (AP-2) is an heterote- tramer composed of two large adaptins (alpha1A/AP2A1 or alpha1B/AP2A1 or alpha2/AP2A2 and beta1/AP2B1), a medium Fig. 1. Residues 693-815 in 2vj0A colored by their relative importance. (See adaptin (mu2/AP2M1) and a small adaptin (sigma2long/AP2S1 or Appendix, Fig.12, for the coloring scheme.) sigma2short/AP2S1). Binds EPN1, EPS15, AMPH, SNAP91 and BIN1 (By similarity). Interaction: Subcellular location: Component of the coat surrounding the cyto- plasmic face of coated vesicles in the plasma membrane. Tissue specificity: Widely expressed. Similarity: Belongs to the adaptor complexes large subunit family. About: This Swiss-Prot entry is copyright. It is produced through a collaboration between the Swiss Institute of Bioinformatics and the EMBL outstation - the European Bioinformatics Institute. There are no restrictions on its use as long as its content is in no way modified Fig. 2. Residues 816-938 in 2vj0A colored by their relative importance. (See and this statement is not removed. Appendix, Fig.12, for the coloring scheme.)

2.4 Top ranking residues in 2vj0A and their position on 2.2 Multiple sequence alignment for 2vj0A the structure For the chain 2vj0A, the alignment 2vj0A.msf (attached) with 47 In the following we consider residues ranking among top 25% of resi- sequences was used. The alignment was downloaded from the HSSP dues in the protein . Figure 3 shows residues in 2vj0A colored by their database, and fragments shorter than 75% of the query as well as importance: bright red and yellow indicate more conserved/important duplicate sequences were removed. It can be found in the attachment residues (see Appendix for the coloring scheme). A Pymol script for to this report, under the name of 2vj0A.msf. Its statistics, from the producing this figure can be found in the attachment. alistat program are the following:

Format: MSF Number of sequences: 47 Total number of residues: 11295 Smallest: 219 Largest: 246 Average length: 240.3 Alignment length: 246 Average identity: 42% Most related pair: 98% Most unrelated pair: 22% Most distant seq: 37%

Furthermore, 5% of residues show as conserved in this alignment. The alignment consists of 27% eukaryotic ( 17% vertebrata, 8% fungi) sequences. (Descriptions of some sequences were not readily available.) The file containing the sequence descriptions can be found in the attachment, under the name 2vj0A.descr.

2.3 Residue ranking in 2vj0A The 2vj0A sequence is shown in Figs. 1–2, with each residue colored Fig. 3. Residues in 2vj0A, colored by their relative importance. Clockwise: according to its estimated importance. The full listing of residues front, back, top and bottom views. in 2vj0A can be found in the file called 2vj0A.ranks sorted in the attachment.

2 2.4.1 Clustering of residues at 25% coverage. Fig. 4 shows the Table 2. top 25% of all residues, this time colored according to clusters they res type subst’s cvg noc/ dist antn belong to. The clusters in Fig.4 are composed of the residues listed (%) bb (A˚ ) 729 E R(4) 0.09 14/10 3.44 site E(95) 735 G G(85) 0.15 10/10 3.67 site A(14) 731 R Q(6) 0.16 7/5 3.96 R(82) K(2) T(2) H(6)

Table 2. The top 25% of residues in 2vj0A at the interface with dithiane diol.(Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each type in the bracket; noc/bb: number of contacts with the ligand, with the num- ber of contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. )

Table 3. res type disruptive mutations 729 E (FW)(YVCAHG)(T)(SLPIR) 735 G (KER)(QHD)(FYMW)(N) 731 R (TD)(Y)(E)(VA) Fig. 4. Residues in 2vj0A, colored according to the cluster they belong to: red, followed by blue and yellow are the largest clusters (see Appendix for Table 3. List of disruptive mutations for the top 25% of residues in 2vj0A, the coloring scheme). Clockwise: front, back, top and bottom views. The that are at the interface with dithiane diol. corresponding Pymol script is attached. Figure 5 shows residues in 2vj0A colored by their importance, at the in Table 1. interface with 2vj0ADTD1940. Sulfate ion binding site. Table 4 lists the top 25% of residues at Table 1. the interface with 2vj0ASO41943 (sulfate ion). The following table cluster size member (Table 5) suggests possible disruptive replacements for these residues color residues (see Section 3.6). red 53 708,714,715,716,717,718,719 722,723,724,725,740,743,744 Table 4. 782,784,785,795,799,818,819 res type subst’s cvg noc/ dist 823,824,825,831,836,837,839 (%) bb (A˚ ) 840,843,849,851,871,872,880 728 S S(76) 0.21 1/1 4.85 881,882,883,886,888,891,892 T(6) 901,902,903,905,906,907,908 L(6) 909,916,918,920 A(10) blue 4 727,728,729,735 yellow 3 752,805,812 Table 4. The top 25% of residues in 2vj0A at the interface with sulfate ion.(Field names: res: residue number in the PDB entry; type: amino acid Table 1. Clusters of top ranking residues in 2vj0A. type; substs: substitutions seen in the alignment; with the percentage of each type in the bracket; noc/bb: number of contacts with the ligand, with the num- ber of contacts realized through backbone atoms given in the bracket; dist: 2.4.2 Overlap with known functional surfaces at 25% coverage. distance of closest apporach to the ligand. ) The name of the ligand is composed of the source PDB identifier and the heteroatom name used in that file. Dithiane diol binding site. Table 2 lists the top 25% of residues at the interface with 2vj0ADTD1940 (dithiane diol). The following table (Table 3) suggests possible disruptive replacements for these residues (see Section 3.6).

3 Fig. 5. Residues in 2vj0A, at the interface with dithiane diol, colored by their Fig. 6. Residues in 2vj0A, at the interface with sulfate ion, colored by their relative importance. The ligand (dithiane diol) is colored green. Atoms further relative importance. The ligand (sulfate ion) is colored green. Atoms further than 30A˚ away from the geometric center of the ligand, as well as on the line than 30A˚ away from the geometric center of the ligand, as well as on the line of sight to the ligand were removed. (See Appendix for the coloring scheme of sight to the ligand were removed. (See Appendix for the coloring scheme for the protein chain 2vj0A.) for the protein chain 2vj0A.)

Table 5. Table 6. continued res type disruptive res type subst’s cvg noc/ dist mutations (%) bb (A˚ ) 728 S (R)(K)(H)(Q) 909 N N(91) 0.08 17/5 3.66 S(2) Table 5. List of disruptive mutations for the top 25% of residues in 2vj0A, D(6) that are at the interface with sulfate ion. 837 F F(95) 0.11 13/0 3.79 V(4) Figure 6 shows residues in 2vj0A colored by their importance, at the 916 R R(95) 0.13 69/12 3.00 interface with 2vj0ASO41943. .(2) Interface with the peptide 2vj0Q. Table 6 lists the top 25% N(2) of residues at the interface with 2vj0Q. The following table (Table 880 V I(12) 0.19 1/0 4.95 7) suggests possible disruptive replacements for these residues (see V(80) Section 3.6). L(6)

Table 6. Table 6. The top 25% of residues in 2vj0A at the interface with 2vj0Q. res type subst’s cvg noc/ dist (Field names: res: residue number in the PDB entry; type: amino acid type; (%) bb (A˚ ) substs: substitutions seen in the alignment; with the percentage of each type 836 F F(100) 0.06 7/0 3.75 in the bracket; noc/bb: number of contacts with the ligand, with the number of 840 W W(100) 0.06 24/0 3.54 contacts realized through backbone atoms given in the bracket; dist: distance 851 Q Q(100) 0.06 1/0 4.98 of closest apporach to the ligand. ) 881 D D(100) 0.06 15/4 3.66 888 V V(100) 0.06 3/0 4.17 905 R R(100) 0.06 58/0 2.96 Table 7. 907 E E(100) 0.06 38/6 2.78 res type disruptive 908 P P(91) 0.07 4/4 3.81 mutations T(8) 836 F (KE)(TQD)(SNCRG)(M) 840 W (KE)(TQD)(SNCRG)(M) continued in next column continued in next column

4 Table 7. continued Table 8. continued res type disruptive res type subst’s cvg noc/ dist mutations (%) bb (A˚ ) 851 Q (Y)(FTWH)(SVCAG)(D) 743 N N(100) 0.06 14/14 3.26 881 D (R)(FWH)(KYVCAG)(TQM) 782 Q Q(100) 0.06 71/11 2.81 888 V (KYER)(QHD)(N)(FTMW) 714 G G(95) 0.10 23/23 3.54 905 R (TD)(SYEVCLAPIG)(FMW)(N) A(4) 907 E (FWH)(YVCARG)(T)(SNKLPI) 744 K K(91) 0.10 7/4 3.89 908 P (R)(YH)(K)(E) R(6) 909 N (Y)(FWH)(R)(T) S(2) 837 F (KE)(QD)(TR)(N) 727 K K(61) 0.11 27/2 2.85 916 R (T)(D)(YVCAG)(S) R(38) 880 V (YR)(KE)(H)(QD) 715 V V(89) 0.17 11/7 3.91 I(10) Table 7. List of disruptive mutations for the top 25% of residues in 2vj0A, 740 F Y(53) 0.18 28/0 3.49 that are at the interface with 2vj0Q. F(46)

Table 8. The top 25% of residues in 2vj0A at the interface with 2vj0P. (Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each type in the bracket; noc/bb: number of contacts with the ligand, with the number of contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. )

Table 9. res type disruptive mutations 723 Q (Y)(FTWH)(SVCAG)(D) 725 G (KER)(FQMWHD)(NYLPI)(SVA) 743 N (Y)(FTWH)(SEVCARG)(MD) 782 Q (Y)(FTWH)(SVCAG)(D) 714 G (KER)(QHD)(FYMW)(N) 744 K (Y)(FW)(T)(VCAG) 727 K (Y)(T)(FW)(SVCAG) 715 V (YR)(KE)(H)(QD) 740 F (K)(E)(Q)(D)

Table 9. List of disruptive mutations for the top 25% of residues in 2vj0A, that are at the interface with 2vj0P.

Fig. 7. Residues in 2vj0A, at the interface with 2vj0Q, colored by their rela- Figure 8 shows residues in 2vj0A colored by their importance, at the tive importance. 2vj0Q is shown in backbone representation (See Appendix for the coloring scheme for the protein chain 2vj0A.) interface with 2vj0P. Chloride ion binding site. Table 10 lists the top 25% of resi- dues at the interface with 2vj0ACL1944 (chloride ion). The following Figure 7 shows residues in 2vj0A colored by their importance, at the table (Table 11) suggests possible disruptive replacements for these interface with 2vj0Q. residues (see Section 3.6). Interface with the peptide 2vj0P. Table 8 lists the top 25% of residues at the interface with 2vj0P. The following table (Table 9) Table 10. suggests possible disruptive replacements for these residues (see res type subst’s cvg noc/ dist antn Section 3.6). (%) bb (A˚ ) 795 F F(80) 0.14 4/2 3.67 site Table 8. Y(10) res type subst’s cvg noc/ dist S(6) (%) bb (A˚ ) I(2) 723 Q Q(100) 0.06 13/0 3.73 725 G G(100) 0.06 19/19 3.50 Table 10. The top 25% of residues in 2vj0A at the interface with chlo- continued in next column ride ion.(Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each

5 Fig. 8. Residues in 2vj0A, at the interface with 2vj0P, colored by their relative Fig. 9. Residues in 2vj0A, at the interface with chloride ion, colored by their importance. 2vj0P is shown in backbone representation (See Appendix for the relative importance. The ligand (chloride ion) is colored green. Atoms further coloring scheme for the protein chain 2vj0A.) than 30A˚ away from the geometric center of the ligand, as well as on the line of sight to the ligand were removed. (See Appendix for the coloring scheme for the protein chain 2vj0A.) type in the bracket; noc/bb: number of contacts with the ligand, with the num- ber of contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. )

Table 11. res type disruptive mutations 795 F (K)(E)(Q)(R)

Table 11. List of disruptive mutations for the top 25% of residues in 2vj0A, that are at the interface with chloride ion.

Fig. 10. A possible active surface on the chain 2vj0A. The larger cluster it Figure 9 shows residues in 2vj0A colored by their importance, at the belongs to is shown in blue. interface with 2vj0ACL1944. 2.4.3 Possible novel functional surfaces at 25% coverage. One Table 12. continued group of residues is conserved on the 2vj0A surface, away from (or susbtantially larger than) other functional sites and interfaces reco- res type substitutions(%) cvg antn gnizable in PDB entry 2vj0. It is shown in Fig. 10. The right panel 784 Q Q(100) 0.06 shows (in blue) the rest of the larger cluster this surface belongs to. 819 P P(100) 0.06 The residues belonging to this surface ”patch” are listed in Table 12, 716 L L(97)I(2) 0.09 while Table 13 suggests possible disruptive replacements for these 718 E E(95)Q(4) 0.09 residues (see Section 3.6). 729 E R(4)E(95) 0.09 site 714 G G(95)A(4) 0.10 Table 12. 744 K K(91)R(6)S(2) 0.10 res type substitutions(%) cvg antn 727 K K(61)R(38) 0.11 723 Q Q(100) 0.06 824 K K(82)R(17) 0.13 725 G G(100) 0.06 719 N D(61)S(2)N(36) 0.14 743 N N(100) 0.06 708 F L(57)F(42) 0.15 782 Q Q(100) 0.06 735 G G(85)A(14) 0.15 site continued in next column continued in next column

6 Table 12. continued res type substitutions(%) cvg antn 731 R Q(6)R(82)K(2) 0.16 T(2)H(6) 785 Q Q(89)T(2)V(4) 0.16 I(2)E(2) 715 V V(89)I(10) 0.17 740 F Y(53)F(46) 0.18 871 G G(87)Q(4)R(4) 0.20 S(4) 728 S S(76)T(6)L(6) 0.21 A(10) 771 K K(80)L(2)S(4) 0.23 Fig. 11. Another possible active surface on the chain 2vj0A. The larger cluster it belongs to is shown in blue. A(6)T(2)Q(2) .(2) 717 F Y(38)F(57)W(4) 0.24 823 N T(6)H(36)A(2) 0.24 N(44)S(10)

Table 12. Residues forming surface ”patch” in 2vj0A.

Table 13. res type disruptive mutations 723 Q (Y)(FTWH)(SVCAG)(D) 725 G (KER)(FQMWHD)(NYLPI)(SVA) 743 N (Y)(FTWH)(SEVCARG)(MD) 782 Q (Y)(FTWH)(SVCAG)(D) 784 Q (Y)(FTWH)(SVCAG)(D) 819 P (YR)(TH)(SKECG)(FQWD) 716 L (YR)(TH)(SKECG)(FQWD) 718 E (FWH)(Y)(VCAG)(TR) 729 E (FW)(YVCAHG)(T)(SLPIR) 714 G (KER)(QHD)(FYMW)(N) 744 K (Y)(FW)(T)(VCAG) 727 K (Y)(T)(FW)(SVCAG) 824 K (Y)(T)(FW)(SVCAG) 719 N (Y)(FWH)(R)(T) 708 F (KE)(T)(QDR)(SCG) 735 G (KER)(QHD)(FYMW)(N) 731 R (TD)(Y)(E)(VA) 785 Q (Y)(H)(FW)(T) 715 V (YR)(KE)(H)(QD) 740 F (K)(E)(Q)(D) 871 G (E)(FW)(KHDR)(YM) 728 S (R)(K)(H)(Q) 771 K (Y)(FW)(T)(H) 717 F (K)(E)(Q)(D) 823 N (Y)(R)(FEWH)(T)

Table 13. Disruptive mutations for the surface patch in 2vj0A.

Another group of surface residues is shown in Fig.11. The right panel shows (in blue) the rest of the larger cluster this surface belongs to. The residues belonging to this surface ”patch” are listed in Table 14, while Table 15 suggests possible disruptive replacements for these residues (see Section 3.6).

7 Table 14. Table 15. continued res type substitutions(%) cvg antn res type disruptive 836 F F(100) 0.06 mutations 840 W W(100) 0.06 849 E (FW)(Y)(VCAHG)(T) 851 Q Q(100) 0.06 883 N (Y)(T)(FWH)(SCG) 881 D D(100) 0.06 916 R (T)(D)(YVCAG)(S) 886 N N(100) 0.06 882 P (YR)(TE)(H)(K) 888 V V(100) 0.06 892 I (R)(Y)(H)(K) 903 L L(97)Y(2) 0.06 site 831 M (Y)(T)(HR)(SCG) 905 R R(100) 0.06 880 V (YR)(KE)(H)(QD) 907 E E(100) 0.06 920 R (TD)(Y)(SECG)(VLAPI) 901 G G(91)L(6)N(2) 0.07 918 T (R)(K)(H)(FW) 908 P P(91)T(8) 0.07 839 R (T)(D)(SE)(YCG) 909 N N(91)S(2)D(6) 0.08 891 G (KR)(E)(H)(Q) 837 F F(95)V(4) 0.11 843 L I(46)L(51)M(2) 0.11 Table 15. Disruptive mutations for the surface patch in 2vj0A. 849 E E(82)R(10)K(6) 0.12 site 883 N N(91)K(4)I(4) 0.12 916 R R(95).(2)N(2) 0.13 882 P P(91)A(2)S(2) 0.17 N(2)H(2) 3 NOTES ON USING TRACE RESULTS 892 I V(48)I(42)T(6) 0.17 3.1 Coverage L(2) Trace results are commonly expressed in terms of coverage: the resi- 831 M L(48)M(44)V(4) 0.19 due is important if its “coverage” is small - that is if it belongs to F(2) some small top percentage of residues [100% is all of the residues 880 V I(12)V(80)L(6) 0.19 in a chain], according to trace. The ET results are presented in the 920 R R(91).(2)K(2) 0.20 form of a table, usually limited to top 25% percent of residues (or A(4) to some nearby percentage), sorted by the strength of the presumed 918 T T(91).(2)V(2) 0.21 evolutionary pressure. (I.e., the smaller the coverage, the stronger the S(2)Q(2) pressure on the residue.) Starting from the top of that list, mutating a 839 R R(91)K(2)W(2) 0.22 site couple of residues should affect the protein somehow, with the exact H(2)Q(2) effects to be determined experimentally. 891 G G(55)A(19)S(12) 0.23 T(10)C(2) 3.2 Known substitutions

Table 14. Residues forming surface ”patch” in 2vj0A. One of the table columns is “substitutions” - other amino acid types seen at the same position in the alignment. These amino acid types may be interchangeable at that position in the protein, so if one wants Table 15. to affect the protein by a point mutation, they should be avoided. For res type disruptive example if the substitutions are “RVK” and the original protein has mutations an R at that position, it is advisable to try anything, but RVK. Conver- 836 F (KE)(TQD)(SNCRG)(M) sely, when looking for substitutions which will not affect the protein, 840 W (KE)(TQD)(SNCRG)(M) one may try replacing, R with K, or (perhaps more surprisingly), with 851 Q (Y)(FTWH)(SVCAG)(D) V. The percentage of times the substitution appears in the alignment 881 D (R)(FWH)(KYVCAG)(TQM) is given in the immediately following bracket. No percentage is given 886 N (Y)(FTWH)(SEVCARG)(MD) in the cases when it is smaller than 1%. This is meant to be a rough 888 V (KYER)(QHD)(N)(FTMW) guide - due to rounding errors these percentages often do not add up 903 L (R)(K)(TYEH)(SQCG) to 100%. 905 R (TD)(SYEVCLAPIG)(FMW)(N) 907 E (FWH)(YVCARG)(T)(SNKLPI) 3.3 Surface 901 G (R)(E)(H)(K) To detect candidates for novel functional interfaces, first we look for 908 P (R)(YH)(K)(E) residues that are solvent accessible (according to DSSP program) by 2 909 N (Y)(FWH)(R)(T) at least 10A˚ , which is roughly the area needed for one water mole- 837 F (KE)(QD)(TR)(N) cule to come in the contact with the residue. Furthermore, we require 843 L (Y)(R)(TH)(CG) that these residues form a “cluster” of residues which have neighbor continued in next column within 5A˚ from any of their heavy atoms. Note, however, that, if our picture of protein evolution is correct, the neighboring residues which are not surface accessible might be equally important in maintaining the interaction specificity - they

8 should not be automatically dropped from consideration when choo- sing the set for mutagenesis. (Especially if they form a cluster with the surface residues.) 3.4 Number of contacts COVERAGE Another column worth noting is denoted “noc/bb”; it tells the num-

ber of contacts heavy atoms of the residue in question make across V the interface, as well as how many of them are realized through the 100% 50% 30% 5% backbone atoms (if all or most contacts are through the backbone, mutation presumably won’t have strong impact). Two heavy atoms are considered to be “in contact” if their centers are closer than 5A˚ . 3.5 Annotation

If the residue annotation is available (either from the pdb file or V from other sources), another column, with the header “annotation” RELATIVE IMPORTANCE appears. Annotations carried over from PDB are the following: site (indicating existence of related site record in PDB ), S-S (disulfide bond forming residue), hb (hydrogen bond forming residue, jb (james Fig. 12. Coloring scheme used to color residues by their relative importance. bond forming residue), and sb (for salt bridge forming residue). 3.6 Mutation suggestions • gaps percentage of gaps in this column Mutation suggestions are completely heuristic and based on comple- mentarity with the substitutions found in the alignment. Note that 4.2 Color schemes used they are meant to be disruptive to the interaction of the protein The following color scheme is used in figures with residues colored with its ligand. The attempt is made to complement the following by cluster size: black is a single-residue cluster; clusters composed of properties: small [AV GSTC], medium [LPNQDEMIK], large more than one residue colored according to this hierarchy (ordered [WFYHR], hydrophobic [LPVAMWFI], polar [GTCY ]; posi- by descending size): red, blue, yellow, green, purple, azure, tur- tively [KHR], or negatively [DE] charged, aromatic [WFYH], quoise, brown, coral, magenta, LightSalmon, SkyBlue, violet, gold, long aliphatic chain [EKRQM], OH-group possession [SDETY ], bisque, LightSlateBlue, orchid, RosyBrown, MediumAquamarine, and NH2 group possession [NQRK]. The suggestions are listed DarkOliveGreen, CornflowerBlue, grey55, burlywood, LimeGreen, according to how different they appear to be from the original amino tan, DarkOrange, DeepPink, maroon, BlanchedAlmond. acid, and they are grouped in round brackets if they appear equally The colors used to distinguish the residues by the estimated disruptive. From left to right, each bracketed group of amino acid evolutionary pressure they experience can be seen in Fig. 12. types resembles more strongly the original (i.e. is, presumably, less disruptive) These suggestions are tentative - they might prove disrup- 4.3 Credits tive to the fold rather than to the interaction. Many researcher will 4.3.1 Alistat alistat reads a multiple sequence alignment from the choose, however, the straightforward alanine mutations, especially in file and shows a number of simple statistics about it. These stati- the beginning stages of their investigation. stics include the format, the number of sequences, the total number of residues, the average and range of the sequence lengths, and the 4 APPENDIX alignment length (e.g. including gap characters). Also shown are 4.1 File formats some percent identities. A percent pairwise alignment identity is defi- ned as (idents / MIN(len1, len2)) where idents is the number of Files with extension “ranks sorted” are the actual trace results. The exact identities and len1, len2 are the unaligned lengths of the two fields in the table in this file: sequences. The ”average percent identity”, ”most related pair”, and • alignment# number of the position in the alignment ”most unrelated pair” of the alignment are the average, maximum, and minimum of all (N)(N-1)/2 pairs, respectively. The ”most distant • residue# residue number in the PDB file seq” is calculated by finding the maximum pairwise identity (best • type amino acid type relative) for all N sequences, then finding the minimum of these N • rank rank of the position according to older version of ET numbers (hence, the most outlying sequence). alistat is copyrighted • variability has two subfields: by HHMI/Washington University School of Medicine, 1992-2001, 1. number of different amino acids appearing in in this column and freely distributed under the GNU General Public License. of the alignment 4.3.2 CE To map ligand binding sites from different 2. their type source structures, report maker uses the CE program: • rho ET score - the smaller this value, the lesser variability of http://cl.sdsc.edu/. Shindyalov IN, Bourne PE (1998) this position across the branches of the tree (and, presumably, ”Protein structure alignment by incremental combinatorial extension the greater the importance for the protein) (CE) of the optimal path . Protein Engineering 11(9) 739-747. • cvg coverage - percentage of the residues on the structure which 4.3.3 DSSP In this work a residue is considered solvent accessi- have this rho or smaller ble if the DSSP program finds it exposed to water by at least 10A˚ 2,

9 which is roughly the area needed for one water molecule to come in 4.5 Citing this work the contact with the residue. DSSP is copyrighted by W. Kabsch, C. The method used to rank residues and make predictions in this report Sander and MPI-MF, 1983, 1985, 1988, 1994 1995, CMBI version can be found in Mihalek, I., I. Res,ˇ O. Lichtarge. (2004). ”A Family of by [email protected] November 18,2002, Evolution-Entropy Hybrid Methods for Ranking of Protein Residues by Importance” J. Mol. Bio. 336: 1265-82. For the original version http://www.cmbi.kun.nl/gv/dssp/descrip.html. of ET see O. Lichtarge, H.Bourne and F. Cohen (1996). ”An Evolu- tionary Trace Method Defines Binding Surfaces Common to Protein 4.3.4 HSSP Whenever available, report maker uses HSSP ali- Families” J. Mol. Bio. 257: 342-358. gnment as a starting point for the analysis (sequences shorter than report maker itself is described in Mihalek I., I. Res and O. 75% of the query are taken out, however); R. Schneider, A. de Lichtarge (2006). ”Evolutionary Trace Report Maker: a new type Daruvar, and C. Sander. ”The HSSP database of protein structure- of service for comparative analysis of proteins.” Bioinformatics sequence alignments.” Nucleic Acids Res., 25:226–230, 1997. 22:1656-7. http://swift.cmbi.kun.nl/swift/hssp/ 4.6 About report maker report maker was written in 2006 by Ivana Mihalek. The 1D ran- 4.3.5 LaTex The text for this report was processed using LAT X; E king visualization program was written by Ivica Res.ˇ report maker Leslie Lamport, “LaTeX: A Document Preparation System Addison- is copyrighted by Lichtarge Lab, Baylor College of Medicine, Wesley,” Reading, Mass. (1986). Houston. 4.3.6 Muscle When making alignments “from scratch”, report 4.7 Attachments maker uses Muscle alignment program: Edgar, Robert C. (2004), The following files should accompany this report: ”MUSCLE: multiple sequence alignment with high accuracy and • 2vj0A.complex.pdb - coordinates of 2vj0A with all of its inter- high throughput.” Nucleic Acids Research 32(5), 1792-97. acting partners http://www.drive5.com/muscle/ • 2vj0A.etvx - ET viewer input file for 2vj0A • 2vj0A.cluster report.summary - Cluster report summary for 4.3.7 Pymol The figures in this report were produced using 2vj0A Pymol. The scripts can be found in the attachment. Pymol • 2vj0A.ranks - Ranks file in sequence order for 2vj0A is an open-source application copyrighted by DeLano Scien- tific LLC (2005). For more information about Pymol see • 2vj0A.clusters - Cluster descriptions for 2vj0A http://pymol.sourceforge.net/. (Note for Windows • 2vj0A.msf - the multiple sequence alignment used for the chain users: the attached package needs to be unzipped for Pymol to read 2vj0A the scripts and launch the viewer.) • 2vj0A.descr - description of sequences used in 2vj0A msf 4.4 Note about ET Viewer • 2vj0A.ranks sorted - full listing of residues and their ranking for 2vj0A Dan Morgan from the Lichtarge lab has developed a visualization • tool specifically for viewing trace results. If you are interested, please 2vj0A.2vj0ADTD1940.if.pml - Pymol script for Figure 5 visit: • 2vj0A.cbcvg - used by other 2vj0A – related pymol scripts • 2vj0A.2vj0ASO41943.if.pml - Pymol script for Figure 6 http://mammoth.bcm.tmc.edu/traceview/ • 2vj0A.2vj0Q.if.pml - Pymol script for Figure 7 The viewer is self-unpacking and self-installing. Input files to be used • 2vj0A.2vj0P.if.pml - Pymol script for Figure 8 with ETV (extension .etvx) can be found in the attachment to the • 2vj0A.2vj0ACL1944.if.pml - Pymol script for Figure 9 main report.

10