Pages 1–11 2z64 Evolutionary trace report by report maker September 9, 2008

4.3.1 Alistat 10 4.3.2 CE 10 4.3.3 DSSP 10 4.3.4 HSSP 10 4.3.5 LaTex 10 4.3.6 Muscle 10 4.3.7 Pymol 10 4.4 Note about ET Viewer 11 4.5 Citing this work 11 4.6 About report maker 11 4.7 Attachments 11

1 INTRODUCTION From the original Data Bank entry (PDB id 2z64): Title: Crystal structure of mouse and mouse md-2 complex Compound: Mol id: 1; molecule: toll-like 4; chain: a; fragment: tlr4, unp residues 27-625; synonym: cd284 antigen; engi- neered: yes; mol id: 2; molecule: lymphocyte antigen 96; chain: CONTENTS c; fragment: md-2, unp residues 21-160; synonym: md-2 protein, esop-1; engineered: yes 1 Introduction 1 Organism, scientific name: Mus Musculus; 2z64 contains a single unique chain 2z64A (599 residues long). 2 Chain 2z64A 1 Not enough homologous sequences could be found to permit analysis 2.1 Q5RGT4 overview 1 for chain 2z64C. 2.2 Multiple sequence alignment for 2z64A 1 2.3 Residue ranking in 2z64A 1 2.4 Top ranking residues in 2z64A and their position on the structure 2 2.4.1 Clustering of residues at 25% coverage. 2 2 CHAIN 2Z64A 2.4.2 Overlap with known functional surfaces at 2.1 Q5RGT4 overview 25% coverage. 3 From SwissProt, id Q5RGT4, 100% identical to 2z64A: 2.4.3 Possible novel functional surfaces at 25% Description: Toll-like receptor 4. coverage. 5 Organism, scientific name: Mus musculus (Mouse). Taxonomy: 3 Notes on using trace results 9 Eukaryota; Metazoa; Chordata; Craniata; Verte- brata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; 3.1 Coverage 9 Rodentia; Sciurognathi; Muridae; Murinae; Mus. 3.2 Known substitutions 9 3.3 Surface 9 3.4 Number of contacts 9 2.2 Multiple sequence alignment for 2z64A 3.5 Annotation 9 3.6 Mutation suggestions 9 For the chain 2z64A, the alignment 2z64A.msf (attached) with 35 sequences was used. The alignment was assembled through combi- 4 Appendix 10 nation of BLAST searching on the UniProt database and alignment 4.1 File formats 10 using Muscle program. It can be found in the attachment to this 4.2 Color schemes used 10 report, under the name of 2z64A.msf. Its statistics, from the alistat 4.3 Credits 10 program are the following:

1 Lichtarge lab 2006 Fig. 1. Residues 27-325 in 2z64A colored by their relative importance. (See Fig. 2. Residues 326-625 in 2z64A colored by their relative importance. (See Appendix, Fig.12, for the coloring scheme.) Appendix, Fig.12, for the coloring scheme.)

Format: MSF Number of sequences: 35 Total number of residues: 20208 Smallest: 524 Largest: 599 Average length: 577.4 Alignment length: 599 Average identity: 26% Most related pair: 99% Most unrelated pair: 15% Most distant seq: 22%

Furthermore, <1% of residues show as conserved in this ali- gnment. The alignment consists of 97% eukaryotic ( 62% vertebrata, 31% arthropoda) sequences. (Descriptions of some sequences were not readily available.) The file containing the sequence descriptions can be found in the attachment, under the name 2z64A.descr. 2.3 Residue ranking in 2z64A The 2z64A sequence is shown in Figs. 1–2, with each residue colored according to its estimated importance. The full listing of residues Fig. 3. Residues in 2z64A, colored by their relative importance. Clockwise: in 2z64A can be found in the file called 2z64A.ranks sorted in the front, back, top and bottom views. attachment.

2.4 Top ranking residues in 2z64A and their position on Table 1. the structure cluster size member In the following we consider residues ranking among top 25% of color residues residues in the protein . Figure 3 shows residues in 2z64A colored red 141 39,48,55,58,60,61,63,65,68 by their importance: bright red and yellow indicate more conser- 73,76,79,82,84,87,92,97,100 ved/important residues (see Appendix for the coloring scheme). A 103,106,107,108,109,111,113 Pymol script for producing this figure can be found in the attachment. 116,121,124,127,130,131,132 133,135,137,151,154,155,156 157,159,161,163,170,173,176 2.4.1 Clustering of residues at 25% coverage. Fig. 4 shows the 179,181,184,186,189,194,197 top 25% of all residues, this time colored according to clusters they 205,210,212,214,217,222,227 belong to. The clusters in Fig.4 are composed of the residues listed continued in next column in Table 1.

2 Table 2. res type subst’s cvg noc/ dist (%) bb (A˚ ) 572 N D(45) 0.24 32/2 1.46 H(2) N(25) R(2) S(17) K(5)

Table 2. The top 25% of residues in 2z64A at the interface with NAG.(Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each type in the bracket; noc/bb: number of contacts with the ligand, with the num- ber of contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. )

Table 3. res type disruptive mutations 572 N (Y)(FW)(T)(H) Fig. 4. Residues in 2z64A, colored according to the cluster they belong to: red, followed by blue and yellow are the largest clusters (see Appendix for Table 3. List of disruptive mutations for the top 25% of residues in the coloring scheme). Clockwise: front, back, top and bottom views. The 2z64A, that are at the interface with NAG. corresponding Pymol script is attached.

Table 1. continued cluster size member color residues 232,259,276,279,282,285,287 302,305,308,311,313,328,333 338,348,351,354,356,359,370 373,376,378,379,381,396,402 403,404,405,407,416,419,422 425,426,427,429,430,432,441 444,447,450,451,452,453,455 457,465,471,474,476,477,479 490,496,499,500,501,502,504 517,520,523,525,528,530,533 538,541,544,547,549,552,554 568,572,573,576 blue 3 561,587,590 yellow 2 580,606 green 2 582,625

Table 1. Clusters of top ranking residues in 2z64A.

2.4.2 Overlap with known functional surfaces at 25% coverage. The name of the ligand is composed of the source PDB identifier Fig. 5. Residues in 2z64A, at the interface with NAG, colored by their relative and the heteroatom name used in that file. importance. The ligand (NAG) is colored green. Atoms further than 30A˚ away NAG binding site. Table 2 lists the top 25% of residues at the from the geometric center of the ligand, as well as on the line of sight to the interface with 2z64NNAG1411 (nag). The following table (Table ligand were removed. (See Appendix for the coloring scheme for the protein 3) suggests possible disruptive replacements for these residues (see chain 2z64A.) Section 3.6). Figure 5 shows residues in 2z64A colored by their importance, at the interface with 2z64NNAG1411.

3 NAG binding site. Table 4 lists the top 25% of residues at the interface with 2z64KNAG1533 (nag). The following table (Table 5) suggests possible disruptive replacements for these residues (see Section 3.6). Table 4. res type subst’s cvg noc/ dist (%) bb (A˚ ) 590 W W(62) 0.20 2/2 4.50 L(5) Y(8) F(11) N(2) Q(2) V(2) H(2)

Table 4. The top 25% of residues in 2z64A at the interface with NAG.(Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each type in the bracket; noc/bb: number of contacts with the ligand, with the num- ber of contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. ) Fig. 6. Residues in 2z64A, at the interface with NAG, colored by their relative importance. The ligand (NAG) is colored green. Atoms further than 30A˚ away Table 5. from the geometric center of the ligand, as well as on the line of sight to the ligand were removed. (See Appendix for the coloring scheme for the protein res type disruptive chain 2z64A.) mutations 590 W (E)(K)(T)(D) Table 6. continued Table 5. List of disruptive mutations for the top 25% of residues in res type subst’s cvg noc/ dist 2z64A, that are at the interface with NAG. (%) bb (A˚ ) P(2) Figure 6 shows residues in 2z64A colored by their importance, at the V(2) interface with 2z64KNAG1533. T(5) Interface with 2z64C.Table 6 lists the top 25% of residues at the 261 E L(48) 0.21 2/2 4.21 interface with 2z64C. The following table (Table 7) suggests possible E(25) disruptive replacements for these residues (see Section 3.6). R(5) I(11) Table 6. H(2) res type subst’s cvg noc/ dist V(2) (%) bb (A˚ ) P(2) 210 S Q(11) 0.16 5/0 4.07 109 T G(11) 0.22 16/0 3.86 A(11) T(31) S(68) S(31) R(5) A(11) N(2) R(2) 155 N D(45) 0.17 4/0 3.67 N(5) N(31) Q(5) K(5) 131 V R(17) 0.23 1/0 3.75 R(5) V(31) Y(5) K(5) S(5) D(11) 61 S S(74) 0.21 8/0 3.60 N(20) Q(8) Y(5) E(5) H(5) continued in next column F(2) continued in next column

4 Table 6. continued res type subst’s cvg noc/ dist (%) bb (A˚ ) 133 V F(8) 0.23 8/0 3.52 V(31) S(22) T(2) Q(5) K(2) D(8) Y(5) M(11)

Table 6. The top 25% of residues in 2z64A at the interface with 2z64C. (Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each type in the bracket; noc/bb: number of contacts with the ligand, with the number of contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. )

Table 7. res type disruptive Fig. 7. Residues in 2z64A, at the interface with 2z64C, colored by their rela- mutations tive importance. 2z64C is shown in backbone representation (See Appendix 210 S (Y)(FWHR)(K)(EM) for the coloring scheme for the protein chain 2z64A.) 155 N (Y)(FW)(TH)(VA) 61 S (R)(H)(K)(FW) 261 E (Y)(H)(FWR)(CG) Table 8. continued 109 T (R)(K)(FWH)(M) res type subst’s cvg noc/ dist 131 V (E)(Y)(K)(R) (%) bb (A˚ ) 133 V (R)(K)(Y)(E) K(2)

Table 7. List of disruptive mutations for the top 25% of residues in Table 8. The top 25% of residues in 2z64A at the interface with 2z64A, that are at the interface with 2z64C. NAG.(Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each Figure 7 shows residues in 2z64A colored by their importance, at the type in the bracket; noc/bb: number of contacts with the ligand, with the num- interface with 2z64C. ber of contacts realized through backbone atoms given in the bracket; dist: NAG binding site. Table 8 lists the top 25% of residues at the distance of closest apporach to the ligand. ) interface with 2z64MNAG1421 (nag). The following table (Table 9) suggests possible disruptive replacements for these residues (see Section 3.6). Table 9. Table 8. res type disruptive res type subst’s cvg noc/ dist mutations (%) bb (A˚ ) 502 S (R)(K)(H)(FYW) 502 S N(17) 0.08 9/0 3.55 500 D (R)(H)(FW)(Y) S(68) A(5) Table 9. List of disruptive mutations for the top 25% of residues in I(5) 2z64A, that are at the interface with NAG. D(2) 500 D I(14) 0.17 2/0 4.54 D(65) Figure 8 shows residues in 2z64A colored by their importance, at the E(5) interface with 2z64MNAG1421. L(2) 2.4.3 Possible novel functional surfaces at 25% coverage. One .(5) group of residues is conserved on the 2z64A surface, away from (or N(2) susbtantially larger than) other functional sites and interfaces reco- continued in next column gnizable in PDB entry 2z64. It is shown in Fig. 9. The right panel shows (in blue) the rest of the larger cluster this surface belongs to. The residues belonging to this surface ”patch” are listed in Table 10,

5 Table 10. continued res type substitutions(%) cvg P(2) 68 L V(25)L(51)I(22) 0.20

Table 10. Residues forming surface ”patch” in 2z64A.

Table 11. res type disruptive mutations 65 L (YR)(H)(T)(KE) 113 I (YR)(H)(T)(KE) 116 F (KE)(TR)(QD)(SCG) 121 F (KE)(TQD)(R)(SNCG) 92 I (R)(Y)(H)(T) 68 L (YR)(H)(T)(KE)

Table 11. Disruptive mutations for the surface patch in 2z64A.

Another group of surface residues is shown in Fig.10. The right panel shows (in blue) the rest of the larger cluster this surface belongs to. Fig. 8. Residues in 2z64A, at the interface with NAG, colored by their relative importance. The ligand (NAG) is colored green. Atoms further than 30A˚ away from the geometric center of the ligand, as well as on the line of sight to the ligand were removed. (See Appendix for the coloring scheme for the protein chain 2z64A.)

Fig. 10. Another possible active surface on the chain 2z64A. The larger cluster it belongs to is shown in blue.

Fig. 9. A possible active surface on the chain 2z64A. The larger cluster it The residues belonging to this surface ”patch” are listed in Table 12, belongs to is shown in blue. while Table 13 suggests possible disruptive replacements for these residues (see Section 3.6). while Table 11 suggests possible disruptive replacements for these Table 12. residues (see Section 3.6). res type substitutions(%) cvg 159 N N(100) 0.01 Table 10. 184 N N(97)C(2) 0.02 res type substitutions(%) cvg 176 L L(91)F(5)I(2) 0.04 65 L I(28)L(68)V(2) 0.06 179 V V(11)L(80)I(8) 0.11 113 I I(65)L(31)V(2) 0.07 173 L L(80)F(8)S(2) 0.13 116 F L(68)I(22)V(2) 0.14 M(5)Q(2) F(5) 230 L L(74)V(14)I(8) 0.13 121 F F(65)L(22)V(2) 0.14 Y(2) .(8) 181 L F(11)L(77)V(8) 0.14 92 I I(65)L(14)V(5) 0.19 M(2) M(2)T(2)F(5) 222 F F(74)L(8).(5) 0.14 continued in next column continued in next column

6 Table 12. continued Table 12. continued res type substitutions(%) cvg res type substitutions(%) cvg M(2)V(5)D(2) R(2)Q(8)M(5) 227 L .(8)L(71)V(5) 0.14 163 S S(68)V(11)D(2) 0.25 I(11)M(2) T(5)H(5)Y(2) 232 L L(71)M(8)V(8) 0.14 N(2) Y(2)I(5).(2) 140 L V(40)L(40)I(17) 0.15 Table 12. Residues forming surface ”patch” in 2z64A. N(2) 282 V I(11)L(71)M(2) 0.15 V(11)F(2) Table 13. 107 I D(40)I(31)V(5) 0.16 res type disruptive N(17)Y(2)S(2) mutations 189 I I(54)L(25)V(11) 0.16 159 N (Y)(FTWH)(SEVCARG)(MD) F(8) 184 N (Y)(FWH)(ER)(T) 210 S Q(11)A(11)S(68) 0.16 176 L (R)(Y)(T)(KE) R(5)N(2) 179 V (YR)(KE)(H)(QD) 155 N D(45)N(31)K(5) 0.17 173 L (Y)(R)(T)(H) R(5)Y(5)S(5) 230 L (R)(Y)(KH)(TE) 197 L L(71)T(5)F(2) 0.17 181 L (YR)(T)(H)(KE) .(2)M(5)S(5) 222 F (K)(E)(T)(R) V(2)P(2) 227 L (Y)(R)(H)(T) 205 L .(5)L(74)M(8) 0.18 232 L (R)(Y)(H)(T) I(2)F(5)V(2) 140 L (Y)(R)(H)(T) 279 L I(2)L(71)M(11) 0.19 282 V (R)(Y)(KE)(H) F(2)V(5)Q(2) 107 I (R)(Y)(H)(K) S(2) 189 I (R)(Y)(T)(KEH) 157 A S(45)A(25)G(8) 0.20 210 S (Y)(FWHR)(K)(EM) D(5)T(2)R(5) 155 N (Y)(FW)(TH)(VA) K(2)N(2) 197 L (R)(Y)(H)(K) 170 F F(74)M(5)L(8) 0.21 205 L (YR)(T)(H)(KE) C(2)I(2)W(2) 279 L (Y)(R)(H)(T) G(2) 157 A (Y)(R)(E)(K) 261 E L(48)E(25)R(5) 0.21 170 F (KE)(DR)(T)(Q) I(11)H(2)V(2) 261 E (Y)(H)(FWR)(CG) P(2) 109 T (R)(K)(FWH)(M) 109 T G(11)T(31)S(31) 0.22 194 L (R)(Y)(H)(K) A(11)R(2)N(5) 131 V (E)(Y)(K)(R) Q(5) 133 V (R)(K)(Y)(E) 194 L L(51)T(5)F(31) 0.22 217 I (Y)(R)(T)(H) .(2)S(2)P(2) 214 I (Y)(T)(R)(E) V(2) 259 L (Y)(THR)(CG)(S) 131 V R(17)V(31)K(5) 0.23 163 S (KR)(QM)(FWH)(E) D(11)N(20)Y(5) H(5)F(2) Table 13. Disruptive mutations for the surface patch in 2z64A. 133 V F(8)V(31)S(22) 0.23 T(2)Q(5)K(2) D(8)Y(5)M(11) Another group of surface residues is shown in Fig.11. The right panel 217 I .(8)I(42)V(17) 0.23 shows (in blue) the rest of the larger cluster this surface belongs to. Q(2)L(14)F(11) The residues belonging to this surface ”patch” are listed in Table 14, W(2) while Table 15 suggests possible disruptive replacements for these 214 I L(45)I(45)F(5) 0.24 residues (see Section 3.6). R(2) Table 14. 259 L L(62)V(8)E(11) 0.24 res type substitutions(%) cvg continued in next column 552 N N(94)I(2)T(2) 0.01 381 N N(94)T(5) 0.02 continued in next column

7 Table 14. continued res type substitutions(%) cvg S(2) 285 D L(62)E(17)N(8) 0.16 Q(2)D(5)I(2) 561 L F(54)Q(2)L(28) 0.16 V(2)P(2).(2) I(2)M(2) 333 L L(80)S(2)F(5) 0.17 .(2)I(5)V(2) 500 D I(14)D(65)E(5) 0.17 L(2).(5)N(2) Fig. 11. Another possible active surface on the chain 2z64A. The larger K(2) cluster it belongs to is shown in blue. 302 F .(11)F(68)L(8) 0.18 S(5)Y(2)W(2) Table 14. continued 308 V .(8)V(14)A(11) 0.18 res type substitutions(%) cvg L(57)I(5)R(2) 407 N N(94)C(5) 0.02 370 L L(65)T(2)I(2) 0.18 576 N N(91)K(2)I(2) 0.02 C(11)V(5)H(2) A(2) .(2)F(2)A(2) 404 L L(91)I(2)M(2) 0.03 426 D D(57)Y(5)H(8) 0.18 V(2) N(8)S(2)R(11) 479 N N(82)S(8).(2) 0.04 L(5) V(5) 457 T L(48)I(20)V(5) 0.19 528 N N(80)S(11)I(5) 0.04 T(17)M(2)Y(5) G(2) 429 H D(14)H(34)E(8) 0.20 359 N N(88)P(5).(2) 0.05 G(25)F(5)N(2) S(2) L(5)T(2) 452 I L(54)I(40)M(2) 0.06 590 W W(62)L(5)Y(8) 0.20 A(2) F(11)N(2)Q(2) 554 I L(54)I(40)M(2) 0.06 V(2)H(2) F(2) 305 L .(8)L(62)M(5) 0.21 376 L L(77)V(14)I(8) 0.07 V(2)I(14)T(2) 402 L V(5)L(77)F(5) 0.08 C(2) I(11) 356 L L(57)F(17)M(11) 0.22 502 S N(17)S(68)A(5) 0.08 I(8).(2)V(2) I(5)D(2) 313 L F(11)L(71)I(5) 0.23 354 L L(85).(2)F(5) 0.10 M(2)A(2)V(2) I(2)W(2) T(2) 432 L L(77)I(11)M(5) 0.10 403 D D(48)N(28)S(8) 0.23 H(2)Q(2) L(2)H(2)K(5) 530 L L(74)I(11)V(8) 0.10 F(2) F(2)M(2) 533 L A(11)L(54)V(5) 0.24 587 F F(77)L(14)M(2) 0.12 I(14)F(11)Y(2) V(2)K(2) 572 N D(45)H(2)N(25) 0.24 379 S S(74)H(2)R(5) 0.13 R(2)S(17)K(5) N(11)Q(2)K(2) 573 L I(14)L(59)V(5) 0.24 311 M .(8)M(8)I(25) 0.15 F(8)M(8)G(2) L(51)F(2)V(2) 328 F L(54)F(22).(2) 0.25 453 S G(8)S(65)Q(5) 0.15 V(2)M(5)I(5) A(5)H(5)D(2) C(5) K(5) 396 T L(51)T(25)M(5) 0.25 465 F H(2)F(74)Y(11) 0.15 F(5)C(2)S(5) K(2).(2)A(2) P(2) continued in next column continued in next column

8 Table 14. continued Table 15. continued res type substitutions(%) cvg res type disruptive 477 A S(25)A(34)C(2) 0.25 mutations N(8)R(8)H(11) 477 A (Y)(E)(DR)(K) Q(5)K(2) Table 15. Disruptive mutations for the surface patch in 2z64A. Table 14. Residues forming surface ”patch” in 2z64A.

Table 15. 3 NOTES ON USING TRACE RESULTS res type disruptive 3.1 Coverage mutations 552 N (Y)(H)(FW)(R) Trace results are commonly expressed in terms of coverage: the resi- 381 N (FYWH)(R)(E)(TVMA) due is important if its “coverage” is small - that is if it belongs to 407 N (Y)(FWH)(ER)(T) some small top percentage of residues [100% is all of the residues 576 N (Y)(TH)(FW)(ER) in a chain], according to trace. The ET results are presented in the 404 L (Y)(R)(H)(T) form of a table, usually limited to top 25% percent of residues (or 479 N (Y)(H)(FW)(R) to some nearby percentage), sorted by the strength of the presumed 528 N (Y)(H)(FWR)(E) evolutionary pressure. (I.e., the smaller the coverage, the stronger the 359 N (Y)(H)(FW)(TR) pressure on the residue.) Starting from the top of that list, mutating a 452 I (Y)(R)(H)(T) couple of residues should affect the protein somehow, with the exact 554 I (YR)(T)(H)(SKECG) effects to be determined experimentally. 376 L (YR)(H)(T)(KE) 402 L (R)(Y)(T)(KEH) 3.2 Known substitutions 502 S (R)(K)(H)(FYW) One of the table columns is “substitutions” - other amino acid types 354 L (R)(TY)(KE)(S) seen at the same position in the alignment. These amino acid types 432 L (Y)(TR)(H)(SCG) may be interchangeable at that position in the protein, so if one wants 530 L (R)(Y)(T)(H) to affect the protein by a point mutation, they should be avoided. For 587 F (E)(T)(K)(D) example if the substitutions are “RVK” and the original protein has 379 S (FW)(Y)(R)(MH) an R at that position, it is advisable to try anything, but RVK. Conver- 311 M (Y)(T)(H)(R) sely, when looking for substitutions which will not affect the protein, 453 S (R)(K)(FW)(H) one may try replacing, R with K, or (perhaps more surprisingly), with 465 F (E)(K)(D)(Q) V. The percentage of times the substitution appears in the alignment 285 D (R)(H)(FW)(Y) is given in the immediately following bracket. No percentage is given 561 L (Y)(R)(T)(H) in the cases when it is smaller than 1%. This is meant to be a rough 333 L (R)(Y)(H)(TK) guide - due to rounding errors these percentages often do not add up 500 D (R)(H)(FW)(Y) to 100%. 302 F (K)(E)(Q)(D) 308 V (Y)(E)(R)(K) 3.3 Surface 370 L (R)(Y)(KE)(H) To detect candidates for novel functional interfaces, first we look for 426 D (R)(FW)(H)(K) residues that are solvent accessible (according to DSSP program) by 2 457 T (R)(K)(H)(Q) at least 10A˚ , which is roughly the area needed for one water mole- 429 H (E)(Q)(TKM)(D) cule to come in the contact with the residue. Furthermore, we require 590 W (E)(K)(T)(D) that these residues form a “cluster” of residues which have neighbor 305 L (R)(Y)(H)(K) within 5A˚ from any of their heavy atoms. 356 L (YR)(T)(H)(KE) Note, however, that, if our picture of protein evolution is correct, 313 L (R)(Y)(H)(K) the neighboring residues which are not surface accessible might be 403 D (R)(FWH)(Y)(CG) equally important in maintaining the interaction specificity - they 533 L (R)(Y)(K)(E) should not be automatically dropped from consideration when choo- 572 N (Y)(FW)(T)(H) sing the set for mutagenesis. (Especially if they form a cluster with 573 L (R)(Y)(H)(T) the surface residues.) 328 F (KE)(T)(D)(R) 396 T (R)(K)(H)(Q) 3.4 Number of contacts continued in next column Another column worth noting is denoted “noc/bb”; it tells the num- ber of contacts heavy atoms of the residue in question make across the interface, as well as how many of them are realized through the backbone atoms (if all or most contacts are through the backbone, mutation presumably won’t have strong impact). Two heavy atoms are considered to be “in contact” if their centers are closer than 5A˚ .

9 3.5 Annotation If the residue annotation is available (either from the pdb file or from other sources), another column, with the header “annotation” appears. Annotations carried over from PDB are the following: site (indicating existence of related site record in PDB ), S-S (disulfide COVERAGE bond forming residue), hb (hydrogen bond forming residue, jb (james bond forming residue), and sb (for salt bridge forming residue). V 50% 30% 5% 3.6 Mutation suggestions 100% Mutation suggestions are completely heuristic and based on comple- mentarity with the substitutions found in the alignment. Note that they are meant to be disruptive to the interaction of the protein with its ligand. The attempt is made to complement the following V properties: small [AV GST C], medium [LP NQDEMIK], large [W F Y HR], hydrophobic [LP V AMW F I], polar [GT CY ]; posi- RELATIVE IMPORTANCE tively [KHR], or negatively [DE] charged, aromatic [W F Y H], long aliphatic chain [EKRQM], OH-group possession [SDET Y ], and NH2 group possession [NQRK]. The suggestions are listed Fig. 12. Coloring scheme used to color residues by their relative importance. according to how different they appear to be from the original amino acid, and they are grouped in round brackets if they appear equally disruptive. From left to right, each bracketed group of amino acid The colors used to distinguish the residues by the estimated types resembles more strongly the original (i.e. is, presumably, less evolutionary pressure they experience can be seen in Fig. 12. disruptive) These suggestions are tentative - they might prove disrup- 4.3 Credits tive to the fold rather than to the interaction. Many researcher will 4.3.1 Alistat alistat reads a multiple sequence alignment from the choose, however, the straightforward alanine mutations, especially in file and shows a number of simple statistics about it. These stati- the beginning stages of their investigation. stics include the format, the number of sequences, the total number of residues, the average and range of the sequence lengths, and the 4 APPENDIX alignment length (e.g. including gap characters). Also shown are 4.1 File formats some percent identities. A percent pairwise alignment identity is defi- ned as (idents / MIN(len1, len2)) where idents is the number of Files with extension “ranks sorted” are the actual trace results. The exact identities and len1, len2 are the unaligned lengths of the two fields in the table in this file: sequences. The ”average percent identity”, ”most related pair”, and • alignment# number of the position in the alignment ”most unrelated pair” of the alignment are the average, maximum, • residue# residue number in the PDB file and minimum of all (N)(N-1)/2 pairs, respectively. The ”most distant seq” is calculated by finding the maximum pairwise identity (best • type amino acid type relative) for all N sequences, then finding the minimum of these N • rank rank of the position according to older version of ET numbers (hence, the most outlying sequence). alistat is copyrighted • variability has two subfields: by HHMI/Washington University School of Medicine, 1992-2001, 1. number of different amino acids appearing in in this column and freely distributed under the GNU General Public License. of the alignment 4.3.2 CE To map ligand binding sites from different 2. their type source structures, report maker uses the CE program: • rho ET score - the smaller this value, the lesser variability of http://cl.sdsc.edu/. Shindyalov IN, Bourne PE (1998) this position across the branches of the tree (and, presumably, ”Protein structure alignment by incremental combinatorial extension the greater the importance for the protein) (CE) of the optimal path . Protein Engineering 11(9) 739-747. • cvg coverage - percentage of the residues on the structure which 4.3.3 DSSP In this work a residue is considered solvent accessi- 2 have this rho or smaller ble if the DSSP program finds it exposed to water by at least 10A˚ , • gaps percentage of gaps in this column which is roughly the area needed for one water molecule to come in the contact with the residue. DSSP is copyrighted by W. Kabsch, C. 4.2 Color schemes used Sander and MPI-MF, 1983, 1985, 1988, 1994 1995, CMBI version The following color scheme is used in figures with residues colored by [email protected] November 18,2002, by cluster size: black is a single-residue cluster; clusters composed of http://www.cmbi.kun.nl/gv/dssp/descrip.html. more than one residue colored according to this hierarchy (ordered by descending size): red, blue, yellow, green, purple, azure, tur- 4.3.4 HSSP Whenever available, report maker uses HSSP ali- quoise, brown, coral, magenta, LightSalmon, SkyBlue, violet, gold, gnment as a starting point for the analysis (sequences shorter than bisque, LightSlateBlue, orchid, RosyBrown, MediumAquamarine, 75% of the query are taken out, however); R. Schneider, A. de DarkOliveGreen, CornflowerBlue, grey55, burlywood, LimeGreen, Daruvar, and C. Sander. ”The HSSP database of protein structure- tan, DarkOrange, DeepPink, maroon, BlanchedAlmond. sequence alignments.” Nucleic Acids Res., 25:226–230, 1997.

10 http://swift.cmbi.kun.nl/swift/hssp/ report maker itself is described in Mihalek I., I. Res and O. Lichtarge (2006). ”Evolutionary Trace Report Maker: a new type 4.3.5 LaTex The text for this report was processed using LAT X; E of service for comparative analysis of .” Bioinformatics Leslie Lamport, “LaTeX: A Document Preparation System Addison- 22:1656-7. Wesley,” Reading, Mass. (1986). 4.6 About report maker 4.3.6 Muscle When making alignments “from scratch”, report maker uses Muscle alignment program: Edgar, Robert C. (2004), report maker was written in 2006 by Ivana Mihalek. The 1D ran- ”MUSCLE: multiple sequence alignment with high accuracy and king visualization program was written by Ivica Res.ˇ report maker high throughput.” Nucleic Acids Research 32(5), 1792-97. is copyrighted by Lichtarge Lab, Baylor College of Medicine, Houston. http://www.drive5.com/muscle/ 4.7 Attachments 4.3.7 Pymol The figures in this report were produced using Pymol. The scripts can be found in the attachment. Pymol The following files should accompany this report: is an open-source application copyrighted by DeLano Scien- • 2z64A.complex.pdb - coordinates of 2z64A with all of its tific LLC (2005). For more information about Pymol see interacting partners http://pymol.sourceforge.net/. (Note for Windows • users: the attached package needs to be unzipped for Pymol to read 2z64A.etvx - ET viewer input file for 2z64A the scripts and launch the viewer.) • 2z64A.cluster report.summary - Cluster report summary for 4.4 Note about ET Viewer 2z64A • 2z64A.ranks - Ranks file in sequence order for 2z64A Dan Morgan from the Lichtarge lab has developed a visualization • tool specifically for viewing trace results. If you are interested, please 2z64A.clusters - Cluster descriptions for 2z64A visit: • 2z64A.msf - the multiple sequence alignment used for the chain 2z64A http://mammoth.bcm.tmc.edu/traceview/ • 2z64A.descr - description of sequences used in 2z64A msf The viewer is self-unpacking and self-installing. Input files to be used • 2z64A.ranks sorted - full listing of residues and their ranking with ETV (extension .etvx) can be found in the attachment to the for 2z64A main report. • 2z64A.2z64NNAG1411.if.pml - Pymol script for Figure 5 4.5 Citing this work • 2z64A.cbcvg - used by other 2z64A – related pymol scripts The method used to rank residues and make predictions in this report • 2z64A.2z64KNAG1533.if.pml - Pymol script for Figure 6 can be found in Mihalek, I., I. Res,ˇ O. Lichtarge. (2004). ”A Family of • Evolution-Entropy Hybrid Methods for Ranking of Protein Residues 2z64A.2z64C.if.pml - Pymol script for Figure 7 by Importance” J. Mol. Bio. 336: 1265-82. For the original version • 2z64A.2z64MNAG1421.if.pml - Pymol script for Figure 8 of ET see O. Lichtarge, H.Bourne and F. Cohen (1996). ”An Evolu- • 2z64C.complex.pdb - coordinates of 2z64C with all of its tionary Trace Method Defines Binding Surfaces Common to Protein interacting partners Families” J. Mol. Bio. 257: 342-358.

11