Pages 1–18 1e6v Evolutionary trace report by report maker July 11, 2009

4.4.1 Clustering of residues at 25% coverage. 9 4.4.2 Overlap with known functional surfaces at 25% coverage. 10 4.4.3 Possible novel functional surfaces at 25% coverage. 15

5 Notes on using trace results 16 5.1 Coverage 16 5.2 Known substitutions 16 5.3 Surface 16 5.4 Number of contacts 16 5.5 Annotation 16 5.6 Mutation suggestions 16

6 Appendix 16 6.1 File formats 16 6.2 Color schemes used 16 6.3 Credits 17 6.3.1 Alistat 17 6.3.2 CE 17 6.3.3 DSSP 17 6.3.4 HSSP 17 CONTENTS 6.3.5 LaTex 17 6.3.6 Muscle 17 1 Introduction 1 6.3.7 Pymol 17 6.4 Note about ET Viewer 17 2 Chain 1e6vC 1 6.5 Citing this work 17 2.1 Q49604 overview 1 6.6 About report maker 17 2.2 Multiple sequence alignment for 1e6vC 1 6.7 Attachments 17 2.3 Residue ranking in 1e6vC 1 2.4 Top ranking residues in 1e6vC and their position on 1 INTRODUCTION the structure 2 2.4.1 Clustering of residues at 25% coverage. 2 From the original Protein Data Bank entry (PDB id 1e6v): 2.4.2 Overlap with known functional surfaces at Title: Methyl-coenzyme m reductase from kandleri 25% coverage. 2 Compound: Mol id: 1; molecule: methyl-coenzyme m reductase i alpha subunit; chain: a, d; mol id: 2; molecule: methyl-coenzyme m 3 Chain 1e6vA 6 reductase i beta subunit; chain: b, e; mol id: 3; molecule: methyl- 3.1 Q49605 overview 6 coenzyme m reductase i gamma subunit; chain: c, f 3.2 Multiple sequence alignment for 1e6vA 7 Organism, scientific name: Methanopyrus Kandleri; 3.3 Residue ranking in 1e6vA 7 1e6v contains unique chains 1e6vC (248 residues), 1e6vA (545 3.4 Top ranking residues in 1e6vA and their position on residues), and 1e6vB (436 residues) 1e6vF is a homologue of chain the structure 7 1e6vC. 1e6vD is a homologue of chain 1e6vA. 1e6vE is a homologue 3.4.1 Clustering of residues at 32% coverage. 7 of chain 1e6vB.

4 Chain 1e6vB 8 2 CHAIN 1E6VC 4.1 Q49601 overview 8 4.2 Multiple sequence alignment for 1e6vB 8 2.1 Q49604 overview 4.3 Residue ranking in 1e6vB 8 From SwissProt, id Q49604, 93% identical to 1e6vC: 4.4 Top ranking residues in 1e6vB and their position on Description: Methyl coenzyme M reductase, gamma subunit. the structure 8 Organism, scientific name: Methanopyrus kandleri.

1 Lichtarge lab 2006 2.4 Top ranking residues in 1e6vC and their position on the structure In the following we consider residues ranking among top 25% of residues in the protein . Figure 3 shows residues in 1e6vC colored by their importance: bright red and yellow indicate more conser- ved/important residues (see Appendix for the coloring scheme). A Pymol script for producing this figure can be found in the attachment.

Fig. 1. Residues 7-130 in 1e6vC colored by their relative importance. (See Appendix, Fig.26, for the coloring scheme.)

Fig. 2. Residues 131-254 in 1e6vC colored by their relative importance. (See Appendix, Fig.26, for the coloring scheme.)

Taxonomy: ; ; ; ; ; Methanopyrus.

2.2 Multiple sequence alignment for 1e6vC For the chain 1e6vC, the alignment 1e6vC.msf (attached) with 44 sequences was used. The alignment was downloaded from the HSSP database, and fragments shorter than 75% of the query as well as duplicate sequences were removed. It can be found in the attachment to this report, under the name of 1e6vC.msf. Its statistics, from the Fig. 3. Residues in 1e6vC, colored by their relative importance. Clockwise: alistat program are the following: front, back, top and bottom views.

Format: MSF Number of sequences: 44 Total number of residues: 10782 2.4.1 Clustering of residues at 25% coverage. Fig. 4 shows the Smallest: 238 top 25% of all residues, this time colored according to clusters they Largest: 248 belong to. The clusters in Fig.4 are composed of the residues listed Average length: 245.0 in Table 1. Alignment length: 248 Average identity: 59% Table 1. Most related pair: 99% cluster size member Most unrelated pair: 46% color residues Most distant seq: 65% red 51 31,33,41,46,47,49,50,53,57 58,59,60,62,79,80,83,85,87 Furthermore, 17% of residues show as conserved in this alignment. 88,90,93,94,102,103,105,115 The alignment consists of 13% prokaryotic, and 56% archaean 117,119,121,122,123,124,128 sequences. (Descriptions of some sequences were not readily availa- 130,131,135,150,155,156,159 ble.) The file containing the sequence descriptions can be found in 160,161,162,165,166,170,172 the attachment, under the name 1e6vC.descr. 178,192,206,208 blue 3 11,19,20 2.3 Residue ranking in 1e6vC yellow 3 99,214,218 The 1e6vC sequence is shown in Figs. 1–2, with each residue colored according to its estimated importance. The full listing of residues Table 1. Clusters of top ranking residues in 1e6vC. in 1e6vC can be found in the file called 1e6vC.ranks sorted in the attachment.

2 Fig. 4. Residues in 1e6vC, colored according to the cluster they belong to: red, followed by blue and yellow are the largest clusters (see Appendix for the coloring scheme). Clockwise: front, back, top and bottom views. The corresponding Pymol script is attached.

2.4.2 Overlap with known functional surfaces at 25% coverage. The name of the ligand is composed of the source PDB identifier and the heteroatom name used in that file. Factor 430 binding site. Table 2 lists the top 25% of residues at the interface with 1e6vDF43553 (factor 430). The following table (Table 3) suggests possible disruptive replacements for these residues (see Section 5.6).

Table 2. res type subst’s cvg noc/ dist antn (%) bb (A˚ ) 122 S S(100) 0.18 16/10 2.92 site 123 G G(100) 0.18 12/12 2.41 site 156 G G(100) 0.18 2/2 4.64 159 V V(100) 0.18 22/10 2.77 161 G G(100) 0.18 4/4 3.82 162 H H(100) 0.18 8/1 3.26 121 L L(97) 0.23 13/6 4.01 site S(2) 124 R R(97) 0.23 23/1 3.60 site G(2) 160 H H(97) 0.23 25/11 2.68 site Y(2)

Table 2. The top 25% of residues in 1e6vC at the interface with factor 430.(Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each type in the bracket; noc/bb: number of contacts with the ligand, with the num- ber of contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. )

3 Table 3. Table 4. continued res type disruptive res type subst’s cvg noc/ dist antn mutations (%) bb (A˚ ) 122 S (KR)(FQMWH)(NYELPI)(D) 90 Q Q(100) 0.18 1/0 4.35 123 G (KER)(FQMWHD)(NYLPI)(SVA) 117 D D(100) 0.18 1/0 4.70 156 G (KER)(FQMWHD)(NYLPI)(SVA) 128 E E(100) 0.18 21/0 3.00 159 V (KYER)(QHD)(N)(FTMW) 156 G G(100) 0.18 18/18 3.00 161 G (KER)(FQMWHD)(NYLPI)(SVA) 88 Y Y(97) 0.23 37/2 3.32 162 H (E)(TQMD)(SNKVCLAPIG)(YR) S(2) 121 L (R)(Y)(H)(K) 124 R R(97) 0.23 20/0 2.77 site 124 R (D)(E)(TYLPI)(SFVMAW) G(2) 160 H (E)(QM)(KD)(TNVLAPI) 160 H H(97) 0.23 27/0 3.77 site Y(2) Table 3. List of disruptive mutations for the top 25% of residues in 45 G A(20) 0.25 1/1 4.35 1e6vC, that are at the interface with factor 430. G(77) S(2)

Table 4. The top 25% of residues in 1e6vC at the interface with 1e6vD. (Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each type in the bracket; noc/bb: number of contacts with the ligand, with the number of contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. )

Table 5. res type disruptive mutations 85 R (TD)(SYEVCLAPIG)(FMW)(N) 87 R (TD)(SYEVCLAPIG)(FMW)(N) 90 Q (Y)(FTWH)(SVCAG)(D) 117 D (R)(FWH)(KYVCAG)(TQM) 128 E (FWH)(YVCARG)(T)(SNKLPI) 156 G (KER)(FQMWHD)(NYLPI)(SVA) 88 Y (K)(QM)(R)(NELPI) 124 R (D)(E)(TYLPI)(SFVMAW) 160 H (E)(QM)(KD)(TNVLAPI) 45 G (KR)(E)(QH)(FMW)

Table 5. List of disruptive mutations for the top 25% of residues in Fig. 5. Residues in 1e6vC, at the interface with factor 430, colored by their 1e6vC, that are at the interface with 1e6vD. relative importance. The ligand (factor 430) is colored green. Atoms further A than 30 ˚ away from the geometric center of the ligand, as well as on the line Figure 6 shows residues in 1e6vC colored by their importance, at the of sight to the ligand were removed. (See Appendix for the coloring scheme interface with 1e6vD. for the protein chain 1e6vC.) Interface with 1e6vB.Table 6 lists the top 25% of residues at the interface with 1e6vB. The following table (Table 7) suggests possible Figure 5 shows residues in 1e6vC colored by their importance, at the disruptive replacements for these residues (see Section 5.6). interface with 1e6vDF43553. Table 6. Interface with 1e6vD.Table 4 lists the top 25% of residues at the res type subst’s cvg noc/ dist interface with 1e6vD. The following table (Table 5) suggests possible (%) bb (A˚ ) disruptive replacements for these residues (see Section 5.6). 57 H H(100) 0.18 44/2 3.59 Table 4. 60 L L(100) 0.18 13/0 3.43 res type subst’s cvg noc/ dist antn 74 V V(100) 0.18 49/13 3.32 (%) bb (A˚ ) 80 A A(100) 0.18 14/5 3.49 85 R R(100) 0.18 8/0 3.53 112 R R(81) 0.18 5/5 4.26 87 R R(100) 0.18 31/2 2.67 N(18) continued in next column continued in next column

4 Table 7. continued res type disruptive mutations 117 D (R)(FWH)(KYVCAG)(TQM) 119 G (KER)(FQMWHD)(NYLPI)(SVA) 128 E (FWH)(YVCARG)(T)(SNKLPI) 130 R (TD)(SYEVCLAPIG)(FMW)(N) 249 G (KER)(QHD)(FYMW)(N) 103 Y (KM)(EVQLAPI)(ND)(SCRG)

Table 7. List of disruptive mutations for the top 25% of residues in 1e6vC, that are at the interface with 1e6vB.

Fig. 6. Residues in 1e6vC, at the interface with 1e6vD, colored by their rela- tive importance. 1e6vD is shown in backbone representation (See Appendix for the coloring scheme for the protein chain 1e6vC.)

Table 6. continued res type subst’s cvg noc/ dist (%) bb (A˚ ) 115 G G(100) 0.18 27/27 3.02 117 D D(100) 0.18 59/7 2.85 119 G G(100) 0.18 2/2 4.69 128 E E(100) 0.18 21/4 3.19 130 R R(100) 0.18 32/0 2.69 249 G G(97) 0.20 14/14 3.48 A(2) 103 Y Y(97) 0.23 14/1 3.21 Fig. 7. Residues in 1e6vC, at the interface with 1e6vB, colored by their rela- R(2) tive importance. 1e6vB is shown in backbone representation (See Appendix for the coloring scheme for the protein chain 1e6vC.) Table 6. The top 25% of residues in 1e6vC at the interface with 1e6vB. (Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each type Figure 7 shows residues in 1e6vC colored by their importance, at the in the bracket; noc/bb: number of contacts with the ligand, with the number of interface with 1e6vB. contacts realized through backbone atoms given in the bracket; dist: distance COM binding site. Table 8 lists the top 25% of residues at the of closest apporach to the ligand. ) interface with 1e6vDCOM555 (com). The following table (Table 9) suggests possible disruptive replacements for these residues (see Section 5.6). Table 7. res type disruptive Table 8. mutations res type subst’s cvg noc/ dist antn 57 H (E)(TQMD)(SNKVCLAPIG)(YR) (%) bb (A˚ ) 60 L (YR)(TH)(SKECG)(FQWD) 121 L L(97) 0.23 7/0 3.13 site 74 V (KYER)(QHD)(N)(FTMW) S(2) 80 A (KYER)(QHD)(N)(FTMW) 124 R R(97) 0.23 7/0 2.86 site 112 R (T)(YD)(SEVCAG)(FLWPI) G(2) 115 G (KER)(FQMWHD)(NYLPI)(SVA) continued in next column Table 8. The top 25% of residues in 1e6vC at the interface with COM.(Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each

5 type in the bracket; noc/bb: number of contacts with the ligand, with the num- Table 10. continued ber of contacts realized through backbone atoms given in the bracket; dist: res type subst’s cvg noc/ dist antn distance of closest apporach to the ligand. ) (%) bb (A˚ ) 117 D D(100) 0.18 1/0 4.58 119 G G(100) 0.18 12/12 3.26 Table 9. 122 S S(100) 0.18 56/25 2.36 site res type disruptive 123 G G(100) 0.18 2/2 4.59 site mutations 128 E E(100) 0.18 1/0 4.68 121 L (R)(Y)(H)(K) 162 H H(100) 0.18 35/0 3.00 124 R (D)(E)(TYLPI)(SFVMAW) 165 R R(100) 0.18 126/23 2.89 170 G G(100) 0.18 4/4 4.08 Table 9. List of disruptive mutations for the top 25% of residues in 172 M M(100) 0.18 2/2 4.13 1e6vC, that are at the interface with COM. 249 G G(97) 0.20 21/21 3.62 A(2) 222 R R(86) 0.21 33/11 3.52 H(13) 103 Y Y(97) 0.23 55/7 3.09 R(2) 121 L L(97) 0.23 40/13 3.60 site S(2) 160 H H(97) 0.23 31/0 2.94 site Y(2) 166 L L(97) 0.23 50/37 3.05 F(2)

Table 10. The top 25% of residues in 1e6vC at the interface with 1e6vA. (Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each type in the bracket; noc/bb: number of contacts with the ligand, with the number of contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. )

Table 11. res type disruptive mutations 19 R (TD)(SYEVCLAPIG)(FMW)(N) 93 D (R)(FWH)(KYVCAG)(TQM) Fig. 8. Residues in 1e6vC, at the interface with COM, colored by their rela- 94 S (KR)(FQMWH)(NYELPI)(D) tive importance. The ligand (COM) is colored green. Atoms further than 30A˚ 102 P (YR)(TH)(SKECG)(FQWD) away from the geometric center of the ligand, as well as on the line of sight 117 D (R)(FWH)(KYVCAG)(TQM) to the ligand were removed. (See Appendix for the coloring scheme for the 119 G (KER)(FQMWHD)(NYLPI)(SVA) protein chain 1e6vC.) 122 S (KR)(FQMWH)(NYELPI)(D) 123 G (KER)(FQMWHD)(NYLPI)(SVA) 128 E (FWH)(YVCARG)(T)(SNKLPI) Figure 8 shows residues in 1e6vC colored by their importance, at the 162 H (E)(TQMD)(SNKVCLAPIG)(YR) interface with 1e6vDCOM555. 165 R (TD)(SYEVCLAPIG)(FMW)(N) Interface with 1e6vA.Table 10 lists the top 25% of residues at 170 G (KER)(FQMWHD)(NYLPI)(SVA) the interface with 1e6vA. The following table (Table 11) suggests 172 M (Y)(TH)(SCRG)(FWD) possible disruptive replacements for these residues (see Section 5.6). 249 G (KER)(QHD)(FYMW)(N) Table 10. 222 R (TD)(E)(SVCLAPIG)(YM) res type subst’s cvg noc/ dist antn 103 Y (KM)(EVQLAPI)(ND)(SCRG) (%) bb (A˚ ) 121 L (R)(Y)(H)(K) 19 R R(100) 0.18 10/0 3.04 160 H (E)(QM)(KD)(TNVLAPI) 93 D D(100) 0.18 4/4 3.75 166 L (R)(TY)(KE)(SCHG) 94 S S(100) 0.18 2/2 4.88 continued in next column 102 P P(100) 0.18 25/6 3.28 continued in next column

6 Table 11. continued res type disruptive mutations

Table 11. List of disruptive mutations for the top 25% of residues in 1e6vC, that are at the interface with 1e6vA.

Fig. 10. Residues 8-279 in 1e6vA colored by their relative importance. (See Appendix, Fig.26, for the coloring scheme.)

About: This Swiss-Prot entry is copyright. It is produced through a collaboration between the Swiss Institute of Bioinformatics and the EMBL outstation - the European Bioinformatics Institute. There are no restrictions on its use as long as its content is in no way modified and this statement is not removed.

3.2 Multiple sequence alignment for 1e6vA For the chain 1e6vA, the alignment 1e6vA.msf (attached) with 14 sequences was used. The alignment was assembled through combi- nation of BLAST searching on the UniProt database and alignment using Muscle program. It can be found in the attachment to this Fig. 9. Residues in 1e6vC, at the interface with 1e6vA, colored by their rela- tive importance. 1e6vA is shown in backbone representation (See Appendix report, under the name of 1e6vA.msf. Its statistics, from the alistat for the coloring scheme for the protein chain 1e6vC.) program are the following:

Format: MSF Figure 9 shows residues in 1e6vC colored by their importance, at the Number of sequences: 14 interface with 1e6vA. Total number of residues: 7621 Smallest: 543 Largest: 545 3 CHAIN 1E6VA Average length: 544.4 3.1 Q49605 overview Alignment length: 545 From SwissProt, id Q49605, 96% identical to 1e6vA: Average identity: 64% Description: Methyl-coenzyme M reductase I alpha subunit (EC Most related pair: 99% 2.8.4.1) (Coenzyme-B sulfoethylthiotransferase alpha) (MCR I Most unrelated pair: 49% alpha). Most distant seq: 77% Organism, scientific name: Methanopyrus kandleri. : Archaea; Euryarchaeota; Methanopyri; Methanopyrales; Furthermore, 32% of residues show as conserved in this alignment. Methanopyraceae; Methanopyrus. The alignment consists of 14% prokaryotic, and 92% archaean Function: Reduction of methyl-coenzyme M (2-(methylthio) etha- sequences. (Descriptions of some sequences were not readily availa- nesulfonic acid) with 7-mercaptoheptanoylthreonine phosphate to ble.) The file containing the sequence descriptions can be found in methane and an heterodisulfide. the attachment, under the name 1e6vA.descr. Catalytic activity: 2-(methylthio)ethanesulfonate (methyl-CoM) + N-(7-mercaptoheptanoyl)threonine 3-O-phosphate (coenzyme B) = CoM- S-S-CoB + methane. 3.3 Residue ranking in 1e6vA Cofactor: Binds 2 coenzyme F430 noncovalently per hexamer. The 1e6vA sequence is shown in Figs. 10–11, with each residue colo- Coenzyme F430 is a yellow nickel porphinoid (By similarity). red according to its estimated importance. The full listing of residues Pathway: Methanogenesis; last step. in 1e6vA can be found in the file called 1e6vA.ranks sorted in the Subunit: Hexamer of two alpha, two beta, and two gamma chains. attachment.

7 Fig. 11. Residues 280-552 in 1e6vA colored by their relative importance. (See Appendix, Fig.26, for the coloring scheme.)

3.4 Top ranking residues in 1e6vA and their position on the structure In the following we consider residues ranking among top 32% Fig. 13. Residues in 1e6vA, colored according to the cluster they belong to: of residues in the protein (the closest this analysis allows us to red, followed by blue and yellow are the largest clusters (see Appendix for get to 25%). Figure 12 shows residues in 1e6vA colored by their the coloring scheme). Clockwise: front, back, top and bottom views. The importance: bright red and yellow indicate more conserved/important corresponding Pymol script is attached. residues (see Appendix for the coloring scheme). A Pymol script for producing this figure can be found in the attachment. in Table 12. Table 12. cluster size member color residues red 164 16,17,36,39,40,42,68,69,70 75,87,88,89,93,94,96,97,98 99,101,102,103,106,115,116 120,123,124,126,127,128,129 132,135,137,140,141,144,145 147,150,151,153,154,155,158 160,163,165,167,191,195,209 214,215,217,222,223,228,229 232,233,236,239,241,242,246 247,248,271,274,275,276,279 280,283,287,291,302,303,308 316,318,319,320,322,324,326 327,328,329,330,331,332,333 334,335,336,337,339,341,346 351,388,392,396,399,400,401 402,404,405,409,410,430,433 437,439,443,444,445,446,448 451,452,453,454,455,457,462 466,467,468,472,474,475,477 478,479,480,481,482,483,484 Fig. 12. Residues in 1e6vA, colored by their relative importance. Clockwise: 485,486,487,491,493,499,504 front, back, top and bottom views. 505,510,513,516,518,523,531 continued in next column

3.4.1 Clustering of residues at 32% coverage. Fig. 13 shows the top 32% of all residues, this time colored according to clusters they belong to. The clusters in Fig.13 are composed of the residues listed

8 Table 12. continued cluster size member color residues 535,536,538,544,545,546 blue 5 170,171,176,177,204 yellow 2 259,260

Table 12. Clusters of top ranking residues in 1e6vA.

4 CHAIN 1E6VB 4.1 Q49601 overview Fig. 14. Residues 7-224 in 1e6vB colored by their relative importance. (See From SwissProt, id Q49601, 89% identical to 1e6vB: Appendix, Fig.26, for the coloring scheme.) Description: Methyl coenzyme M reductase, beta subunit. Organism, scientific name: Methanopyrus kandleri. Taxonomy: Archaea; Euryarchaeota; Methanopyri; Methanopyrales; Methanopyraceae; Methanopyrus.

4.2 Multiple sequence alignment for 1e6vB For the chain 1e6vB, the alignment 1e6vB.msf (attached) with 40 sequences was used. The alignment was downloaded from the HSSP database, and fragments shorter than 75% of the query as well as duplicate sequences were removed. It can be found in the attachment to this report, under the name of 1e6vB.msf. Its statistics, from the alistat program are the following:

Format: MSF Number of sequences: 40 Total number of residues: 17186 Fig. 15. Residues 225-442 in 1e6vB colored by their relative importance. (See Appendix, Fig.26, for the coloring scheme.) Smallest: 389 Largest: 436 Average length: 429.6 Pymol script for producing this figure can be found in the attachment. Alignment length: 436 Average identity: 58% Most related pair: 98% 4.4.1 Clustering of residues at 25% coverage. Fig. 17 shows the Most unrelated pair: 46% top 25% of all residues, this time colored according to clusters they Most distant seq: 63% belong to. The clusters in Fig.17 are composed of the residues listed in Table 13.

Furthermore, 15% of residues show as conserved in this alignment. Table 13. The alignment consists of 12% prokaryotic, and 52% archaean cluster size member sequences. (Descriptions of some sequences were not readily availa- color residues ble.) The file containing the sequence descriptions can be found in red 84 16,30,33,187,188,190,193,194 the attachment, under the name 1e6vB.descr. 195,199,220,226,227,228,232 235,236,238,239,240,243,245 4.3 Residue ranking in 1e6vB 246,247,248,249,251,277,283 299,303,304,305,323,325,326 The 1e6vB sequence is shown in Figs. 14–15, with each residue colo- 333,334,336,337,340,346,349 red according to its estimated importance. The full listing of residues 351,355,358,359,360,361,362 in 1e6vB can be found in the file called 1e6vB.ranks sorted in the 363,364,365,366,367,368,369 attachment. 370,371,372,373,374,376,378 4.4 Top ranking residues in 1e6vB and their position on 379,380,382,383,384,385,388 the structure 392,396,401,402,403,404,405 406,407,412,413,429,433 In the following we consider residues ranking among top 25% of continued in next column residues in the protein . Figure 16 shows residues in 1e6vB colored by their importance: bright red and yellow indicate more conser- ved/important residues (see Appendix for the coloring scheme). A

9 Table 13. continued cluster size member color residues purple 2 110,111

Table 13. Clusters of top ranking residues in 1e6vB.

4.4.2 Overlap with known functional surfaces at 25% coverage. The name of the ligand is composed of the source PDB identifier and the heteroatom name used in that file. Interface with 1e6vE.Table 14 lists the top 25% of residues at the interface with 1e6vE. The following table (Table 15) suggests possible disruptive replacements for these residues (see Section 5.6). Table 14. res type subst’s cvg noc/ dist (%) bb (A˚ ) 121 R R(100) 0.15 26/7 3.52 187 E E(100) 0.15 3/3 4.20 190 G G(100) 0.15 18/18 3.07 226 E E(100) 0.15 78/22 2.72 228 G G(100) 0.15 17/17 3.86 Fig. 16. Residues in 1e6vB, colored by their relative importance. Clockwise: front, back, top and bottom views. 232 G G(100) 0.15 9/9 3.76 183 P P(97) 0.18 35/15 3.58 T(2) 193 L L(97) 0.19 5/1 4.10 Y(2) 138 A A(97) 0.20 3/2 4.15 S(2) 165 Y Y(97) 0.20 20/0 3.09 C(2) 227 M M(90) 0.22 22/20 3.62 T(10) 188 G G(94) 0.23 17/17 3.42 S(5) 30 P P(97) 0.24 13/8 3.65 .(2) 133 L T(52) 0.25 36/6 3.56 L(45) S(2)

Table 14. The top 25% of residues in 1e6vB at the interface with 1e6vE. (Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each type in the bracket; noc/bb: number of contacts with the ligand, with the number of contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. ) Fig. 17. Residues in 1e6vB, colored according to the cluster they belong to: red, followed by blue and yellow are the largest clusters (see Appendix for the coloring scheme). Clockwise: front, back, top and bottom views. The Table 15. corresponding Pymol script is attached. res type disruptive mutations Table 13. continued 121 R (TD)(SYEVCLAPIG)(FMW)(N) cluster size member 187 E (FWH)(YVCARG)(T)(SNKLPI) color residues 190 G (KER)(FQMWHD)(NYLPI)(SVA) blue 6 258,260,264,265,269,317 226 E (FWH)(YVCARG)(T)(SNKLPI) yellow 4 100,102,118,121 continued in next column green 4 163,165,166,173 continued in next column

10 Table 15. continued Table 16. continued res type disruptive res type subst’s cvg noc/ dist mutations (%) bb (A˚ ) 228 G (KER)(FQMWHD)(NYLPI)(SVA) 317 V V(100) 0.15 12/10 3.54 232 G (KER)(FQMWHD)(NYLPI)(SVA) 323 R R(100) 0.15 71/10 2.66 183 P (R)(YH)(K)(E) 326 Q Q(100) 0.15 45/8 2.85 193 L (R)(K)(TYEH)(SQCG) 334 Y Y(100) 0.15 61/0 3.01 138 A (KR)(YE)(QH)(D) 340 E E(100) 0.15 30/4 2.74 165 Y (K)(QM)(ER)(NLPI) 346 P P(100) 0.15 27/5 3.57 227 M (YH)(R)(FTW)(SKCDG) 351 G G(100) 0.15 11/11 3.89 188 G (KR)(E)(FQMWH)(D) 365 H H(100) 0.15 21/0 2.86 30 P (YR)(TH)(SCG)(KE) 401 D D(100) 0.15 5/5 3.85 133 L (R)(Y)(H)(K) 402 A A(100) 0.15 34/23 3.37 404 T T(100) 0.15 17/6 3.54 Table 15. List of disruptive mutations for the top 25% of residues in 337 D D(80) 0.17 15/0 2.58 1e6vB, that are at the interface with 1e6vE. V(20) 403 G G(80) 0.17 22/22 3.59 D(20) 264 G G(97) 0.18 17/17 3.13 A(2) 260 H N(97) 0.19 1/1 4.18 H(2) 265 T T(97) 0.22 35/3 3.09 V(2) 333 L V(2) 0.24 2/0 4.54 L(77) M(20) 344 G G(90) 0.25 28/28 2.85 S(5) A(5)

Table 16. The top 25% of residues in 1e6vB at the interface with 1e6vC. (Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each type in the bracket; noc/bb: number of contacts with the ligand, with the number of contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. )

Table 17. res type disruptive Fig. 18. Residues in 1e6vB, at the interface with 1e6vE, colored by their rela- mutations tive importance. 1e6vE is shown in backbone representation (See Appendix 258 K (Y)(FTW)(SVCAG)(HD) for the coloring scheme for the protein chain 1e6vB.) 295 Y (K)(QM)(NEVLAPIR)(D) 317 V (KYER)(QHD)(N)(FTMW) 323 R (TD)(SYEVCLAPIG)(FMW)(N) Figure 18 shows residues in 1e6vB colored by their importance, at 326 Q (Y)(FTWH)(SVCAG)(D) the interface with 1e6vE. 334 Y (K)(QM)(NEVLAPIR)(D) Interface with 1e6vC.Table 16 lists the top 25% of residues at 340 E (FWH)(YVCARG)(T)(SNKLPI) the interface with 1e6vC. The following table (Table 17) suggests 346 P (YR)(TH)(SKECG)(FQWD) possible disruptive replacements for these residues (see Section 5.6). 351 G (KER)(FQMWHD)(NYLPI)(SVA) Table 16. 365 H (E)(TQMD)(SNKVCLAPIG)(YR) res type subst’s cvg noc/ dist 401 D (R)(FWH)(KYVCAG)(TQM) 402 A (KYER)(QHD)(N)(FTMW) (%) bb (A˚ ) 404 T (KR)(FQMWH)(NELPI)(D) 258 K K(100) 0.15 16/11 3.92 337 D (R)(H)(FKYW)(QCG) 295 Y Y(100) 0.15 24/6 3.13 continued in next column continued in next column

11 Table 17. continued Table 18. continued res type disruptive res type subst’s cvg noc/ dist antn mutations (%) bb (A˚ ) 403 G (R)(K)(FWH)(EQM) 380 H H(100) 0.15 10/0 3.23 site 264 G (KER)(QHD)(FYMW)(N) 382 V V(100) 0.15 1/0 3.69 260 H (E)(T)(MD)(SVQCAG) 265 T (KR)(QH)(FEMW)(N) Table 18. The top 25% of residues in 1e6vB at the interface with 333 L (Y)(R)(H)(T) TP7.(Field names: res: residue number in the PDB entry; type: amino acid 344 G (KR)(E)(QH)(FMW) type; substs: substitutions seen in the alignment; with the percentage of each type in the bracket; noc/bb: number of contacts with the ligand, with the num- Table 17. List of disruptive mutations for the top 25% of residues in ber of contacts realized through backbone atoms given in the bracket; dist: 1e6vB, that are at the interface with 1e6vC. distance of closest apporach to the ligand. )

Table 19. res type disruptive mutations 362 F (KE)(TQD)(SNCRG)(M) 363 F (KE)(TQD)(SNCRG)(M) 368 Y (K)(QM)(NEVLAPIR)(D) 369 G (KER)(FQMWHD)(NYLPI)(SVA) 370 G (KER)(FQMWHD)(NYLPI)(SVA) 380 H (E)(TQMD)(SNKVCLAPIG)(YR) 382 V (KYER)(QHD)(N)(FTMW)

Table 19. List of disruptive mutations for the top 25% of residues in 1e6vB, that are at the interface with TP7.

Fig. 19. Residues in 1e6vB, at the interface with 1e6vC, colored by their rela- tive importance. 1e6vC is shown in backbone representation (See Appendix for the coloring scheme for the protein chain 1e6vB.)

Figure 19 shows residues in 1e6vB colored by their importance, at the interface with 1e6vC. TP7 binding site. Table 18 lists the top 25% of residues at the interface with 1e6vATP7554 (tp7). The following table (Table 19) suggests possible disruptive replacements for these residues (see Section 5.6). Table 18. res type subst’s cvg noc/ dist antn (%) bb (A˚ ) 362 F F(100) 0.15 6/0 3.99 363 F F(100) 0.15 19/0 3.73 368 Y Y(100) 0.15 21/6 3.58 site 369 G G(100) 0.15 9/9 3.90 Fig. 20. Residues in 1e6vB, at the interface with TP7, colored by their relative 370 G G(100) 0.15 13/13 2.91 site importance. The ligand (TP7) is colored green. Atoms further than 30A˚ away continued in next column from the geometric center of the ligand, as well as on the line of sight to the ligand were removed. (See Appendix for the coloring scheme for the protein chain 1e6vB.)

12 Figure 20 shows residues in 1e6vB colored by their importance, at the interface with 1e6vATP7554. Factor 430 binding site. Table 20 lists the top 25% of resi- dues at the interface with 1e6vDF43553 (factor 430). The following table (Table 21) suggests possible disruptive replacements for these residues (see Section 5.6). Table 20. res type subst’s cvg noc/ dist antn (%) bb (A˚ ) 362 F F(100) 0.15 1/0 4.94 367 I I(100) 0.15 18/3 3.53 368 Y Y(100) 0.15 32/0 3.09 site 366 S S(97) 0.18 5/2 3.46 site G(2)

Table 20. The top 25% of residues in 1e6vB at the interface with factor 430.(Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each type in the bracket; noc/bb: number of contacts with the ligand, with the num- ber of contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. )

Fig. 21. Residues in 1e6vB, at the interface with factor 430, colored by their Table 21. relative importance. The ligand (factor 430) is colored green. Atoms further than 30A˚ away from the geometric center of the ligand, as well as on the line res type disruptive of sight to the ligand were removed. (See Appendix for the coloring scheme mutations for the protein chain 1e6vB.) 362 F (KE)(TQD)(SNCRG)(M) 367 I (YR)(TH)(SKECG)(FQWD) 368 Y (K)(QM)(NEVLAPIR)(D) Table 22. continued 366 S (KR)(FQMWH)(E)(NYLPI) res type subst’s cvg noc/ dist antn (%) bb (A˚ ) Table 21. List of disruptive mutations for the top 25% of residues in 372 G G(100) 0.15 6/6 3.64 1e6vB, that are at the interface with factor 430. 404 T T(100) 0.15 10/10 3.13 405 Q Q(80) 0.17 7/3 3.92 Figure 21 shows residues in 1e6vB colored by their importance, at M(20) the interface with 1e6vDF43553. 366 S S(97) 0.18 24/24 3.54 site Interface with 1e6vD.Table 22 lists the top 25% of residues at G(2) the interface with 1e6vD. The following table (Table 23) suggests 165 Y Y(97) 0.20 26/12 3.33 possible disruptive replacements for these residues (see Section 5.6). C(2) 407 F I(2) 0.21 44/1 3.25 Table 22. F(97) res type subst’s cvg noc/ dist antn 406 M M(77) 0.23 43/9 3.18 (%) bb (A˚ ) Y(20) 72 G G(100) 0.15 19/19 3.05 L(2) 163 G G(100) 0.15 6/6 3.34 133 L T(52) 0.25 3/0 4.49 166 P P(100) 0.15 43/11 3.12 L(45) 326 Q Q(100) 0.15 8/0 3.61 S(2) 363 F F(100) 0.15 2/2 4.26 364 S S(100) 0.15 23/21 3.24 Table 22. The top 25% of residues in 1e6vB at the interface with 1e6vD. 365 H H(100) 0.15 55/31 3.40 (Field names: res: residue number in the PDB entry; type: amino acid type; 367 I I(100) 0.15 49/20 2.95 substs: substitutions seen in the alignment; with the percentage of each type 368 Y Y(100) 0.15 26/17 2.91 site in the bracket; noc/bb: number of contacts with the ligand, with the number of 369 G G(100) 0.15 19/19 3.16 contacts realized through backbone atoms given in the bracket; dist: distance 370 G G(100) 0.15 11/11 3.56 site of closest apporach to the ligand. ) 371 G G(100) 0.15 24/24 3.32 continued in next column

13 Table 23. Interface with 1e6vA.Table 24 lists the top 25% of residues at res type disruptive the interface with 1e6vA. The following table (Table 25) suggests mutations possible disruptive replacements for these residues (see Section 5.6). 72 G (KER)(FQMWHD)(NYLPI)(SVA) 163 G (KER)(FQMWHD)(NYLPI)(SVA) Table 24. 166 P (YR)(TH)(SKECG)(FQWD) res type subst’s cvg noc/ dist antn 326 Q (Y)(FTWH)(SVCAG)(D) (%) bb (A˚ ) 363 F (KE)(TQD)(SNCRG)(M) 187 E E(100) 0.15 22/2 2.78 364 S (KR)(FQMWH)(NYELPI)(D) 228 G G(100) 0.15 1/1 4.93 365 H (E)(TQMD)(SNKVCLAPIG)(YR) 351 G G(100) 0.15 19/19 3.19 367 I (YR)(TH)(SKECG)(FQWD) 358 V V(100) 0.15 50/17 3.27 368 Y (K)(QM)(NEVLAPIR)(D) 361 S S(100) 0.15 27/18 3.41 369 G (KER)(FQMWHD)(NYLPI)(SVA) 362 F F(100) 0.15 71/21 3.24 370 G (KER)(FQMWHD)(NYLPI)(SVA) 363 F F(100) 0.15 12/0 3.70 371 G (KER)(FQMWHD)(NYLPI)(SVA) 365 H H(100) 0.15 18/3 3.53 372 G (KER)(FQMWHD)(NYLPI)(SVA) 368 Y Y(100) 0.15 16/1 3.56 site 404 T (KR)(FQMWH)(NELPI)(D) 380 H H(100) 0.15 26/3 3.28 site 405 Q (Y)(TH)(FW)(SCG) 382 V V(100) 0.15 11/7 3.80 366 S (KR)(FQMWH)(E)(NYLPI) 384 R R(100) 0.15 29/1 2.93 165 Y (K)(QM)(ER)(NLPI) 385 H H(100) 0.15 59/2 2.74 407 F (KE)(T)(QDR)(SCG) 337 D D(80) 0.17 6/0 2.83 406 M (Y)(T)(HR)(CG) V(20) 133 L (R)(Y)(H)(K) 359 G G(80) 0.17 15/15 3.37 S(20) 349 D D(97) 0.18 1/1 4.90 Table 23. List of disruptive mutations for the top 25% of residues in 1e6vB, that are at the interface with 1e6vD. Y(2) 238 H Q(34) 0.20 7/0 3.18 H(65) 360 F F(77) 0.21 1/1 4.85 S(20) M(2) 227 M M(90) 0.22 6/4 3.78 T(10) 355 G G(97) 0.22 33/33 3.26 A(2)

Table 24. The top 25% of residues in 1e6vB at the interface with 1e6vA. (Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each type in the bracket; noc/bb: number of contacts with the ligand, with the number of contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. )

Table 25. res type disruptive mutations 187 E (FWH)(YVCARG)(T)(SNKLPI) 228 G (KER)(FQMWHD)(NYLPI)(SVA) 351 G (KER)(FQMWHD)(NYLPI)(SVA) 358 V (KYER)(QHD)(N)(FTMW) 361 S (KR)(FQMWH)(NYELPI)(D) 362 F (KE)(TQD)(SNCRG)(M) Fig. 22. Residues in 1e6vB, at the interface with 1e6vD, colored by their rela- 363 F (KE)(TQD)(SNCRG)(M) tive importance. 1e6vD is shown in backbone representation (See Appendix 365 H (E)(TQMD)(SNKVCLAPIG)(YR) for the coloring scheme for the protein chain 1e6vB.) 368 Y (K)(QM)(NEVLAPIR)(D) continued in next column

Figure 22 shows residues in 1e6vB colored by their importance, at the interface with 1e6vD.

14 Table 25. continued Table 26. continued res type disruptive res type subst’s cvg noc/ dist antn mutations (%) bb (A˚ ) 380 H (E)(TQMD)(SNKVCLAPIG)(YR) 365 H H(100) 0.15 1/1 4.77 382 V (KYER)(QHD)(N)(FTMW) 368 Y Y(100) 0.15 9/0 3.30 site 384 R (TD)(SYEVCLAPIG)(FMW)(N) 366 S S(97) 0.18 3/1 3.76 site 385 H (E)(TQMD)(SNKVCLAPIG)(YR) G(2) 337 D (R)(H)(FKYW)(QCG) 359 G (KR)(E)(FQMWH)(D) Table 26. The top 25% of residues in 1e6vB at the interface with 349 D (R)(K)(FVAWH)(QMCG) COM.(Field names: res: residue number in the PDB entry; type: amino acid 238 H (TE)(D)(SVMCAG)(QLPI) type; substs: substitutions seen in the alignment; with the percentage of each 360 F (K)(E)(TQDR)(CG) type in the bracket; noc/bb: number of contacts with the ligand, with the num- 227 M (YH)(R)(FTW)(SKCDG) ber of contacts realized through backbone atoms given in the bracket; dist: 355 G (KER)(QHD)(FYMW)(N) distance of closest apporach to the ligand. )

Table 25. List of disruptive mutations for the top 25% of residues in 1e6vB, that are at the interface with 1e6vA. Table 27. res type disruptive mutations 362 F (KE)(TQD)(SNCRG)(M) 365 H (E)(TQMD)(SNKVCLAPIG)(YR) 368 Y (K)(QM)(NEVLAPIR)(D) 366 S (KR)(FQMWH)(E)(NYLPI)

Table 27. List of disruptive mutations for the top 25% of residues in 1e6vB, that are at the interface with COM.

Fig. 23. Residues in 1e6vB, at the interface with 1e6vA, colored by their rela- tive importance. 1e6vA is shown in backbone representation (See Appendix for the coloring scheme for the protein chain 1e6vB.)

Figure 23 shows residues in 1e6vB colored by their importance, at the interface with 1e6vA. COM binding site. Table 26 lists the top 25% of residues at the interface with 1e6vDCOM555 (com). The following table (Table 27) suggests possible disruptive replacements for these residues (see Fig. 24. Residues in 1e6vB, at the interface with COM, colored by their rela- Section 5.6). tive importance. The ligand (COM) is colored green. Atoms further than 30A˚ Table 26. away from the geometric center of the ligand, as well as on the line of sight to the ligand were removed. (See Appendix for the coloring scheme for the res type subst’s cvg noc/ dist antn protein chain 1e6vB.) (%) bb (A˚ ) 362 F F(100) 0.15 9/0 3.56 continued in next column Figure 24 shows residues in 1e6vB colored by their importance, at the interface with 1e6vDCOM555.

15 4.4.3 Possible novel functional surfaces at 25% coverage. One Table 29. continued group of residues is conserved on the 1e6vB surface, away from (or res type disruptive susbtantially larger than) other functional sites and interfaces reco- mutations gnizable in PDB entry 1e6v. It is shown in Fig. 25. The residues 133 L (R)(Y)(H)(K)

Table 29. Disruptive mutations for the surface patch in 1e6vB.

5 NOTES ON USING TRACE RESULTS 5.1 Coverage Trace results are commonly expressed in terms of coverage: the resi- due is important if its “coverage” is small - that is if it belongs to some small top percentage of residues [100% is all of the residues in a chain], according to trace. The ET results are presented in the form of a table, usually limited to top 25% percent of residues (or to some nearby percentage), sorted by the strength of the presumed evolutionary pressure. (I.e., the smaller the coverage, the stronger the pressure on the residue.) Starting from the top of that list, mutating a couple of residues should affect the protein somehow, with the exact effects to be determined experimentally.

5.2 Known substitutions One of the table columns is “substitutions” - other amino acid types seen at the same position in the alignment. These amino acid types may be interchangeable at that position in the protein, so if one wants to affect the protein by a point mutation, they should be avoided. For Fig. 25. A possible active surface on the chain 1e6vB. example if the substitutions are “RVK” and the original protein has an R at that position, it is advisable to try anything, but RVK. Conver- belonging to this surface ”patch” are listed in Table 28, while Table sely, when looking for substitutions which will not affect the protein, 29 suggests possible disruptive replacements for these residues (see one may try replacing, R with K, or (perhaps more surprisingly), with Section 5.6). V. The percentage of times the substitution appears in the alignment is given in the immediately following bracket. No percentage is given Table 28. in the cases when it is smaller than 1%. This is meant to be a rough res type substitutions(%) cvg guide - due to rounding errors these percentages often do not add up 163 G G(100) 0.15 to 100%. 166 P P(100) 0.15 173 G G(80)D(20) 0.17 5.3 Surface 138 A A(97)S(2) 0.20 To detect candidates for novel functional interfaces, first we look for 165 Y Y(97)C(2) 0.20 residues that are solvent accessible (according to DSSP program) by 2 65 G G(87)K(12) 0.25 at least 10A˚ , which is roughly the area needed for one water mole- 133 L T(52)L(45)S(2) 0.25 cule to come in the contact with the residue. Furthermore, we require that these residues form a “cluster” of residues which have neighbor Table 28. Residues forming surface ”patch” in 1e6vB. within 5A˚ from any of their heavy atoms. Note, however, that, if our picture of protein evolution is correct, the neighboring residues which are not surface accessible might be Table 29. equally important in maintaining the interaction specificity - they res type disruptive should not be automatically dropped from consideration when choo- mutations sing the set for mutagenesis. (Especially if they form a cluster with 163 G (KER)(FQMWHD)(NYLPI)(SVA) the surface residues.) 166 P (YR)(TH)(SKECG)(FQWD) 173 G (R)(K)(FWH)(EQM) 5.4 Number of contacts 138 A (KR)(YE)(QH)(D) Another column worth noting is denoted “noc/bb”; it tells the num- 165 Y (K)(QM)(ER)(NLPI) ber of contacts heavy atoms of the residue in question make across 65 G (FEW)(YHDR)(KM)(QLPI) the interface, as well as how many of them are realized through the continued in next column backbone atoms (if all or most contacts are through the backbone, mutation presumably won’t have strong impact). Two heavy atoms are considered to be “in contact” if their centers are closer than 5A˚ .

16 5.5 Annotation If the residue annotation is available (either from the pdb file or from other sources), another column, with the header “annotation” appears. Annotations carried over from PDB are the following: site (indicating existence of related site record in PDB ), S-S (disulfide COVERAGE bond forming residue), hb (hydrogen bond forming residue, jb (james bond forming residue), and sb (for salt bridge forming residue). V 100% 50% 30% 5% 5.6 Mutation suggestions Mutation suggestions are completely heuristic and based on comple- mentarity with the substitutions found in the alignment. Note that they are meant to be disruptive to the interaction of the protein with its ligand. The attempt is made to complement the following V properties: small [AV GSTC], medium [LPNQDEMIK], large [WFYHR], hydrophobic [LPVAMWFI], polar [GTCY ]; posi- RELATIVE IMPORTANCE tively [KHR], or negatively [DE] charged, aromatic [WFYH], long aliphatic chain [EKRQM], OH-group possession [SDETY ], and NH2 group possession [NQRK]. The suggestions are listed Fig. 26. Coloring scheme used to color residues by their relative importance. according to how different they appear to be from the original amino acid, and they are grouped in round brackets if they appear equally disruptive. From left to right, each bracketed group of amino acid The colors used to distinguish the residues by the estimated types resembles more strongly the original (i.e. is, presumably, less evolutionary pressure they experience can be seen in Fig. 26. disruptive) These suggestions are tentative - they might prove disrup- 6.3 Credits tive to the fold rather than to the interaction. Many researcher will 6.3.1 Alistat alistat reads a multiple sequence alignment from the choose, however, the straightforward alanine mutations, especially in file and shows a number of simple statistics about it. These stati- the beginning stages of their investigation. stics include the format, the number of sequences, the total number of residues, the average and range of the sequence lengths, and the 6 APPENDIX alignment length (e.g. including gap characters). Also shown are 6.1 File formats some percent identities. A percent pairwise alignment identity is defi- ned as (idents / MIN(len1, len2)) where idents is the number of Files with extension “ranks sorted” are the actual trace results. The exact identities and len1, len2 are the unaligned lengths of the two fields in the table in this file: sequences. The ”average percent identity”, ”most related pair”, and • alignment# number of the position in the alignment ”most unrelated pair” of the alignment are the average, maximum, • residue# residue number in the PDB file and minimum of all (N)(N-1)/2 pairs, respectively. The ”most distant seq” is calculated by finding the maximum pairwise identity (best • type amino acid type relative) for all N sequences, then finding the minimum of these N • rank rank of the position according to older version of ET numbers (hence, the most outlying sequence). alistat is copyrighted • variability has two subfields: by HHMI/Washington University School of Medicine, 1992-2001, 1. number of different amino acids appearing in in this column and freely distributed under the GNU General Public License. of the alignment 6.3.2 CE To map ligand binding sites from different 2. their type source structures, report maker uses the CE program: • rho ET score - the smaller this value, the lesser variability of http://cl.sdsc.edu/. Shindyalov IN, Bourne PE (1998) this position across the branches of the tree (and, presumably, ”Protein structure alignment by incremental combinatorial extension the greater the importance for the protein) (CE) of the optimal path . Protein Engineering 11(9) 739-747. • cvg coverage - percentage of the residues on the structure which 6.3.3 DSSP In this work a residue is considered solvent accessi- have this rho or smaller ble if the DSSP program finds it exposed to water by at least 10A˚ 2, • gaps percentage of gaps in this column which is roughly the area needed for one water molecule to come in the contact with the residue. DSSP is copyrighted by W. Kabsch, C. 6.2 Color schemes used Sander and MPI-MF, 1983, 1985, 1988, 1994 1995, CMBI version The following color scheme is used in figures with residues colored by [email protected] November 18,2002, by cluster size: black is a single-residue cluster; clusters composed of http://www.cmbi.kun.nl/gv/dssp/descrip.html. more than one residue colored according to this hierarchy (ordered by descending size): red, blue, yellow, green, purple, azure, tur- 6.3.4 HSSP Whenever available, report maker uses HSSP ali- quoise, brown, coral, magenta, LightSalmon, SkyBlue, violet, gold, gnment as a starting point for the analysis (sequences shorter than bisque, LightSlateBlue, orchid, RosyBrown, MediumAquamarine, 75% of the query are taken out, however); R. Schneider, A. de DarkOliveGreen, CornflowerBlue, grey55, burlywood, LimeGreen, Daruvar, and C. Sander. ”The HSSP database of protein structure- tan, DarkOrange, DeepPink, maroon, BlanchedAlmond. sequence alignments.” Nucleic Acids Res., 25:226–230, 1997.

17 http://swift.cmbi.kun.nl/swift/hssp/ • 1e6vC.ranks - Ranks file in sequence order for 1e6vC • 1e6vC.clusters - Cluster descriptions for 1e6vC 6.3.5 LaTex The text for this report was processed using LATEX; Leslie Lamport, “LaTeX: A Document Preparation System Addison- • 1e6vC.msf - the multiple sequence alignment used for the chain Wesley,” Reading, Mass. (1986). 1e6vC • 1e6vC.descr - description of sequences used in 1e6vC msf 6.3.6 Muscle When making alignments “from scratch”, report maker uses Muscle alignment program: Edgar, Robert C. (2004), • 1e6vC.ranks sorted - full listing of residues and their ranking for ”MUSCLE: multiple sequence alignment with high accuracy and 1e6vC high throughput.” Nucleic Acids Research 32(5), 1792-97. • 1e6vC.1e6vDF43553.if.pml - Pymol script for Figure 5 http://www.drive5.com/muscle/ • 1e6vC.cbcvg - used by other 1e6vC – related pymol scripts • 1e6vC.1e6vD.if.pml - Pymol script for Figure 6 6.3.7 Pymol The figures in this report were produced using Pymol. The scripts can be found in the attachment. Pymol • 1e6vC.1e6vB.if.pml - Pymol script for Figure 7 is an open-source application copyrighted by DeLano Scien- • 1e6vC.1e6vDCOM555.if.pml - Pymol script for Figure 8 tific LLC (2005). For more information about Pymol see • 1e6vC.1e6vA.if.pml - Pymol script for Figure 9 http://pymol.sourceforge.net/. (Note for Windows • users: the attached package needs to be unzipped for Pymol to read 1e6vA.complex.pdb - coordinates of 1e6vA with all of its the scripts and launch the viewer.) interacting partners • 1e6vA.etvx - ET viewer input file for 1e6vA 6.4 Note about ET Viewer • 1e6vA.cluster report.summary - Cluster report summary for Dan Morgan from the Lichtarge lab has developed a visualization 1e6vA tool specifically for viewing trace results. If you are interested, please • visit: 1e6vA.ranks - Ranks file in sequence order for 1e6vA • 1e6vA.clusters - Cluster descriptions for 1e6vA http://mammoth.bcm.tmc.edu/traceview/ • 1e6vA.msf - the multiple sequence alignment used for the chain The viewer is self-unpacking and self-installing. Input files to be used 1e6vA with ETV (extension .etvx) can be found in the attachment to the • 1e6vA.descr - description of sequences used in 1e6vA msf main report. • 1e6vA.ranks sorted - full listing of residues and their ranking 6.5 Citing this work for 1e6vA The method used to rank residues and make predictions in this report • 1e6vB.complex.pdb - coordinates of 1e6vB with all of its can be found in Mihalek, I., I. Res,ˇ O. Lichtarge. (2004). ”A Family of interacting partners Evolution-Entropy Hybrid Methods for Ranking of Protein Residues • 1e6vB.etvx - ET viewer input file for 1e6vB by Importance” J. Mol. Bio. 336: 1265-82. For the original version • of ET see O. Lichtarge, H.Bourne and F. Cohen (1996). ”An Evolu- 1e6vB.cluster report.summary - Cluster report summary for tionary Trace Method Defines Binding Surfaces Common to Protein 1e6vB Families” J. Mol. Bio. 257: 342-358. • 1e6vB.ranks - Ranks file in sequence order for 1e6vB report maker itself is described in Mihalek I., I. Res and O. • 1e6vB.clusters - Cluster descriptions for 1e6vB Lichtarge (2006). ”Evolutionary Trace Report Maker: a new type • 1e6vB.msf - the multiple sequence alignment used for the chain of service for comparative analysis of proteins.” Bioinformatics 1e6vB 22:1656-7. • 1e6vB.descr - description of sequences used in 1e6vB msf 6.6 About report maker • 1e6vB.ranks sorted - full listing of residues and their ranking for report maker was written in 2006 by Ivana Mihalek. The 1D ran- 1e6vB king visualization program was written by Ivica Res.ˇ report maker • 1e6vB.1e6vE.if.pml - Pymol script for Figure 18 is copyrighted by Lichtarge Lab, Baylor College of Medicine, • Houston. 1e6vB.cbcvg - used by other 1e6vB – related pymol scripts • 1e6vB.1e6vC.if.pml - Pymol script for Figure 19 6.7 Attachments • 1e6vB.1e6vATP7554.if.pml - Pymol script for Figure 20 The following files should accompany this report: • 1e6vB.1e6vDF43553.if.pml - Pymol script for Figure 21 • 1e6vC.complex.pdb - coordinates of 1e6vC with all of its • 1e6vB.1e6vD.if.pml - Pymol script for Figure 22 interacting partners • 1e6vB.1e6vA.if.pml - Pymol script for Figure 23 • 1e6vC.etvx - ET viewer input file for 1e6vC • 1e6vB.1e6vDCOM555.if.pml - Pymol script for Figure 24 • 1e6vC.cluster report.summary - Cluster report summary for 1e6vC

18