Pages 1–11 1luj Evolutionary trace report by report maker October 1, 2010

4 Notes on using trace results 9 4.1 Coverage 9 4.2 Known substitutions 9 4.3 Surface 9 4.4 Number of contacts 9 4.5 Annotation 9 4.6 Mutation suggestions 9

5 Appendix 9 5.1 File formats 9 5.2 Color schemes used 9 5.3 Credits 10 5.3.1 Alistat 10 5.3.2 CE 10 5.3.3 DSSP 10 5.3.4 HSSP 10 5.3.5 LaTex 10 5.3.6 Muscle 10 5.3.7 Pymol 10 5.4 Note about ET Viewer 10 5.5 Citing this work 10 5.6 About report maker 10 CONTENTS 5.7 Attachments 10

1 Introduction 1 1 INTRODUCTION From the original Data Bank entry (PDB id 1luj): 2 Chain 1lujA 1 Title: Crystal structure of the beta-catenin/icat complex 2.1 Q5R2I4 overview 1 Compound: Mol id: 1; molecule: beta-catenin; chain: a; fragment: 2.2 Multiple sequence alignment for 1lujA 1 residues 150-666; engineered: yes; mol id: 2; molecule: beta-catenin- 2.3 Residue ranking in 1lujA 1 interacting protein icat; chain: b; engineered: yes 2.4 Top ranking residues in 1lujA and their position on Organism, scientific name: Homo Sapiens; the structure 1 1luj contains unique chains 1lujA (501 residues) and 1lujB (71 2.4.1 Clustering of residues at 25% coverage. 2 residues) 2.4.2 Overlap with known functional surfaces at 25% coverage. 2 2 CHAIN 1LUJA 2.4.3 Possible novel functional surfaces at 25% coverage. 4 2.1 Q5R2I4 overview From SwissProt, id Q5R2I4, 94% identical to 1lujA: 3 Chain 1lujB 6 Description: Beta-catenin homologue. 3.1 Q9JJN6 overview 6 Organism, scientific name: Trionyx sinensis (Chinese softshell 3.2 Multiple sequence alignment for 1lujB 6 turtle) (Pelodiscus sinensis). 3.3 Residue ranking in 1lujB 6 Taxonomy: Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; 3.4 Top ranking residues in 1lujB and their position on Euteleostomi; Testudines; Cryptodira; Trionychoidea; Trionychidae; the structure 7 Pelodiscus. 3.4.1 Clustering of residues at 25% coverage. 7 3.4.2 Overlap with known functional surfaces at 2.2 Multiple sequence alignment for 1lujA 25% coverage. 7 For the chain 1lujA, the alignment 1lujA.msf (attached) with 36 3.4.3 Possible novel functional surfaces at 25% sequences was used. The alignment was downloaded from the HSSP coverage. 8 database, and fragments shorter than 75% of the query as well as

1 Lichtarge lab 2006 Fig. 1. Residues 150-399 in 1lujA colored by their relative importance. (See Appendix, Fig.13, for the coloring scheme.)

Fig. 2. Residues 400-663 in 1lujA colored by their relative importance. (See duplicate sequences were removed. It can be found in the attachment Appendix, Fig.13, for the coloring scheme.) to this report, under the name of 1lujA.msf. Its statistics, from the alistat program are the following:

Format: MSF Number of sequences: 36 Total number of residues: 17648 Smallest: 392 Largest: 501 Average length: 490.2 Alignment length: 501 Average identity: 58% Most related pair: 99% Most unrelated pair: 20% Most distant seq: 40%

Furthermore, 4% of residues show as conserved in this alignment. The alignment consists of 55% eukaryotic ( 19% vertebrata, 11% arthropoda) sequences. (Descriptions of some sequences were not readily available.) The file containing the sequence descriptions can be found in the attachment, under the name 1lujA.descr. 2.3 Residue ranking in 1lujA

The 1lujA sequence is shown in Figs. 1–2, with each residue colored Fig. 3. Residues in 1lujA, colored by their relative importance. Clockwise: according to its estimated importance. The full listing of residues front, back, top and bottom views. in 1lujA can be found in the file called 1lujA.ranks sorted in the attachment. Table 1. 2.4 Top ranking residues in 1lujA and their position on cluster size member the structure color residues In the following we consider residues ranking among top 25% of resi- red 89 216,219,222,240,244,254,255 dues in the protein . Figure 3 shows residues in 1lujA colored by their 257,258,259,260,261,262,270 importance: bright red and yellow indicate more conserved/important 286,290,292,294,296,299,301 residues (see Appendix for the coloring scheme). A Pymol script for 304,312,315,316,320,321,324 producing this figure can be found in the attachment. 325,328,333,334,335,336,338 339,342,343,344,345,347,348 2.4.1 Clustering of residues at 25% coverage. Fig. 4 shows the 349,350,353,354,358,361,366 top 25% of all residues, this time colored according to clusters they 374,376,377,380,382,383,386 belong to. The clusters in Fig.4 are composed of the residues listed continued in next column in Table 1.

2 Table 2. res type subst’s cvg noc/ dist (%) bb (A˚ ) 386 R R(100) 0.04 42/12 3.31 387 N N(100) 0.04 23/7 3.13 389 S S(100) 0.04 3/1 4.57 390 D D(100) 0.04 31/9 3.34 426 N N(100) 0.04 41/3 2.68 435 K K(100) 0.04 6/0 2.67 469 R R(100) 0.04 30/2 2.76 470 H H(100) 0.04 51/11 2.91 516 N N(91) 0.06 1/0 4.78 K(8) 345 K K(91) 0.08 29/4 3.21 R(8) 425 S S(94) 0.08 1/1 4.97 C(5) 430 N N(94) 0.08 23/5 2.95 G(5) 349 V V(88) 0.10 26/0 3.24 I(2) .(8) Fig. 4. Residues in 1lujA, colored according to the cluster they belong to: 612 R R(88) 0.10 22/0 2.99 red, followed by blue and yellow are the largest clusters (see Appendix for C(2) the coloring scheme). Clockwise: front, back, top and bottom views. The .(8) corresponding Pymol script is attached. 654 Y Y(88) 0.10 29/0 2.79 F(2) .(8) Table 1. continued 429 C C(86) 0.11 30/6 2.92 cluster size member A(8) color residues S(5) 387,388,389,390,418,419,422 578 H H(86) 0.11 12/0 3.96 425,426,427,428,429,430,431 .(8) 434,435,443,447,462,463,466 Q(5) 467,468,469,470,472,474,475 312 K K(97) 0.13 7/0 2.74 513,515,516,519,520 R(2) blue 24 508,563,564,566,569,571,572 620 E E(83) 0.13 14/0 2.69 576,577,578,612,613,616,617 Q(8) 618,620,621,640,653,654,655 .(8) 657,658,659 422 G G(94) 0.14 7/7 3.94 yellow 3 227,231,232 Q(5) green 2 401,405 474 R R(83) 0.14 37/0 2.92 purple 2 539,540 K(2) azure 2 625,628 G(8) N(5) Table 1. Clusters of top ranking residues in 1lujA. 653 T T(86) 0.15 30/6 3.10 A(5) .(8) 2.4.2 Overlap with known functional surfaces at 25% coverage. 428 T T(91) 0.17 3/3 4.08 The name of the ligand is composed of the source PDB identifier L(2) and the heteroatom name used in that file. V(5) Interface with 1lujB.Table 2 lists the top 25% of residues at the 515 R R(91) 0.17 40/0 2.64 interface with 1lujB. The following table (Table 3) suggests possible continued in next column disruptive replacements for these residues (see Section 4.6).

3 Table 2. continued res type subst’s cvg noc/ dist (%) bb (A˚ ) N(2) L(5) 613 V V(77) 0.19 1/0 4.38 E(8) .(8) A(5) 657 A A(75) 0.22 7/3 3.85 G(8) S(2) .(8) I(5)

Table 2. The top 25% of residues in 1lujA at the interface with 1lujB. (Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each type in the bracket; noc/bb: number of contacts with the ligand, with the number of contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. )

Fig. 5. Residues in 1lujA, at the interface with 1lujB, colored by their relative Table 3. importance. 1lujB is shown in backbone representation (See Appendix for the coloring scheme for the protein chain 1lujA.) res type disruptive mutations 386 R (TD)(SYEVCLAPIG)(FMW)(N) 2.4.3 Possible novel functional surfaces at 25% coverage. One 387 N (Y)(FTWH)(SEVCARG)(MD) group of residues is conserved on the 1lujA surface, away from (or 389 S (KR)(FQMWH)(NYELPI)(D) susbtantially larger than) other functional sites and interfaces reco- 390 D (R)(FWH)(KYVCAG)(TQM) gnizable in PDB entry 1luj. It is shown in Fig. 6. The right panel 426 N (Y)(FTWH)(SEVCARG)(MD) shows (in blue) the rest of the larger cluster this surface belongs to. 435 K (Y)(FTW)(SVCAG)(HD) 469 R (TD)(SYEVCLAPIG)(FMW)(N) 470 H (E)(TQMD)(SNKVCLAPIG)(YR) 516 N (Y)(FTW)(H)(SVCAG) 345 K (Y)(T)(FW)(SVCAG) 425 S (KR)(FQMWH)(E)(NYLPI) 430 N (Y)(FWH)(ER)(T) 349 V (Y)(R)(KE)(H) 612 R (D)(ELPI)(T)(VA) 654 Y (K)(Q)(M)(E) 429 C (KR)(E)(QH)(FMW) 578 H (E)(T)(D)(VMCAG) 312 K (Y)(T)(FW)(SVCAG) 620 E (FWH)(YVCAG)(TR)(S) 422 G (FEWHR)(KYD)(M)(QLPI) Fig. 6. A possible active surface on the chain 1lujA. The larger cluster it belongs to is shown in blue. 474 R (TYD)(E)(SFVAW)(CLPIG) 653 T (KR)(H)(Q)(FMW) 428 T (R)(K)(H)(Q) The residues belonging to this surface ”patch” are listed in Table 515 R (T)(Y)(D)(SECG) 4, while Table 5 suggests possible disruptive replacements for these 613 V (YR)(K)(H)(E) residues (see Section 4.6). 657 A (R)(K)(E)(Y) Table 4. res type substitutions(%) cvg Table 3. List of disruptive mutations for the top 25% of residues in 1lujA, 376 R R(100) 0.04 that are at the interface with 1lujB. 386 R R(100) 0.04 387 N N(100) 0.04 Figure 5 shows residues in 1lujA colored by their importance, at the continued in next column interface with 1lujB.

4 Table 4. continued Table 4. continued res type substitutions(%) cvg res type substitutions(%) cvg 390 D D(100) 0.04 520 C C(75)L(8)Q(8) 0.23 426 N N(100) 0.04 I(5)S(2) 431 N N(100) 0.04 296 I I(77)Q(5)V(13) 0.24 434 N N(100) 0.04 L(2) 462 E E(100) 0.04 519 L L(86)S(5)M(8) 0.24 469 R R(100) 0.04 470 H H(100) 0.04 Table 4. Residues forming surface ”patch” in 1lujA. 419 C C(91)Y(8) 0.06 516 N N(91)K(8) 0.06 254 Y Y(94)C(5) 0.08 Table 5. 345 K K(91)R(8) 0.08 res type disruptive 374 S S(91)D(8) 0.08 mutations 425 S S(94)C(5) 0.08 376 R (TD)(SYEVCLAPIG)(FMW)(N) 430 N N(94)G(5) 0.08 386 R (TD)(SYEVCLAPIG)(FMW)(N) 475 H H(94)N(5) 0.08 387 N (Y)(FTWH)(SEVCARG)(MD) 339 T T(86)R(8)A(5) 0.10 390 D (R)(FWH)(KYVCAG)(TQM) 342 R R(86)Q(8)K(5) 0.10 426 N (Y)(FTWH)(SEVCARG)(MD) 349 V V(88)I(2).(8) 0.10 431 N (Y)(FTWH)(SEVCARG)(MD) 350 C C(88)T(2).(8) 0.10 434 N (Y)(FTWH)(SEVCARG)(MD) 257 T T(86)L(8)S(5) 0.11 462 E (FWH)(YVCARG)(T)(SNKLPI) 292 K K(97)R(2) 0.11 469 R (TD)(SYEVCLAPIG)(FMW)(N) 321 P P(91)H(2)S(5) 0.11 470 H (E)(TQMD)(SNKVCLAPIG)(YR) 429 C C(86)A(8)S(5) 0.11 419 C (K)(ER)(QM)(D) 312 K K(97)R(2) 0.13 516 N (Y)(FTW)(H)(SVCAG) 333 Y Y(94)H(5) 0.13 254 Y (K)(QM)(ER)(NLPI) 361 G G(97)A(2) 0.13 345 K (Y)(T)(FW)(SVCAG) 222 S S(94)C(5) 0.14 374 S (R)(K)(FWH)(QM) 422 G G(94)Q(5) 0.14 425 S (KR)(FQMWH)(E)(NYLPI) 474 R R(83)K(2)G(8) 0.14 430 N (Y)(FWH)(ER)(T) N(5) 475 H (E)(T)(MD)(SVQCAG) 358 V V(86)I(13) 0.15 339 T (K)(R)(FWH)(EQM) 261 N N(91)T(2)S(5) 0.17 342 R (T)(Y)(D)(SVCAG) 334 E E(91)R(5)P(2) 0.17 349 V (Y)(R)(KE)(H) 335 K K(91)G(2)N(5) 0.17 350 C (KR)(E)(FMW)(H) 354 K K(91)P(2)A(5) 0.17 257 T (R)(K)(H)(FQW) 463 P P(91)R(2)S(5) 0.17 292 K (Y)(T)(FW)(SVCAG) 468 L L(97)M(2) 0.17 321 P (R)(Y)(K)(TE) 515 R R(91)N(2)L(5) 0.17 429 C (KR)(E)(QH)(FMW) 260 H H(88)R(5)D(5) 0.18 312 K (Y)(T)(FW)(SVCAG) 466 C C(91)A(8) 0.18 333 Y (K)(QM)(E)(NVLAPI) 299 D D(91)E(2)N(5) 0.19 361 G (KER)(QHD)(FYMW)(N) 316 L L(91)Y(2)V(5) 0.19 222 S (KR)(FQMWH)(E)(NYLPI) 290 N N(88)S(5)D(5) 0.20 422 G (FEWHR)(KYD)(M)(QLPI) 383 W W(83)V(5)E(8) 0.20 474 R (TYD)(E)(SFVAW)(CLPIG) C(2) 358 V (YR)(KE)(H)(QD) 338 W W(80)Y(13)F(5) 0.21 261 N (Y)(FWH)(R)(E) 380 N N(75)A(8)H(2) 0.21 334 E (FWH)(Y)(CG)(TVA) S(13) 335 K (Y)(FW)(T)(VAH) 418 T T(75)A(8)M(2) 0.22 354 K (Y)(T)(FW)(H) L(8)I(5) 463 P (Y)(R)(H)(T) 219 H H(75)S(8)F(8) 0.23 468 L (Y)(R)(TH)(SCG) Q(5)Y(2) continued in next column continued in next column

5 Table 5. continued Table 6. continued res type disruptive res type substitutions(%) cvg mutations V(2) 515 R (T)(Y)(D)(SECG) 653 T T(86)A(5).(8) 0.15 260 H (TE)(M)(VCADG)(Q) 566 M M(80)L(11).(8) 0.18 466 C (KER)(QHD)(FYMW)(N) 613 V V(77)E(8).(8) 0.19 299 D (R)(FWH)(Y)(VCAG) A(5) 316 L (R)(KY)(EH)(T) 572 G G(80)S(2).(8) 0.20 290 N (Y)(FWH)(R)(T) L(5)V(2) 383 W (K)(E)(QR)(D) 657 A A(75)G(8)S(2) 0.22 338 W (K)(E)(Q)(D) .(8)I(5) 380 N (Y)(E)(FTWHR)(MCDG) 625 K K(75)P(8).(8) 0.23 418 T (R)(K)(H)(Q) A(5)R(2) 219 H (E)(QM)(D)(K) 539 L L(75)I(22)V(2) 0.24 520 C (R)(KEH)(Y)(FW) 569 I I(77)V(5).(8) 0.25 296 I (Y)(R)(H)(T) L(8) 519 L (YR)(H)(T)(K) Table 6. Residues forming surface ”patch” in 1lujA. Table 5. Disruptive mutations for the surface patch in 1lujA.

Another group of surface residues is shown in Fig.7. The right panel shows (in blue) the rest of the larger cluster this surface belongs to. Table 7. res type disruptive mutations 508 K (Y)(FTW)(SVCAG)(HD) 571 E (FWH)(VCAG)(YR)(T) 563 G (KER)(FQMWHD)(NLPI)(Y) 612 R (D)(ELPI)(T)(VA) 654 Y (K)(Q)(M)(E) 578 H (E)(T)(D)(VMCAG) 659 L (Y)(R)(T)(H) 628 A (Y)(R)(KE)(H) 620 E (FWH)(YVCAG)(TR)(S) 564 V (Y)(R)(KE)(H) 621 L (Y)(R)(H)(T) 653 T (KR)(H)(Q)(FMW) Fig. 7. Another possible active surface on the chain 1lujA. The larger cluster it belongs to is shown in blue. 566 M (Y)(T)(H)(CG) 613 V (YR)(K)(H)(E) 572 G (R)(K)(E)(H) The residues belonging to this surface ”patch” are listed in Table 657 A (R)(K)(E)(Y) 6, while Table 7 suggests possible disruptive replacements for these 625 K (Y)(T)(FW)(S) residues (see Section 4.6). 539 L (YR)(H)(T)(KE) 569 I (YR)(H)(T)(KE) Table 6. res type substitutions(%) cvg 508 K K(100) 0.04 Table 7. Disruptive mutations for the surface patch in 1lujA. 571 E E(91).(8) 0.06 563 G G(88).(11) 0.08 612 R R(88)C(2).(8) 0.10 654 Y Y(88)F(2).(8) 0.10 578 H H(86).(8)Q(5) 0.11 3 CHAIN 1LUJB 659 L L(86).(8)M(5) 0.11 3.1 Q9JJN6 overview 628 A A(88).(8)L(2) 0.12 From SwissProt, id Q9JJN6, 97% identical to 1lujB: 620 E E(83)Q(8).(8) 0.13 Description: Beta-catenin-interacting protein 1 (Inhibitor of beta- 564 V V(86)I(2).(11) 0.14 catenin and Tcf- 4). 621 L L(83).(8)M(5) 0.15 Organism, scientific name: Mus musculus (Mouse). continued in next column Taxonomy: Eukaryota; Metazoa; Chordata; Craniata; Verte- brata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus.

6 3.4 Top ranking residues in 1lujB and their position on the structure In the following we consider residues ranking among top 25% of resi- dues in the protein . Figure 9 shows residues in 1lujB colored by their importance: bright red and yellow indicate more conserved/important Fig. 8. Residues 5-75 in 1lujB colored by their relative importance. (See residues (see Appendix for the coloring scheme). A Pymol script for Appendix, Fig.13, for the coloring scheme.) producing this figure can be found in the attachment.

Function: Prevents the interaction between CTNNB1 and TCF family members, and acts as negative regulator of the . Subunit: Binds CTNNB1. Subcellular location: Nuclear and cytoplasmic. Tissue specificity: Highly expressed in heart, brain, liver and skeletal muscle. Detected at low levels in kidney, testis and lung. Similarity: Belongs to the CTNNBIP1 family. About: This Swiss-Prot entry is copyright. It is produced through a collaboration between the Swiss Institute of Bioinformatics and the EMBL outstation - the European Bioinformatics Institute. There are no restrictions on its use as long as its content is in no way modified and this statement is not removed.

3.2 Multiple sequence alignment for 1lujB For the chain 1lujB, the alignment 1lujB.msf (attached) with 21 sequences was used. The alignment was downloaded from the HSSP database, and fragments shorter than 75% of the query as well as duplicate sequences were removed. It can be found in the attachment to this report, under the name of 1lujB.msf. Its statistics, from the Fig. 9. Residues in 1lujB, colored by their relative importance. Clockwise: alistat program are the following: front, back, top and bottom views.

Format: MSF Number of sequences: 21 3.4.1 Clustering of residues at 25% coverage. Fig. 10 shows the Total number of residues: 1411 top 25% of all residues, this time colored according to clusters they Smallest: 63 belong to. The clusters in Fig.10 are composed of the residues listed Largest: 71 in Table 8. Average length: 67.2 Table 8. Alignment length: 71 cluster size member Average identity: 53% color residues Most related pair: 99% red 13 17,18,19,23,26,28,30,33,37 Most unrelated pair: 13% 38,40,41,45 Most distant seq: 34% blue 3 53,54,55

Furthermore, 4% of residues show as conserved in this alignment. Table 8. Clusters of top ranking residues in 1lujB. The alignment consists of 33% eukaryotic ( 33% vertebrata) sequences. (Descriptions of some sequences were not readily availa- ble.) The file containing the sequence descriptions can be found in 3.4.2 Overlap with known functional surfaces at 25% coverage. the attachment, under the name 1lujB.descr. The name of the ligand is composed of the source PDB identifier and the heteroatom name used in that file. Interface with 1lujA.Table 9 lists the top 25% of residues at the 3.3 Residue ranking in 1lujB interface with 1lujA. The following table (Table 10) suggests possible disruptive replacements for these residues (see Section 4.6). The 1lujB sequence is shown in Fig. 8, with each residue colored according to its estimated importance. The full listing of residues in 1lujB can be found in the file called 1lujB.ranks sorted in the attachment.

7 Table 10. res type disruptive mutations 26 L (YR)(TH)(SKECG)(FQWD) 33 L (Y)(T)(HR)(SCG) 30 G (R)(KE)(H)(FYQWD) 19 K (Y)(T)(FW)(SCG) 70 A (Y)(R)(KE)(H) 18 Q (Y)(T)(FWH)(SCDG) 37 E (FWH)(R)(YVCAG)(T)

Table 10. List of disruptive mutations for the top 25% of residues in 1lujB, that are at the interface with 1lujA.

Fig. 10. Residues in 1lujB, colored according to the cluster they belong to: red, followed by blue and yellow are the largest clusters (see Appendix for the coloring scheme). Clockwise: front, back, top and bottom views. The corresponding Pymol script is attached.

Table 9. res type subst’s cvg noc/ dist (%) bb (A˚ ) 26 L L(100) 0.04 18/2 3.39 33 L L(95) 0.09 5/3 3.91 K(4) 30 G G(95) 0.11 9/9 4.09 L(4) 19 K K(90) 0.13 33/11 3.63 I(9) 70 A A(90) 0.18 12/9 3.34 Fig. 11. Residues in 1lujB, at the interface with 1lujA, colored by their rela- .(4) tive importance. 1lujA is shown in backbone representation (See Appendix L(4) for the coloring scheme for the protein chain 1lujB.) 18 Q Q(90) 0.20 5/4 4.26 R(4) V(4) Figure 11 shows residues in 1lujB colored by their importance, at the 37 E D(38) 0.25 19/0 3.02 interface with 1lujA. E(61) 3.4.3 Possible novel functional surfaces at 25% coverage. One group of residues is conserved on the 1lujB surface, away from (or Table 9. The top 25% of residues in 1lujB at the interface with 1lujA. susbtantially larger than) other functional sites and interfaces reco- (Field names: res: residue number in the PDB entry; type: amino acid type; gnizable in PDB entry 1luj. It is shown in Fig. 12. The residues substs: substitutions seen in the alignment; with the percentage of each type belonging to this surface ”patch” are listed in Table 11, while Table in the bracket; noc/bb: number of contacts with the ligand, with the number of 12 suggests possible disruptive replacements for these residues (see contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. ) Section 4.6). Table 11. res type substitutions(%) cvg 23 L L(100) 0.04 26 L L(100) 0.04 continued in next column

8 Table 12. continued res type disruptive mutations 38 E (H)(FWR)(Y)(K) 18 Q (Y)(T)(FWH)(SCDG) 37 E (FWH)(R)(YVCAG)(T)

Table 12. Disruptive mutations for the surface patch in 1lujB.

4 NOTES ON USING TRACE RESULTS 4.1 Coverage Trace results are commonly expressed in terms of coverage: the resi- due is important if its “coverage” is small - that is if it belongs to some small top percentage of residues [100% is all of the residues in a chain], according to trace. The ET results are presented in the form of a table, usually limited to top 25% percent of residues (or to some nearby percentage), sorted by the strength of the presumed evolutionary pressure. (I.e., the smaller the coverage, the stronger the pressure on the residue.) Starting from the top of that list, mutating a Fig. 12. A possible active surface on the chain 1lujB. couple of residues should affect the protein somehow, with the exact effects to be determined experimentally.

Table 11. continued 4.2 Known substitutions res type substitutions(%) cvg One of the table columns is “substitutions” - other amino acid types 33 L L(95)K(4) 0.09 seen at the same position in the alignment. These amino acid types 40 F F(95)L(4) 0.09 may be interchangeable at that position in the protein, so if one wants 45 A A(95)E(4) 0.09 to affect the protein by a point mutation, they should be avoided. For 17 Q Q(95)E(4) 0.11 example if the substitutions are “RVK” and the original protein has 30 G G(95)L(4) 0.11 an R at that position, it is advisable to try anything, but RVK. Conver- 19 K K(90)I(9) 0.13 sely, when looking for substitutions which will not affect the protein, 28 K K(90)P(4)E(4) 0.15 one may try replacing, R with K, or (perhaps more surprisingly), with 41 L L(90)I(4)E(4) 0.15 V. The percentage of times the substitution appears in the alignment 38 E E(85)S(4)A(4) 0.17 is given in the immediately following bracket. No percentage is given C(4) in the cases when it is smaller than 1%. This is meant to be a rough 18 Q Q(90)R(4)V(4) 0.20 guide - due to rounding errors these percentages often do not add up 37 E D(38)E(61) 0.25 to 100%.

Table 11. Residues forming surface ”patch” in 1lujB. 4.3 Surface To detect candidates for novel functional interfaces, first we look for residues that are solvent accessible (according to DSSP program) by 2 Table 12. at least 10A˚ , which is roughly the area needed for one water mole- res type disruptive cule to come in the contact with the residue. Furthermore, we require mutations that these residues form a “cluster” of residues which have neighbor 23 L (YR)(TH)(SKECG)(FQWD) within 5A˚ from any of their heavy atoms. 26 L (YR)(TH)(SKECG)(FQWD) Note, however, that, if our picture of protein evolution is correct, 33 L (Y)(T)(HR)(SCG) the neighboring residues which are not surface accessible might be 40 F (KE)(T)(QDR)(SCG) equally important in maintaining the interaction specificity - they 45 A (YR)(KH)(EQ)(FNWD) should not be automatically dropped from consideration when choo- 17 Q (Y)(FWH)(T)(VCAG) sing the set for mutagenesis. (Especially if they form a cluster with 30 G (R)(KE)(H)(FYQWD) the surface residues.) 19 K (Y)(T)(FW)(SCG) 28 K (Y)(FTW)(CG)(SVAH) 4.4 Number of contacts 41 L (YR)(H)(T)(CG) Another column worth noting is denoted “noc/bb”; it tells the num- continued in next column ber of contacts heavy atoms of the residue in question make across the interface, as well as how many of them are realized through the backbone atoms (if all or most contacts are through the backbone,

9 mutation presumably won’t have strong impact). Two heavy atoms are considered to be “in contact” if their centers are closer than 5A˚ . 4.5 Annotation If the residue annotation is available (either from the pdb file or COVERAGE from other sources), another column, with the header “annotation”

appears. Annotations carried over from PDB are the following: site V (indicating existence of related site record in PDB ), S-S (disulfide 100% 50% 30% 5% bond forming residue), hb (hydrogen bond forming residue, jb (james bond forming residue), and sb (for salt bridge forming residue). 4.6 Mutation suggestions Mutation suggestions are completely heuristic and based on comple-

mentarity with the substitutions found in the alignment. Note that V they are meant to be disruptive to the interaction of the protein RELATIVE IMPORTANCE with its ligand. The attempt is made to complement the following properties: small [AV GSTC], medium [LPNQDEMIK], large [WFYHR], hydrophobic [LPVAMWFI], polar [GTCY ]; posi- Fig. 13. Coloring scheme used to color residues by their relative importance. tively [KHR], or negatively [DE] charged, aromatic [WFYH], long aliphatic chain [EKRQM], OH-group possession [SDETY ], and NH2 group possession [NQRK]. The suggestions are listed DarkOliveGreen, CornflowerBlue, grey55, burlywood, LimeGreen, according to how different they appear to be from the original amino tan, DarkOrange, DeepPink, maroon, BlanchedAlmond. acid, and they are grouped in round brackets if they appear equally The colors used to distinguish the residues by the estimated disruptive. From left to right, each bracketed group of amino acid evolutionary pressure they experience can be seen in Fig. 13. types resembles more strongly the original (i.e. is, presumably, less disruptive) These suggestions are tentative - they might prove disrup- 5.3 Credits tive to the fold rather than to the interaction. Many researcher will 5.3.1 Alistat alistat reads a multiple sequence alignment from the choose, however, the straightforward alanine mutations, especially in file and shows a number of simple statistics about it. These stati- the beginning stages of their investigation. stics include the format, the number of sequences, the total number of residues, the average and range of the sequence lengths, and the 5 APPENDIX alignment length (e.g. including gap characters). Also shown are 5.1 File formats some percent identities. A percent pairwise alignment identity is defi- ned as (idents / MIN(len1, len2)) where idents is the number of Files with extension “ranks sorted” are the actual trace results. The exact identities and len1, len2 are the unaligned lengths of the two fields in the table in this file: sequences. The ”average percent identity”, ”most related pair”, and • alignment# number of the position in the alignment ”most unrelated pair” of the alignment are the average, maximum, and minimum of all (N)(N-1)/2 pairs, respectively. The ”most distant • residue# residue number in the PDB file seq” is calculated by finding the maximum pairwise identity (best • type amino acid type relative) for all N sequences, then finding the minimum of these N • rank rank of the position according to older version of ET numbers (hence, the most outlying sequence). alistat is copyrighted • variability has two subfields: by HHMI/Washington University School of Medicine, 1992-2001, 1. number of different amino acids appearing in in this column and freely distributed under the GNU General Public License. of the alignment 5.3.2 CE To map ligand binding sites from different 2. their type source structures, report maker uses the CE program: • rho ET score - the smaller this value, the lesser variability of http://cl.sdsc.edu/. Shindyalov IN, Bourne PE (1998) this position across the branches of the tree (and, presumably, ”Protein structure alignment by incremental combinatorial extension the greater the importance for the protein) (CE) of the optimal path . Protein Engineering 11(9) 739-747. • cvg coverage - percentage of the residues on the structure which 5.3.3 DSSP In this work a residue is considered solvent accessi- have this rho or smaller ble if the DSSP program finds it exposed to water by at least 10A˚ 2, • gaps percentage of gaps in this column which is roughly the area needed for one water molecule to come in the contact with the residue. DSSP is copyrighted by W. Kabsch, C. 5.2 Color schemes used Sander and MPI-MF, 1983, 1985, 1988, 1994 1995, CMBI version The following color scheme is used in figures with residues colored by [email protected] November 18,2002, by cluster size: black is a single-residue cluster; clusters composed of http://www.cmbi.kun.nl/gv/dssp/descrip.html. more than one residue colored according to this hierarchy (ordered by descending size): red, blue, yellow, green, purple, azure, tur- 5.3.4 HSSP Whenever available, report maker uses HSSP ali- quoise, brown, coral, magenta, LightSalmon, SkyBlue, violet, gold, gnment as a starting point for the analysis (sequences shorter than bisque, LightSlateBlue, orchid, RosyBrown, MediumAquamarine, 75% of the query are taken out, however); R. Schneider, A. de

10 Daruvar, and C. Sander. ”The HSSP database of protein structure- 5.6 About report maker sequence alignments.” Nucleic Acids Res., 25:226–230, 1997. report maker was written in 2006 by Ivana Mihalek. The 1D ran- http://swift.cmbi.kun.nl/swift/hssp/ king visualization program was written by Ivica Res.ˇ report maker is copyrighted by Lichtarge Lab, Baylor College of Medicine, 5.3.5 LaTex The text for this report was processed using LATEX; Houston. Leslie Lamport, “LaTeX: A Document Preparation System Addison- Wesley,” Reading, Mass. (1986). 5.7 Attachments 5.3.6 Muscle When making alignments “from scratch”, report The following files should accompany this report: maker uses Muscle alignment program: Edgar, Robert C. (2004), • ”MUSCLE: multiple sequence alignment with high accuracy and 1lujA.complex.pdb - coordinates of 1lujA with all of its interac- high throughput.” Nucleic Acids Research 32(5), 1792-97. ting partners • 1lujA.etvx - ET viewer input file for 1lujA http://www.drive5.com/muscle/ • 1lujA.cluster report.summary - Cluster report summary for 5.3.7 Pymol The figures in this report were produced using 1lujA Pymol. The scripts can be found in the attachment. Pymol • 1lujA.ranks - Ranks file in sequence order for 1lujA is an open-source application copyrighted by DeLano Scien- • 1lujA.clusters - Cluster descriptions for 1lujA tific LLC (2005). For more information about Pymol see http://pymol.sourceforge.net/. (Note for Windows • 1lujA.msf - the multiple sequence alignment used for the chain users: the attached package needs to be unzipped for Pymol to read 1lujA the scripts and launch the viewer.) • 1lujA.descr - description of sequences used in 1lujA msf 5.4 Note about ET Viewer • 1lujA.ranks sorted - full listing of residues and their ranking for 1lujA Dan Morgan from the Lichtarge lab has developed a visualization tool specifically for viewing trace results. If you are interested, please • 1lujA.1lujB.if.pml - Pymol script for Figure 5 visit: • 1lujA.cbcvg - used by other 1lujA – related pymol scripts http://mammoth.bcm.tmc.edu/traceview/ • 1lujB.complex.pdb - coordinates of 1lujB with all of its interac- ting partners The viewer is self-unpacking and self-installing. Input files to be used • 1lujB.etvx - ET viewer input file for 1lujB with ETV (extension .etvx) can be found in the attachment to the main report. • 1lujB.cluster report.summary - Cluster report summary for 1lujB 5.5 Citing this work • 1lujB.ranks - Ranks file in sequence order for 1lujB The method used to rank residues and make predictions in this report • 1lujB.clusters - Cluster descriptions for 1lujB can be found in Mihalek, I., I. Res,ˇ O. Lichtarge. (2004). ”A Family of Evolution-Entropy Hybrid Methods for Ranking of Protein Residues • 1lujB.msf - the multiple sequence alignment used for the chain by Importance” J. Mol. Bio. 336: 1265-82. For the original version 1lujB of ET see O. Lichtarge, H.Bourne and F. Cohen (1996). ”An Evolu- • 1lujB.descr - description of sequences used in 1lujB msf tionary Trace Method Defines Binding Surfaces Common to Protein • 1lujB.ranks sorted - full listing of residues and their ranking for Families” J. Mol. Bio. 257: 342-358. 1lujB report maker itself is described in Mihalek I., I. Res and O. • Lichtarge (2006). ”Evolutionary Trace Report Maker: a new type 1lujB.1lujA.if.pml - Pymol script for Figure 11 of service for comparative analysis of .” Bioinformatics • 1lujB.cbcvg - used by other 1lujB – related pymol scripts 22:1656-7.

11