3Ews Lichtarge Lab 2006

Pages 1–9 3ews Evolutionary trace report by report maker December 13, 2008

4.3.1 Alistat 8 4.3.2 CE 8 4.3.3 DSSP 8 4.3.4 HSSP 8 4.3.5 LaTex 8 4.3.6 Muscle 8 4.3.7 Pymol 8 4.4 Note about ET Viewer 8 4.5 Citing this work 8 4.6 About report maker 8 4.7 Attachments 8

1 INTRODUCTION From the original Protein Data Bank entry (PDB id 3ews): Title: Human dead-box rna-helicase ddx19 in complex with adp Compound: Mol id: 1; molecule: atp-dependent rna helicase ddx19b; chain: a, b; fragment: helicase atp-binding domain, helicase c-terminal domain, unp residues 54-475; synonym: dead box protein 19b, dead box rna helicase dead5; ec: 3.6.1.-; engineered: yes Organism, scientiﬁc name: Homo Sapiens; 3ews contains a single unique chain 3ewsA (416 residues long) and CONTENTS its homologue 3ewsB.

1 Introduction 1 2 CHAIN 3EWSA

2 Chain 3ewsA 1 2.1 Q9UMR2 overview 2.1 Q9UMR2 overview 1 From SwissProt, id Q9UMR2, 99% identical to 3ewsA: 2.2 Multiple sequence alignment for 3ewsA 1 Description: ATP-dependent RNA helicase DDX19 (DEAD-box 2.3 Residue ranking in 3ewsA 1 protein 19) (DEAD-box RNA helicase DEAD5). 2.4 Top ranking residues in 3ewsA and their position on Organism, scientific name: Homo sapiens (Human). the structure 1 Taxonomy: Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; 2.4.1 Clustering of residues at 25% coverage. 2 Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; 2.4.2 Overlap with known functional surfaces at Catarrhini; Hominidae; Homo. 25% coverage. 3 Function: ATP-dependent RNA helicase involved in mRNA export 2.4.3 Possible novel functional surfaces at 25% from the nucleus. coverage. 4 Subunit: Interacts with Nup214. Subcellular location: Cytoplasmic and nuclear pore complex cyto- 3 Notes on using trace results 7 plasmic fibrils. 3.1 Coverage 7 Similarity: Belongs to the DEAD box helicase family. 3.2 Known substitutions 7 DBP5/DDX19 subfamily. 3.3 Surface 7 About: This Swiss-Prot entry is copyright. It is produced through a 3.4 Number of contacts 7 collaboration between the Swiss Institute of Bioinformatics and the 3.5 Annotation 7 EMBL outstation - the European Bioinformatics Institute. There are 3.6 Mutation suggestions 7 no restrictions on its use as long as its content is in no way modified and this statement is not removed. 4 Appendix 7 4.1 File formats 7 2.2 Multiple sequence alignment for 3ewsA 4.2 Color schemes used 7 For the chain 3ewsA, the alignment 3ewsA.msf (attached) with 1007 4.3 Credits 8 sequences was used. The alignment was downloaded from the HSSP

1 Lichtarge lab 2006 Fig. 1. Residues 53-260 in 3ewsA colored by their relative importance. (See Fig. 2. Residues 261-472 in 3ewsA colored by their relative importance. (See Appendix, Fig.8, for the coloring scheme.) Appendix, Fig.8, for the coloring scheme.)

database, and fragments shorter than 75% of the query as well as duplicate sequences were removed. It can be found in the attachment to this report, under the name of 3ewsA.msf. Its statistics, from the alistat program are the following:

Format: MSF Number of sequences: 1007 Total number of residues: 343677 Smallest: 179 Largest: 416 Average length: 341.3 Alignment length: 416 Average identity: 39% Most related pair: 99% Most unrelated pair: 12% Most distant seq: 32%

Furthermore, <1% of residues show as conserved in this alignment. The alignment consists of 10% eukaryotic ( 4% vertebrata, <1% arthropoda, <1% fungi, 1% plantae), 10% prokaryotic, Fig. 3. Residues in 3ewsA, colored by their relative importance. Clockwise: <1% archaean, and <1% viral sequences. (Descriptions of some front, back, top and bottom views. sequences were not readily available.) The file containing the sequence descriptions can be found in the attachment, under the name 3ewsA.descr. 2.4.1 Clustering of residues at 25% coverage. Fig. 4 shows the top 25% of all residues, this time colored according to clusters they 2.3 Residue ranking in 3ewsA belong to. The clusters in Fig.4 are composed of the residues listed in Table 1. The 3ewsA sequence is shown in Figs. 1–2, with each residue colored according to its estimated importance. The full listing of residues Table 1. in 3ewsA can be found in the file called 3ewsA.ranks sorted in the cluster size member attachment. color residues red 58 94,111,112,115,116,118,119 2.4 Top ranking residues in 3ewsA and their position on 123,124,136,137,138,139,140 the structure 141,142,143,144,145,146,147 148,151,170,171,172,173,174 In the following we consider residues ranking among top 25% of 175,177,178,181,185,215,216 residues in the protein . Figure 3 shows residues in 3ewsA colored 217,218,219,220,223,224,241 by their importance: bright red and yellow indicate more conserved/important residues (see Appendix for the coloring scheme). A continued in next column Pymol script for producing this figure can be found in the attachment.

2 Table 2. continued res type subst’s cvg noc/ dist antn (%) bb (A˚ ) 143 G G(99).N 0.03 30/30 3.15 141 G G(98).R 0.04 28/28 2.91 site S 119 Q Q(98).K 0.05 15/0 2.71 A 145 T T(98).S 0.05 29/15 2.73 site G 140 S T(57) 0.06 11/9 3.61 site S(31) N(9).AR 94 F F(95) 0.10 4/0 3.65 W(1) .(2)AYV H 142 T S(12) 0.10 17/15 3.35 site T(83) M(2).Y 138 S A(81) 0.14 1/0 4.95 S(16) Fig. 4. Residues in 3ewsA, colored according to the cluster they belong to: G(1).N red, followed by blue and yellow are the largest clusters (see Appendix for 146 A A(84) 0.14 9/4 3.02 site the coloring scheme). Clockwise: front, back, top and bottom views. The S(2)V corresponding Pymol script is attached. L(2)I G(5). C(2) Table 1. continued T(1)FQH cluster size member 139 Q Q(76) 0.15 2/2 4.17 color residues P(3) 242,243,244,245,246,247,248 K(14).S 252,253,272,273,274,275,276 H(1)ERA 277,281 NYGF blue 40 309,320,324,339,342,343,346 115 P P(86) 0.16 7/6 3.68 site 353,364,365,369,372,376,379 A(6) 380,387,388,389,390,391,392 M(2) 393,394,395,396,397,398,399 T(1) 402,405,407,409,411,421,423 .(1)CLS 425,426,427,428,429 E yellow 3 133,269,292 116 S S(60) 0.16 9/3 3.71 T(37) Table 1. Clusters of top ranking residues in 3ewsA. M(1) .(1)FL 111 G G(88)R 0.19 1/1 4.68 2.4.2 Overlap with known functional surfaces at 25% coverage. K(3) The name of the ligand is composed of the source PDB identiﬁer N(4) and the heteroatom name used in that ﬁle. .(1)ASH ADP binding site. Table 2 lists the top 25% of residues at the VQ interface with 3ewsAADP602 (adp). The following table (Table 3) 112 F Y(37) 0.20 78/9 3.43 site suggests possible disruptive replacements for these residues (see F(56) Section 3.6). I(1)H Table 2. W(1) res type subst’s cvg noc/ dist antn .(1)ML (%) bb (A˚ ) 144 K K(99). 0.02 31/17 2.80 site Table 2. The top 25% of residues in 3ewsA at the interface with continued in next column ADP.(Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each

3 type in the bracket; noc/bb: number of contacts with the ligand, with the num- Interface with 3ewsB.Table 4 lists the top 25% of residues at the ber of contacts realized through backbone atoms given in the bracket; dist: interface with 3ewsB. The following table (Table 5) suggests possible distance of closest apporach to the ligand. ) disruptive replacements for these residues (see Section 3.6). Table 4. Table 3. res type subst’s cvg noc/ dist res type disruptive (%) bb (A˚ ) mutations 172 Y R(94). 0.08 17/0 3.18 144 K (Y)(FTW)(SVCAG)(HD) Y(3)HQL 143 G (R)(E)(K)(FWH) KDFC 141 G (E)(K)(FMWDR)(QH) 365 G G(83) 0.14 5/5 3.48 119 Q (Y)(FTWH)(SCG)(VA) A(9) 145 T (KR)(FQMWH)(NELPI)(D) R(1). 140 S (KR)(FWH)(M)(Q) S(4)DE 94 F (K)(E)(Q)(D) 342 T T(79)H 0.17 24/7 3.16 142 T (KR)(Q)(H)(FMW) S(13) 138 S (R)(K)(H)(FW) R(3).KM 146 A (R)(KE)(Y)(QD) LAQYE 139 Q (Y)(T)(FWH)(CG) 343 R K(54) 0.22 53/12 3.13 115 P (R)(Y)(H)(K) R(31) 116 S (R)(K)(H)(Q) Q(1) 111 G (E)(R)(D)(K) T(4) 112 F (KE)(T)(QD)(R) V(5).NS IHAFC Table 3. List of disruptive mutations for the top 25% of residues in 3ewsA, that are at the interface with ADP. Table 4. The top 25% of residues in 3ewsA at the interface with 3ewsB. (Field names: res: residue number in the PDB entry; type: amino acid type; substs: substitutions seen in the alignment; with the percentage of each type in the bracket; noc/bb: number of contacts with the ligand, with the number of contacts realized through backbone atoms given in the bracket; dist: distance of closest apporach to the ligand. )

Table 5. res type disruptive mutations 172 Y (K)(M)(Q)(E) 365 G (R)(K)(FWH)(E) 342 T (R)(K)(FWH)(QM) 343 R (D)(T)(YE)(S)

Table 5. List of disruptive mutations for the top 25% of residues in 3ewsA, that are at the interface with 3ewsB.

Figure 6 shows residues in 3ewsA colored by their importance, at the interface with 3ewsB. 2.4.3 Possible novel functional surfaces at 25% coverage. One group of residues is conserved on the 3ewsA surface, away from (or susbtantially larger than) other functional sites and interfaces reco- gnizable in PDB entry 3ews. It is shown in Fig. 7. The residues belonging to this surface ”patch” are listed in Table 6, while Table Fig. 5. Residues in 3ewsA, at the interface with ADP, colored by their relative 7 suggests possible disruptive replacements for these residues (see importance. The ligand (ADP) is colored green. Atoms further than 30A˚ away from the geometric center of the ligand, as well as on the line of sight to the Section 3.6). ligand were removed. (See Appendix for the coloring scheme for the protein Table 6. chain 3ewsA.) res type substitutions(%) cvg antn 276 T T(100) 0.00 Figure 5 shows residues in 3ewsA colored by their importance, at the continued in next column interface with 3ewsAADP602.

4 Table 6. continued res type substitutions(%) cvg antn 144 K K(99). 0.02 site 372 R R(99).QC 0.02 395 R R(99).S 0.02 396 G G(99).C 0.02 143 G G(99).N 0.03 177 Q Q(99).RAKGE 0.03 339 F F(98).Y 0.03 141 G G(98).RS 0.04 site 398 D D(98)N.H(1) 0.04 119 Q Q(98).KA 0.05 145 T T(98).SG 0.05 site 170 P P(95)H(3).KLNAS 0.05 173 E E(98).PRKYLH 0.05 140 S T(57)S(31)N(9). 0.06 site AR 425 H H(99).L 0.06 426 R R(99).S 0.06 429 R R(98).H 0.06 217 T T(98)NSA.P 0.07 391 N D(88)N(9).GES 0.07 Fig. 6. Residues in 3ewsA, at the interface with 3ewsB, colored by their rela- 172 Y R(94).Y(3)HQLKD 0.08 tive importance. 3ewsB is shown in backbone representation (See Appendix FC for the coloring scheme for the protein chain 3ewsA.) 428 G G(97).A(1) 0.08 171 T T(95)S(2).AVIG 0.09 220 T R(90)T(6)K(2)LI 0.09 SH. 405 V V(96).AI(2)R 0.09 142 T S(12)T(83)M(2). 0.10 site Y 379 F L(18)F(78)M.Y 0.10 I(1)VH 248 I L(91)I(5)F(1)MV 0.11 Y. 320 K K(92)R(6)L.YDVH 0.11 P 253 H F(84)L(3)NM(4)Q 0.12 TH(3)KYVSIA 309 Q Q(91)H(4)ADV 0.12 E(1)RSNTL 427 I I(88)T(1).S(2) 0.12 V(6)RQL 252 G G(84)N(1)D(4) 0.13 F(2)E(4)IS(1)TV M 369 V Q(89)AIG(1).H 0.13 V(3)KTNEPSRML Fig. 7. A possible active surface on the chain 3ewsA. 146 A A(84)S(2)VL(2)I 0.14 site G(5).C(2)T(1)FQ H Table 6. continued 246 V E(57)N(3)DQ(3) 0.14 res type substitutions(%) cvg antn R(16)V(3)K(12)I 219 G G(99).D 0.01 TLHFMG. 242 D D(99). 0.01 continued in next column 243 E E(99). 0.01 245 D D(99). 0.01 275 A A(99)R 0.01 continued in next column

5 Table 6. continued Table 6. continued res type substitutions(%) cvg antn res type substitutions(%) cvg antn 273 F F(80)V(4)L(8)S 0.14 Y(4)I(2)VWH Y(2)CI(2)M 399 V V(60)I(37).FQSA 0.23 365 G G(83)A(9)R(1). 0.14 L S(4)DE 380 R K(38)R(57)SA(1) 0.24 139 Q Q(76)P(3)K(14). 0.15 HYT.VQF SH(1)ERANYGF 402 V I(26)V(58). 0.24 223 D D(87)E(3)A(2) 0.15 L(14)ACTMN Q(2)H(1)RKNGYS. 251 Q M(53)K(5)R(13) 0.25 F D(2)L(4)Q(10) 394 A A(81)S(5)T(7) 0.15 P(2)E(1)ISANT G(5).PR G(1)HV.YF 115 P P(86)A(6)M(2) 0.16 site T(1).(1)CLSE Table 6. Residues forming surface ”patch” in 3ewsA. 116 S S(60)T(37)M(1) 0.16 .(1)FL 421 E E(86).(8)D(2)SK 0.16 Table 7. AQ(1)VP res type disruptive 342 T T(79)HS(13)R(3) 0.17 mutations .KMLAQYE 276 T (KR)(FQMWH)(NELPI)(D) 392 V V(66)L(25)I(7). 0.17 219 G (R)(K)(FWH)(M) M 242 D (R)(FWH)(VCAG)(KY) 393 C A(54)W(8)L(21) 0.17 243 E (FWH)(VCAG)(YR)(T) I(1)F(5).V(4)S 245 D (R)(FWH)(VCAG)(KY) C(2)M(1)T 275 A (YE)(D)(K)(QHR) 409 D D(86)H(1).G(1)N 0.18 144 K (Y)(FTW)(SVCAG)(HD) E(6)Q(1)TSKA 372 R (D)(T)(Y)(SEVLAPI) 111 G G(88)RK(3)N(4) 0.19 395 R (D)(T)(LPI)(YE) .(1)ASHVQ 396 G (E)(KR)(FMWD)(H) 112 F Y(37)F(56)I(1)H 0.20 site 143 G (R)(E)(K)(FWH) W(1).(1)ML 177 Q (Y)(FW)(H)(T) 124 P P(75)V(6)I(1)S 0.20 339 F (K)(E)(Q)(D) M(3)DK(3)T(1) 141 G (E)(K)(FMWDR)(QH) .(1)RL(3)AGQFYE 398 D (R)(FW)(VCAHG)(Y) 397 I L(39)I(57)M(1). 0.20 119 Q (Y)(FTWH)(SCG)(VA) FV 145 T (KR)(FQMWH)(NELPI)(D) 224 W H(40)M(21)L(26) 0.21 170 P (Y)(R)(T)(H) F(1)Y(3)IQW(4)N 173 E (FW)(CHG)(YVA)(T) TVS.C 140 S (KR)(FWH)(M)(Q) 137 Q M(3)Q(60)EL(5) 0.22 425 H (E)(T)(D)(Q) R(9)G(1)CV(1). 426 R (D)(T)(LPI)(YE) S(4)K(4)I(1) 429 R (TD)(SEVCLAPIG)(YM)(FNW) T(1)A(2)FH 217 T (R)(K)(H)(FW) 181 V A(16)E(23)T 0.22 391 N (Y)(FWH)(R)(T) V(34)D(4)Y(1) 172 Y (K)(M)(Q)(E) S(8)I(1)N(3) 428 G (KER)(HD)(Q)(FMW) Q(2)L.CFRGMK 171 T (R)(K)(H)(Q) 343 R K(54)R(31)Q(1) 0.22 220 T (R)(K)(FW)(H) T(4)V(5).NSIHAF 405 V (Y)(E)(KR)(D) C 142 T (KR)(Q)(H)(FMW) 118 I I(83)V(14).(1)M 0.23 379 F (KE)(T)(QD)(R) GA 248 I (R)(Y)(T)(H) 277 F M(48)L(20)F(22) 0.23 continued in next column continued in next column

6 Table 7. continued to affect the protein by a point mutation, they should be avoided. For res type disruptive example if the substitutions are “RVK” and the original protein has mutations an R at that position, it is advisable to try anything, but RVK. Conver- 320 K (Y)(T)(FW)(CG) sely, when looking for substitutions which will not affect the protein, 253 H (E)(TD)(Q)(K) one may try replacing, R with K, or (perhaps more surprisingly), with 309 Q (Y)(H)(FW)(T) V. The percentage of times the substitution appears in the alignment 427 I (Y)(R)(H)(T) is given in the immediately following bracket. No percentage is given 252 G (R)(K)(H)(E) in the cases when it is smaller than 1%. This is meant to be a rough 369 V (Y)(R)(E)(K) guide - due to rounding errors these percentages often do not add up 146 A (R)(KE)(Y)(QD) to 100%. 246 V (Y)(R)(E)(K) 273 F (K)(E)(R)(Q) 3.3 Surface 365 G (R)(K)(FWH)(E) To detect candidates for novel functional interfaces, first we look for 139 Q (Y)(T)(FWH)(CG) residues that are solvent accessible (according to DSSP program) by 2 223 D (R)(FW)(H)(Y) at least 10A˚ , which is roughly the area needed for one water mole- 394 A (KER)(Y)(H)(D) cule to come in the contact with the residue. Furthermore, we require 115 P (R)(Y)(H)(K) that these residues form a “cluster” of residues which have neighbor 116 S (R)(K)(H)(Q) within 5A˚ from any of their heavy atoms. 421 E (H)(FW)(Y)(R) Note, however, that, if our picture of protein evolution is correct, 342 T (R)(K)(FWH)(QM) the neighboring residues which are not surface accessible might be 392 V (Y)(R)(H)(KE) equally important in maintaining the interaction specificity - they 393 C (R)(K)(E)(H) should not be automatically dropped from consideration when choo- 409 D (R)(FW)(H)(Y) sing the set for mutagenesis. (Especially if they form a cluster with 111 G (E)(R)(D)(K) the surface residues.) 112 F (KE)(T)(QD)(R) 124 P (YR)(H)(T)(K) 3.4 Number of contacts 397 I (YR)(T)(H)(KE) Another column worth noting is denoted “noc/bb”; it tells the num- 224 W (K)(E)(D)(Q) ber of contacts heavy atoms of the residue in question make across 137 Q (Y)(H)(FW)(T) the interface, as well as how many of them are realized through the 181 V (Y)(R)(K)(E) backbone atoms (if all or most contacts are through the backbone, 343 R (D)(T)(YE)(S) mutation presumably won’t have strong impact). Two heavy atoms 118 I (Y)(R)(H)(TKE) are considered to be “in contact” if their centers are closer than 5A˚ . 277 F (KE)(T)(QD)(R) 399 V (YR)(KE)(H)(D) 3.5 Annotation 380 R (D)(TE)(Y)(LPI) If the residue annotation is available (either from the pdb file or 402 V (R)(Y)(KE)(H) from other sources), another column, with the header “annotation” 251 Q (Y)(H)(T)(FW) appears. Annotations carried over from PDB are the following: site (indicating existence of related site record in PDB ), S-S (disulfide Table 7. Disruptive mutations for the surface patch in 3ewsA. bond forming residue), hb (hydrogen bond forming residue, jb (james bond forming residue), and sb (for salt bridge forming residue). 3.6 Mutation suggestions 3 NOTES ON USING TRACE RESULTS Mutation suggestions are completely heuristic and based on comple- 3.1 Coverage mentarity with the substitutions found in the alignment. Note that Trace results are commonly expressed in terms of coverage: the resi- they are meant to be disruptive to the interaction of the protein due is important if its “coverage” is small - that is if it belongs to with its ligand. The attempt is made to complement the following some small top percentage of residues [100% is all of the residues properties: small [AV GSTC], medium [LPNQDEMIK], large in a chain], according to trace. The ET results are presented in the [WFYHR], hydrophobic [LPVAMWFI], polar [GTCY ]; posi- form of a table, usually limited to top 25% percent of residues (or tively [KHR], or negatively [DE] charged, aromatic [WFYH], to some nearby percentage), sorted by the strength of the presumed long aliphatic chain [EKRQM], OH-group possession [SDETY ], evolutionary pressure. (I.e., the smaller the coverage, the stronger the and NH2 group possession [NQRK]. The suggestions are listed pressure on the residue.) Starting from the top of that list, mutating a according to how different they appear to be from the original amino couple of residues should affect the protein somehow, with the exact acid, and they are grouped in round brackets if they appear equally effects to be determined experimentally. disruptive. From left to right, each bracketed group of amino acid types resembles more strongly the original (i.e. is, presumably, less 3.2 Known substitutions disruptive) These suggestions are tentative - they might prove disrup- One of the table columns is “substitutions” - other amino acid types tive to the fold rather than to the interaction. Many researcher will seen at the same position in the alignment. These amino acid types choose, however, the straightforward alanine mutations, especially in may be interchangeable at that position in the protein, so if one wants the beginning stages of their investigation.

7 alignment length (e.g. including gap characters). Also shown are some percent identities. A percent pairwise alignment identity is deﬁ- ned as (idents / MIN(len1, len2)) where idents is the number of exact identities and len1, len2 are the unaligned lengths of the two COVERAGE sequences. The ”average percent identity”, ”most related pair”, and ”most unrelated pair” of the alignment are the average, maximum,

V and minimum of all (N)(N-1)/2 pairs, respectively. The ”most distant 100% 50% 30% 5% seq” is calculated by ﬁnding the maximum pairwise identity (best relative) for all N sequences, then ﬁnding the minimum of these N numbers (hence, the most outlying sequence). alistat is copyrighted by HHMI/Washington University School of Medicine, 1992-2001, and freely distributed under the GNU General Public License.

V 4.3.2 CE To map ligand binding sites from different source structures, report maker uses the CE program: RELATIVE IMPORTANCE http://cl.sdsc.edu/. Shindyalov IN, Bourne PE (1998) ”Protein structure alignment by incremental combinatorial extension Fig. 8. Coloring scheme used to color residues by their relative importance. (CE) of the optimal path . Protein Engineering 11(9) 739-747. 4.3.3 DSSP In this work a residue is considered solvent accessi- ˚ 2 4 APPENDIX ble if the DSSP program finds it exposed to water by at least 10A , which is roughly the area needed for one water molecule to come in 4.1 File formats the contact with the residue. DSSP is copyrighted by W. Kabsch, C. Files with extension “ranks sorted” are the actual trace results. The Sander and MPI-MF, 1983, 1985, 1988, 1994 1995, CMBI version fields in the table in this file: by [email protected] November 18,2002,

• alignment# number of the position in the alignment http://www.cmbi.kun.nl/gv/dssp/descrip.html. • residue# residue number in the PDB file 4.3.4 HSSP Whenever available, report maker uses HSSP ali- • type amino acid type gnment as a starting point for the analysis (sequences shorter than • rank rank of the position according to older version of ET 75% of the query are taken out, however); R. Schneider, A. de • variability has two subfields: Daruvar, and C. Sander. ”The HSSP database of protein structure- 1. number of different amino acids appearing in in this column sequence alignments.” Nucleic Acids Res., 25:226–230, 1997. of the alignment http://swift.cmbi.kun.nl/swift/hssp/ 2. their type • rho ET score - the smaller this value, the lesser variability of 4.3.5 LaTex The text for this report was processed using LATEX; this position across the branches of the tree (and, presumably, Leslie Lamport, “LaTeX: A Document Preparation System Addison- the greater the importance for the protein) Wesley,” Reading, Mass. (1986). • cvg coverage - percentage of the residues on the structure which 4.3.6 Muscle When making alignments “from scratch”, report have this rho or smaller maker uses Muscle alignment program: Edgar, Robert C. (2004), • gaps percentage of gaps in this column ”MUSCLE: multiple sequence alignment with high accuracy and high throughput.” Nucleic Acids Research 32(5), 1792-97. 4.2 Color schemes used The following color scheme is used in figures with residues colored http://www.drive5.com/muscle/ by cluster size: black is a single-residue cluster; clusters composed of more than one residue colored according to this hierarchy (ordered 4.3.7 Pymol The figures in this report were produced using by descending size): red, blue, yellow, green, purple, azure, tur- Pymol. The scripts can be found in the attachment. Pymol quoise, brown, coral, magenta, LightSalmon, SkyBlue, violet, gold, is an open-source application copyrighted by DeLano Scien- bisque, LightSlateBlue, orchid, RosyBrown, MediumAquamarine, tific LLC (2005). For more information about Pymol see DarkOliveGreen, CornflowerBlue, grey55, burlywood, LimeGreen, http://pymol.sourceforge.net/. (Note for Windows tan, DarkOrange, DeepPink, maroon, BlanchedAlmond. users: the attached package needs to be unzipped for Pymol to read The colors used to distinguish the residues by the estimated the scripts and launch the viewer.) evolutionary pressure they experience can be seen in Fig. 8. 4.4 Note about ET Viewer 4.3 Credits Dan Morgan from the Lichtarge lab has developed a visualization 4.3.1 Alistat alistat reads a multiple sequence alignment from the tool specifically for viewing trace results. If you are interested, please file and shows a number of simple statistics about it. These stati- visit: stics include the format, the number of sequences, the total number of residues, the average and range of the sequence lengths, and the http://mammoth.bcm.tmc.edu/traceview/

8 The viewer is self-unpacking and self-installing. Input files to be used 4.7 Attachments with ETV (extension .etvx) can be found in the attachment to the The following files should accompany this report: main report. • 3ewsA.complex.pdb - coordinates of 3ewsA with all of its 4.5 Citing this work interacting partners The method used to rank residues and make predictions in this report • 3ewsA.etvx - ET viewer input file for 3ewsA can be found in Mihalek, I., I. Res,ˇ O. Lichtarge. (2004). ”A Family of • 3ewsA.cluster report.summary - Cluster report summary for Evolution-Entropy Hybrid Methods for Ranking of Protein Residues 3ewsA by Importance” J. Mol. Bio. 336: 1265-82. For the original version • 3ewsA.ranks - Ranks file in sequence order for 3ewsA of ET see O. Lichtarge, H.Bourne and F. Cohen (1996). ”An Evolu- tionary Trace Method Defines Binding Surfaces Common to Protein • 3ewsA.clusters - Cluster descriptions for 3ewsA Families” J. Mol. Bio. 257: 342-358. • 3ewsA.msf - the multiple sequence alignment used for the chain report maker itself is described in Mihalek I., I. Res and O. 3ewsA Lichtarge (2006). ”Evolutionary Trace Report Maker: a new type • 3ewsA.descr - description of sequences used in 3ewsA msf of service for comparative analysis of proteins.” Bioinformatics • 22:1656-7. 3ewsA.ranks sorted - full listing of residues and their ranking for 3ewsA 4.6 About report maker • 3ewsA.3ewsAADP602.if.pml - Pymol script for Figure 5 report maker was written in 2006 by Ivana Mihalek. The 1D ran- • 3ewsA.cbcvg - used by other 3ewsA – related pymol scripts king visualization program was written by Ivica Res.ˇ report maker • 3ewsA.3ewsB.if.pml - Pymol script for Figure 6 is copyrighted by Lichtarge Lab, Baylor College of Medicine, Houston.