Pages 1–6 1j8m Evolutionary trace report by report maker June 20, 2010

4.3.3 DSSP 5 4.3.4 HSSP 5 4.3.5 LaTex 6 4.3.6 Muscle 6 4.3.7 Pymol 6 4.4 Note about ET Viewer 6 4.5 Citing this work 6 4.6 About report maker 6 4.7 Attachments 6

1 INTRODUCTION From the original Data Bank entry (PDB id 1j8m): Title: Signal recognition particle conserved gtpase from a. ambivalens Compound: Mol id: 1; molecule: signal recognition 54 kda pro- tein; chain: f; fragment: g-domain, gtpase domain; synonym: srp54; engineered: yes Organism, scientific name: Ambivalens; 1j8m contains a single unique chain 1j8mF (295 residues long).

2 CHAIN 1J8MF CONTENTS 2.1 P70722 overview 1 Introduction 1 From SwissProt, id P70722, 89% identical to 1j8mF: Description: Signal recognition 54 kDa protein (SRP54) (Fragment). 2 Chain 1j8mF 1 Organism, scientific name: Acidianus ambivalens (Desulfurolobus 2.1 P70722 overview 1 ambivalens). 2.2 Multiple sequence alignment for 1j8mF 1 : ; ; ; ; 2.3 Residue ranking in 1j8mF 1 ; Acidianus. 2.4 Top ranking residues in 1j8mF and their position on Function: Binds to the signal sequence of presecretory protein when the structure 1 they emerge from the ribosomes (By similarity). 2.4.1 Clustering of residues at 25% coverage. 2 Subunit: Archaeal signal recognition particle consists of a 7S RNA 2.4.2 Possible novel functional surfaces at 25% molecule of 300 nucleotides and two protein subunits: SRP54 and coverage. 2 SRP19 (By similarity). Subcellular location: Cytoplasmic (By similarity). 3 Notes on using trace results 4 Domain: Has a two domain structure: the G-domain binds GTP; the 3.1 Coverage 4 M- domain binds the 7S RNA in presence of SRP19 and also binds 3.2 Known substitutions 4 the signal sequence (By similarity). 3.3 Surface 4 Similarity: Belongs to the GTP-binding SRP family. 3.4 Number of contacts 4 About: This Swiss-Prot entry is copyright. It is produced through a 3.5 Annotation 5 collaboration between the Swiss Institute of Bioinformatics and the 3.6 Mutation suggestions 5 EMBL outstation - the European Bioinformatics Institute. There are no restrictions on its use as long as its content is in no way modified 4 Appendix 5 and this statement is not removed. 4.1 File formats 5 4.2 Color schemes used 5 2.2 Multiple sequence alignment for 1j8mF 4.3 Credits 5 For the chain 1j8mF, the alignment 1j8mF.msf (attached) with 1025 4.3.1 Alistat 5 sequences was used. The alignment was downloaded from the HSSP 4.3.2 CE 5 database, and fragments shorter than 75% of the query as well as

1 Lichtarge lab 2006 by their importance: bright red and yellow indicate more conser- ved/important residues (see Appendix for the coloring scheme). A Pymol script for producing this figure can be found in the attachment.

Fig. 1. Residues 3-149 in 1j8mF colored by their relative importance. (See Appendix, Fig.6, for the coloring scheme.)

Fig. 2. Residues 150-297 in 1j8mF colored by their relative importance. (See Appendix, Fig.6, for the coloring scheme.) duplicate sequences were removed. It can be found in the attachment to this report, under the name of 1j8mF.msf. Its statistics, from the alistat program are the following: Fig. 3. Residues in 1j8mF, colored by their relative importance. Clockwise: Format: MSF front, back, top and bottom views. Number of sequences: 1025 Total number of residues: 294439 Smallest: 229 2.4.1 Clustering of residues at 25% coverage. Fig. 4 shows the Largest: 295 top 25% of all residues, this time colored according to clusters they Average length: 287.3 belong to. The clusters in Fig.4 are composed of the residues listed Alignment length: 295 in Table 1. Average identity: 41% Most related pair: 99% Table 1. Most unrelated pair: 19% cluster size member Most distant seq: 46% color residues red 69 32,33,36,40,41,105,106,107 108,110,111,112,113,117,119 Furthermore, <1% of residues show as conserved in this ali- 133,135,137,138,139,140,141 gnment. 144,145,148,187,188,189,190 The alignment consists of 4% eukaryotic ( <1% vertebrata, <1% 191,192,200,201,203,204,213 arthropoda, 1% fungi, 1% plantae), 16% prokaryotic, and 2% 215,218,219,221,223,225,226 archaean sequences. (Descriptions of some sequences were not rea- 227,228,231,232,235,243,247 dily available.) The file containing the sequence descriptions can be 248,250,251,252,254,255,256 found in the attachment, under the name 1j8mF.descr. 257,258,259,260,261,268,269 270,273,275,276,284 2.3 Residue ranking in 1j8mF blue 2 292,293 The 1j8mF sequence is shown in Figs. 1–2, with each residue colored yellow 2 172,176 according to its estimated importance. The full listing of residues in 1j8mF can be found in the file called 1j8mF.ranks sorted in the Table 1. Clusters of top ranking residues in 1j8mF. attachment.

2.4 Top ranking residues in 1j8mF and their position on 2.4.2 Possible novel functional surfaces at 25% coverage. One the structure group of residues is conserved on the 1j8mF surface, away from (or In the following we consider residues ranking among top 25% of susbtantially larger than) other functional sites and interfaces reco- residues in the protein . Figure 3 shows residues in 1j8mF colored gnizable in PDB entry 1j8m. It is shown in Fig. 5. The right panel

2 Table 2. continued res type substitutions(%) cvg 110 G G(99)L. 0.03 138 R R(99)Q. 0.03 108 G G(99)DC. 0.04 117 K K(99)TQ. 0.04 221 D D(99)SN. 0.04 144 Q Q(99)XW 0.05 191 R R(99)CFSKG 0.05 276 E E(98).QK 0.05 111 K K(99)IR. 0.06 248 K R(10)K(89)M. 0.06 255 G G(98)SAVC. 0.06 40 D D(99).NI 0.07 235 F F(99)ILY. 0.07 284 F F(99).LY 0.07 113 T T(99)SLC. 0.08 189 A A(90)S(9)RVQ 0.09 226 Q Q(97)N(1)K.SHGL 0.09 107 Q Q(75)N(22)YHERK 0.10 S 139 P P(66)A(31)L(2)Y 0.10 Fig. 4. Residues in 1j8mF, colored according to the cluster they belong to: TN. red, followed by blue and yellow are the largest clusters (see Appendix for 203 E E(93)Q(6)K 0.10 the coloring scheme). Clockwise: front, back, top and bottom views. The 275 G S(3)G(94)A(2). 0.10 corresponding Pymol script is attached. 200 L M(32)L(67)KE 0.12 201 L M(82)L(3)I(4) 0.12 F(8)RVC shows (in blue) the rest of the larger cluster this surface belongs to. 251 G G(91)S(5)A(2)CT 0.12 .R 106 V L(74)V(21)AQ 0.13 I(1)FTP 140 A A(77)G(19)N(3)S 0.13 Q 213 P P(95)IAV(1)RTKF 0.14 .SLQE 247 T T(92)S(6)A. 0.14 252 T D(64)T(23)H(7) 0.14 S(3)QNAGY. 192 H T(4)L(77)QM 0.15 H(14)SDNIVAY 254 K R(62)K(35)A(1)M 0.15 Fig. 5. A possible active surface on the chain 1j8mF. The larger cluster it S.N belongs to is shown in blue. 243 T G(83)FS(9)TA(5) 0.16 HEDYC. 133 G S(28)A(55)C(9) 0.17 The residues belonging to this surface ”patch” are listed in Table G(3)P(1)EVQL. 2, while Table 3 suggests possible disruptive replacements for these 268 T P(95)MRKQTAG 0.17 residues (see Section 3.6). S(1).ND Table 2. 137 Y R(3)Y(50)F(31)I 0.18 res type substitutions(%) cvg Q(8)VHN(2)WADTM 135 D D(99)G 0.01 KSL 190 G G(100) 0.01 41 V V(93)I(3)LFT 0.19 141 A A(99)GSE 0.02 M(1)C. 250 D D(99)E. 0.02 continued in next column 256 G G(99). 0.02 continued in next column

3 Table 2. continued Table 3. continued res type substitutions(%) cvg res type disruptive 204 M A(9)L(45)I(22) 0.19 mutations V(4)M(18) 113 T (R)(K)(H)(Q) 269 I I(75)V(19)L(4). 0.20 189 A (Y)(E)(KR)(D) 148 L M(1)L(49)W(20) 0.21 226 Q (Y)(FW)(TH)(SVCAG) I(3)HV(14)GN(9) 107 Q (Y)(FTW)(H)(VA) STXQYA 139 P (R)(Y)(H)(K) 231 L T(31)S(1)Q(34) 0.21 203 E (FW)(H)(Y)(VCAG) V(19)I(1)L(9)AE 275 G (KR)(E)(QH)(FMWD) HMN 200 L (Y)(THR)(CG)(S) 215 E E(82)SYN(3)D(4) 0.22 201 L (Y)(R)(T)(H) H(1)Q(3)L(2)RCM 251 G (KE)(R)(FWH)(MD) V.KA 106 V (R)(Y)(KE)(H) 223 S L(11)M(52)S(11) 0.23 140 A (Y)(R)(E)(KH) VG(2)T(16)HQ 213 P (Y)(R)(H)(T) A(2)N.CI 247 T (KR)(QH)(FMW)(NE) 227 K D(49)N(21)S(1)T 0.23 252 T (R)(K)(M)(FWH) V(1)E(10)KA(8) 192 H (E)(Q)(T)(K) Q(4)LGMRIH 254 K (Y)(FW)(T)(CG) 261 A M(10)I(46)V(25) 0.24 243 T (K)(R)(Q)(M) L(2)A(14)F.RS 133 G (R)(K)(H)(E) 165 V A(10)P(81)S(4) 0.25 268 T (R)(H)(FW)(K) V(3)TIKLQYD 137 Y (K)(Q)(ER)(M) 253 A G(6)A(61)T(18) 0.25 41 V (R)(K)(YE)(H) S(12)VP. 204 M (Y)(H)(R)(T) 270 K K(68)FR(10)P 0.25 269 I (YR)(H)(T)(KE) L(5)M(2)TI(5)H 148 L (R)(Y)(H)(K) V(2)A(1)YS.QED 231 L (Y)(R)(H)(T) 215 E (H)(FW)(Y)(R) Table 2. Residues forming surface ”patch” in 1j8mF. 223 S (R)(K)(H)(FW) 227 K (Y)(FW)(T)(H) 261 A (Y)(E)(R)(K) Table 3. 165 V (R)(Y)(K)(E) res type disruptive 253 A (R)(K)(E)(Y) mutations 270 K (Y)(T)(FW)(CG) 135 D (R)(FWH)(K)(Y) 190 G (KER)(FQMWHD)(NYLPI)(SVA) Table 3. Disruptive mutations for the surface patch in 1j8mF. 141 A (R)(K)(Y)(H) 250 D (R)(FW)(H)(VCAG) 256 G (KER)(FQMWHD)(NLPI)(Y) 3 NOTES ON USING TRACE RESULTS 110 G (R)(KE)(H)(FWD) 138 R (T)(D)(YVCAG)(S) 3.1 Coverage 108 G (R)(K)(FEWH)(QM) Trace results are commonly expressed in terms of coverage: the resi- 117 K (Y)(FW)(T)(VA) due is important if its “coverage” is small - that is if it belongs to 221 D (R)(FWH)(Y)(KVCAG) some small top percentage of residues [100% is all of the residues 144 Q (Y)(T)(H)(FW) in a chain], according to trace. The ET results are presented in the 191 R (D)(E)(T)(Y) form of a table, usually limited to top 25% percent of residues (or 276 E (FW)(H)(Y)(VCAG) to some nearby percentage), sorted by the strength of the presumed 111 K (Y)(T)(FW)(SCG) evolutionary pressure. (I.e., the smaller the coverage, the stronger the 248 K (Y)(T)(FW)(SCG) pressure on the residue.) Starting from the top of that list, mutating a 255 G (KR)(E)(Q)(H) couple of residues should affect the protein somehow, with the exact 40 D (R)(H)(FW)(Y) effects to be determined experimentally. 235 F (K)(E)(Q)(T) 284 F (K)(E)(Q)(TD) 3.2 Known substitutions continued in next column One of the table columns is “substitutions” - other types seen at the same position in the alignment. These amino acid types may be interchangeable at that position in the protein, so if one wants to affect the protein by a point mutation, they should be avoided. For

4 example if the substitutions are “RVK” and the original protein has an R at that position, it is advisable to try anything, but RVK. Conver- sely, when looking for substitutions which will not affect the protein, one may try replacing, R with K, or (perhaps more surprisingly), with V. The percentage of times the substitution appears in the alignment COVERAGE is given in the immediately following bracket. No percentage is given

in the cases when it is smaller than 1%. This is meant to be a rough V guide - due to rounding errors these percentages often do not add up 100% 50% 30% 5% to 100%.

3.3 Surface To detect candidates for novel functional interfaces, first we look for residues that are solvent accessible (according to DSSP program) by 2 V at least 10A˚ , which is roughly the area needed for one water mole- cule to come in the contact with the residue. Furthermore, we require RELATIVE IMPORTANCE that these residues form a “cluster” of residues which have neighbor within 5A˚ from any of their heavy atoms. Note, however, that, if our picture of protein evolution is correct, Fig. 6. Coloring scheme used to color residues by their relative importance. the neighboring residues which are not surface accessible might be equally important in maintaining the interaction specificity - they 4 APPENDIX should not be automatically dropped from consideration when choo- sing the set for mutagenesis. (Especially if they form a cluster with 4.1 File formats the surface residues.) Files with extension “ranks sorted” are the actual trace results. The fields in the table in this file: 3.4 Number of contacts • alignment# number of the position in the alignment Another column worth noting is denoted “noc/bb”; it tells the num- ber of contacts heavy atoms of the residue in question make across • residue# residue number in the PDB file the interface, as well as how many of them are realized through the • type amino acid type backbone atoms (if all or most contacts are through the backbone, • rank rank of the position according to older version of ET mutation presumably won’t have strong impact). Two heavy atoms • variability are considered to be “in contact” if their centers are closer than 5A˚ . has two subfields: 1. number of different amino acids appearing in in this column 3.5 Annotation of the alignment If the residue annotation is available (either from the pdb file or 2. their type from other sources), another column, with the header “annotation” • rho ET score - the smaller this value, the lesser variability of appears. Annotations carried over from PDB are the following: site this position across the branches of the tree (and, presumably, (indicating existence of related site record in PDB ), S-S (disulfide the greater the importance for the protein) bond forming residue), hb (hydrogen bond forming residue, jb (james • cvg coverage - percentage of the residues on the structure which bond forming residue), and sb (for salt bridge forming residue). have this rho or smaller • gaps percentage of gaps in this column 3.6 Mutation suggestions Mutation suggestions are completely heuristic and based on comple- 4.2 Color schemes used mentarity with the substitutions found in the alignment. Note that The following color scheme is used in figures with residues colored they are meant to be disruptive to the interaction of the protein by cluster size: black is a single-residue cluster; clusters composed of with its ligand. The attempt is made to complement the following more than one residue colored according to this hierarchy (ordered properties: small [AV GSTC], medium [LPNQDEMIK], large by descending size): red, blue, yellow, green, purple, azure, tur- [WFYHR], hydrophobic [LPVAMWFI], polar [GTCY ]; posi- quoise, brown, coral, magenta, LightSalmon, SkyBlue, violet, gold, tively [KHR], or negatively [DE] charged, aromatic [WFYH], bisque, LightSlateBlue, orchid, RosyBrown, MediumAquamarine, long aliphatic chain [EKRQM], OH-group possession [SDETY ], DarkOliveGreen, CornflowerBlue, grey55, burlywood, LimeGreen, and NH2 group possession [NQRK]. The suggestions are listed tan, DarkOrange, DeepPink, maroon, BlanchedAlmond. according to how different they appear to be from the original amino The colors used to distinguish the residues by the estimated acid, and they are grouped in round brackets if they appear equally evolutionary pressure they experience can be seen in Fig. 6. disruptive. From left to right, each bracketed group of amino acid types resembles more strongly the original (i.e. is, presumably, less 4.3 Credits disruptive) These suggestions are tentative - they might prove disrup- 4.3.1 Alistat alistat reads a multiple sequence alignment from the tive to the fold rather than to the interaction. Many researcher will file and shows a number of simple statistics about it. These stati- choose, however, the straightforward alanine mutations, especially in stics include the format, the number of sequences, the total number the beginning stages of their investigation. of residues, the average and range of the sequence lengths, and the

5 alignment length (e.g. including gap characters). Also shown are users: the attached package needs to be unzipped for Pymol to read some percent identities. A percent pairwise alignment identity is defi- the scripts and launch the viewer.) ned as (idents / MIN(len1, len2)) where idents is the number of 4.4 Note about ET Viewer exact identities and len1, len2 are the unaligned lengths of the two sequences. The ”average percent identity”, ”most related pair”, and Dan Morgan from the Lichtarge lab has developed a visualization ”most unrelated pair” of the alignment are the average, maximum, tool specifically for viewing trace results. If you are interested, please and minimum of all (N)(N-1)/2 pairs, respectively. The ”most distant visit: seq” is calculated by finding the maximum pairwise identity (best http://mammoth.bcm.tmc.edu/traceview/ relative) for all N sequences, then finding the minimum of these N The viewer is self-unpacking and self-installing. Input files to be used numbers (hence, the most outlying sequence). alistat is copyrighted with ETV (extension .etvx) can be found in the attachment to the by HHMI/Washington University School of Medicine, 1992-2001, main report. and freely distributed under the GNU General Public License. 4.5 Citing this work 4.3.2 CE To map ligand binding sites from different source structures, report maker uses the CE program: The method used to rank residues and make predictions in this report http://cl.sdsc.edu/. Shindyalov IN, Bourne PE (1998) can be found in Mihalek, I., I. Res,ˇ O. Lichtarge. (2004). ”A Family of ”Protein structure alignment by incremental combinatorial extension Evolution-Entropy Hybrid Methods for Ranking of Protein Residues (CE) of the optimal path . Protein Engineering 11(9) 739-747. by Importance” J. Mol. Bio. 336: 1265-82. For the original version of ET see O. Lichtarge, H.Bourne and F. Cohen (1996). ”An Evolu- 4.3.3 DSSP In this work a residue is considered solvent accessi- tionary Trace Method Defines Binding Surfaces Common to Protein 2 ble if the DSSP program finds it exposed to water by at least 10A˚ , Families” J. Mol. Bio. 257: 342-358. which is roughly the area needed for one water molecule to come in report maker itself is described in Mihalek I., I. Res and O. the contact with the residue. DSSP is copyrighted by W. Kabsch, C. Lichtarge (2006). ”Evolutionary Trace Report Maker: a new type Sander and MPI-MF, 1983, 1985, 1988, 1994 1995, CMBI version of service for comparative analysis of .” Bioinformatics by [email protected] November 18,2002, 22:1656-7. http://www.cmbi.kun.nl/gv/dssp/descrip.html. 4.6 About report maker 4.3.4 HSSP Whenever available, report maker uses HSSP ali- report maker was written in 2006 by Ivana Mihalek. The 1D ran- gnment as a starting point for the analysis (sequences shorter than king visualization program was written by Ivica Res.ˇ report maker 75% of the query are taken out, however); R. Schneider, A. de is copyrighted by Lichtarge Lab, Baylor College of Medicine, Daruvar, and C. Sander. ”The HSSP database of protein structure- Houston. sequence alignments.” Nucleic Acids Res., 25:226–230, 1997. 4.7 Attachments http://swift.cmbi.kun.nl/swift/hssp/ The following files should accompany this report:

4.3.5 LaTex The text for this report was processed using LATEX; • 1j8mF.complex.pdb - coordinates of 1j8mF with all of its inter- Leslie Lamport, “LaTeX: A Document Preparation System Addison- acting partners Wesley,” Reading, Mass. (1986). • 1j8mF.etvx - ET viewer input file for 1j8mF 4.3.6 Muscle When making alignments “from scratch”, report • 1j8mF.cluster report.summary - Cluster report summary for maker uses Muscle alignment program: Edgar, Robert C. (2004), 1j8mF ”MUSCLE: multiple sequence alignment with high accuracy and • 1j8mF.ranks - Ranks file in sequence order for 1j8mF high throughput.” Nucleic Acids Research 32(5), 1792-97. • 1j8mF.clusters - Cluster descriptions for 1j8mF http://www.drive5.com/muscle/ • 1j8mF.msf - the multiple sequence alignment used for the chain 4.3.7 Pymol The figures in this report were produced using 1j8mF Pymol. The scripts can be found in the attachment. Pymol • 1j8mF.descr - description of sequences used in 1j8mF msf is an open-source application copyrighted by DeLano Scien- • 1j8mF.ranks sorted - full listing of residues and their ranking for tific LLC (2005). For more information about Pymol see 1j8mF http://pymol.sourceforge.net/. (Note for Windows

6