1Ydl Lichtarge Lab 2006
Total Page:16
File Type:pdf, Size:1020Kb
Pages 1–5 1ydl Evolutionary trace report by report maker December 13, 2009 4.3.3 DSSP 4 4.3.4 HSSP 4 4.3.5 LaTex 5 4.3.6 Muscle 5 4.3.7 Pymol 5 4.4 Note about ET Viewer 5 4.5 Citing this work 5 4.6 About report maker 5 4.7 Attachments 5 1 INTRODUCTION From the original Protein Data Bank entry (PDB id 1ydl): Title: Crystal structure of the human tfiih, northeast structural genomics target hr2045. Compound: Mol id: 1; molecule: general transcription factor iih, polypeptide 5; chain: a; synonym: tfiih; engineered: yes Organism, scientific name: Homo Sapiens; 1ydl contains a single unique chain 1ydlA (71 residues long). 2 CHAIN 1YDLA 2.1 Q6ZYL4 overview CONTENTS From SwissProt, id Q6ZYL4, 95% identical to 1ydlA: 1 Introduction 1 Description: TFIIH basal transcription factor complex TTD-A subu- nit (General transcription factor IIH polypeptide 5) (TFB5 ortholog). 2 Chain 1ydlA 1 Organism, scientific name: Homo sapiens (Human). 2.1 Q6ZYL4 overview 1 Taxonomy: Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; 2.2 Multiple sequence alignment for 1ydlA 1 Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; 2.3 Residue ranking in 1ydlA 1 Catarrhini; Hominidae; Homo. 2.4 Top ranking residues in 1ydlA and their position on Function: Component of the TFIIH basal transcription factor invol- the structure 2 ved in nucleotide excision repair (NER) of DNA and, when comple- 2.4.1 Clustering of residues at 25% coverage. 2 xed to CAK, in RNA transcription by RNA polymerase II. Necessary 2.4.2 Possible novel functional surfaces at 25% for the stability of the TFIIH complex and for the presence of normal coverage. 2 levels of TFIIH in the cell. Subunit: Subunit of the TFIIH basal transcription factor com- 3 Notes on using trace results 3 plex that contains ERCC2, ERCC3, GTF2H1, GTF2H2, GTF2H3, 3.1 Coverage 3 GTF2H4, GTF2H5, MNAT1, CDK7 and CCNH. 3.2 Known substitutions 3 Subcellular location: Nuclear. 3.3 Surface 3 Disease: Defects in GTF2H5 are a cause of trichothiodystrophy 3.4 Number of contacts 3 (TTD) [MIM:601675]. TTD is an autosomal recessive disease cha- 3.5 Annotation 4 racterized by sulfur-deficient brittle hair and nails, mental retardation, 3.6 Mutation suggestions 4 impaired sexual development, ichthyotic skin, abnormal facies and in some but not all instances photosensitivity. There are no reports of 4 Appendix 4 skin cancer associated with TTD. Photosensitive patients have a defi- 4.1 File formats 4 ciency in excision- repair which in most cases is indistinguishable 4.2 Color schemes used 4 from that in XP patients. 4.3 Credits 4 Similarity: Belongs to the TFB5 family. 4.3.1 Alistat 4 About: This Swiss-Prot entry is copyright. It is produced through a 4.3.2 CE 4 collaboration between the Swiss Institute of Bioinformatics and the 1 Lichtarge lab 2006 Fig. 1. Residues 1-71 in 1ydlA colored by their relative importance. (See Appendix, Fig.5, for the coloring scheme.) EMBL outstation - the European Bioinformatics Institute. There are no restrictions on its use as long as its content is in no way modified and this statement is not removed. 2.2 Multiple sequence alignment for 1ydlA For the chain 1ydlA, the alignment 1ydlA.msf (attached) with 28 sequences was used. The alignment was downloaded from the HSSP database, and fragments shorter than 75% of the query as well as duplicate sequences were removed. It can be found in the attachment to this report, under the name of 1ydlA.msf. Its statistics, from the alistat program are the following: Format: MSF Fig. 2. Residues in 1ydlA, colored by their relative importance. Clockwise: Number of sequences: 28 front, back, top and bottom views. Total number of residues: 1801 Smallest: 57 Largest: 71 Average length: 64.3 Alignment length: 71 Average identity: 43% Most related pair: 98% Most unrelated pair: 14% Most distant seq: 38% Furthermore, <1% of residues show as conserved in this ali- gnment. The alignment consists of 35% eukaryotic ( 21% vertebrata, 3% fungi, 3% plantae) sequences. (Descriptions of some sequences were not readily available.) The file containing the sequence descriptions can be found in the attachment, under the name 1ydlA.descr. 2.3 Residue ranking in 1ydlA The 1ydlA sequence is shown in Fig. 1, with each residue colored according to its estimated importance. The full listing of residues in 1ydlA can be found in the file called 1ydlA.ranks sorted in the attachment. 2.4 Top ranking residues in 1ydlA and their position on Fig. 3. Residues in 1ydlA, colored according to the cluster they belong to: the structure red, followed by blue and yellow are the largest clusters (see Appendix for the coloring scheme). Clockwise: front, back, top and bottom views. The In the following we consider residues ranking among top 25% of resi- corresponding Pymol script is attached. dues in the protein . Figure 2 shows residues in 1ydlA colored by their importance: bright red and yellow indicate more conserved/important residues (see Appendix for the coloring scheme). A Pymol script for producing this figure can be found in the attachment. 2.4.1 Clustering of residues at 25% coverage. Fig. 3 shows the top 25% of all residues, this time colored according to clusters they belong to. The clusters in Fig.3 are composed of the residues listed in Table 1. 2 Table 1. Table 2. continued cluster size member res type substitutions(%) cvg color residues 24 D D(71)A(3)N(17) 0.14 red 11 9,12,13,14,15,17,18,37,38,39 M(3)S(3) 42 37 D D(71)E(10)M(10) 0.15 blue 6 7,24,32,33,34,45 R(3)V(3) 32 K D(14)K(75)N(7) 0.17 Table 1. Clusters of top ranking residues in 1ydlA. H(3) 18 Q A(10)Q(78)E(7) 0.18 S(3) 2.4.2 Possible novel functional surfaces at 25% coverage. One 12 C C(85)F(3)N(3) 0.20 group of residues is conserved on the 1ydlA surface, away from (or S(7) susbtantially larger than) other functional sites and interfaces reco- 34 I I(78)V(17)E(3) 0.21 gnizable in PDB entry 1ydl. It is shown in Fig. 4. The residues 9 L L(82).(3)F(14) 0.23 38 I L(78)V(3)I(17) 0.24 Table 2. Residues forming surface ”patch” in 1ydlA. Table 3. res type disruptive mutations 13 D (R)(FWH)(YVCAG)(K) 39 D (R)(FWH)(K)(Y) 42 H (E)(MD)(TQ)(VLAPI) 33 F (K)(E)(Q)(D) 7 G (KR)(E)(FMWH)(Q) 17 K (Y)(T)(FW)(SCG) 15 A (R)(K)(YE)(H) 14 P (R)(Y)(H)(K) 24 D (R)(H)(FYW)(K) 37 D (R)(H)(FYW)(CG) 32 K (Y)(T)(FW)(VCAG) 18 Q (Y)(H)(FW)(T) 12 C (KER)(QMHD)(FW)(Y) 34 I (YR)(H)(T)(K) 9 L (R)(Y)(T)(H) 38 I (YR)(H)(T)(KE) Fig. 4. A possible active surface on the chain 1ydlA. Table 3. Disruptive mutations for the surface patch in 1ydlA. belonging to this surface ”patch” are listed in Table 2, while Table 3 suggests possible disruptive replacements for these residues (see Section 3.6). 3 NOTES ON USING TRACE RESULTS Table 2. 3.1 Coverage res type substitutions(%) cvg Trace results are commonly expressed in terms of coverage: the resi- 13 D D(96)E(3) 0.01 due is important if its “coverage” is small - that is if it belongs to 39 D D(96)G(3) 0.03 some small top percentage of residues [100% is all of the residues 42 H R(14)H(82)Y(3) 0.04 in a chain], according to trace. The ET results are presented in the 33 F Y(10)F(85)A(3) 0.06 form of a table, usually limited to top 25% percent of residues (or 7 G G(92).(3)S(3) 0.07 to some nearby percentage), sorted by the strength of the presumed 17 K K(75)R(14)A(10) 0.09 evolutionary pressure. (I.e., the smaller the coverage, the stronger the 15 A S(10)A(71)T(3) 0.10 pressure on the residue.) Starting from the top of that list, mutating a P(14) couple of residues should affect the protein somehow, with the exact 14 P P(85)S(3)V(7) 0.13 effects to be determined experimentally. I(3) continued in next column 3.2 Known substitutions One of the table columns is “substitutions” - other amino acid types seen at the same position in the alignment. These amino acid types 3 may be interchangeable at that position in the protein, so if one wants to affect the protein by a point mutation, they should be avoided. For example if the substitutions are “RVK” and the original protein has an R at that position, it is advisable to try anything, but RVK. Conver- sely, when looking for substitutions which will not affect the protein, COVERAGE one may try replacing, R with K, or (perhaps more surprisingly), with V. The percentage of times the substitution appears in the alignment V is given in the immediately following bracket. No percentage is given 100% 50% 30% 5% in the cases when it is smaller than 1%. This is meant to be a rough guide - due to rounding errors these percentages often do not add up to 100%.