The Geometry and Evolution of Catalytic Sites and Metal Binding Sites
Total Page:16
File Type:pdf, Size:1020Kb
The geometry and evolution of catalytic sites and metal binding sites. James William Torrance Robinson College, Cambridge European Bioinformatics Institute This dissertation is submitted for the degree of Doctor of Philosophy. March 2008 Preface This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. The length of this dissertation does not exceed the limit specified by the Graduate School of Biological, Medical and Veterinary Sciences. 2 Abstract Analysing the geometry of functional sites in proteins can shed light on the evolution of these functional sites, explore the relationship between active site geometry and chemistry, and work towards methods for predicting protein function from structure. This thesis describes the analysis of manually annotated datasets of catalytic residues, biologically relevant metal binding sites, and catalytic mechanisms. The principal source of data for this thesis was the Catalytic Site Atlas, a database of catalytic sites in proteins of known structure. The author has supervised an expansion of the coverage and level of detail of this database. The expanded database has been analysed to discover trends in the catalytic roles played by residues and cofactors. A comparison of the structures of catalytic sites in homologous enzymes showed that these mostly differ by less than 1 A˚ root mean square deviation, even when the sequence similarity between the proteins is low. As a consequence of this structural conservation, structural templates representing catalytic sites have the potential to succeed at function prediction in cases where methods based on sequence or overall structure fail. Templates were found to discriminate between matches to related proteins and random matches with over 85% sensitivity and predictive accuracy. Templates based on protein backbone positions were found to be more discriminating than those based on sidechain atoms. This approach to analysing structural variation can also be applied to other functional sites in proteins, such as metal binding sites. An analysis of a set of of well-documented structural calcium and zinc binding sites found that, like catalytic sites, these are highly conserved between distant relatives. Structural templates representing these conserved calcium and zinc binding sites were used to search the Protein Data Bank for cases where unrelated proteins have converged upon the same residue selection and geometry for metal 3 binding. This allowed the identification of “archetypal” metal binding sites, which had independently evolved on a number of occasions. Relatives of these metal binding proteins sometimes do not bind metal. For most of the calcium binding sites studied, the lack of metal binding in relatives was due to point mutation of the metal-binding residues, whilst for zinc binding sites, lack of metal binding in relatives always involved more extensive changes. As a complement to the analysis of overall structural variation in catalytic sites de- scribed above, statistics were gathered describing the typical distances and angles of indi- vidual catalytic residues with regard to the substrate and one another. The geometry of residues whose function involves the transfer or sharing of hydrogens was found to closely resemble the geometry of non-catalytic hydrogen bonds. 4 Acknowledgements First of all, thanks are due to my supervisor Janet Thornton. She has kept me focused on the big picture, the positive side, and the schedule. Without her advice and encourage- ment, this thesis would have been a big pile of blank paper sitting inside a printer. I also thank my co-supervisor in the Chemistry Department, John Mitchell, who was always happy to have as much or as little involvement as was necessary at different stages, and who supplied an important chemical perspective. Many people have passed through the Thornton group over the last four years, and all of them have provided some combination of technical advice and/or moral support; those who are named here are just the first among those many. Craig Porter, Gail Bartlett and Alex Gutteridge introduced me to the Catalytic Site Atlas. Jonathan Barker provided assistance with his template matching program Jess, as well as amusement through his mechanical ingenuity and artistic talents. Malcolm MacArthur furnished me with his dataset of metal binding sites, and acted as a patient guide to the world of metallopro- teins. Gemma Holliday explained the workings of the MACiE database, and endured my questions on chemical topics. All of the above, along with Gabby Reeves, James Watson and Roman Laskowski, kindly gave up their time to proofread portions of this thesis. I’m also grateful to the various summer students who suffered under my tutelage. I am indebted to other members of the group for non-academic reasons. My various office-mates down the years tolerated my nervous tics, muttering, and attempts to fill the room with houseplants. Matthew Bashton recklessly offered me a room in his flat despite having previously been my office-mate. Gabby and James dragged me to Steve “drill sergeant” Russen’s circuit training sessions, resulting in the Arnold-Schwarzenegger-like physique that I rejoice in today. Tim Massingham bravely protected me from the goths 5 down at the Kambar. Light relief was provided by an assortment of friends, acquaintances, cronies, hangers- on and ne’er-do-wells, including the EBI PhD student mob, the Robinson PhD student mob, the old York University gang from antediluvian times, and the even older Robinson gang from the times before that. Depeche Mode, Iron Maiden and the Sisters of Mercy permitted me to trade away some of my hearing in order to retain some of my sanity. These bands are fairly unlikely to read this thesis, but I suppose it could help them pass the time on a tour bus. Finally, I’d like to thank my parents. Not only have they been a consistent source of entertainment, education, and spirited yet amicable political debate, they have also borne the brunt of my whingeing with extraordinary patience. 6 Contents 1 Introduction 17 1.1 The role and importance of enzymes . 17 1.1.1 Classifying enzymes . 19 1.1.2 Fundamentals of the thermodynamics of enzymatic catalysis . 21 1.2 Functions of catalytic residues . 22 1.2.1 The definition of a catalytic residue . 23 1.2.2 Roles played by catalytic residues . 24 1.3 Experimentally determining catalytic residues and enzyme mechanisms . 30 1.3.1 Non-structural methods . 31 1.3.2 Protein structure as a source of information on enzymes . 33 1.4 Enzyme evolution . 39 1.4.1 How enzyme function changes as protein sequence diverges . 40 1.4.2 Mechanisms of enzyme evolution . 41 1.4.3 Structural evolution of catalytic sites in enzymes of similar function 44 1.5 Using bioinformatics to predict enzyme function and catalytic residues . 47 1.5.1 Predicting function using sequence homology . 47 1.5.2 Predicting function using protein structure to identify homologues . 50 1.5.3 Recognising distant homologues and cases of convergent evolution using template matching methods . 51 1.5.4 Function prediction using protein structure without identifying ho- mologues . 65 1.5.5 Meta-servers for function prediction . 67 1.6 The structure of this thesis . 68 7 CONTENTS 2 The Catalytic Site Atlas 71 2.1 Introduction . 71 2.2 The Catalytic Site Atlas . 72 2.2.1 Types of entry in the CSA . 72 2.2.2 Outline history of the CSA . 73 2.2.3 CSA annotation . 73 2.2.4 Homologous entries . 77 2.3 Analysis of the contents of the CSA . 79 2.3.1 Coverage growth . 79 2.3.2 Independent evolution of function . 82 2.3.3 Versatile catalytic domains . 83 2.3.4 Nonredundant dataset . 85 2.3.5 Total number of residues . 85 2.3.6 Catalytic residue frequency . 88 2.3.7 Catalytic residue propensity . 93 2.3.8 Nonredundant subset of high-annotation entries . 93 2.3.9 Residue functions . 95 2.3.10 Residue targets . 101 2.3.11 Evidence that residues are catalytic . 102 2.4 Discussion . 105 2.4.1 Growth of the CSA . 105 2.4.2 Independent evolution of function, and versatile domains . 107 2.4.3 Roles of residues and cofactors . 108 3 Using structural templates to recognise catalytic sites and explore their evolution 110 3.1 Introduction . 110 3.2 Results . 112 3.2.1 Dataset . 112 3.2.2 Structural variation of catalytic sites . 114 3.2.3 Family analysis . 127 8 CONTENTS 3.2.4 Library analysis . 136 3.3 Discussion . 140 3.3.1 Structural conservation of active sites and the performance of struc- tural templates . 140 3.3.2 Statistical significance measures . 142 3.4 Methods . 143 3.4.1 Non-redundant set of CSA families . 143 3.4.2 Template generation (Figure 3.14 box 1) . 144 3.4.3 Similarity within template families . 145 3.4.4 Non-redundant PDB subset (Figure 3.14 box 4) . 145 3.4.5 Template matching . 147 3.4.6 Statistical significance of template matches . 147 3.4.7 Setting a threshold (Figure 3.14 box 10) . 148 3.4.8 Definition of statistical terms . 149 3.4.9 Analysing the results of the family and library analyses . 149 4 Using structural templates to analyse zinc and calcium binding sites 150 4.1 Introduction . 150 4.2 Results . 153 4.2.1 Dataset . 153 4.2.2 Structural variation of metal binding sites . 156 4.2.3 Water molecule structural variation compared to that of protein sidechains . 159 4.2.4 Structural template matches . 161 4.2.5 Convergent evolution . 167 4.2.6 Metal loss over evolution . 176 4.2.7 Structural basis and functional consequences of metal loss .