Neetika Nath Phd Thesis
Total Page:16
File Type:pdf, Size:1020Kb
QUANTITATIVE AND EVOLUTIONARY GLOBAL ANALYSIS OF ENZYME REACTION MECHANISMS Neetika Nath A Thesis Submitted for the Degree of PhD at the University of St Andrews 2015 Full metadata for this item is available in Research@StAndrews:FullText at: http://research-repository.st-andrews.ac.uk/ Please use this identifier to cite or link to this item: http://hdl.handle.net/10023/6899 This item is protected by original copyright PhD Thesis Quantitative and Evolutionary Global Analysis of Enzyme Reaction Mechanisms Author: Supervisor: Neetika Nath Dr. John BO Mitchell This thesis presented for the degree of Doctor of Philosophy School of Chemistry University of St Andrews United Kingdom Friday 17th March, 2015 Declaration I, Neetika Nath, declare that this thesis titled, ‘Quantitative and Evolution- ary Global Analysis of Enzyme Reaction Mechanisms’ and the work pre- sented in it are my own. I confirm that: This work was done wholly or mainly while in candidature for a re- search degree at this University. Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other insti- tution, this has been clearly stated. Where I have consulted the published work of others, this is always clearly attributed. Where I have quoted from the work of others, the source is always given. Except such quotations, this thesis is entirely my own work. I have acknowledged all main sources of help. Where the thesis is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed myself. Signed: Date: i Supervisor’s Declaration I hereby certify that the candidate has fulfilled the conditions of the Reso- lution and Regulations appropriate for the degree of PhD in the University of St Andrews and that the candidate is qualified to submit this thesis in application for that degree. Signed: Date: ii Supporting Statement In submitting this thesis to the University of St Andrews I understand that I am giving permission for it to be made available for use in accordance with the regulations of the University Library for the time being in force, subject to any copyright vested in the work not being affected thereby. I also understand that the title and the abstract will be published, and that a copy of the work may be made and supplied to any bona fide library or research worker, that my thesis will be electronically accessible for personal or research use unless exempt by award of an embargo as requested below, and that the library has the right to migrate my thesis into new electronic forms as required to ensure continued access to the thesis. I have obtained any third-party copyright permissions that may be required in order to allow such access and migration, or have requested the appropriate embargo below. The following is an agreed request by candidate and supervisor regarding the publication of this thesis: PRINTED COPY a No embargo on print copy b Embargo on all or part of print copy for a period of . years (maxi- mum five) on the following ground(s): – Publication would be commercially damaging to the researcher, or to the supervisor, or the University – Publication would preclude future publication – Publication would be in breach of laws or ethics c Permanent or longer term embargo on all or part of print copy for a period of 1 years (the request will be referred to the Pro-Provost and permission will be granted only in exceptional circumstances). ELECTRONIC COPY a No embargo on electronic copy b Embargo on all or part of electronic copy for a period of . years (maximum five) on the following ground(s): iii – Publication would be commercially damaging to the researcher, or to the supervisor, or the University – Publication would preclude future publication – Publication would be in breach of law or ethics c Permanent or longer term embargo on all or part of electronic copy for a period of 1 years (the request will be referred to the Pro-Provost and permission will be granted only in exceptional circumstances). Supporting statement for electronic embargo request: Signature of Candidate: Signature of Supervisor : Date: “Statistics is the grammar of science.” Karl Pearson Contents Acknowledgements x Publications xii Abstract xiii List of Figures xv List of Tables xvi 1 Introduction 1 1.1 Motivation . .1 1.2 Contributions . .3 1.3 Thesis Structure . .5 2 Enzymes, Function, and Bioinformatics 7 2.1 Enzyme Function or Catalytic Tool Kit . 10 2.1.1 Enzyme Catalytic Residues . 11 2.1.2 Enzyme Structure-Cofactor Relationships . 12 2.1.3 Catalytic Mechanisms . 13 2.2 Evolution of Enzyme Function . 14 2.2.1 More Definitions Suggesting Evolution Strategy . 15 2.2.2 Biostatistics to Study Evolution . 16 2.3 Bioinformatics . 17 2.3.1 In lieu of Wet Experiments . 17 2.3.2 Functional Prediction by Database Search . 18 2.3.3 Functional Prediction by Biostatistical Methods . 20 2.3.4 Enzyme Catalytic Tool Kit . 23 2.4 Applications . 25 vi 2.4.1 Enzyme Engineering . 25 2.4.2 Drug Discovery . 26 2.4.3 Clinical Implications . 26 2.5 Summary . 27 3 Data and Databases 29 3.1 Enzyme Mechanism Reactions Database . 30 3.1.1 MACiE Database . 31 3.1.2 Components of MACiE . 32 3.1.3 Catalytic Information Available in MACiE . 32 3.1.4 Definitions of Enzyme Function . 35 3.2 Metal-MACiE . 37 3.3 Molecular Ancestry Network (MANET) . 38 3.4 Challenges and Limitations in Bioinformatics . 39 3.4.1 Challenges Faced in Bioinformatics . 39 3.4.2 Caveats . 40 3.5 Summary . 41 4 Quantitative Global Analysis of Enzyme Reaction Mecha- nisms 42 4.1 Previous Studies of Cluster Analysis . 43 4.2 Data . 43 4.3 Motivation . 44 4.4 Brief Introduction of Various Clustering Algorithms . 44 4.4.1 Hierarchical Clustering Method . 45 4.4.2 Partitional Clustering Method . 45 4.4.3 Density-Based Clustering . 47 4.4.4 PFClust: Parameter Free Clustering . 49 4.5 Evaluating Clustering Solutions . 52 4.6 Result . 55 4.6.1 Part A . 57 4.6.2 Part B . 58 4.6.3 Propensities of EC Classes to Cluster Together . 61 4.7 Two Case Studies . 64 4.8 Summary . 67 5 Prediction of Enzymatic Function 68 5.1 Introduction . 68 5.2 Introduction: Enzyme Function Prediction . 69 5.2.1 Using Sequence . 69 5.2.2 Using Structure . 69 5.2.3 Using Overall Chemical Transformation . 70 5.3 Method . 70 5.3.1 Data Culling . 71 5.3.2 Step 1: Data Preparation . 72 5.3.3 Step 2: Internal and External N-Fold Cross-Validation 72 5.3.4 Machine Learning Methods . 75 5.3.5 Step 3: Validation . 82 5.4 Results . 84 5.4.1 Classification: Enzyme Function Prediction . 84 5.4.2 Regression Analysis . 87 5.5 Summary . 88 6 Enzyme Function Evolution: Chemolution Study 90 6.1 Background . 90 6.2 Method . 91 6.2.1 Data Culling . 91 6.2.2 Phylogenetic Analysis: . 92 6.3 Findings . 93 6.3.1 A General Approach Grounded in Protein Domain Structure . 93 6.3.2 Historical Trends Unfold a Natural History of Biocat- alytic Mechanisms . 94 6.3.3 Ancient H-level Structures are Popular, Central and Versatile . 97 6.3.4 Some Structures Hold Exceptionally Diverse Mecha- nistic Step Types . 101 6.3.5 The Combinatorics of Mechanistic Steps Reveals Win- ners . 102 6.4 Conclusion . 107 6.5 Summary . 108 7 Conclusion and Discussion 110 7.1 Global Analysis of Enzyme Reaction Mechanisms . 110 7.1.1 PFClust: Results and Discussion . 111 7.1.2 Results From Mechanistic Annotation . 111 7.2 Enzyme Function Prediction . 112 7.3 Application of Machine Learning Method . 116 7.4 History of Biocatalytic Mechanisms . 116 7.5 Future Work . 118 7.6 Summary . 119 A Data and Tables 121 A.1 Results from Quantitative Global Analysis of Enzyme Reac- tion Mechanisms . 121 A.2 Table for Chapter 6: Enzyme Function Evolution: Chemolu- tion Study ............................. 136 B Data and R Code 139 Acknowledgements First and foremost, I am deeply indebted to my promoter and supervisor Dr. John BO Mitchell, for his guidance and contribution to this thesis. His guidance, advice and flexibility, stimulating suggestions and encouragement helped me during all ups and downs I faced in my research. I wish to express my deep sense of gratitude to Prof. Gustavo Caetano- Anollés, Department of Crop Science, University of Illinois at Urbana - Champaign, USA, for having fruitful discussions, sharing knowledge on en- zyme function evolution, granting access to his lab for a scientific visit and in research collaboration with some parts of this thesis. It is my pleasure to thank many collaborators and supporters from many parts of the world: Prof. Gustavo Caetano-Anollés (University of Illinois, United States), Minglei Wang (University of Illinois, United States), Syed Abbas Bukhari (University of Illinois, United States), Dr. Tanja van Mourik (University of St Andrews, United Kingdom), James L. McDonagh (Univer- sity of St Andrews, United Kingdom), Dr. Lazaros Mavridis (Queen Mary, University of London, United Kingdom), Dr. Luna De Ferrari (Compu- tational Systems Biology group at CISA, United Kingdom) and PD Dr. Reinhard Guthke (Hans Knöll Institute (HKI), Germany). I would like to thank my colleagues Lazaros Mavridis, Luna De Ferrari, Rosanna Alderson, and James McDonagh for providing valuable informa- tion to build a strong background in enzyme function evolution and machine learning model. Additionally, I would like to thank other colleagues includ- ing Dr. Ludovic Castro, Luke Crawford, Leo Holroyd, Rachael Skyner, Ava Sih-Yu Chen, and Jose Garrido Torres for assisting in many ways during my research work.