Characterisation, Classification and Conformational Variability Of
Total Page:16
File Type:pdf, Size:1020Kb
Characterisation, Classification and Conformational Variability of Organic Enzyme Cofactors Julia D. Fischer European Bioinformatics Institute Clare Hall College University of Cambridge A thesis submitted for the degree of Doctor of Philosophy 11 April 2011 This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. This dissertation does not exceed the word limit of 60,000 words. Acknowledgements I would like to thank all the members of the Thornton research group for their constant interest in my work, their continuous willingness to answer my academic questions, and for their company during my time at the EBI. This includes Saumya Kumar, Sergio Martinez Cuesta, Matthias Ziehm, Dr. Daniela Wieser, Dr. Xun Li, Dr. Irene Pa- patheodorou, Dr. Pedro Ballester, Dr. Abdullah Kahraman, Dr. Rafael Najmanovich, Dr. Tjaart de Beer, Dr. Syed Asad Rahman, Dr. Nicholas Furnham, Dr. Roman Laskowski and Dr. Gemma Holli- day. Special thanks to Asad for allowing me to use early development versions of his SMSD software and for help and advice with the KEGG API installation, to Roman for knowing where to find all kinds of data, to Dani for help with R scripts, to Nick for letting me use his E.C. tree program, to Tjaart for python advice and especially to Gemma for her constant advice and feedback on my work in all aspects, in particular the chemistry side. Most importantly, I would like to thank Prof. Janet Thornton for giving me the chance to work on this project, for all the time she spent in meetings with me and reading my work, for sharing her seemingly limitless knowledge and enthusiasm about the fascinating world of enzymes, and for being such an experienced and motivational advisor. I would also like to thank my thesis advisory committee: Prof. Christoph Steinbeck, Prof. John Mitchell and Prof. Victor Lamzin for accompanying my scientific journey, for their con- tinuous critical evaluation of my work, and for their scientific as well as organisational advice. Further, I would like to thank Helen Barker and Stacy Schab who were very helpful in managing my meetings with Janet and my travelling needs. For proofreading, I would like to thank Dr. Steve Wilder, Dr. Gilleain Torrance, Sergio Martinez Cuesta, Dr. Nick Furnham, Dr. Roman Laskowski, Dr. Cajetan Neubauer and of course Janet and Gemma. Most of the proofreading was however done by Dr. Tjaart de Beer who deserves special thanks. My time at the EBI would not have been the same without the com- pany of my fellow predocs. I would like to thank all of them for the welcome distractions and - when necessary - their commiseration at the weekly predoc lunches and at the many social activities. I would like to especially thank Judith Zaugg and Michele Mattioni, with whom I shared the predoc experience from the very beginning. In hindsight I could not imagine my life here without rowing. I would like to thank Dr. Adrienne Leonard for convincing me to learn to row, for quickly realising my potential in the sport, for convincing me to run for women’s captain in Clare Hall Boat Club, and for all her advice after I was elected. Thanks also to my First Ladies Crew in Clare Hall Boat Club, who won blades with me in May Bumps 2009 and 2010. Thanks also go to Dr. Elisa Dassie, without whom Clare Hall rowing would not have been the same. Thank you, Elisa and Adrienne, not only for all the good times on the water, but also at the May Balls. I would further like to thank Nick Acock, who answered all my questions about trialling for the University, which leads me to my time at Cambridge University Women’s Boat Club. I would like to thank Sarah Allen for running the squad in 2009/10 and for convincing me that I have what it takes. Most importantly, I would like to thank my lovely 2010 Blondie crew, especially Dr. Stacey Hickson and Claudia Cattachio, who made this time of extreme training, working and sleeping schedules one of the best times of my life. Finally, I would like to thank Stacey, Rebecca, Sophie and Mat for the hard training in the IV+ and for winning the gold medal at the British University and College Championships Regatta with me! Last, but by no means least, I would like to acknowledge the people who supported me in my personal life. First I would like to thank my parents who have made it possible for me to study here and especially to my mum who has always supported me in my professional choices, even if it meant that I’d be living in another country. Finally, I would like to thank Tjaart for his continuous support, especially in these last months before my submission. Abstract Enzymes are the cell’s executive molecules that catalyse the chemical reactions required for an organism to function. Some enzymes require cofactors (metal ions or organic small molecules) for catalysis. Ap- proximately half of all enzyme reactions crucially depend on organic enzyme cofactors. In order to better understand the chemistry of life, it is important to understand the properties, evolutionary context and functional roles of organic enzyme cofactors. The aim of this work is to investigate this important group of com- pounds. Which molecules are organic enzyme cofactors? What is their molecular function and enzymatic mechanism? What are their physi- cochemical properties? How are they different from other metabo- lites in the cell? Are there intrinsic groupings among them? Which enzyme reactions use them? How variable is their conformation in crystallographic protein structures and in solution? How does their conformation vary among homologous proteins? These questions have been investigated using computational methods. The data were extracted manually from the scientific literature and automatically from web resources and has been stored in a database. Statistical methods including uni- and multivariate data analysis have been applied to analyse one-dimensional descriptors, two-dimensional structures and functional roles of the cofactor molecules, while the conformational analysis further uses three-dimensional superposition. The results show that organic enzyme cofactors are slightly larger and more polar than the other metabolites although their physicochemi- cal properties sample all of metabolite space. Intrinsic groupings can be identified based on descriptors, two-dimensional structures and molecular functions. Some organic cofactors share common building blocks, which complete and partially complement the functional pro- file of catalytic amino acids and metal ions. The conformational vari- ability of a cofactor depends on its group membership and can vary substantially from its conformational variability in solution. Contents Nomenclature xxii 1 Introduction 1 1.1 Motivation............................... 1 1.2 Proteinsandenzymes. 2 1.2.1 Introductiontoproteins . 2 1.2.2 Introduction to enzymes . 4 1.3 Cofactors ............................... 5 1.3.1 Cofactor definition for this thesis . 5 1.3.2 Ambiguouscases .. .. 7 1.4 Existingwork ............................. 19 1.4.1 Biocatalysis, enzyme design and evolution of cofactors .. 19 1.4.1.1 Biocatalysis . 19 1.4.1.2 Recent progress in biocatalysis and enzyme design 20 1.4.1.3 Evolutionofcofactors . 22 1.4.2 Existingdataresources . 23 1.4.2.1 Compoundresources . 23 1.4.2.2 Enzyme resources . 25 1.5 Structural data from X-ray crystallography . 26 1.5.1 The protein folding problem . 26 1.5.2 X-ray crystallography: method summary . 28 1.5.2.1 Quality and resolution . 29 1.5.2.2 Implications for derived data . 31 1.5.3 Other protein structure determination methods . 32 1.6 Researchobjectives . 32 vii CONTENTS 2 Methods 33 2.1 Compilation of the data in the CoFactor database . 33 2.1.1 E.C. numbers of enzyme reactions using cofactors . 36 2.1.2 E.C. wheels and E.C. trees . 37 2.1.3 Three-dimensional conformational variability of each cofac- torinthePDB ........................ 38 2.1.4 Measuring the variation in solvent accessibility of each co- factorinthePDB....................... 40 2.1.5 Stepwise mechanisms . 40 2.2 Data analysis methods for descriptor variables and two-dimensional chemicalstructures . 41 2.2.1 Compilation of the cofactor and metabolite data sets for the physicochemical properties analysis . 41 2.2.1.1 Thecofactordataset. 41 2.2.1.2 Themetabolitedataset . 41 2.2.2 Computing physicochemical descriptors of organic compounds 42 2.2.2.1 Normalisation of organic molecules . 42 2.2.2.2 Choice of molecular descriptor variables . 42 2.2.3 The t-test ........................... 44 2.2.4 Structure-based chemical similarity . 44 2.2.5 Principal component analysis . 45 2.2.5.1 Variable requirements and limitations of PCA . 45 2.2.5.2 Boxplots for outlier detection . 47 2.2.5.3 PCA: method summary . 48 2.2.5.4 PCA: biplot and individuals plot . 49 2.2.5.5 PCA: correlation circle . 50 2.2.6 Clustering ........................... 50 2.3 Data analysis methods for three-dimensional chemical structures . 52 2.3.1 Rootmeansquaredeviation . 52 2.3.2 Conformation generation with OMEGA. 53 2.3.3 Superpositionmethod . 55 viii CONTENTS 3 The CoFactor database 56 3.1 Datacontent ............................. 58 3.1.1 Overviewpage ........................ 59 3.1.2 Mechanismpage ....................... 61 3.1.3 Enzymes and domains page . 63 3.1.4 Compound page: conformation and solvent accessibility . 70 3.2 TheCoFactordataset . .. .. 70 3.2.1 Thiamine diphosphate . 71 3.2.2 Flavin adenine dinucleotide . 72 3.2.3 Flavin mononucleotide . 74 3.2.4 Nicotinamide-adenine dinucleotide . 75 3.2.5 Phosphopantetheine . 77 3.2.6 CoenzymeA.......................... 78 3.2.7 Pyridoxal5’-phosphate . 78 3.2.8 Glutathione .......................... 80 3.2.9 Biotin ............................