Predicting and Characterising Protein-Protein Complexes
Total Page:16
File Type:pdf, Size:1020Kb
Predicting and Characterising Protein-Protein Complexes Iain Hervé Moal June 2011 Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute and Department of Biochemistry and Molecular Biology, University College London A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Biochemistry at the University College London. 2 I, Iain Hervé Moal, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. Abstract Macromolecular interactions play a key role in all life processes. The con- struction and annotation of protein interaction networks is pivotal for the understanding of these processes, and how their perturbation leads to dis- ease. However the extent of the human interactome and the limitations of the experimental techniques which can be brought to bear upon it necessit- ate theoretical approaches. Presented here are computational investigations into the interactions between biological macromolecules, focusing on the structural prediction of interactions, docking, and their kinetic and thermo- dynamic characterisation via empirical functions. Firstly, the use of normal modes in docking is investigated. Vibrational analysis of proteins are shown to indicate the motions which proteins are intrinsically disposed to under- take, and the use of this information to model flexible deformations upon protein-protein binding is evaluated. Subsequently SwarmDock, a docking algorithm which models flexibility as a linear combination of normal modes, is presented and benchmarked on a wide variety of test cases. This algorithm utilises state of the art energy functions and metaheuristics to navigate the free energy landscape. Information derived from Langevin dynamics sim- ulations of encounter complex formation in the crowded cytosolic environ- ment can be incorporated into SwarmDock and enhances its performance. Finally, a benchmark of binding free energies derived from the literature is presented. For this benchmark, a large number of molecular descriptors are derived. Machine learning methods are then applied to these in order to derive empirical binding free energy, association rate and dissociation rate functions which take account of the conformational changes which occur upon complexation. 3 Acknowledgements It would be impossible for me to thank all the friends, family, colleagues and teachers who have inspired and educated me, supported and enriched me. To those I have omitted, I apologise. I would like to thank my examiners Prof. Michael Sternberg and Dr. Andrew Martin for agreeing to review this thesis and allowing me to defend it viva voce. Funding for this work was provided by Cancer Research UK, and I would like to extend my gratitude to the charity and its supporters. In particular, I would like to thank my thesis committee, Prof. Neil McDonald and Dr. Helen Walden. I would also like to thank the various members of the Biomolecular Modelling Laboratory, without any of whom my time at the London Research Institute would have been duller. The indomitable scientific officer, Raphaël Chaleil, for all the help and conversations, and the wizardly ability to compile even the most multifarious code. Xiaofan Li, a solid wall for bouncing ideas off, for the good company and friendship, and the productive collaboration. Marcin Król, for donating code and helpful discussions during the preliminary design of SwarmDock. Alexander Tournier and Özge Kürkçüo˘glufor reading manuscripts and providing constructive feedback. Katie Bentley, Marc Offman, Yanlan Mao, Tammy Cheng, Mieczysław Torchała, Melda Tozluo˘glu,Rudi Agius and Torb¨orn Klatt, for all the spirited discussions, entertainment and food for thought, both at work and outside. I would also like to thank the support staff at the London Research Institute, including but not limited to Sally Leevers, Erin Fortin, Emma Rainbow and Sabina Ebbols. My greatest thanks, however, must go to Paul Bates, for placing his faith in me, taking me under his wing, 4 5 allowing me the liberty to follow my thoughts and, above all, for sharing his knowledge. For all of this, I am indebted. I would like to extend gratefulness to the various collaborators I have had the pleasure of working with. Prof. Xiaodong Zhang at Imperial College London, Prof. Joël Janin at Université Paris-Sud, Prof. Alexandre Bonvin and Panagiotis Kastritis at Universiteit Utrect, and Prof. Zhiping Weng and Dr. Howook Hwang at the University of Massachusetts. I would also like to thank those without whose imparted wisdom my last four years would never have happened. The Chemistry faculty of the University of Nottingham, who elevated interest to fascination. Particularly, Prof. Jonathan Hirst, under whose tutelage the field of molecular modelling was revealed in diorama, and Dr. Richard Wheatley, for supervising my Masters project. I would also like to thank the staff at The Henley College and Burnham Grammar School. Particularly Dr. Hywel Thomas, Dr. Branfield, Dr. Judge and Ms. Lilly, all of whose clarity illuminated their respective disciplines. Last, but by no means least, I would like to thank my parents, François and Geraldine, for their love, support and stoicism. Without the curiosity they nurtured and indulged, and their encouragement, I would only be a fraction of myself, and I aspire to some day become a fraction of the parent they have been to me. This work is dedicated to Ciara - More than six billion people on the surface of a sphere, how lucky I am to have found you. Contents Abstract3 Acknowledgements4 Contents7 List of Figures 13 List of Tables 15 List of Abbreviations 17 Peer-reviewed publications 19 1 Introduction 21 1.1 Lucretius Vindicated: Of Atoms, Interactions, Life and Disease 21 1.2 Outline of the thesis......................... 22 1.3 A Thesis Justified.......................... 23 1.3.1 Systems From the Ground Up............... 23 1.3.1.1 Cellular Logic.................. 23 1.3.1.2 The Circuitry of Life............... 27 1.3.1.3 Aspirations, Tribulations and Computations. 30 1.3.2 More Immediate Applications.............. 34 1.3.2.1 Protein-Protein Interactions as Drug Targets. 34 1.3.2.2 Other Applications............... 36 1.4 The Physical Basis of Reality.................... 36 1.4.1 Quantum and Molecular Mechanics........... 37 1.4.1.1 The Schrödinger Equation........... 37 1.4.1.2 Energy Landscapes............... 38 7 Contents 8 1.4.1.3 Force Field Construction............ 43 1.4.2 Dynamics.......................... 44 1.4.2.1 Newton’s Laws................. 44 1.4.2.2 The Simple Harmonic Oscillator........ 46 1.4.2.3 Normal Mode Analysis............. 48 1.4.3 Interaction Kinetics..................... 50 1.4.4 Interaction Thermodynamics............... 53 1.4.4.1 The Second Law: A Classical Perspective... 54 1.4.4.2 The Second Law: A Statistical Perspective.. 55 1.4.4.3 The Gibbs Free Energy............. 59 1.4.4.4 Statistical Potentials............... 61 1.4.4.5 The Free Energy Landscape.......... 63 1.4.4.6 Thermodynamic Cycles............. 64 1.4.4.7 From Free Energy to Binding Affinity..... 65 1.4.4.8 Binding Affinity Prediction........... 66 1.5 Protein-Protein Docking...................... 71 1.5.1 A General Overview.................... 73 1.5.2 Rigid-body Docking.................... 73 1.5.3 The Correlation Method.................. 74 1.5.4 Surface Matching...................... 76 1.5.5 Guided Search....................... 77 1.5.6 Accounting for Flexibility................. 78 1.5.7 Re-Ranking and Clustering................ 82 1.5.8 Incorporating Experimental Data............. 84 1.5.9 High-Throughput Docking................ 86 1.5.10 The CAPRI Experiment.................. 86 2 Normal Mode Analysis and Conformational Transitions 90 2.1 Introduction............................. 90 2.2 Methods............................... 93 2.2.1 Normal Mode Analysis.................. 93 2.2.1.1 The Elastic Network Model.......... 94 2.2.1.2 Rotation-Translation-of-Blocks......... 95 2.2.2 Overlap........................... 95 2.2.3 Modes in Linear Combination.............. 96 2.2.4 Data Set........................... 100 Contents 9 2.3 Results................................ 101 2.3.1 Atomistic and Coarse ENM................ 101 2.3.2 Single Modes........................ 101 2.3.2.1 Rotations-Translation-in-Blocks Method... 103 2.3.3 Modes in Combination................... 105 2.4 Discussion.............................. 110 3 SwarmDock 113 3.1 Introduction............................. 113 3.2 Methods............................... 117 3.2.1 Search Space......................... 117 3.2.2 SwarmDock: An Overview................ 118 3.2.3 Initialisation......................... 120 3.2.4 Particle Swarm Optimisation............... 122 3.2.4.1 Neighbourhoods................. 123 3.2.4.2 Velocity Clamping................ 124 3.2.4.3 Variations of PSO................ 124 3.2.5 Local Search......................... 125 3.2.6 Energy Function...................... 127 3.2.6.1 Van der Waals and Electrostatics........ 127 3.2.6.2 EEF1 Desolvation................ 129 3.2.6.3 DComplex.................... 130 3.2.7 Clustering.......................... 130 3.3 Results................................ 131 3.3.1 Parameterisation...................... 131 3.3.1.1 PSO Variant Selection.............. 131 3.3.1.2 Inertial Weight and Velocity Limits...... 135 3.3.2 Bound-Bound Benchmark v2.0.............. 137 3.3.3 The EEF1 Desolvation Term................ 144