New Approaches to Protein Docking
Total Page:16
File Type:pdf, Size:1020Kb
New approaches to protein docking Dissertation zur Erlangung des Grades Doktor der Ingenieurwissenschaften (Dr.-Ing.) der Naturwissenschaftlich-Technischen Fakult¨at I der Universit¨at des Saarlandes von Oliver Kohlbacher Saarbr¨ucken 12. Januar 2001 Datum des Kolloquiums: 12. Januar 2000 Dekan der technischen Fakult¨at: Professor Dr. Rainer Schulze-Pillot-Ziemen Gutachter: Professor Dr. Hans-Peter Lenhof, Universit¨at des Saarlandes, Saarbr¨ucken Professor Dr. Kurt Mehlhorn, MPI f¨ur Informatik, Saarbr¨ucken 2 Á ØÒ Ø ÑÓ×Ø Ü ÔÙØÖ Ö× ÒÓÛ × ÔÖØÐÝ Ò ÖÓ ¸ Ò ÔÖØÐÝ Ò ØÓ Ó Ñ×ØÖݺ º º º ÓÐÓÝ × ×Ó Øи Ò ÙØ Ù×Ùк Ì ØÖÓÙÐ ÛØ ÓÐÓÝ × Øظ ÝÓÙ Ú ØÓ ÛÓÖ × ÓÐÓ×ظ Ø³× ÓÖÒº ÓÙÖ ÜÔ ÖÑÒØ× Ø ÝÓÙ ØÖ ÝÖ× Ò ØÒ¸ ÓÒ Òظ Ø Ý Ó × Ó« Ò ÐÐ Ø ØÒ× ÓÙ ×ØÖØ ÓÚÖº ÁÒ Û ÓÙÖ ÓÛÒ ÛÓÖÐ׺ ÓÐÓ×Ø× ×ÖÚ ÐÓØ Ó ÓÖ Ò Ð ØÓ ×ÐÙ Ø ØÖÓÙº – Donald Knuth Acknowledgements The work on this thesis was carried out during the years 1996–2000 at the Max-Planck- Institut f¨ur Informatik in the group of Prof. Dr. Kurt Mehlhorn under the supervision of Prof. Dr. Hans-Peter Lenhof. Prof. Dr. Hans-Peter Lenhof kindled my interest in Bioinformatics and gave me the freedom to do research in those areas that fascinated me most. Our discussions, although sometimes heated, were always fruitful and forced me to get to the very bottom of many problems. The implementation of BALL is unthinkable without the help of all the people who con- tributed code and ideas. Nicolas Boghossian had significant impact on the design of the library core and contributed lots of code for the kernel. Heiko Klein implemented the largest part of the visualization. Dr. Peter M¨uller brought in his experience in the field of molecular dynamics simulations. Andreas Burchardt implemented parts of the NMR code. Upon Andreas Moll fell the ungrateful work of testing and debugging. Stefan Strobel implemented the difficult part of molecular surface calculations. Andreas Hildebrandt implemented the NMR visualization. The more sophisticated parts of the solvation code were implemented by Andreas Kerzmann, who also set up the BALL web server. Last, but not least, Prof. Dr. Hans-Peter Lenhof not only contributed code for structure mapping and force field calculations, but also initiated the whole project and kept everyone motivated. Ernst Althaus implemented the branch-&-cut-algorithm and was coauthor of the paper on flexible docking. He had the patience to explain to me some of the finer details of polyhedral theory. The Rechnerbetriebsgruppe had to suffer from this work as well. J¨org Herrmann, Wolfram Wagner, Bernd F¨arber, Uwe Brahm, Thomas Hirtz, and Roland Berberich solved numerous of my hardware- and software-related problems even at night and during weekends. Christoph Clodo managed to track down and resolve several errors in the LATEX code of this thesis. I am also grateful to my “WG” (Holger, Michael, Christian) for nearly six years of enjoy- able coexistence and for the flat, the evenings, and quite some bottles of wine we shared. Last, but certainly not least, I wish to thank Andreas for his patience, his understanding, and his support while I wrote this thesis. Abstract In the first part of this work, we propose new methods for protein docking. First, we present two ap- proaches to protein docking with flexible side chains. The first approach is a fast greedy heuristic, while the second is a branch-&-cut algorithm that yields optimal solutions. For a test set of protease-inhibitor complexes, both approaches correctly predict the true complex structure. Another problem in protein docking is the prediction of the binding free energy, which is the the final step of many protein docking algorithms. Therefore, we propose a new approach that avoids the expensive and difficult calculation of the binding free energy and, instead, employs a scoring function that is based on the similarity of the proton nuclear magnetic resonance spectra of the tentative complexes with the experimental spectrum. Using this method, we could even predict the structure of a very difficult protein-peptide complex that could not be solved using any energy-based scoring functions. The second part of this work presents BALL (Biochemical ALgorithms Library), a framework for Rapid Application Development in the field of Molecular Modeling. BALL provides an extensive set of data structures as well as classes for Molecular Mechanics, advanced solvation methods, comparison and analysis of protein structures, file import/export, NMR shift prediction, and visualization. BALL has been carefully designed to be robust, easy to use, and open to extensions. Especially its extensibility, which results from an object-oriented and generic programming approach, distinguishes it from other software packages. Kurzzusammenfassung Der erste Teil dieser Arbeit besch¨aftigt sich mit neuen Ans¨atzen zum Proteindocking. Zun¨achst stellen wir zwei Ans¨atze zum Proteindocking mit flexiblen Seitenketten vor. Der erste Ansatz beruht auf einer schnellen, gierigen Heuristik, w¨ahrend der zweite Ansatz auf branch-&-cut-Techniken beruht und das Problem optimal l¨osen kann. Beide Ans¨atze sind in der Lage die korrekte Komplexstruktur f¨ur einen Satz von Testbeispielen (bestehend aus Protease-Inhibitor-Komplexen) vorherzusagen. Ein weiteres, gr¨osstenteils ungel¨ostes, Problem ist der letzte Schritt vieler Protein-Docking-Algorithmen, die Vorher- sage der freien Bindungsenthalpie. Daher schlagen wir eine neue Methode vor, die die schwierige und aufw¨andige Berechnung der freien Bindungsenthalpie vermeidet. Statt dessen wird eine Bewertungs- funktion eingesetzt, die auf der Ahnlichkeit¨ der Protonen-Kernresonanzspektren der potentiellen Kom- plexstrukturen mit dem experimentellen Spektrum beruht. Mit dieser Methode konnten wir sogar die korrekte Struktur eines Protein-Peptid-Komplexes vorhersagen, an dessen Vorhersage energiebasierte Bewertungsfunktionen scheitern. Der zweite Teil der Arbeit stellt BALL (Biochemical ALgorithms Library) vor, ein Rahmenwerk zur schnellen Anwendungsentwicklung im Bereich Molecular Modeling. BALL stellt eine Vielzahl von Datenstrukturen und Algorithmen f¨ur die Felder Molek¨ulmechanik, Vergleich und Analyse von Protein- strukturen, Datei-Import und -Export, NMR-Shiftvorhersage und Visualisierung zur Verf¨ugung. Beim Entwurf von BALL wurde auf Robustheit, einfache Benutzbarkeit und Erweiterbarkeit Wert gelegt. Von existierenden Software-Paketen hebt es sich vor allem durch seine Erweiterbarkeit ab, die auf der konsequenten Anwendung von objektorientierter und generischer Programmierung beruht. Contents Part I – Introduction 1 Part II – Protein Docking 7 1 Biochemistry – the Basics 9 1.1 AtomsandMolecules............................... 9 1.2 AminoAcids ................................... 10 1.3 Proteins ...................................... 10 1.4 NucleicAcids................................... 13 1.5 InteratomicForces ............................... 14 1.5.1 Nonbonded interactions . 14 1.5.2 MolecularMechanics. 16 2 Introduction 19 2.1 RigidBodyDocking ............................... 19 2.2 Docking and Protein Flexibility . ...... 21 2.3 Combining NMR Data and Docking Algorithms . ..... 23 3 Semi-Flexible Docking 25 3.1 Introduction.................................... 25 3.2 TheDockingAlgorithm. 28 3.2.1 RigidDocking .............................. 28 3.2.2 SideChainDemangling . 28 3.2.3 TheMulti-GreedyMethod . 29 3.2.4 The Branch-&-Cut Algorithm . 30 3.2.5 Energetic Evaluation . 38 3.3 ExperimentalResults . 40 4 Protein Docking and NMR 45 4.1 Nuclear Magnetic Resonance Spectroscopy . ....... 45 4.1.1 The Nuclear Angular Momentum . 45 4.1.2 Electronic Shielding and the Chemical Shift . ....... 47 4.1.3 TheBasicNMRExperiment . 48 4.2 Application to the Protein Docking Problem . ........ 50 4.2.1 PreviousWork .............................. 50 4.2.2 NMRShiftPrediction .... ..... ...... ..... ...... 50 4.2.3 Spectrum Synthesis and Comparison . 53 4.3 ExperimentalResults . 55 4.3.1 Methods ................................. 55 4.3.1.1 Preparation of Structures and Rigid Body Docking . 55 4.3.1.2 NMR Chemical Shift Calculation . 55 4.3.1.3 Spectrum Synthesis and Comparison . 56 4.3.2 Results .................................. 57 5 Discussion 61 Part III – BALL 65 6 Design and Implementation 67 6.1 Introduction.................................... 67 6.2 DesignGoals ................................... 68 6.2.1 EaseofUse................................ 68 6.2.2 Functionality ............................... 69 6.2.3 Openness ................................. 69 6.2.4 Robustness ................................ 69 6.3 Choice of Programming Language . 70 6.4 Architecture.................................... 70 6.5 TheFoundationClasses. 71 6.5.1 GlobalDefinitions ............................ 71 6.5.2 CompositeClass ............................. 72 6.5.3 ObjectPersistence ............................ 73 6.5.4 Run-Time Type Identification . 78 6.5.5 Iterators.................................. 78 6.5.6 Processors ................................ 79 6.5.7 Options .................................. 80 6.5.8 LoggingFacility ............................. 81 6.5.9 Strings and Related Classes . 82 6.5.10 Mathematics ............................... 82 6.5.11 Miscellaneous .............................. 82 6.6 TheKernel .................................... 83 6.6.1 Molecular Data Structures . 83 6.6.2 Iterators.................................. 85 6.6.3 Selection ................................. 85 6.7 TheBasicComponents .............................. 87 6.7.1 FileImport/Export