Free Energy Methods Involving Quantum Physics, Path Integrals, and Virtual Screenings
Total Page:16
File Type:pdf, Size:1020Kb
Free Energy Methods Involving Quantum Physics, Path Integrals, and Virtual Screenings Development, Implementation and Application in Drug Discovery Dissertation zur Erlangung des Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) eingereicht beim Fachbereich Mathematik und Informatik der Freien Universität Berlin vorgelegt von Christoph Gorgulla Berlin 2018 This thesis was created within the graduate programs of the Berlin Mathematical School (BMS) and the International Max Planck Research School for Compu- tational Biology and Scientific Computing (IMPRS-CBSC) of the Max Planck Institute for Molecular Genetics (MPIMG), as well as within the Department of Mathematics and Computer Science and the Department of Physics of the Freie Universität Berlin. Erstgutachter (Betreuer) Prof. Dr. Christof Schütte Freie Universität Berlin Fachbereich Mathematik und Informatik Arnimallee 6 14195 Berlin, Germany Zweitgutachter Priv. Doz. Dr. Konstantin Fackeldey Technische Universität Berlin Institut für Mathematik Straße des 17. Juni 136 10623 Berlin, Germany Tag der Disputation: 3. Mai 2018 Copyright c 2018 by Christoph Gorgulla To my grandmother, and all the kind, wise, and caring people in the world. Abstract Computational science has the potential to solve most of the problems which pharma- ceutical research is facing these days. In this field the most pivotal property is arguably the free energy of binding. Yet methods to predict this quantity with sufficient accuracy, reliability and efficiency remain elusive, and are thus not yet able to replace experimental determinations, which remains one of the unattained holy grails of computer-aided drug design (CADD). The situation is similar for methods which are used to identify promising new drug candidates with high binding affinity, which resembles a closely related endeavor in this field. In this thesis the development of a new free energy method (QSTAR) was in the focus. It is able to explicitly take into account the quantum nature of atomic nuclei which so far was not done in binding free energy simulations of biomolecular systems. However, it can be expected to play a substantial role in such systems in particular due to the abundance of hydrogen atoms which posses one of the strongest nuclear delocalizations of all atoms. To take these nuclear quantum effects into account Feynman’s path integral formulation is used and combined in a synergistic way with a novel alchemical transformation scheme. QSTAR makes also available the first readily available single topology approach for electronic structure methods (ESMs). Moreover, an extended alchemical scheme for relative binding free energies was developed to address van der Waals endpoint problems. QSTAR and the alchemical schemes were implemented in HyperQ, a new free energy simulation suite which is highly automated and scalable. Most ESMs methods become soon prohibitively expensive with the size of the system, a restriction which can be circumvented by quantum mechanics/molecular mechanics (QM/MM) methods. In order to be able to apply QSTAR together with ESMs on biomolecular systems an enhanced QM/MM scheme was developed. It is a method for diffusive systems based on restraining potentials, and allows to define QM regions of customizable shape while being computationally fast. It was implemented in a novel client for i-PI, and together with HyperQ allows to carry out free energy simulations of biomolecular systems with potentials of very high accuracy. One of the most promising ways to identify new hit compounds in CADD is provided by structure-based virtual screenings (SBVSs) which make use of free energy methods. In this thesis it is argued that the larger the scale of virtual screenings the higher their success. And a novel workflow system was developed called Virtual Flow, allowing to carry out SBVS-related tasks on computer clusters with virtually perfect scaling behavior and no practically relevant bounds regarding the number of nodes/CPUs. Two versions were implemented, VFLP and VFVS, dedicated to the preparation of large ligand databases and for carrying out the SBVS procedure itself. As a primary application of the new methods and software a dedicated drug design project was started involving three regions on the novel target EBP1, expected to be located on protein-protein interfaces which are extremely challenging to inhibit. Three multistage SBVSs were carried out each involving more than 100 million compounds. Subsequent experimental binding assays indicated a remarkably high true hit rate of above 30 %. Subsequent fluorescence microscopy of one selected compound exhibited favorable biological activities in cancer cells. Other applied projects included computational hit and lead discovery for several other types of anti-cancer drugs, anti-Herpes medications, as well as antibacterials. v Acknowledgements Each one of us can make a difference. Together we make change. Barbara Mikulski There are many people who have supported my doctoral research in one way or another, and I am more than grateful to all of them. At first I would like to sincerely thank the person who made this PhD project possible, my supervisor Christof Schütte. On the one hand he provided what was most important to me: Freedom. Freedom to choose and design the contents of my PhD, and freedom to find and follow my own paths. This freedom transformed my PhD into a wonderful adventure within the realm of science. On the other hand, during this journey Christof Schütte provided me with all the support required whenever there was a need. I am also deeply grateful to my two mentors/thesis advisors, Petra and Max, for their constant support, their exceptional kindness, and the trust they seemed to have in me from the early days on. I feel more than fortunate to having had them as my mentors, and it was a true pleasure to work with them. To Noemi, my mentor of the Berlin Mathematical School (BMS), I am also grateful for her advices on several matters. A substantial part of this work would not have been possible without all the collabo- rating groups and people involved, and I would like to sincerely thank all of them for their efforts. Among them are Haribabu Arthanari and Gerhard Wagner of the Dana Farber Cancer Institute and the Harvard Medical School for their support in bringing my applied project to the experimental level, and for having invited me to visit their research groups. I am thankful to Andras Boeszoermenyi, my primary coworker in Boston regarding the experimental work, for all his time and the interesting discussion during my visit. And to our collaborators, Nancy Kedersha and Pavel Ivanov in the Brigham and Women’s Hospital in Boston, who have done excellent work in regard to the fluorescence microscopy experiments. Magdalena Czuban from the Charité in Berlin I would like to thank for having worked with me on our new collaborative project, Michelle Ceriotti as well as Riccardo Petraglia from the EPFL for their support in relation to i-PI/i-QI, and Jonathan LaRochelle, as well as a few members of the Naar Lab and the Coen Lab with whom I had the joy to work with in several collaborative projects. I would also like to acknowledge Enamine, in particular Olga Tarhanova, for having supported my applied drug discovery project by freely synthesizing compounds for us from their vast virtual compound libraries. Also, it was a true pleasure to having had such wonderful colleagues and fellow stu- dents in the Computational Biophysics Group, the Biocomputing Group, the Arthanari Lab, the Wagner Lab, the Berlin Mathematical School (BMS) and the International Max Planck Research School for Computational Biology and Scientific Computing (IMPRS- CBSC). Thank you all for having provided such a pleasant and supportive working environment. To my friends Moritz and Brigitte I would like to express my gratitude for reading my thesis and their valuable comments, Moritz also for his support regarding Python, and Brigitte for being so meticulous in her reading and encouraging. Also Gerhard König and Han Cheng Lie I would like to thank for various discussions. vii Among the administrative staff of the research groups and graduate schools I would like to thank in particular Fabian Feutlinske, Kirsten Kelleher, Chris Seyffer, Benita Ross, Dominique Schneider, Tanja Fagel, Annette Schumann-Welde, and Dorothé Auth for their constant and kind support throughout my PhD. Considerable computational resources were required during this PhD, and I would like to thank all the sources which have granted me computing time and additional technical support. Among the high performance computers used are the Sheldon, Yoshi, and Soroban Linux clusters of the Freie Universität Berlin, the HLRN III supercomputer in Berlin/Hanover, the ERISOne cluster of Partners Healthcare/Dana Farber Cancer Institute, the Orchestra and O2 clusters of the Harvard Medical School, the Coral cluster of the Wagner Lab in the Harvard Medical School, and the Odyssey cluster of the Faculty of Harvard University. Among the technical and scientific staff related to these resources I would like to thank in particular Jens Dreger, Christian Tuma, Guido Laubender, Francesco Pontiggia, James Cuff and Scott Yockel for their support in various ways. I would also like to acknowledge the free and open source software communities which provide the world altruistically with such magnificent and versatile software, but which we often take for granted. Their contributions played a crucial role in various aspects of this work, and using their packages has often been a delight. This work was funded by several institutions/groups (mainly via scholarships or traveling allowances), and I would like to thank all of them for their support. These are the Einstein Center of Mathematics Berlin/BMS, the Computational Biophysics Group/Freie Universität Berlin, the IMPRS-CBSC/Max Planck Institute for Molecular Genetics, the Arthanari Lab/Dana Farber Cancer Institute, and the Schneider Group/Berlin Institute of Technology. viii Content Overview Abstract ........................................ v Acknowledgements ................................. vii Acronyms ....................................... xxi Nomenclature ...................................