Stratagems for Effective Function Evaluation in Computational Chemistry

Stratagems for Effective Function Evaluation in Computational Chemistry Gwyn S. Skone Keble College Doctor of Philosophy University of Oxford Mathematical, Physical, and Life Sciences Division Computing Laboratory May 2010 Version 1.0 (beta) submitted November 2009 and examined February 2010. Version 1.1 (revised) submitted April 2010 and approved May 2010. Final printing July 2010. Abstract In recent years, the potential benefits of high-throughput virtual screening to the drug discovery community have been recognized, bringing an increase in the number of tools developed for this purpose. These programs have to process large quantities of data, searching for an optimal solution in a vast combinatorial range. This is particularly the case for protein-ligand docking, since proteins are sophisticated structures with complicated interactions for which either molecule might reshape itself. Even the very limited flexibility model to be considered here, using ligand conformation ensembles, requires six dimensions of exploration — three translations and three rotations — per rigid conformation. The functions for evaluating pose suitability can also be complex to calculate. Consequently, the programs being written for these biochemical simulations are extremely resource-intensive. This work introduces a pure computer science approach to the field, developing techniques to improve the effectiveness of such tools. Their architecture is generalized to an abstract pattern of nested layers for discussion, covering scoring functions, search methods, and screening overall. Based on this, new stratagems for molecular docking software design are described, including lazy or partial evaluation, geometric analysis, and parallel processing implementation. In addition, a range of novel algorithms are presented for applications such as active site detection with linear complexity (PIES) and small molecule shape description (PASTRY) for pre-alignment of ligands. The various stratagems are assessed individually and in combination, using several modified versions of an existing docking program, to demonstrate their benefit to virtual screening in practical contexts. In particular, the importance of appropriate precision in calculations is highlighted. 3 Contents Abstract 3 List of Symbols and Abbreviations 8 List of Figures 10 List of Tables 13 List of Listings 14 Acknowledgements 15 1 Introduction 17 1.1 Biochemistry . 17 1.1.1 Drug Discovery . 18 1.2 Computer Science . 20 1.2.1 This Thesis . 21 2 Literature Review 23 2.1 Molecular Representation . 23 2.2 Molecular Comparison . 28 2.3 Protein Structure Prediction . 29 2.3.1 Native Structure Prediction — The Folding Problem . 30 2.3.2 Complexed Structure Prediction — The Docking Problem . 32 2.4 Current Areas of Research . 47 2.4.1 Lead Compound Identification . 49 2.4.2 Reviews . 51 3 Early Work 53 3.1 FFT Alignment . 53 3.2 Sphere Trees . 54 3.2.1 Bonded Sphere Trees for Molecular Representation . 55 3.2.2 Test Program (cSpheres) . 57 3.3 Scoring Functions and Search Methods . 59 3.3.1 Piecewise Linear Potential . 61 3.3.2 XScore . 61 3.3.3 Search Methods . 63 3.4 The DOX Family . 64 3.4.1 DOX ..................................... 64 3.4.2 DOXGA .................................. 65 3.4.3 OrthoDOX ................................. 65 3.5 XScore Implementation . 66 3.5.1 Surface Calculations . 66 3.5.2 Recalibration . 67 4 Contents 4 Stratagems From Computer Science 69 4.1 Context and Existing Technology . 69 4.2 Stratagems for Consideration . 70 4.2.1 Scoring . 70 4.2.2 Searching . 71 4.2.3 Screening . 73 4.3 Applications and Examples . 75 4.3.1 Geometric Guidance . 77 4.3.2 Efficient Exploration . 78 4.3.3 Properties, Priority, and Parallelization . 79 4.4 Assessment Criteria . 80 5 Geometric Guidance 83 5.1 Quaternion Rotations . 83 5.1.1 Comparison with Euler Angles . 85 5.2 Predicted Pocket Positioning . 87 5.2.1 PIES . 87 5.2.2 PASS . 96 5.2.3 Comparison of Pocket Detection Methods . 97 5.2.4 Pre-Positioned Docking Results . 99 5.2.5 Automatic Search Extents . 101 5.3 Shape Descriptors . 106 5.3.1 USR . 106 5.3.2 PASTRY . 107 5.3.3 Comparison of Shape Descriptors . 108 5.3.4 Ligand Alignment Using Known Poses . 111 5.3.5 Pre-Aligned Docking Results . 113 6 Efficient Exploration 117 6.1 Local Optimization . 117 6.1.1 Local Searches Versus GA Generations . 120 6.2 Look-Up Table Interpolation . 121 6.3 Lazy Evaluation: Caching Look-Up Tables . 125 6.4 Early Rejection . 127 6.4.1 Scoring-Based Early Rejection . 128 6.4.2 Pose-Based Early Rejection . 134 6.4.3 Quota-Based Early Rejection . 137 7 Properties, Priority, and Parallelization 143 7.1 Knowledge Bases . 143 7.1.1 Normalized Scores . 146 7.2 Learnable Properties . 148 7.2.1 Conformation Prioritization . 150 7.3 Job Control . 151 7.4 Multi-Processor Distribution . 153 5 Contents 8 Comparisons and Conclusions 157 8.1 Results: Assessment of Stratagems . 157 8.2 Future Work: More Stratagems to Consider . 160 8.2.1 Pharmacophores and Alignment . 160 8.2.2 Directed Search Heuristics . 161 8.2.3 Search Methods . 162 8.2.4 Sphere Tree Representations . 163 8.3 Conclusions . 163 8.3.1 Scoring, Searching, and Screening . 165 8.3.2 A Strategy . 166 A Fundamentals of Protein Structure 171 A.1 Introduction . 171 A.1.1 Primary Structure . 172 A.1.2 Secondary Structure . 173 A.1.3 Tertiary Structure . 174 A.1.4 Quaternary Structure . 174 A.2 Protein Behaviour and Interactions . 174 A.2.1 Folding . 174 A.2.2 Binding . 174 A.2.3 Structure Identification . 175 A.2.4 Biochemistry . 176 B FFT Tesselation Test 177 B.1 Implementation . 177 B.1.1 Background . 177 B.1.2 Interface . 178 B.1.3 Testing . 180 B.1.4 Extension . 182 B.2 Gradual Refinement Algorithm . 183 B.2.1 Pipelined Architecture . 183 B.2.2 Development . 184 C Dotty Surfaces 187 C.1 Motivation . 187 C.2 Method . 188 D XScore Calibration 191 D.1 Training Cases and Function Data . 191 E Test Configurations 195 E.1 Molecule Test Cases . 197 E.1.1 1AF2 . 197 E.1.2 1K3U . 197 E.1.3 The Astex Diverse Set . 199 E.1.4 The Astex Mini Set . 200 E.2 Hardware Platforms . 200 E.3 DOX Editions: Strategies and Codes . ..

Load more