BUDE: GPU-Accelerated Molecular Docking for Drug Discovery
Total Page:16
File Type:pdf, Size:1020Kb
BUDE A General Purpose Molecular Docking Program Using OpenCL Richard B Sessions 1 The molecular docking problem Proteins typically O(1000) atoms ligand Ligands typically O(100) atoms predicted complex receptor 1 Sampling (6-degrees of freedom) EMC 2 Binding affinity prediction EFE-FF 2 An atom-atom based forcefield parameterised according to atom type, analagous to standard molecular mechanics 3 Empirical Free Energy Forcefield McIntosh-Smith, S., et al., Benchmarking Energy Efficiency, Power Costs and Carbon Emissions on Heterogeneous Systems. soft core Computer Journal, 2012. 55(2): p. 192-205. Re-docking a ligand into the Xray Structure (good prediction == low RMSD) 1CIL (Human carbonic anhydrase II) RMSD ~ 0.2 Å 5 Another example 1EZQ (Human Factor XA) RMSD ~ 1.2 Å 6 Accuracy of Pose Prediction (re-docking the BindingDB validation set, 84 complexes) www.bindingdb.org 7 Binding Energy Prediction: is BUDE any better? Mike Hann’s 2006 test of docking software Yes – better but not perfect! 8 BUDE Simplified Flow Diagram (C++/OpenCL) Start BUDE Enter Initial End BUDE Data Yes Data Error(s)? Reading Print Help Yes No Error(s)? Write Control File Prepare Data for Docking No Docking Info Type Docking End BUDE Act on No Option Small Yes Large Error(s)? Site Docking Surface Docking Generate Surface Pairs Calculate Energies Do Docking Print Results No Do Generation Rank Energies Host Job Parallel EMC Code? Score Results Accelerated Job Yes Yes Last No Generation 9 BUDE’s heterogeneous approach 1. Discover all OpenCL platforms/devices, inc. both CPUs and GPUs 2. Run a micro benchmark on each device, ideally a short piece of real work 3. Load balance using micro benchmark results 4. Re-run micro benchmark at regular intervals in case load changes 10 BUDE’s Three Docking Modes •Virtual Screening by Docking • Binding Site Prediction • Protein-Protein Docking in real space 11 Virtual Screening by Docking 12 Virtual Screening by Docking of NDM-1 New Delhi metallo-β-lactamase-1 • 8 million ZINC8 candidate drug molecules 20 conformers each 160M dockings • EMERALD (STFC funded machine in Oxford) • 372 GPU • 2.4x1017 atom-atom energies calculated • ~60 hours actual wall-time 13 BUDE’s EMC in Action 14 Virtual Screening for Ligands to Stabilise a Protein Screened 160 million conformations of the 8 million ZINC database against 5 different conformations of the protein on EMERALD Selected and tested 58 compounds with two types of experimental assays and found 18 compounds binding between 10 and 100 µM 31% hit rate 15 A New Virtual Screen against a key protein from the Malaria Parasite BlueCrystal P3 76 Nvidia K20s EMERALD 372 Nvidia M2090s 16 Binding Site Identification Full rotation and limited translation of the ligand at each receptor surface vector Location of the Binding Site of PI3P to a Protein (homology model) Involved in Insulin Signalling Thomas & Tavare 18 Protein-Protein Docking (in real space) Each point on ligand offered to each point on receptor with a local mini-dock: complete rotation in Z , rock in X & Y, small translations in X, Y & Z 19 Protein-Protein Docking Example the leucine zipper coiled coil Best energy -> RMSD = 0.2 Å In a “real” case with Pete Cullen’s group we have mapped a protein- protein interface using BUDE and confirmed it experimentally. This took only 20 site-directed mutations, instead of the hundreds required by full alanine-scanning mutagenesis 20 Performance across devices 16 cores @ 3.1 GHz High performance in silico virtual drug screening on many-core processors. Simon McIntosh-Smith, James Price, Richard B. Sessions & Amaurys A. Ibarra International Journal of High Performance Computing Applications (accepted for publication) 21 Main Optimisations Conditional accumulation Predicated accumulation Instruction mix in the innermost loop of the energy calculation High performance in silico virtual drug screening on many-core processors. Simon McIntosh-Smith, James Price, Richard B. Sessions & Amaurys A. Ibarra International Journal of High Performance Computing Applications (accepted for publication) 22 Optimisations High performance in silico virtual drug screening on many-core processors. Simon McIntosh-Smith, James Price, Richard B. Sessions & Amaurys A. Ibarra International Journal of High Performance Computing Applications (accepted for publication) 23 Summary • GPUs and machines like Emerald are enabling new science • BUDE is promising a step-change in Molecular Docking • But plenty more developments and improvements are possible! 24 Acknowledgements On the shoulders of giants ... Amaurys Avila Ibarra Simon N McIntosh-Smith James Price Debbie K Shoemark Emil Fischer (1852-1919) Willard Gibbs (1839-1903) ‘Lock and Key’ Gibbs Free Energy G = H – TS EMERALD and the eInfraStructure South Consortium UK BlueCrystal and the Advanced Computing Research Centre (Bristol) 25 Supplementary Slides 26 Structure and Binding Energy Prediction speed vs accuracy tradeoff Accuracy Speed Typical docking Empirical Free Free Energy scoring Energy Forcefield calculations functions BUDE MM1,2 QM/MM3 Entropy: solvation No Yes Yes configurational Approx Approx Yes Electrostatics ? Approx Yes All atom No Yes Yes Explicit solvent No No Yes 1. MD Tyka, AR Clarke, RB Sessions, J. Phys. Chem. B 110 17212-20 (2006) 2. MD Tyka, RB Sessions, AR Clarke, J. Phys. Chem. B 111 9571-80 (2007) 3. CJ Woods, FR Manby, AJ Mulholland, J. Chem. Phys. 128 014109 (2008) 27 EMC Genetic Algorithm minimiser 28 On the shoulders of giants ... Emil Fischer (1852-1919) Willard Gibbs (1839-1903) ‘Lock and Key’ Gibbs Free Energy G = H – TS Receptor and Ligand Flexibility Full flexibility: would be Molecular Dynamics Limited flexibility: is appropriate for Molecular Docking: Protein: Backbone – dock to selected Xray or MD structures Sidechains – sample side chain rotamers during docking Small molecule: generate and dock many different conformations e.g. ZINC database of 8 M drug-like compounds 160 M conformers 30 EMC Genetic Algorithm Seed Parents Selected By Flag Generation Size Output Output Mutation Parameter Parameter Parameter Descriptors Coordinates Method N M R* True X Y Z U K% R* R* BUDE Algorithm 32 33 .