BUDE
A General Purpose Molecular Docking Program Using OpenCL
Richard B Sessions
1 The molecular docking problem Proteins typically O(1000) atoms ligand Ligands typically O(100) atoms predicted complex
1 Sampling (6-degrees of freedom) EMC 2 Binding affinity prediction EFE-FF
2 An atom-atom based forcefield
parameterised according to atom type, analagous to standard molecular mechanics
3 Empirical Free Energy Forcefield
McIntosh-Smith, S., et al., Benchmarking Energy Efficiency, Power Costs and Carbon Emissions on Heterogeneous Systems. soft core Computer Journal, 2012. 55(2): p. 192-205. Re-docking a ligand into the Xray Structure (good prediction == low RMSD)
1CIL (Human carbonic anhydrase II) RMSD ~ 0.2 Å 5 Another example
1EZQ (Human Factor XA) RMSD ~ 1.2 Å 6 Accuracy of Pose Prediction (re-docking the BindingDB validation set, 84 complexes) www.bindingdb.org
7 Binding Energy Prediction: is BUDE any better?
Mike Hann’s 2006 test of docking software
Yes – better but not perfect! 8 BUDE Simplified Flow Diagram (C++/OpenCL)
Start BUDE Enter Initial End BUDE Data Yes
Data Error(s)? Reading Print Help Yes No Error(s)?
Write Control File Prepare Data for Docking No Docking Info Type
Docking End BUDE Act on No Option Small Yes Large Error(s)?
Site Docking Surface Docking
Generate Surface Pairs Calculate Energies Do Docking Print Results No Do Generation Rank Energies Host Job Parallel EMC Code? Score Results Accelerated Job Yes
Yes Last No Generation
9 BUDE’s heterogeneous approach
1. Discover all OpenCL platforms/devices, inc. both CPUs and GPUs 2. Run a micro benchmark on each device, ideally a short piece of real work 3. Load balance using micro benchmark results 4. Re-run micro benchmark at regular intervals in case load changes
10 BUDE’s Three Docking Modes
•Virtual Screening by Docking
• Binding Site Prediction
• Protein-Protein Docking in real space
11 Virtual Screening by Docking
12 Virtual Screening by Docking of NDM-1 New Delhi metallo-β-lactamase-1
• 8 million ZINC8 candidate drug molecules 20 conformers each 160M dockings
• EMERALD (STFC funded machine in Oxford) • 372 GPU • 2.4x1017 atom-atom energies calculated • ~60 hours actual wall-time
13 BUDE’s EMC in Action 14 Virtual Screening for Ligands to Stabilise a Protein
Screened 160 million conformations of the 8 million ZINC database against 5 different conformations of the protein on EMERALD
Selected and tested 58 compounds with two types of experimental assays and found 18 compounds binding between 10 and 100 µM 31% hit rate
15 A New Virtual Screen against a key protein from the Malaria Parasite
BlueCrystal P3 76 Nvidia K20s EMERALD 372 Nvidia M2090s 16 Binding Site Identification
Full rotation and limited translation of the ligand at each receptor surface vector Location of the Binding Site of PI3P to a Protein (homology model) Involved in Insulin Signalling
Thomas & Tavare 18 Protein-Protein Docking (in real space)
Each point on ligand offered to each point on receptor with a local mini-dock: complete rotation in Z , rock in X & Y, small translations in X, Y & Z
19 Protein-Protein Docking Example the leucine zipper coiled coil
Best energy -> RMSD = 0.2 Å
In a “real” case with Pete Cullen’s group we have mapped a protein- protein interface using BUDE and confirmed it experimentally. This took only 20 site-directed mutations, instead of the hundreds required by full alanine-scanning mutagenesis
20 Performance across devices
16 cores @ 3.1 GHz
High performance in silico virtual drug screening on many-core processors. Simon McIntosh-Smith, James Price, Richard B. Sessions & Amaurys A. Ibarra International Journal of High Performance Computing Applications (accepted for publication) 21 Main Optimisations
Conditional accumulation Predicated accumulation
Instruction mix in the innermost loop of the energy calculation
High performance in silico virtual drug screening on many-core processors. Simon McIntosh-Smith, James Price, Richard B. Sessions & Amaurys A. Ibarra International Journal of High Performance Computing Applications (accepted for publication) 22 Optimisations
High performance in silico virtual drug screening on many-core processors. Simon McIntosh-Smith, James Price, Richard B. Sessions & Amaurys A. Ibarra International Journal of High Performance Computing Applications (accepted for publication) 23 Summary
• GPUs and machines like Emerald are enabling new science
• BUDE is promising a step-change in Molecular Docking
• But plenty more developments and improvements are possible!
24 Acknowledgements
On the shoulders of giants ... Amaurys Avila Ibarra Simon N McIntosh-Smith James Price Debbie K Shoemark
Emil Fischer (1852-1919) Willard Gibbs (1839-1903) ‘Lock and Key’ Gibbs Free Energy G = H – TS EMERALD and the eInfraStructure South Consortium UK
BlueCrystal and the Advanced Computing Research Centre (Bristol)
25 Supplementary Slides
26 Structure and Binding Energy Prediction speed vs accuracy tradeoff Accuracy
Speed
Typical docking Empirical Free Free Energy scoring Energy Forcefield calculations functions BUDE MM1,2 QM/MM3
Entropy: solvation No Yes Yes configurational Approx Approx Yes Electrostatics ? Approx Yes All atom No Yes Yes Explicit solvent No No Yes
1. MD Tyka, AR Clarke, RB Sessions, J. Phys. Chem. B 110 17212-20 (2006) 2. MD Tyka, RB Sessions, AR Clarke, J. Phys. Chem. B 111 9571-80 (2007) 3. CJ Woods, FR Manby, AJ Mulholland, J. Chem. Phys. 128 014109 (2008) 27 EMC Genetic Algorithm minimiser
28 On the shoulders of giants ...
Emil Fischer (1852-1919) Willard Gibbs (1839-1903) ‘Lock and Key’ Gibbs Free Energy
G = H – TS Receptor and Ligand Flexibility
Full flexibility: would be Molecular Dynamics
Limited flexibility: is appropriate for Molecular Docking:
Protein: Backbone – dock to selected Xray or MD structures Sidechains – sample side chain rotamers during docking
Small molecule: generate and dock many different conformations
e.g. ZINC database of 8 M drug-like compounds 160 M conformers
30 EMC Genetic Algorithm
Seed Parents Selected By Flag Generation Size Output Output Mutation Parameter Parameter Parameter
Descriptors Coordinates Method
N M R* True X Y Z U K% R* R* BUDE Algorithm
32 33