<<

BUDE

A General Purpose Molecular Docking Program Using OpenCL

Richard B Sessions

1 The molecular docking problem typically O(1000) atoms Ligands typically O(100) atoms predicted complex

1 Sampling (6-degrees of freedom) EMC 2 Binding affinity prediction EFE-FF

2 An atom-atom based forcefield

parameterised according to atom type, analagous to standard

3 Empirical Free Energy Forcefield

McIntosh-Smith, S., et al., Benchmarking Energy Efficiency, Power Costs and Carbon Emissions on Heterogeneous Systems. soft core Computer Journal, 2012. 55(2): p. 192-205. Re-docking a ligand into the Xray Structure (good prediction == low RMSD)

1CIL (Human carbonic anhydrase II) RMSD ~ 0.2 Å 5 Another example

1EZQ (Human Factor XA) RMSD ~ 1.2 Å 6 Accuracy of Pose Prediction (re-docking the BindingDB validation set, 84 complexes) www.bindingdb.org

7 Binding Energy Prediction: is BUDE any better?

Mike Hann’s 2006 test of docking software

Yes – better but not perfect! 8 BUDE Simplified Flow Diagram (C++/OpenCL)

Start BUDE Enter Initial End BUDE Data Yes

Data Error(s)? Reading Print Help Yes No Error(s)?

Write Control File Prepare Data for Docking No Docking Info Type

Docking End BUDE Act on No Option Small Yes Large Error(s)?

Site Docking Surface Docking

Generate Surface Pairs Calculate Energies Do Docking Print Results No Do Generation Rank Energies Host Job Parallel EMC Code? Score Results Accelerated Job Yes

Yes Last No Generation

9 BUDE’s heterogeneous approach

1. Discover all OpenCL platforms/devices, inc. both CPUs and GPUs 2. Run a micro benchmark on each device, ideally a short piece of real work 3. Load balance using micro benchmark results 4. Re-run micro benchmark at regular intervals in case load changes

10 BUDE’s Three Docking Modes

by Docking

Prediction

-Protein Docking in real space

11 Virtual Screening by Docking

12 Virtual Screening by Docking of NDM-1 New Delhi metallo-β-lactamase-1

• 8 million ZINC8 candidate drug 20 conformers each  160M dockings

• EMERALD (STFC funded machine in Oxford) • 372 GPU • 2.4x1017 atom-atom energies calculated • ~60 hours actual wall-time

13 BUDE’s EMC in Action 14 Virtual Screening for Ligands to Stabilise a Protein

Screened 160 million conformations of the 8 million ZINC database against 5 different conformations of the protein on EMERALD

Selected and tested 58 compounds with two types of experimental assays and found 18 compounds binding between 10 and 100 µM 31% hit rate

15 A New Virtual Screen against a key protein from the Malaria Parasite

BlueCrystal P3 76 Nvidia K20s EMERALD 372 Nvidia M2090s 16 Binding Site Identification

Full rotation and limited translation of the ligand at each receptor surface vector Location of the Binding Site of PI3P to a Protein (homology model) Involved in Insulin Signalling

Thomas & Tavare 18 Protein-Protein Docking (in real space)

Each point on ligand offered to each point on receptor with a local mini-dock: complete rotation in Z , rock in X & Y, small translations in X, Y & Z

19 Protein-Protein Docking Example the leucine zipper coiled coil

Best energy -> RMSD = 0.2 Å

In a “real” case with Pete Cullen’s group we have mapped a protein- protein interface using BUDE and confirmed it experimentally. This took only 20 site-directed mutations, instead of the hundreds required by full alanine-scanning mutagenesis

20 Performance across devices

16 cores @ 3.1 GHz

High performance virtual drug screening on many-core processors. Simon McIntosh-Smith, James Price, Richard B. Sessions & Amaurys A. Ibarra International Journal of High Performance Computing Applications (accepted for publication) 21 Main Optimisations

Conditional accumulation Predicated accumulation

Instruction mix in the innermost loop of the energy calculation

High performance in silico virtual drug screening on many-core processors. Simon McIntosh-Smith, James Price, Richard B. Sessions & Amaurys A. Ibarra International Journal of High Performance Computing Applications (accepted for publication) 22 Optimisations

High performance in silico virtual drug screening on many-core processors. Simon McIntosh-Smith, James Price, Richard B. Sessions & Amaurys A. Ibarra International Journal of High Performance Computing Applications (accepted for publication) 23 Summary

• GPUs and machines like Emerald are enabling new science

• BUDE is promising a step-change in Molecular Docking

• But plenty more developments and improvements are possible!

24 Acknowledgements

On the shoulders of giants ... Amaurys Avila Ibarra Simon N McIntosh-Smith James Price Debbie K Shoemark

Emil Fischer (1852-1919) Willard Gibbs (1839-1903) ‘Lock and Key’ Gibbs Free Energy G = H – TS EMERALD and the eInfraStructure South Consortium UK

BlueCrystal and the Advanced Computing Research Centre (Bristol)

25 Supplementary Slides

26 Structure and Binding Energy Prediction speed vs accuracy tradeoff Accuracy

Speed

Typical docking Empirical Free Free Energy scoring Energy Forcefield calculations functions BUDE MM1,2 QM/MM3

Entropy: solvation No Yes Yes configurational Approx Approx Yes Electrostatics ? Approx Yes All atom No Yes Yes Explicit solvent No No Yes

1. MD Tyka, AR Clarke, RB Sessions, J. Phys. Chem. B 110 17212-20 (2006) 2. MD Tyka, RB Sessions, AR Clarke, J. Phys. Chem. B 111 9571-80 (2007) 3. CJ Woods, FR Manby, AJ Mulholland, J. Chem. Phys. 128 014109 (2008) 27 EMC minimiser

28 On the shoulders of giants ...

Emil Fischer (1852-1919) Willard Gibbs (1839-1903) ‘Lock and Key’ Gibbs Free Energy

G = H – TS Receptor and Ligand Flexibility

Full flexibility: would be

Limited flexibility: is appropriate for Molecular Docking:

Protein: Backbone – dock to selected Xray or MD structures Sidechains – sample side chain rotamers during docking

Small : generate and dock many different conformations

e.g. ZINC database of 8 M drug-like compounds  160 M conformers

30 EMC Genetic Algorithm

Seed Parents Selected By Flag Generation Size Output Output Mutation Parameter Parameter Parameter

Descriptors Coordinates Method

N M R* True X Y Z U K% R* R* BUDE Algorithm

32 33