Integrated CADD Methods a Cocktail of KNIME, Bash and Modeling Software
Loris Moretti KNIME Spring Summit 2018 Berlin, March 5-9, 2018 TOPIC OF THE DAY
Outline
Nuevolution status and technology The ligand-binding quest in Drug Discovery Modeling infrastructure HIV-1 protease as modeling example Summary
Slide 2 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT Drug Discovery at Nuevolution
Slide 3 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT NUEVOLUTION A/S A Powerful Technology for Hit-Finding
Nuevolution A/S ...with global partnerships
Founded 2001 Located in central Copenhagen 37 employees in Science Department Small molecule drug discovery Chemetics® drug discovery platform Internal and partnered programs ...and global CRO support Inflammation, Cancer & Immuno-oncology Listed on Nasdaq First North, Sweden, 2015 (uplisting soon)
Slide 4 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT THE CHEMETICS® LEAD DISCOVERY PLATFORM Fast and Efficient Generation, Selection and Identification of hits
Re-synthesis ~1 month ~2 days ~2 weeks Confirmation Optimization DNA Encoded Library (DEL) Selection Identification Etc…
~60.000 ~5 ~500 ~20 B Fragments Libraries/Year Screenings/Year Templates/Year
Slide 5 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT PURSUED TARGETS Internal Pipeline and Collaborations
We are active in the fields of INFLAMATION and ONCOLOGY (~25 targets)
Slide 6 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT EVERYDAY SCENARIO AT NUEVOLUTION The Ligand-Binding Players
TARGET Many different proteins, receptors, enzymes, recognition domain, etc… LIGAND Small Molecules, 10s-1000s hits from CHEMETICS
NEED Ligand-binding hypothesis for ligand optimization Slide 7 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT The Ligand Binding Quest
Slide 8 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT LIGAND-BINDING STUDY Self Docking
knowledge inputs modeling outputs Ligand-binding prediction • Biological information • Protein structure • Docking software • 1 or more poses • Ligand structure • Energy estimation
• LB known • RMSD • score
Slide 9 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT LIGAND-BINDING STUDY Cross Docking (Non-Native)
knowledge inputs modeling outputs Ligand-binding prediction • Biological information • Protein structure • Docking software • 1 or more poses • Ligand structure • Energy estimation
• LB known • van der Waals • Induced fit
Slide 10 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT LIGAND-BINDING STUDY A More Complex Picture: Drug Discovery Environment
Ligand-binding knowledge inputs modeling outputs prediction • literature information Protein • protein preparation • visualization • in house data • Xray (one or more), • ligand preparation • prior knowledge • activity data, homology model • ligand exploration (QM) • 1 or more poses biophysical, • binding sites, • Software selection • metrics for energy biochemical, In vitro, conformations, induce • filters estimation etc… fit, plasticity (pharmacophore) • ranking • Kd, Ki, IC50, EC50, etc… • role of waters, • scoring • more binding modes ionization, cofactor, • ADME/Tox data • Optimization (MM, ions, phys-chem MD) properties Ligand • chemotypes, flexibility, planarity, • stereoisomers, tautomers, ionization • LB unknown
MODELING HYPOTHESIS deal with these characteristics, issues, aspects… Slide 11 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT LIGAND-BINDING STUDY When the Answer is Unknown
Expand and explore all the possibilities: protein states and ligands states Consider more solutions: different hypotheses Look for confirmation: prior knowledge, SAR, consistency Unbiased view: different software and technology Fraction into steps: more control over the process Evaluate and explore each step: process tuning
…to be robust, reliable, automated, modifiable, traceable
COMPUTATIONAL INFRASTRUCTURE an environment to control and explore the modeling process Slide 12 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT The Computational Infrastructure
Slide 13 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT COMPUTATIONAL INFRASTRUCTURE Software Setup Concept
KNIME Analytics platform protocols for data flow and system calls
Bash scripts wrapping modeling software
Modeling software through command line interface
Moretti L., & Sartori L. (2016) Molecular informatics, 35(8-9), 382-390. Moretti L., & Sartori L. (2016) Molecular informatics, 35(10), 489-494.
Slide 14 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT MODELING WORKFLOW Infrastructure in Layers: KNIME
1 Command line for KNIME batch mode
0 Reads TXT input for files location Modeling steps interconnected Email with experiment and modeling parameters through flow variables specifications and results
-1
Condition to SDF reader Actual modeling step run the step from danish “LEg GOdt” (play well)
-2
Handling of Settings handling for Process of the molecules files the job outputs -3 System call to the modeling software (Bash)
Slide 15 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT MODELING WORKFLOW Infrastructure in Layers: BASH Modeling task Script Intro and Files and Main Script variables conditions
Modeling software execution and files Variables transformation
Conclusion
Slurm queuing management Slide 16 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT COMPUTATIONAL INFRASTRUCTURE Features
Hardware and Software related Servers with multiple CPUs and GPUs Run in parallel GNU/Linux Debian OS Installation of third-party software (for modeling, analysis, etc.) Python and Bash to glue together software and procedures (make a “flow”)
Process related Nomenclature (identifiers) for targets, small-molecules and experiments Environment variables customizable for target, small-molecules and experiments File system structure for storing inputs and outputs, and for temporary files Targets prepared in the same way (consistency)
Moretti L., & Sartori L. (2016) Molecular informatics, 35(8-9), 382-390. Moretti L., & Sartori L. (2016) Molecular informatics, 35(10), 489-494.
Slide 17 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT “MODELING BRICKS” AND PROTOCOLS Everyday Tasks Tasks and Software
Protein Preparation: Bash script with modeling software Ligand Preparation: Ligprep, RDKit Docking: Autodock, Vina, Plants, Glide, rDock Knime Protocols Poses Clustering: ACIAP1 and cut-off based Molecular Mechanics: Ambertools and Gromacs Docking Scoring: plants, XScore, Drugscore, BEAR2, consensus score3,4 Scoring Interaction Fingerprint: Plants Molecular Dynamics Quantum Mechanics: Gamess-US Virtual Screening Reference Comparison: Python script Quantum Mechanics Binding Site Analysis: Voidoo, Fpocket, Caver Favorable Interaction Regions: Autogrid, Autodock/Vina Pymol plugin5 Visualization: Pymol, Maestro, Jmol, Vmd, Bodil, Coot
Web interface: Django and Python 1Bottegoni G. et al., (2006) Bioinformatics, 22(14), e58-e65. 2Degliesposti G. et al., (2011) Journal of biomolecular screening, 16(1), 129-133. 3Charifson, P. S. et al., (1999) Journal of medicinal chemistry, 42(25), 5100-5109. 4Oda, A. et al., (2006) Journal of chemical information and modeling, 46(1), 380-391. 5Seeliger D., & de Groot B. L. (2010) Journal of computer-aided molecular design, 24(5), 417-422. Slide 18 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT A Modeling Example
Slide 19 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT LIGAND-BINDING STUDY Docking Example
• Protein HIV-1 Protease • 11 ligands (15-23 rot bonds) • Complexes PDBs available • Cross docking on 1HXW
Slide 20 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT DOCKING EXAMPLE Software and Sampling
1 3 Poses X ligand X software
Autodock Glide SP Plants rDock Vina
10 100
more poses and more programs Slide 21 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT DOCKING EXAMPLE Scoring
Post-optimization
PBSA: • optimization Cscore: • optimization • Xscore + DrugscoreX + Plants • Customizable • Wider scope
Autodock Glide SP Plants rDock Vina
Combination of scoring metrics Slide 22 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT DOCKING EXAMPLE Clustering
Docking poses Cluster best Rmsd Clusters medoids Clusters best cscore
• ACIAP implementation • Simplify conformational space
Map and simplify the conformational space Slide 23 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT DOCKING EXAMPLE Post-Docking Optimization
• MMFF94 • Ligand minimization • Ambertools • Cscore • ”fast”
• BEAR • AM1-BCC • Complex min - Ligand MD – complex min • Ambertools • PBSA • ”slow”
Improve results with optimization Slide 24 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT The Summary
Slide 25 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT SUMMARY Integration for Robustness and Flexibility
A Drug Discovery environment: Nuevolution
“The Need” in a Drug Discovery environment: Ligand-Binding Assessment
Complexity: self, cross, real docking
Modeling infrastructure: …to be robust, reliable, automated, modifiable, traceable
Modeling infrastructure: Integration and customizable environment (LEGO)
KNIME + Bash + Third-party software
HIV-1 protease case: example of integration
Slide 26 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT THANKS
• Alex Haahr Gouliaev • Thomas Franch • Mads Nørregaard-Madsen • Johannes Dolberg • Aleksejs Kontijevskis • All others at Nuevolution
• To the open-source and free software community
• To the KNIME team and community
…and you for the attention
Slide 27 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT Visit us at: https://nuevolution.com
Contact me at: [email protected] NUEVOLUTION
TRANSFORMING CHALLENGES INTO MEDICINE