Integrated CADD Methods a Cocktail of KNIME, Bash and Modeling

Loris Moretti KNIME Spring Summit 2018 Berlin, March 5-9, 2018 TOPIC OF THE DAY

Outline

Nuevolution status and technology The ligand-binding quest in Drug Discovery Modeling infrastructure HIV-1 protease as modeling example Summary

Slide 2 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT Drug Discovery at Nuevolution

Slide 3 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT NUEVOLUTION A/S A Powerful Technology for Hit-Finding

Nuevolution A/S ...with global partnerships

Founded 2001 Located in central Copenhagen 37 employees in Science Department Small molecule drug discovery Chemetics® drug discovery platform Internal and partnered programs ...and global CRO support Inflammation, Cancer & Immuno-oncology Listed on Nasdaq First North, Sweden, 2015 (uplisting soon)

Slide 4 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT THE CHEMETICS® LEAD DISCOVERY PLATFORM Fast and Efficient Generation, Selection and Identification of hits

Re-synthesis ~1 month ~2 days ~2 weeks Confirmation Optimization DNA Encoded Library (DEL) Selection Identification Etc…

~60.000 ~5 ~500 ~20 B Fragments Libraries/Year Screenings/Year Templates/Year

Slide 5 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT PURSUED TARGETS Internal Pipeline and Collaborations

We are active in the fields of INFLAMATION and ONCOLOGY (~25 targets)

Slide 6 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT EVERYDAY SCENARIO AT NUEVOLUTION The Ligand-Binding Players

TARGET Many different proteins, receptors, enzymes, recognition domain, etc… LIGAND Small Molecules, 10s-1000s hits from CHEMETICS

NEED Ligand-binding hypothesis for ligand optimization Slide 7 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT The Ligand Binding Quest

Slide 8 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT LIGAND-BINDING STUDY Self

knowledge inputs modeling outputs Ligand-binding prediction • Biological information • Protein structure • Docking software • 1 or more poses • Ligand structure • Energy estimation

• LB known • RMSD • score

Slide 9 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT LIGAND-BINDING STUDY Cross Docking (Non-Native)

knowledge inputs modeling outputs Ligand-binding prediction • Biological information • Protein structure • Docking software • 1 or more poses • Ligand structure • Energy estimation

• LB known • van der Waals • Induced fit

Slide 10 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT LIGAND-BINDING STUDY A More Complex Picture: Drug Discovery Environment

Ligand-binding knowledge inputs modeling outputs prediction • literature information Protein • protein preparation • • in house data • Xray (one or more), • ligand preparation • prior knowledge • activity data, homology model • ligand exploration (QM) • 1 or more poses biophysical, • binding sites, • Software selection • metrics for energy biochemical, In vitro, conformations, induce • filters estimation etc… fit, plasticity (pharmacophore) • ranking • Kd, Ki, IC50, EC50, etc… • role of waters, • scoring • more binding modes ionization, cofactor, • ADME/Tox data • Optimization (MM, ions, phys-chem MD) properties Ligand • chemotypes, flexibility, planarity, • stereoisomers, tautomers, ionization • LB unknown

MODELING HYPOTHESIS deal with these characteristics, issues, aspects… Slide 11 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT LIGAND-BINDING STUDY When the Answer is Unknown

Expand and explore all the possibilities: protein states and ligands states Consider more solutions: different hypotheses Look for confirmation: prior knowledge, SAR, consistency Unbiased view: different software and technology Fraction into steps: more control over the process Evaluate and explore each step: process tuning

…to be robust, reliable, automated, modifiable, traceable

COMPUTATIONAL INFRASTRUCTURE an environment to control and explore the modeling process Slide 12 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT The Computational Infrastructure

Slide 13 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT COMPUTATIONAL INFRASTRUCTURE Software Setup Concept

KNIME Analytics platform protocols for data flow and system calls

Bash scripts wrapping modeling software

Modeling software through command line interface

Moretti L., & Sartori L. (2016) Molecular informatics, 35(8-9), 382-390. Moretti L., & Sartori L. (2016) Molecular informatics, 35(10), 489-494.

Slide 14 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT MODELING WORKFLOW Infrastructure in Layers: KNIME

1 Command line for KNIME batch mode

0 Reads TXT input for files location Modeling steps interconnected Email with experiment and modeling parameters through flow variables specifications and results

-1

Condition to SDF reader Actual modeling step run the step from danish “LEg GOdt” (play well)

-2

Handling of Settings handling for Process of the molecules files the job outputs -3 System call to the modeling software (Bash)

Slide 15 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT MODELING WORKFLOW Infrastructure in Layers: BASH Modeling task Script Intro and Files and Main Script variables conditions

Modeling software execution and files Variables transformation

Conclusion

Slurm queuing management Slide 16 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT COMPUTATIONAL INFRASTRUCTURE Features

Hardware and Software related Servers with multiple CPUs and GPUs Run in parallel GNU/ Debian OS Installation of third-party software (for modeling, analysis, etc.) Python and Bash to glue together software and procedures (make a “flow”)

Process related Nomenclature (identifiers) for targets, small-molecules and experiments Environment variables customizable for target, small-molecules and experiments File system structure for storing inputs and outputs, and for temporary files Targets prepared in the same way (consistency)

Moretti L., & Sartori L. (2016) Molecular informatics, 35(8-9), 382-390. Moretti L., & Sartori L. (2016) Molecular informatics, 35(10), 489-494.

Slide 17 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT “MODELING BRICKS” AND PROTOCOLS Everyday Tasks Tasks and Software

Protein Preparation: Bash script with modeling software Ligand Preparation: Ligprep, RDKit Docking: Autodock, Vina, Plants, Glide, rDock Knime Protocols Poses Clustering: ACIAP1 and cut-off based Molecular Mechanics: Ambertools and Gromacs Docking Scoring: plants, XScore, Drugscore, BEAR2, consensus score3,4 Scoring Interaction Fingerprint: Plants Quantum Mechanics: Gamess-US Reference Comparison: Python script Quantum Mechanics Binding Site Analysis: Voidoo, Fpocket, Caver Favorable Interaction Regions: Autogrid, Autodock/Vina Pymol plugin5 Visualization: Pymol, Maestro, , Vmd, Bodil, Coot

Web interface: Django and Python 1Bottegoni G. et al., (2006) Bioinformatics, 22(14), e58-e65. 2Degliesposti G. et al., (2011) Journal of biomolecular screening, 16(1), 129-133. 3Charifson, P. S. et al., (1999) Journal of medicinal chemistry, 42(25), 5100-5109. 4Oda, A. et al., (2006) Journal of chemical information and modeling, 46(1), 380-391. 5Seeliger D., & de Groot B. L. (2010) Journal of computer-aided molecular design, 24(5), 417-422. Slide 18 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT A Modeling Example

Slide 19 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT LIGAND-BINDING STUDY Docking Example

• Protein HIV-1 Protease • 11 ligands (15-23 rot bonds) • Complexes PDBs available • Cross docking on 1HXW

Slide 20 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT DOCKING EXAMPLE Software and Sampling

1 3 Poses X ligand X software

Autodock Glide SP Plants rDock Vina

10 100

more poses and more programs Slide 21 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT DOCKING EXAMPLE Scoring

Post-optimization

PBSA: • optimization Cscore: • optimization • Xscore + DrugscoreX + Plants • Customizable • Wider scope

Autodock Glide SP Plants rDock Vina

Combination of scoring metrics Slide 22 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT DOCKING EXAMPLE Clustering

Docking poses Cluster best Rmsd Clusters medoids Clusters best cscore

• ACIAP implementation • Simplify conformational space

Map and simplify the conformational space Slide 23 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT DOCKING EXAMPLE Post-Docking Optimization

• MMFF94 • Ligand minimization • Ambertools • Cscore • ”fast”

• BEAR • AM1-BCC • Complex min - Ligand MD – complex min • Ambertools • PBSA • ”slow”

Improve results with optimization Slide 24 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT The Summary

Slide 25 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT SUMMARY Integration for Robustness and Flexibility

A Drug Discovery environment: Nuevolution

“The Need” in a Drug Discovery environment: Ligand-Binding Assessment

Complexity: self, cross, real docking

Modeling infrastructure: …to be robust, reliable, automated, modifiable, traceable

Modeling infrastructure: Integration and customizable environment (LEGO)

KNIME + Bash + Third-party software

HIV-1 protease case: example of integration

Slide 26 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT THANKS

• Alex Haahr Gouliaev • Thomas Franch • Mads Nørregaard-Madsen • Johannes Dolberg • Aleksejs Kontijevskis • All others at Nuevolution

• To the open-source and free software community

• To the KNIME team and community

…and you for the attention

Slide 27 NUEVOLUTION COPYRIGHT & DISTRIBUTION RIGHT Visit us at: https://nuevolution.com

Contact me at: [email protected] NUEVOLUTION

TRANSFORMING CHALLENGES INTO MEDICINE