SkeleDock: A Web Application for Scaffold in PlayMolecule

Alejandro Varela-Rial,†,‡ Maciej Majewski,‡ Alberto Cuzzolin,† Gerard

Mart´ınez-Rosell,† and Gianni De Fabritiis∗,‡,†,¶

†Acellera Labs, Doctor Trueta 183, Barcelona, Spain ‡Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Barcelona, Spain ¶Instituci´oCatalana de Recerca i Estudis Avan¸cats(ICREA), Passeig Lluis Companys 23, Barcelona, Spain

E-mail: [email protected]

Abstract

SkeleDock is a scaffold docking algorithm which uses the structure of a protein- ligand complex as a template to model the binding mode of a chemically similar system. This algorithm was evaluated in the D3R Grand Challenge 4 pose prediction challenge, where it achieved competitive performance. Furthermore, we show that, if crystallized fragments of the target ligand are available, SkeleDock can outperform rDock docking at predicting the binding mode. This article also addresses the capacity of arXiv:2005.05606v1 [q-bio.QM] 12 May 2020 this algorithm to model macrocycles and deal with scaffold hopping. SkeleDock can be accessed at https://playmolecule.org/SkeleDock/.

Introduction Accurate predictions can substantially reduce the costs of drug development and speed up Predicting the binding mode of small molecules the process.1 Several software solutions exist in a protein pocket is one of the main chal- that address this problem, including AutoDock lenges in the field of . Vina,2 Glide,3 Gold4 or rDock.5 Typical -

1 ing protocols use the protein cavity and the that can, potentially, improve the pharmacologi- query ligand to generate poses that are later cal properties of a compound or circumvent intel- evaluated with a generalized scoring function. lectual property.12 Therefore, there is a need to However, structural knowledge about the tar- maximize the use of structural information. We get system is usually available, such as protein present here SkeleDock, a new scaffold docking homologs with similar co-crystallized ligands. algorithm that can overcome local mismatches. Hence, given that the binding mode of similar molecules is usually conserved,6,7 it is reasonable to exploit this information to increase the accu- racy of the prediction. Considering the growing Features amount of structural data available in the Pro- 8 tein Data Bank (PDB), and the popularity of Template Query fragment-based drug discovery,9 we expect this molecule molecule knowledge-rich scenario to become increasingly Make template Make query prevalent. graph graph

Identify common Docking algorithms which make use of such subgraph knowledge are usually referred to as similarity 10 docking or scaffold docking. Scaffold docking Dihedral autocompletion methods usually rely on maximum common sub- 11 structure (MCS) approaches, such as fkcombu. Template-Query MCS methods try to find the largest common Mapping substructure (subgraph) between two molecules. Pose When found, the conformation of that substruc- Tethering refinement ture in the query ligand can be modelled by simply mimicking the conformation of that same Figure 1: Main steps of SkeleDock algorithm. substructure in the template, while the position The dihedral autocompletion step is optional. of the remaining atoms is decided by a general scoring function. However, due to the character- Algorithm. SkeleDock web application pro- istics of the MCS methods, two almost identical vides a user-friendly interface to perform scaf- molecules that only differ in minor modifications, fold docking, starting from files with the struc- can return disappointingly short subgraphs. A ture of the receptor (PDB), a template molecule shorter MCS means that more atoms in the (PDB) and a set SMILES representing query query ligand would have to be modelled without ligands (CSV). After submission, these files fol- any reference by the docking software, which is low SkeleDock’s algorithm, whose main steps not desirable. Additionally, such minimal mis- are summarized in Figure 1. The algorithm matches can be of critical interest in medicinal begins by building a graph for the query and chemistry, as they can constitute scaffold hops the template molecules. These two graphs are

2 then compared to identify a common subgraph, called dihedral autocompletion. As shown in that is, a continuous set of atoms whose ele- Figure 2b, the mapping found in the graph com- ment (node) and bonds (edges) are equivalent parison step has stopped just before the atom in both the query and the template molecules. whose element differs between the query and Hence, if this step is successful, a mapping link- template molecules, depicted as X in Figure 2a. ing several atoms in the query molecule to their However, this mismatching atom belongs to a template counterparts will be returned. In the dihedral (highlighted in a -stick represen- following step, tethering, this mapping is used tation) in which three consecutive atoms are to change the position of the atoms of the query already mapped to the template. We can then molecule. This is done by creating a force in each assume that the mismatching atom -the fourth query atom that points towards the location of atom of this dihedral- matches the fourth atom its template counterpart, effectively biasing the of the equivalent dihedral in the template. This conformation of the query ligand towards that is what we refer to as dihedral autocompletion. of the template. Finally, in order to find an ap- After each dihedral autocompletion cycle, a new propriate location for those atoms in the query non-mapped fourth atom appears, and this step molecule for which no template equivalent was is repeated recursively until no more atoms are found, the tethered template docking protocol of available. If the template dihedral offers several rDock5 is used. This protocol allows the user to possibilities for the fourth atom, all of them are constrain the degrees of freedom of the docking explored and evaluated. This functionality is run (orientation, position and dihedral angles of key to overcome local mismatches, which makes the ligand), based on the initial conformation of SkeleDock able to handle some minor scaffold the provided molecule and a set of atom indexes. hopps. Some MCS methods can overcome sim- These indexes correspond to the atoms that the ple mismatches as the one shown in Figure 2, as user wants to be fixed, in our case, those atoms they could identify the two disconnected, com- for which we have found a template counter- mon subgraphs. However, if these disconnected part. If a given dihedral is composed by atoms subgraphs are not highly similar, typical MCS whose indexes belong to this set, its dihedral methods could fail. angle will not be sampled at all or only within a user-defined range.

Autocompletion step. As previously dis- cussed, one limitation of methods based on MCS is its sensitivity to small changes: two molecules which are almost identical, except for some mi- nor modifications, will return a smaller mapping, as the common subgraph shared by both is now smaller. Figure 2a depicts such scenario. To avoid this problem, we added an optional step

3 a) Application options. Different options are available to change the behaviour of the ap- plication. rDock refinement step is enabled by default, but it might not be necessary if ev- ery query atom has a template equivalent. The

b) scaffold-hopping tolerant mode enables or dis- ables the dihedral autocompletion step. Users can also modify the magnitude of the force ap- plied to each atom during tethering. Higher values result in a better alignment but might in- troduce some artefacts, like a change of chirality. The last option is probe radius that defines the radius of the spheres used to define the size of the docking cavity for rDock.5 After execution, the best pose of each ligand can be displayed together with the protein and the template lig- and (Figure 3). Results can be downloaded in a Figure 2: Dihedral autocompletion step. a) tar.gz file. Chemical structure of template and query molecules. The mismatching atom is depicted as X. b) Overlap between query molecule (opaque licorice) and template molecule (transparent texture) before (top image) and after (bottom image) the autocompletion step. The semi-completed dihedral (atoms depicted with a ball) propagates to the right side, improving the overlap with the tem- plate. Template molecule is PDB code: 1UVT, resname: I48. Rings have to be bro- ken to allow this step, but they are restored before the tethering. These conformations are not the final docked poses.

4 Figure 3: SkeleDock’s graphical user interface. Docked molecules (line representation) are shown overlapped with the template used (ball and stick representation).

Time performance. We assessed the effi- Table 1: Time performance of SkeleDock ciency of SkeleDock by docking congeneric series and rDock for BACE-1 and CathepsinS con- generic series. The number of simulated lig- for two different targets: Cathepsin S (459 lig- ands is listed in brackets. ands, average of 46.6 heavy atoms) and BACE-1 (154 ligands, an average of 38.4 heavy atoms). BACE-1 CatS (154) (459) We used rDock as a baseline, and each test was Method #cpu Yield Yield run using 4 and 30 cores. SkeleDock is two to [Lig/min] [Lig/min] three times slower than rDock, but we believe 4 15.2 13.9 SkeleDock that the increase in accuracy compensates it. 30 43.8 50.7 4 50.5 53.4 Table 1 sums up the results of time performance rDock 30 102.0 127.5 evaluation.

Validation

Fragment-based docking. We evaluated SkeleDock’s ability to recover the native pose of a ligand using a fragment as a template. Due to the lack of structures of protein with ligands and corresponding fragments, we decided to artificially generate fragments for complexes from the refined set of PDBbind (version 2018)13

5 and use them as templates for SkeleDock. The because the less-strict nature of the agnostic ligands were fragmented by breaking a selected MCS setting might find mappings which are fea- rotatable bond. We prepared three sets of frag- sible in terms of equivalence in other features ments of increasing difficulty by excluding from (as ring-ring), but lead to wrong orientations the fragment 1, 3 or 5 rotatable bonds of the of the ligand. These results suggest that, when complete ligand. Deleting more atoms from the the binding mode of the query ligand and its template increases the difficulty of predicting the fragments are conserved, biasing the prediction right pose, as there is no reference for them. using SkeleDock or MCS approaches can sub- stantially increase the success rate of binding We compared SkeleDock’s performance with mode prediction. two MCS-based methods and with an uncon- strained docking protocol. The MCS-based methods are two different settings of RDKit’s14

100 findMCS function: The first, where the element SkeleDock N = 1871 strict-MCS of the atoms must match (strict MCS) and the 90 agnostic-MCS free-rDock N = 982 second, where the element and bond-order mis- 80 matches are allowed (agnostic MCS). The func- N = 462 70 tion returns a mapping (just as the graph com- parison step of SkeleDock), which is then directly 60 passed as an input to the tethering and pose re- 50

finement steps. Finally, for the unconstrained 40 % correct poses docking protocol, we used rDock with free rota- 30 tion, translation and dihedral angle exploration 20 (free rDock). The performance of docking al- gorithms is evaluated by the number of correct 10

0 predictions. By convention, poses are considered 1 3 5 correct if their RMSD from their crystal pose is Number of non-tethered rotatable bonds under 2.0A.˚ 15 We report two levels of success: Figure 4: Self-docking performance of Skele- Top 1, where only the top pose was selected, Dock, strict MCS, agnostic MCS and free and Top 5, where the best among the top five rDock. Different shades correspond to the poses was selected (Figure 4). A full report of different success levels: Top 1 - opaque, Top the docking results can be found in Table S1. 5 - transparent.

In terms of success rate, SkeleDock outper- D3R Grand Challenge 4. In order to forms other approaches in all fragmentation sce- evaluate SkeleDock prospectively, we engaged in narios (Figure 4). Strict MCS is comparable to the D3R Grand Challenge 4 (GC4) pose predic- SkeleDock, and they both outperform agnostic tion subchallenge. The D3R Grand Challenge MCS and free rDock. This result is expected is an international contest where participants

6 complete different computational tasks of phar- the global RMSD and the macrocycle RMSD is maceutical interest.16 In its fourth edition, the lower in SkeleDock poses, thanks to the bigger objective was to predict binding modes of 20 mapping with the template (Table S3 and Table ligands of BACE-1 protein. As templates for S4). SkeleDock, we used crystal structures of close homologs and their co-crystallized ligands from SkeleDock’s submission (code: qqou3) fin- PDB (Table S2). At the time the challenge took ished among the top-performing participants, place, the final rDock pose refinement step was ranking 9th out of 74 according to median RMSD not implemented in the protocol. Instead, we (1.02 A)˚ and 15th according to mean RMSD run a short (MD) simula- (1.33 A).˚ In the retrospective analysis, Skele- tion to relax the poses (See SI:MD simulation Dock web application performed slightly worse for further details). To asses the performance of with a mean RMSD of 1.47 A.˚ Given that this the final protocol, a retrospective analysis was test was run in a fully automated fashion and run using SkeleDock’s web application at Play- with no human supervision, the gap between the Molecule. two results is understandable.

This subchallenge was particularly compli- cated for two reasons: (1) all ligands except one had a macrocycle, and (2) most of the ligands had a shortened MCS with their template due to certain atoms differing in element, or the pres- ence of rings. Conformational changes in macro- cycles involve the concerted rotation of several dihedrals, making them difficult to model.17 The gold standard among docking practitioners is to first sample different conformations of a macro- cycle and then dock each one independently. This was not needed in our case, as SkeleDock can simply use the macrocycle of the template to model the one in the query ligand. Regarding the shortened MCS, the autocompletion step of SkeleDock can handle these mismatches, lead- ing to a greater mapping and overlap with the Figure 5: Overlap between predicted poses templates both in the macro and non-macro frac- (gold) and template (violet) using 3 different tions of the molecules, as can be seen in Figure methods: a) SkeleDock, b) Element Agnostic, 5. We actually compared the RMSD of the poses ) Strict MCS. RMSD is the average RMSD generated by SkeleDock and the two MCS meth- value of the poses, and mRMSD is the mean ods described in fragment-based docking. Both value of the RMSD of the macrocycle atoms.

7 Conclusions der grant agreement No 823712 (CompBioMed2) and from the Industrial Doctorates Plan of the SkeleDock algorithm offers four main features: Secretariat of Universities and Research of the 1) docking of molecules based on their ana- Department of Economy and Knowledge of the logues or fragments, 2) autocompletion step Generalitat of Catalonia. that can handle local mismatches and hence, model minor scaffold hopps, 3) ability to model macrocycles without having to pregenerate ring Supporting Information conformations and 4) a user-friendly GUI that Available enables efficient scaffold docking and results exploration. The protocol can be accessed at The following files are available free of charge. https://playmolecule.org/SkeleDock/. Table S1: Detailed results of the retrospec- tive validation analysis. (csv) Table S2: PDB codes used as template for SkeleDock in the D3R Acknowledgement GC4 pose prediction challenge. MD simulation: Description of the MD protocol used to refine The authors thank Acellera Ltd. for funding and the poses in D3R GC4 pose prediction challenge. the D3R organizers for their efforts. G.D.F. ac- Table S3: Mean RMSD obtained by the different knowledges support from MINECO (BIO2017- methods used to model macrocycles. Table S4: 82628-P) and FEDER. This project has received Mean RMSD (computed for macrocycle atoms funding from the European Union’s Horizon only) obtained by the different methods used to 2020 research and innovation programme un- model macrocycles. (pdf)

8 References

(1) Pinzi, L.; Rastelli, G. Molecular Docking: Shifting Paradigms in Drug Discovery. Int. J. Mol. Sci. 2019, 20, 4331.

(2) Trott, O.; Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010, 31, 455–61.

(3) Friesner, R. A.; Banks, J. L.; Murphy, R. B.; Halgren, T. A.; Klicic, J. J.; Mainz, D. T.; Repasky, M. P.; Knoll, E. H.; Shelley, M.; Perry, J. K.; Shaw, D. E.; Francis, P.; Shenkin, P. S. Glide: A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy. J. Med. Chem. 2004, 47, 1739–1749, PMID: 15027865.

(4) Jones, G.; Willett, P.; Glen, R. C.; Leach, A. R.; Taylor, R. Development and validation of a genetic algorithm for flexible docking 1 1Edited by F. E. Cohen. J. Mol. Biol. 1997, 267, 727–748.

(5) Ruiz-Carmona, S.; Alvarez-Garcia, D.; Foloppe, N.; Garmendia-Doval, A. B.; Juhos, S.; Schmidtke, P.; Barril, X.; Hubbard, R. E.; Morley, S. D. rDock: A Fast, Versatile and Open Source Program for Docking Ligands to Proteins and Nucleic Acids. PLoS Comput. Biol. 2014, 10, e1003571.

(6) Drwal, M. N.; Jacquemard, C.; Perez, C.; Desaphy, J.; Kellenberger, E.; Drwal, M. N.; Jacquemard, C.; Perez, C.; Desaphy, J.; Kellenberger, E.; Drwal, M. N.; Jacque- mard, C.; Perez, C.; Desaphy, J.; Kellenberger, E. Do Fragments and Crystallization Additives Bind Similarly to Drug-like Ligands? J. Chem. Inf. Model. 2017, 57, 1197– 1209.

(7) Malhotra, S.; Karanicolas, J. When Does Chemical Elaboration Induce a Ligand To Change Its Binding Mode? J. Med. Chem. 2017, 60, 128–145.

(8) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242.

(9) Lamoree, B.; Hubbard, R. E. Current perspectives in fragment-based lead discovery (FBLD). Essays Biochem. 2017, 61, 453–464.

9 (10) Fradera, X.; Mestres, J. Guided Docking Approaches to Structure-Based Design and Screening. Curr. Top. Med. Chem. (Trivandrum, India) 2004, 4, 687–700.

(11) Kawabata, T.; Nakamura, H. 3D Flexible Alignment Using 2D Maximum Common Substructure: Dependence of Prediction Accuracy on Target-Reference Chemical Sim- ilarity. J. Chem. Inf. Model. 2014, 54, 1850–1863.

(12) Hu, Y.; Stumpfe, D.; Bajorath, J. Recent Advances in Scaffold Hopping. 2017.

(13) Wang, R.; Fang, X.; Lu, Y.; Yang, C.-Y.; Wang, S. The PDBbind Database: Method- ologies and Updates. J. Med. Chem. 2005, 48, 4111–4119.

(14) Landrum, G. RDKit: A software suite for , computational chemistry, and predictive modeling,. http://www.rdkit.org, accessed 1 April 2020.

(15) Cross, J. B.; Thompson, D. C.; Rai, B. K.; Baber, J. C.; Fan, K. Y.; Hu, Y.; Hum- blet, C. Comparison of Several Molecular Docking Programs: Pose Prediction and Accuracy. J. Chem. Inf. Model. 2009, 49, 1455–1474.

(16) Gaieb, Z.; Parks, C. D.; Chiu, M.; Yang, H.; Shao, C.; Walters, W. P.; Lam- bert, M. H.; Nevins, N.; Bembenek, S. D.; Ameriks, M. K.; Mirzadegan, T.; Bur- ley, S. K.; Amaro, R. E.; Gilson, M. K. D3R Grand Challenge 3: blind prediction of proteinligand poses and affinity rankings. J. Comput.-Aided Mol. Des. 2019, 33, 1–18.

(17) Allen, S. E.; Dokholyan, N. V.; Bowers, A. A. Dynamic Docking of Conformationally Constrained Macrocycles: Methods and Applications. ACS Chem. Biol. 2016, 11, 10– 24.

10