Rdock Reference Guide Rdock Development Team August 27, 2015 Contents

rDock
Reference Guide

rDock Development Team
August 27, 2015

Contents

12345

Preface

4

444
Acknowledgements Introduction Conﬁguration Cavity mapping

5.1 Two sphere method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Reference ligand method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

68

6

Scoring function reference

11

6.1 Component Scoring Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.1.1 van der Waals potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.1.2 Empirical attractive and repulsive polar potentials . . . . . . . . . . . . . . . . . . 11 6.1.3 Solvation potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 6.1.4 Dihedral potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.2 Intermolecular scoring functions under evaluation . . . . . . . . . . . . . . . . . . . . . . . 13
6.2.1 Training sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.2.2 Scoring Functions Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.2.3 Scoring Functions Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.3 Code Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

7

Docking protocol

17

7.1 Protocol Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.1.1 Pose Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 7.1.2 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 7.1.3 Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 7.1.4 Simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7.2 Code Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 7.3 Standard rDock docking protocol (dock.prm) . . . . . . . . . . . . . . . . . . . . . . . . . 18

8

System deﬁnition ﬁle reference

22

8.1 Receptor definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 8.2 Ligand definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 8.3 Solvent definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 8.4 Cavity mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 8.5 Cavity restraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 8.6 Pharmacophore restraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 8.7 NMR restraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 8.8 Example system definition files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

9

Molecular ﬁles and atoms typing

30

9.1 Atomic properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 9.2 Difference between formal charge and distributed formal charge . . . . . . . . . . . . . . . 30 9.3 Parsing a MOL2 file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 9.4 Parsing an SD file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 9.5 Assigning distributed formal charges to the receptor . . . . . . . . . . . . . . . . . . . . . 31

10 rDock ﬁle formats

32

10.1 .prm file format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 10.2 Water PDB file format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 10.3 Pharmacophore restraints file format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2

11 rDock programs

35

11.1 Programs reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
11.1.1 rbdock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 11.1.2 rbcavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 11.1.3 rbcalcgrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 11.1.4 make grid.csh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 11.1.5 rbmoegrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 11.1.6 sdrmsd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 11.1.7 sdtether . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 11.1.8 sdﬁlter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 11.1.9 sdreport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 11.1.10sdsplit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 11.1.11sdsort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 11.1.12sdmodify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 11.1.13rbhtﬁnder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 11.1.14rblist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

12 Common Use cases

42

12.1 Standard docking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
12.1.1 Standard docking workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
12.2 Tethered scaffold docking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
12.2.1 Example ligand definition for tethered scaffold . . . . . . . . . . . . . . . . . . . . 43
12.3 Docking with pharmacophore restraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 12.4 Docking with explicit waters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

13 Appendix

45

3

1 Preface

It is intended to develop this document into a full reference guide for the rDock platform, describing the software tools, parameter files, scoring functions, and search engines. The reader is encouraged to crossreference the descriptions with the corresponding source code files to discover the finer implementation details.

2 Acknowledgements

Third-party source code. Two third-party C++ libraries are included within the rDock source code, to provide support for speciﬁc numerical calculations. The source code for each library can be distributed freely without licensing restrictions and we are grateful to the respective authors for their contributions.

• Nelder-Meads Simplex search, from Prof. Virginia Torczon’s group, College of William and Mary,
Department of Computer Science, VA. (http://www.cs.wm.edu/va/software/)

• Template Numerical Toolkit, from Roldan Pozo, Mathematical and Computational Sciences Division, National Institute of Standards and Technology (http://math.nist.gov/tnt/)

3 Introduction

The rDock platform is a suite of command-line tools for high-throughput docking and virtual screening. The programs and methods were developed and validated from 1998 to 2002 at RiboTargets (more recently, Vernalis) for proprietary use. The original program (RiboDock) was designed for high-throughput virtual screening of large ligand libraries against RNA targets, in particular the bacterial ribosome. Since 2002 the programs have been substantially rewritten and validated for docking of drug-like ligands to protein and RNA targets. A variety of experimental restraints can be incorporated into the docking calculation, in support of an integrated Structure-Based Drug Design process. In 2006, the software was licensed to the University of York for maintenance and distribution and, in 2012, Vernalis and the University of York agreed to release the program as Open Source software. rDock is licensed under GNU-LGPL version 3.0 with support from the University of Barcelona - rdock.sourceforge.net.

4 Conﬁguration

Before launching rDock, make sure the following environment variables are deﬁned. Precise details are likely to be site-speciﬁc.

• RBT ROOT environment variable: should be deﬁned to point to the rDock installation directory.

• RBT HOME environment variable: is optional, but can be defined to to point to a user project directory containing rDock input files and customised data files.

• PATH environment variable: $RBT ROOT/bin should be added to the $PATH environment variable.

• LD LIBRARY PATH. $RBT ROOT/lib should be added to the $LD LIBRARY PATH environment variable.

Input file locations. The search path for the majority of input files for rDock is:
• Current working directory • $RBT HOME, if defined • The appropriate subdirectory of $RBT ROOT/data/. For example, the default location for scoring function files is $RBT ROOT/data/sf/.

4
The exception is that input ligand SD files are always specified as an absolute path. If you wish to customise a scoring function or docking protocol, it is sufficient to copy the relevant file to the current working directory or to $RBT HOME, and to modify the copied file.

Launching rDock. For small scale experimentation, the rDock executables can be launched directly from the command line. However, serious virtual screening campaigns will likely need access to a compute farm. In common with other docking tools, rDock uses the embarrassingly parallel approach to distributed computing. Large ligand libraries are split into smaller chunks, each of which is docked independently on a single machine. Docking jobs are controlled by a distributed resource manager (DRM) such as Condor or SGE.

5

5 Cavity mapping

Virtual screening is very rarely conducted against entire macromolecules. The usual practice is to dock small molecules in a much more confined region of interest. rDock makes a clear distinction between the region the ligand is allowed to explore (known here as the docking site), and the receptor atoms that need to be included in order to calculate the score correctly. The former is controlled by the cavity mapping algorithm, whilst the latter is scoring function dependent as it depends on the distance range of each component term (for example, vdW range >> polar range). For this reason, it is usual practice with rDock to prepare intact receptor files (rather than truncated spheres around the region of interest), and to allow each scoring function term to isolate the relevant receptor atoms within range of the docking site. rDock provides two methods for defining the docking site:

• Two sphere method • Reference ligand method

Note All the keywords found in capital letters in following cavity mapping methods explanation (e.g. RADIUS), make reference to the parameters defined in prm rDock configuration file. For more information, go to section 8.4 - Cavity mapping on page 25.

5.1 Two sphere method

The two sphere method aims to find cavities that are accessible to a small sphere (of typical atomic or solvent radius) but are inaccessible to a larger sphere. The larger sphere probe will eliminate flat and convex regions of the receptor surface, and also shallow cavities. The regions that remain and are accessible to the small sphere are likely to be the nice well defined cavities of interest for drug design purposes.
1. A grid is placed over the cavity mapping region, encompassing a sphere of radius=RADIUS, center=CENTER. Cavity mapping is restricted to this sphere. All cavities located will be wholly within this sphere. Any cavity that would otherwise protrude beyond the cavity mapping sphere will be truncated at the periphery of the sphere.

6
2. Grid points within the volume occupied by the receptor are excluded (coloured red). The radii of the receptor atoms are increased temporarily by VOL INCR in this step.

3. Probes of radii LARGE SPHERE are placed on each remaining unallocated grid point and checked for clashes with receptor excluded volume. To eliminate edge eﬀects, the grid is extended beyond the cavity mapping region by the diameter of the large sphere (for this step only). This allows the large probe to be placed on grid points outside of the cavity mapping region, yet partially protrude into the cavity mapping region.

4. All grid points within the cavity mapping region that are accessible to the large probe are excluded
(coloured green).

7
5. Probes of radii SMALL SPHERE are placed on each remaining grid point and checked for clashes with receptor excluded volume (red) or large probe excluded volume (green)

6. All grid points that are accessible to the small probe are selected (yellow). 7. The final selection of cavity grid points is divided into distinct cavities (contiguous regions).
In this example only one distinct cavity is found. User-defined filters of MIN VOLUME and MAX CAVITIES are applied at this stage to select a subset of cavities if required. Note that the filters will accept or reject intact cavities only.

5.2 Reference ligand method

The reference ligand method provides a much easier option to deﬁne a docking volume of a given size around the binding mode of a known ligand, and is particularly appropriate for large scale automated validation experiments.

8
1. Reference coordinates are read from REF MOL. A grid is placed over the cavity mapping region, encompassing overlapping spheres of radius=RADIUS, centered on each atom in REF MOL. Grid points outside of the overlapping spheres are excluded (coloured green). Grid points within the volume occupied by the receptor are excluded (coloured red). The vdW radii of the receptor atoms are increased by VOL INCR in this step.

2. Probes of radii SMALL SPHERE are placed on each remaining grid point and checked for clashes with red or green regions.

3. All grid points that are accessible to the small probe are selected (yellow).

9
4. The final selection of cavity grid points is divided into distinct cavities (contiguous regions).
In this example only one distinct cavity is found. User-defined filters of MIN VOLUME and MAX CAVITIES are applied at this stage to select a subset of cavities if required. Note that the filters will accept or reject intact cavities only.

10

6 Scoring function reference

6.1 Component Scoring Functions

The rDock master scoring function (S_total) is a weighted sum of intermolecular (S_inter), ligand intramolecular (S_intra), site intramolecular (S_site), and external restraint terms (S_restraint) (Equation 1). S_interis the main term of interest as it represents the protein-ligand (or RNA-ligand) interaction score (Equation 2). S_intrarepresents the relative energy of the ligand conformation (Equation 3). Similarly, S_siterepresents the relative energy of the ﬂexible regions of the active site (Equation 4). In the current implementation, the only ﬂexible bonds in the active site are terminal OH and NH3+ bonds. S_restraintis a collection of non-physical restraint functions that can be used to bias the docking calculation in several useful ways (Equation 5).

S^total= S^inter+ S^intra+ S^site+ S^restraint

(1)

inter vdw inter vdw

inter

polar inter repul inter repul inter arom inter arom

S^inter

=+===

W

· S

+ W

polar

· S

+ W

· S

+ W

· S

+ W_solv· S_solv

+

W_rot· N_rot+ W_const

(2) (3) (4) (5)

intra vdw site

intra

repul

intra

dihedral intra dihedral

S^intraS^site
S^restraint

WW

· S

+ W

· S

+ W

site

+ W

· S

+ W

site

+ W

· S

dihedral vdw site vdw polar site

polar

repul

site

· S

+ W

· S

vdw

polar

repul

dihedral

W_cavity· S_cavity+ W_tether· S_tether+ W_nmr· S_nmr+ W_ph4· S_ph4

S_inter, S_intra, and S_siteare built from a common set of constituent potentials, which are described below. The main changes to the original RiboDock scoring function [ref] are:

i the replacement of the crude steric potentials (S_lipoand S_rep) with a true van der Waals potential,

S_vdW

ii the introduction of two generalised terms for all short range attractive (S_polar) and repulsive (S_repulpolar interactions
)iii the implementation of a fast weighed solvent accessible surface area (WSAS) solvation term iv the addition of explicit dihedral potentials

6.1.1 van der Waals potential

We have replaced the S_lipoand S_repempirical potentials used in RiboDock with a true vdW potential similar to that used by GOLD [ref]. Atom types and vdW radii were taken from the Tripos 5.2 force ﬁeld and are listed in the Appendix Section (page 45, table 13.1). Energy well depths are switchable between the original Tripos 5.2 values and those used by GOLD, which are calculated from the atomic polarisability and ionisation potentials of the atom types involved. Additional atom types were created for carbons with implicit hydrogens, as the Tripos force ﬁeld uses an all-atom representation. vdW radii
˚for implicit hydrogen types are increased by 0.1 A for each implicit hydrogen, with well depths unchanged.

The functional form is switchable between a softer 4-8 and a harder 6-12 potential. A quadratic potential is used at close range to prevent excessive energy penalties for atomic clashes. The potential is truncated at longer range (1.5 · r_min, the sum of the vdW radii).
The quadratic potential is used at repulsive energies between e_cutoffand e₀, where e_cutoffis defined as a multiple of the energy well depth (e_cutoff= ECUT * e_min), and e₀is the energy at zero separation, defined as a multiple of e_cutoff(e₀= E0 * e_cutoff). ECUT can vary between 1 and 120 during the docking search (see below)[ref to ga section], whereas E0 is fixed at 1.5.

6.1.2 Empirical attractive and repulsive polar potentials

We continue to use an empirical Bohm-like potential to score hydrogen-bonding and other short-range polar interactions. The original RiboDock polar terms (S_H−bond, S_posC−acc, S_acc−acc, S_don−don) are generalised and condensed into two scoring functions, S_polarand S_repul(Equations 6 and 7, also taking into account equations 8 to 13), which deal with attractive and repulsive interactions respectively. Six types of polar interaction centres are considered: hydrogen bond donors (DON), metal ions (M+), positively charged carbons (C+, as found at the centre of guanidinium, amidinium and imidazole groups), hydrogen

11 bond acceptors with pronounced lone pair directionality (ACC LP), acceptors with in-plane preference but limited lone-pair directionality (ACC PLANE), and all remaining acceptors (ACC). The ACC LP type is used for carboxylate oxygens and O_sp2atoms in RNA bases, with ACC PLANE used for other Osp2 acceptors. This distinction between acceptor types was not made in RiboDock, in which all acceptors were implicitly of type ACC.