rDock
Reference Guide
rDock Development Team
August 27, 2015
Contents
12345
- Preface
- 4
444
Acknowledgements Introduction Configuration Cavity mapping
5.1 Two sphere method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Reference ligand method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
68
- 6
- Scoring function reference
- 11
6.1 Component Scoring Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.1.1 van der Waals potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.1.2 Empirical attractive and repulsive polar potentials . . . . . . . . . . . . . . . . . . 11 6.1.3 Solvation potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 6.1.4 Dihedral potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.2 Intermolecular scoring functions under evaluation . . . . . . . . . . . . . . . . . . . . . . . 13
6.2.1 Training sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.2.2 Scoring Functions Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.2.3 Scoring Functions Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.3 Code Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
- 7
- Docking protocol
- 17
7.1 Protocol Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
7.1.1 Pose Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 7.1.2 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 7.1.3 Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 7.1.4 Simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7.2 Code Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 7.3 Standard rDock docking protocol (dock.prm) . . . . . . . . . . . . . . . . . . . . . . . . . 18
- 8
- System definition file reference
- 22
8.1 Receptor definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 8.2 Ligand definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 8.3 Solvent definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 8.4 Cavity mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 8.5 Cavity restraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 8.6 Pharmacophore restraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 8.7 NMR restraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 8.8 Example system definition files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
- 9
- Molecular files and atoms typing
- 30
9.1 Atomic properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 9.2 Difference between formal charge and distributed formal charge . . . . . . . . . . . . . . . 30 9.3 Parsing a MOL2 file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 9.4 Parsing an SD file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 9.5 Assigning distributed formal charges to the receptor . . . . . . . . . . . . . . . . . . . . . 31
- 10 rDock file formats
- 32
10.1 .prm file format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 10.2 Water PDB file format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 10.3 Pharmacophore restraints file format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2
- 11 rDock programs
- 35
11.1 Programs reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
11.1.1 rbdock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 11.1.2 rbcavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 11.1.3 rbcalcgrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 11.1.4 make grid.csh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 11.1.5 rbmoegrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 11.1.6 sdrmsd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 11.1.7 sdtether . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 11.1.8 sdfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 11.1.9 sdreport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 11.1.10sdsplit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 11.1.11sdsort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 11.1.12sdmodify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 11.1.13rbhtfinder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 11.1.14rblist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
- 12 Common Use cases
- 42
12.1 Standard docking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
12.1.1 Standard docking workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
12.2 Tethered scaffold docking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
12.2.1 Example ligand definition for tethered scaffold . . . . . . . . . . . . . . . . . . . . 43
12.3 Docking with pharmacophore restraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 12.4 Docking with explicit waters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
- 13 Appendix
- 45
3
1 Preface
It is intended to develop this document into a full reference guide for the rDock platform, describing the software tools, parameter files, scoring functions, and search engines. The reader is encouraged to crossreference the descriptions with the corresponding source code files to discover the finer implementation details.
2 Acknowledgements
Third-party source code. Two third-party C++ libraries are included within the rDock source code, to provide support for specific numerical calculations. The source code for each library can be distributed freely without licensing restrictions and we are grateful to the respective authors for their contributions.
• Nelder-Meads Simplex search, from Prof. Virginia Torczon’s group, College of William and Mary,
Department of Computer Science, VA. (http://www.cs.wm.edu/va/software/)
• Template Numerical Toolkit, from Roldan Pozo, Mathematical and Computational Sciences Division, National Institute of Standards and Technology (http://math.nist.gov/tnt/)
3 Introduction
The rDock platform is a suite of command-line tools for high-throughput docking and virtual screening. The programs and methods were developed and validated from 1998 to 2002 at RiboTargets (more recently, Vernalis) for proprietary use. The original program (RiboDock) was designed for high-throughput virtual screening of large ligand libraries against RNA targets, in particular the bacterial ribosome. Since 2002 the programs have been substantially rewritten and validated for docking of drug-like ligands to protein and RNA targets. A variety of experimental restraints can be incorporated into the docking calculation, in support of an integrated Structure-Based Drug Design process. In 2006, the software was licensed to the University of York for maintenance and distribution and, in 2012, Vernalis and the University of York agreed to release the program as Open Source software. rDock is licensed under GNU-LGPL version 3.0 with support from the University of Barcelona - rdock.sourceforge.net.
4 Configuration
Before launching rDock, make sure the following environment variables are defined. Precise details are likely to be site-specific.
• RBT ROOT environment variable: should be defined to point to the rDock installation directory.
• RBT HOME environment variable: is optional, but can be defined to to point to a user project directory containing rDock input files and customised data files.
• PATH environment variable: $RBT ROOT/bin should be added to the $PATH environment variable.
• LD LIBRARY PATH. $RBT ROOT/lib should be added to the $LD LIBRARY PATH environment variable.
Input file locations. The search path for the majority of input files for rDock is:
• Current working directory • $RBT HOME, if defined • The appropriate subdirectory of $RBT ROOT/data/. For example, the default location for scoring function files is $RBT ROOT/data/sf/.
4
The exception is that input ligand SD files are always specified as an absolute path. If you wish to customise a scoring function or docking protocol, it is sufficient to copy the relevant file to the current working directory or to $RBT HOME, and to modify the copied file.
Launching rDock. For small scale experimentation, the rDock executables can be launched directly from the command line. However, serious virtual screening campaigns will likely need access to a compute farm. In common with other docking tools, rDock uses the embarrassingly parallel approach to distributed computing. Large ligand libraries are split into smaller chunks, each of which is docked independently on a single machine. Docking jobs are controlled by a distributed resource manager (DRM) such as Condor or SGE.
5
5 Cavity mapping
Virtual screening is very rarely conducted against entire macromolecules. The usual practice is to dock small molecules in a much more confined region of interest. rDock makes a clear distinction between the region the ligand is allowed to explore (known here as the docking site), and the receptor atoms that need to be included in order to calculate the score correctly. The former is controlled by the cavity mapping algorithm, whilst the latter is scoring function dependent as it depends on the distance range of each component term (for example, vdW range >> polar range). For this reason, it is usual practice with rDock to prepare intact receptor files (rather than truncated spheres around the region of interest), and to allow each scoring function term to isolate the relevant receptor atoms within range of the docking site. rDock provides two methods for defining the docking site:
• Two sphere method • Reference ligand method
Note All the keywords found in capital letters in following cavity mapping methods explanation (e.g. RADIUS), make reference to the parameters defined in prm rDock configuration file. For more information, go to section 8.4 - Cavity mapping on page 25.
5.1 Two sphere method
The two sphere method aims to find cavities that are accessible to a small sphere (of typical atomic or solvent radius) but are inaccessible to a larger sphere. The larger sphere probe will eliminate flat and convex regions of the receptor surface, and also shallow cavities. The regions that remain and are accessible to the small sphere are likely to be the nice well defined cavities of interest for drug design purposes.
1. A grid is placed over the cavity mapping region, encompassing a sphere of radius=RADIUS, center=CENTER. Cavity mapping is restricted to this sphere. All cavities located will be wholly within this sphere. Any cavity that would otherwise protrude beyond the cavity mapping sphere will be truncated at the periphery of the sphere.
6
2. Grid points within the volume occupied by the receptor are excluded (coloured red). The radii of the receptor atoms are increased temporarily by VOL INCR in this step.
3. Probes of radii LARGE SPHERE are placed on each remaining unallocated grid point and checked for clashes with receptor excluded volume. To eliminate edge effects, the grid is extended beyond the cavity mapping region by the diameter of the large sphere (for this step only). This allows the large probe to be placed on grid points outside of the cavity mapping region, yet partially protrude into the cavity mapping region.
4. All grid points within the cavity mapping region that are accessible to the large probe are excluded
(coloured green).
7
5. Probes of radii SMALL SPHERE are placed on each remaining grid point and checked for clashes with receptor excluded volume (red) or large probe excluded volume (green)
6. All grid points that are accessible to the small probe are selected (yellow). 7. The final selection of cavity grid points is divided into distinct cavities (contiguous regions).
In this example only one distinct cavity is found. User-defined filters of MIN VOLUME and MAX CAVITIES are applied at this stage to select a subset of cavities if required. Note that the filters will accept or reject intact cavities only.
5.2 Reference ligand method
The reference ligand method provides a much easier option to define a docking volume of a given size around the binding mode of a known ligand, and is particularly appropriate for large scale automated validation experiments.
8
1. Reference coordinates are read from REF MOL. A grid is placed over the cavity mapping region, encompassing overlapping spheres of radius=RADIUS, centered on each atom in REF MOL. Grid points outside of the overlapping spheres are excluded (coloured green). Grid points within the volume occupied by the receptor are excluded (coloured red). The vdW radii of the receptor atoms are increased by VOL INCR in this step.
2. Probes of radii SMALL SPHERE are placed on each remaining grid point and checked for clashes with red or green regions.
3. All grid points that are accessible to the small probe are selected (yellow).
9
4. The final selection of cavity grid points is divided into distinct cavities (contiguous regions).
In this example only one distinct cavity is found. User-defined filters of MIN VOLUME and MAX CAVITIES are applied at this stage to select a subset of cavities if required. Note that the filters will accept or reject intact cavities only.
10
6 Scoring function reference
6.1 Component Scoring Functions
The rDock master scoring function (Stotal) is a weighted sum of intermolecular (Sinter), ligand intramolecular (Sintra), site intramolecular (Ssite), and external restraint terms (Srestraint) (Equation 1). Sinter is the main term of interest as it represents the protein-ligand (or RNA-ligand) interaction score (Equation 2). Sintra represents the relative energy of the ligand conformation (Equation 3). Similarly, Ssite represents the relative energy of the flexible regions of the active site (Equation 4). In the current implementation, the only flexible bonds in the active site are terminal OH and NH3+ bonds. Srestraint is a collection of non-physical restraint functions that can be used to bias the docking calculation in several useful ways (Equation 5).
Stotal = Sinter + Sintra + Ssite + Srestraint
(1)
inter vdw inter vdw
- inter
- inter
polar inter repul inter repul inter arom inter arom
Sinter
=+===
W
· S
+ W
polar
· S
+ W
· S
+ W
· S
+ Wsolv · Ssolv
+
Wrot · Nrot + Wconst
(2) (3) (4) (5)
intra vdw site
- intra
- intra
- intra
- intra
repul
- intra
- intra
dihedral intra dihedral
Sintra Ssite
Srestraint
WW
· S
+ W
· S
+ W
site
+ W
· S
+ W
site
+ W
· S
dihedral vdw site vdw polar site
- polar
- repul
- site
- site
- site
· S
+ W
- · S
- · S
- · S
- vdw
- polar
- polar
- repul
- repul
- dihedral
Wcavity · Scavity + Wtether · Stether + Wnmr · Snmr + Wph4 · Sph4
Sinter, Sintra, and Ssite are built from a common set of constituent potentials, which are described below. The main changes to the original RiboDock scoring function [ref] are:
i the replacement of the crude steric potentials (Slipo and Srep) with a true van der Waals potential,
SvdW
ii the introduction of two generalised terms for all short range attractive (Spolar) and repulsive (Srepul polar interactions
)iii the implementation of a fast weighed solvent accessible surface area (WSAS) solvation term iv the addition of explicit dihedral potentials
6.1.1 van der Waals potential
We have replaced the Slipo and Srep empirical potentials used in RiboDock with a true vdW potential similar to that used by GOLD [ref]. Atom types and vdW radii were taken from the Tripos 5.2 force field and are listed in the Appendix Section (page 45, table 13.1). Energy well depths are switchable between the original Tripos 5.2 values and those used by GOLD, which are calculated from the atomic polarisability and ionisation potentials of the atom types involved. Additional atom types were created for carbons with implicit hydrogens, as the Tripos force field uses an all-atom representation. vdW radii
˚for implicit hydrogen types are increased by 0.1 A for each implicit hydrogen, with well depths unchanged.
The functional form is switchable between a softer 4-8 and a harder 6-12 potential. A quadratic potential is used at close range to prevent excessive energy penalties for atomic clashes. The potential is truncated at longer range (1.5 · rmin, the sum of the vdW radii).
The quadratic potential is used at repulsive energies between ecutoff and e0, where ecutoff is defined as a multiple of the energy well depth (ecutoff = ECUT * emin), and e0 is the energy at zero separation, defined as a multiple of ecutoff (e0 = E0 * ecutoff ). ECUT can vary between 1 and 120 during the docking search (see below)[ref to ga section], whereas E0 is fixed at 1.5.
6.1.2 Empirical attractive and repulsive polar potentials
We continue to use an empirical Bohm-like potential to score hydrogen-bonding and other short-range polar interactions. The original RiboDock polar terms (SH−bond, SposC−acc, Sacc−acc, Sdon−don) are generalised and condensed into two scoring functions, Spolar and Srepul (Equations 6 and 7, also taking into account equations 8 to 13), which deal with attractive and repulsive interactions respectively. Six types of polar interaction centres are considered: hydrogen bond donors (DON), metal ions (M+), positively charged carbons (C+, as found at the centre of guanidinium, amidinium and imidazole groups), hydrogen
11 bond acceptors with pronounced lone pair directionality (ACC LP), acceptors with in-plane preference but limited lone-pair directionality (ACC PLANE), and all remaining acceptors (ACC). The ACC LP type is used for carboxylate oxygens and Osp2 atoms in RNA bases, with ACC PLANE used for other Osp2 acceptors. This distinction between acceptor types was not made in RiboDock, in which all acceptors were implicitly of type ACC.