Ligand Binding Site Superposition and Comparison Based on Atomic
Total Page:16
File Type:pdf, Size:1020Kb
Totrov BMC Bioinformatics 2011, 12(Suppl 1):S35 http://www.biomedcentral.com/1471-2105/12/S1/S35 RESEARCH Open Access Ligand binding site superposition and comparison based on Atomic Property Fields: identification of distant homologues, convergent evolution and PDB-wide clustering of binding sites Maxim Totrov From The Ninth Asia Pacific Bioinformatics Conference (APBC 2011) Inchon, Korea. 11-14 January 2011 Abstract A new binding site comparison algorithm using optimal superposition of the continuous pharmacophoric property distributions is reported. The method demonstrates high sensitivity in discovering both, distantly homologous and convergent binding sites. Good quality of superposition is also observed on multiple examples. Using the new approach, a measure of site similarity is derived and applied to clustering of ligand binding pockets in PDB. Background are constructed from these points and used for target site Experimental structural biology efforts are uncovering matching. PocketMatch [3] is an algorithm for compari- protein structures at unprecedented rate. There is a son of binding sites in a frame-invariant manner, based need to understand relationships and discover similari- on representation of the sites by sorted lists of distances ties between the solved structures. While fold compari- capturing shape and chemical nature of the site. Lists are sons are routinely performed to identify homologies that compared using a special alignment algorithm and are at or beyond the limit of the sequence comparison PMScore function. IsoCleft [4] detects 3D atomic simila- methods, some functional relationships can only be rities between binding sties using a graph-matching detected at the level of binding sites. Ultimately, it is the method. Protein functional surfaces [5] methodology configuration of these sites rather than overall sequence attempts to optimize global shape and local physico- or fold, that determine enzymatic or signal transduction chemical ‘texture’ match between a pair of surfaces using activity of a protein. object recognition techniques. Often, search algorithm is Most existing methods for binding site comparison are combined with a specially compiled database of binding based on some form of coarse-grain representation of the sites, for example CPASS database comprises ligand- geometry and properties of the pocket as a set of points defined binding sites found in the protein data bank or centers. Using a variety of algorithms, correspondence (PDB) and CPASS algorithm compares these ligand between the two sets is established. FLAP [1] algorithm defined sites to determine similarity without maintaining first generates GRID [2] molecular interaction fields, sequence connectivity [6]. Similarly, SURFACE is a data- which are used to detect locations where interactions of base of protein surface regions, with finctional surface chemical groups with particular pharmacophoric features patches defined by sets of residues, and searches per- would be most favorable. Four-point pharmacophores formed by matching the residue sets [7]. CavBase is a dataset of cavities extracted from PDB and searcheable Correspondence: [email protected] using an algorithm that matches pseudocenters Molsoft LLC,3366 N Torrey Pines Ct, La Jolla, CA 92037, USA © 2011 Totrov; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Totrov BMC Bioinformatics 2011, 12(Suppl 1):S35 Page 2 of 9 http://www.biomedcentral.com/1471-2105/12/S1/S35 analogous to pharmacophoric points [8]. The Superim- pocket similarities, ‘fuzzy’ matching may be needed posé webserver [9] implements several superposition and because some of the discrete features may disappear, comparison methods in an on-line format and allows appear or change. These issues can be partially over- detection of similarities between binding sites or entire come by increasing the number of representative proteins. A searchable database for comparing protein- points and allowing partial matches. ligand binding sites for the analysis of structure-function Ultimately, it might be a better solution to use continu- relationships has been reported [10], including compari- ous representation instead of discreet points. In the son method based on geometric hashing, which identifies related field of ligand or small molecule superposition, a maximum common sub-graph of atomic features [11]. method using continuous pharmacophoric Atomic Prop- Med-SuMo rapidly compares protein surfaces repre- erty Fields (APF) has been recently proposed [18]. The sented by triplets of chemical groups [12]. Standard 3-, 4- method represents 7 atomic properties (hydrogen bond and 5-point pharmacophores extracted from binding donor, hydrogen bond acceptor, lipophilicity, size, elec- pockets identified by icmPocketFinder [13] across human tronegativity, charge, aromaticity/hybridization) as con- PDB protein structures were used create a virtual library tinuous potentials projected in 3D space from atom of sites in human pocketome, and querying the library centers using Gaussian functions (Fig. 1). Each atom type with a pharmacophore of methyl-lysine binding site, is characterized by a distinct 7-component vector of interesting non-trivial hits were retrieved [14]. Of note, properties, and a pseudo-energy reflecting similarity of another perspective on the pocket comparison problem, 3D property distributions can be calculated for two or which is to detect principal differences between related more molecules. By optimizing the APF pseudo-energy, sites, was taken by several groups [15-17]. optimal superpositions of multiple ligands that bring Discretized representation of the continuous pocket together identical or similar atoms and separating dissim- surface by amino-acid residues, chemical groups, ilar ones can be identified. The approach was successfully pharmacophoric points or similar descriptors, allows tested in pairwise flexible ligand superposition and multi- very rapid comparison but may not be always ade- ple chemical alignment. In a recent independent study, quate to capture distant similarities. Pharmacophoric APF performance was compared to other small molecule points are well-suited to represent highly localized superposition techniques and the method demonstrated interaction centers, such as hydrogen bond donors best superposition accuracy across a large benchmark and acceptors. Hydrophobic interactions and shape [19]. Promising results were also obtained in the applica- complementarity on the other hand are continuously tion of APF potential to ligand-based virtual screening distributed properties that lend themselves poorly to and 3D QSAR [18]. An optimized measure of chemical point representation. Moreover, to detect distant similarity based on APF was derived [20]. Figure 1 Seven components of the APF potential. Reproduced from ref. [18]. Totrov BMC Bioinformatics 2011, 12(Suppl 1):S35 Page 3 of 9 http://www.biomedcentral.com/1471-2105/12/S1/S35 In the present work, the APF approach is adapted to after 10,000 energy evaluations. The entire atomic prop- the problem of binding site/pocket superposition. The erty field binding site superposition (APF BSS) protocol resulting pocket superposition method is tested on mul- is implemented as a script in ICM [24,25]. Schematic tiple distantly similar pocket examples. The method also outline of the protocol is represented on Figure 2. produces a score characterizing the degree of similarity of the pockets. The utility of the APF site superposition Distance matrix calculation and clustering as a site comparison method is evaluated by calculating APF pseudo-energy or score EAPF for the optimal super- a complete distance matrix for the set of over 5000 position reflects the similarity of the atomic property binding sites in scPDB binding site database [21]. distributions of the two binding pockets. It can be used Finally, clustering of this available slice of the pocke- directly for ranking of the database binding sites by tome is performed. their similarity to a query. However, for some other applications such as clustering, it is necessary to derive Methods a similarity measure that behaves distance-like, rather Adaptation of the APF ligand superposition method to then ranking score-like. In particular, for a pair of non- binding site superposition identical sites it has to be a positive value that increases The original APF ligand superposition protocol consists as they become more dissimilar and becomes zero for of (I) generation of grids with 7 APF potential compo- identical pairs. On the other hand, EAPF is always nega- nents from the template (static) ligand and (II) optimi- tive, and the value for identical sites varies depending zation of the target ligand in the grid APF potentials on the size and composition of the site. To convert EAPF combined with internal force-field energy of the ligand. to a normalized dot product-like measure with a correct Monte-Carlo with gradient minimization after each ran- asymptotic behavior, we used the following formula: dom step is used as a global energy optimizer. Six vari- tanh Δ ables controlling overall position of the ligand as well as SAPF = ((EAPF-E0)/ 0), torsions around rotatable bonds are optimized. where E0 and Δ0 are empiric parameters. Next, dis- Here, the binding