A Graph-Based Approach to Construct Target-Focused Libraries for Virtual

Naderi et al. J Cheminform (2016) 8:14 DOI 10.1186/s13321-016-0126-6 SOFTWARE Open Access A graph‑based approach to construct target‑focused libraries for virtual screening Misagh Naderi1†, Chris Alvin2†, Yun Ding3, Supratik Mukhopadhyay4 and Michal Brylinski1,5* Abstract Background: Due to exorbitant costs of high-throughput screening, many drug discovery projects commonly employ inexpensive virtual screening to support experimental efforts. However, the vast majority of compounds in widely used screening libraries, such as the ZINC database, will have a very low probability to exhibit the desired bio- activity for a given protein. Although combinatorial chemistry methods can be used to augment existing compound libraries with novel drug-like compounds, the broad chemical space is often too large to be explored. Consequently, the trend in library design has shifted to produce screening collections specifically tailored to modulate the function of a particular target or a protein family. Methods: Assuming that organic compounds are composed of sets of rigid fragments connected by flexible link- ers, a molecule can be decomposed into its building blocks tracking their atomic connectivity. On this account, we developed eSynth, an exhaustive graph-based search algorithm to computationally synthesize new compounds by reconnecting these building blocks following their connectivity patterns. Results: We conducted a series of benchmarking calculations against the Directory of Useful Decoys, Enhanced database. First, in a self-benchmarking test, the correctness of the algorithm is validated with the objective to recover a molecule from its building blocks. Encouragingly, eSynth can efficiently rebuild more than 80 % of active molecules from their fragment components. Next, the capability to discover novel scaffolds is assessed in a cross-benchmarking test, where eSynth successfully reconstructed 40 % of the target molecules using fragments extracted from chemically distinct compounds. Despite an enormous chemical space to be explored, eSynth is computationally efficient; half of the molecules are rebuilt in less than a second, whereas 90 % take only about a minute to be generated. Conclusions: eSynth can successfully reconstruct chemically feasible molecules from molecular fragments. Further- more, in a procedure mimicking the real application, where one expects to discover novel compounds based on a small set of already developed bioactives, eSynth is capable of generating diverse collections of molecules with the desired activity profiles. Thus, we are very optimistic that our effort will contribute to targeted drug discovery. eSynth is freely available to the academic community at www.brylinski.org/content/molecular-synthesis. Keywords: Molecular synthesis, Virtual screening, Target-focused libraries, eSynth, Chemical space, Targeted drug discovery Background particular, virtual screening, a technique that shows great Due to extreme costs of high-throughput screening, promise for lead discovery, has become an integral part many drug discovery projects commonly employ inex- of modern drug design pipelines [1, 2]. Here, the idea is pensive computations to support experimental efforts. In to considerably reduce the number of candidate compounds that need to be tested experimentally against a protein target of interest. Due to advances in computer *Correspondence: [email protected] †Misagh Naderi and Chris Alvin contributed equally to this work technology resulting in constantly increasing compu- 1 Department of Biological Sciences, Louisiana State University, tational power, virtual libraries comprising many thou- Baton Rouge, LA, USA sands of compounds can be rapidly evaluated in silico Full list of author information is available at the end of the article © 2016 Naderi et al. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Naderi et al. J Cheminform (2016) 8:14 Page 2 of 16 prior to experimental screens and at a fraction of the compounds and 41 kinases from five different fami- cost. Virtual screening approaches, historically divided lies, demonstrating a 6.7-fold higher overall hit enrich- into ligand- and structure-based algorithms, prioritize ment compared to a generic compound collection [14]. drug candidates by estimating the probability of binding Furthermore, a structure-based modeling was used to to the target receptor [3]. Among many methods devel- create a small, focused library against Chlamydophila oped to date, docking-based techniques are valuable pneumoniae, a common pathogen recently linked to ath- tools for lead identification [4]. These algorithms rank erosclerosis and the risk of myocardial infarction [15]. compounds by modeling the binding pose of a query Encouragingly, the experimentally determined hit rate molecule in the binding pocket of the target protein, fol- for the targeted library was 24.2 %, which is considerably lowed by the prediction of binding affinity from molecu- higher than that expected for a generic library. Similar lar interactions. There are many examples of successful to structure-based approaches, ligand-based techniques applications of virtual screening to develop compounds can also be used in the focused library design, as shown with desired bioactivities [5, 6]. for the GPCR family [10]. Compared to large and diverse Computer-aided drug discovery traditionally utilizes screening libraries, using relatively small and targeted large compound libraries. For example, the ZINC database collections significantly improves the odds of finding is one of the most comprehensive repositories of com- potential drug candidates, thus further reduces the costs mercially available compounds for virtual screening [7]. of drug discovery. It currently features over 35 million compounds in ready- Target-focused libraries are either designed or assem- to-dock formats. These large generic collections of low bled upon some understanding of a specific protein molecular weight organic compounds provide a sufficient target or a protein family. These collections are often diversity to perform virtual screening against any molecu- compiled from larger, more diverse libraries using either lar target, however, the vast majority of compounds will molecular docking (structure-based approach) or ligand have a very low probability to exhibit the desired bioactiv- fingerprint similarity (ligand-based approach). The for- ity for a specific target protein. Furthermore, considering mer employs structural, sequence and mutagenesis data, the imperfections of compound ranking by virtual screen- whereas the latter is based on the biomolecular proper- ing algorithms [8], the top-ranked subset of compound ties derived from known ligands, offering a useful way of library may contain a very few active molecules. Conse- “scaffold hopping” from one ligand class to another [16]. quently, the chances to identify novel, high-quality leads Target-focused libraries are often constructed around a from large compound repositories are low. For instance, an single scaffold with one or more positions used to attach internal analysis of the Abbott compound collection sug- various chemical moieties or side chains. Although this gested that <4 % of the compounds in their library have approach can result in millions of different compounds the potential to yield novel kinase hinge-binders [9]. To [17], the chemical space remains largely unexplored, address these issues, there has been a significant effort to therefore, truly novel compounds will not be discovered. augment existing collections with those compounds hav- On the other hand, combinatorial chemistry methods ing a higher potential to bind to a specific target of inter- can produce a vast collection of divers compounds, so est. On that account, the trend in library design has shifted vast that only a tiny fraction of them could be explored, to include a strong focus on the target class in addition to even using supercomputers. One can hardly imagine diversity and drug-likeness criteria [10]. screening the chemical universe containing from 1012 to A target-focused library is a screening collection of 10180 drug-like compounds [18]. Therefore, techniques compounds specifically tailored to modulate the func- to design chemical libraries covering pharmacologically tion of a particular target or a protein family [11, 12]. relevant regions are needed [19]. These methods hold There are a variety of approaches developed to date to a promise to advance our knowledge of biological pro- design target-specific focused libraries against, e.g. pro- cesses leading to new strategies to treat diseases. tein kinases, ion channels, G-protein coupled receptors Compiling focused libraries by molecular synthesis is (GPCRs), nuclear receptors, and protein–protein inter- essentially a combinatorial problem that can be addressed faces. These libraries not only reduce waste by elimi- using graph theory. These techniques have been already nating compounds that are unlikely to bind to

A Graph-Based Approach to Construct Target-Focused Libraries for Virtual

Structure-Based Virtual Screening of Hypothetical Inhibitors of the Enzyme Longiborneol Synthase—A Potential Target to Reduce Fusarium Head Blight Disease

Virtual Screening of the Inhibitors Targeting at the Viral Protein 40 of Ebola Virus V

Report on an NIH Workshop on Ultralarge Chemistry Databases Wendy A

Qsar Methods Development, Virtual and Experimental Screening for Cannabinoid Ligand Discovery

Fast Three Dimensional Pharmacophore Virtual Screening of New Potent Non-Steroid Aromatase Inhibitors

Bigger Is Better in Virtual Drug Screens

Autodock Vina Manual

Structure Based Pharmacophore Modeling, Virtual Screening

Virtual Screening in Drug Design – Overview of Most Frequent Techniques

Virtual Screening, Docking, ADMET and System Pharmacology Studies

Virtual Screening of One Billion Compound Libraries Using Novel

Cheminformatics in Drug Discovery, an Industrial Perspective