<<

Expert Opinion on Discovery

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/iedc20

Open access in silico tools to predict the ADMET profiling of drug candidates

Supratik Kar & Jerzy Leszczynski

To cite this article: Supratik Kar & Jerzy Leszczynski (2020): Open access in silico tools to predict the ADMET profiling of drug candidates, Expert Opinion on , DOI: 10.1080/17460441.2020.1798926 To link to this article: https://doi.org/10.1080/17460441.2020.1798926

Published online: 31 Jul 2020.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=iedc20 EXPERT OPINION ON DRUG DISCOVERY https://doi.org/10.1080/17460441.2020.1798926

REVIEW Open access in silico tools to predict the ADMET profiling of drug candidates Supratik Kar and Jerzy Leszczynski Interdisciplinary Center for Nanotoxicity, Department of Chemistry, Physics and Atmospheric Sciences, Jackson State University, Jackson, MS, USA

ABSTRACT ARTICLE HISTORY Introduction: We are in an era of bioinformatics and cheminformatics where we can predict data in the Received 30 March 2020 fields of medicine, the environment, engineering and public health. Approaches with open access in Accepted 17 July 2020 silico tools have revolutionized disease management due to early prediction of the absorption, dis­ KEYWORDS tribution, metabolism, excretion, and toxicity (ADMET) profiles of the chemically designed and eco- ADMET; drug; in silico; open friendly next-generation . access; prediction Areas covered: This review meticulously encompasses the fundamental functions of open access in silico prediction tools (webservers and standalone software) and advocates their use in drug discovery research for the safety and reliability of any candidate-drug. This review also aims to help support new researchers in the field of . Expert opinion: The choice of in silico tools is critically important for drug discovery and the accuracy of ADMET prediction. The accuracy largely depends on the types of dataset, the algorithm used, the quality of the model, the available endpoints for prediction, and user requirement. The key is to use multiple in silico tools for predictions and comparing the results, followed by the identification of the most probable prediction.

1. Introduction inhibition prediction is extremely crucial for drug toxicity and drug-drug interactions predictions in drug discovery. A shocking 90% attrition rate of drug candidates is reported Combining the efforts of molecular dynamics (MD) simulations by the during the transition from and in silico prediction for CYP inhibition has improved the preclinical trials to marketing surveillance trials or phase 4 safety of the drugs up to many folds [8]. clinical trials after spending an estimated US$ 2.6 billion for each new chemical entity (NCE) [1–3]. The US Food and Drug The idea about ADMET parameters for each drug is to have Administration (FDA) has reported that only 12 novel small a significant impact before entering preclinical trials to reduce drugs [2,4] and 59 NCE (comprised of 64% small the withdrawal of the drug from a certain stage of pre-clinical molecule drugs) were approved in the years 2016 and 2018, and clinical trials. Therefore, most of the pharmaceutical indus­ respectively. Undesirable of drugs due to impro­ tries rely heavily on the earlier evaluation through in silico per pharmacokinetic (PK) and pharmacodynamic (PD) proper­ prediction tools that comprises tools such as regression and ties, followed by toxicity, are the key reasons for the high classification-based approaches, machine learning (ML) meth­ failure rate of drug discoveries. A fine balance between drug- ods as well as artificial intelligence (AI) [9]. In many cases, the candidates (the drugs to be) and their ADMET (absorption, prediction tools can be the combinations of all tools with distribution, metabolism, elimination, and toxicity) profiling, integrated databases of both the existing and approved during the synthesis of drug , can help avoid late- drugs and the experimental drugs under screening. stage drug failure in the drug discovery process. Thus, the A wide range of tools are available, where a few are com­ earlier detection of PK/PD properties along with drug- mercial, and a few are open access ones (in majority cases likeness and ADMET profiling can save both money and online in silico tools). Such commercial tools as CASE ULTRA time, ensuring simultaneously the safety and stability of the [10], DEREK [11], META-PC [12], METEOR [13], PASS [14], GUSAR designed-drugs or candidate-drugs [5,6]. [15] etc. are available in the market for quick predictions The terms such as ‘drug-likeness,’ ‘PK/PD study,’ and through ADMET profiling. Under open access tools, online ‘ADMET profiling’ are addressed by researchers interchange­ servers such as ADMETlab [16], admetSAR [17], pkCSM [18], ably many times, but they are different from each other and SwissADME [19], etc. are quite popular among researchers for each has a crucial role to play in drug discovery [7]. The ideas a fast and money-saving prediction of ADMET, followed by about PK/PD, drug-likeness, lipophilicity, water solubility, and a rational synthesis of a probable drug molecule instead of toxicity are considered under the single term of ADMET profil­ blind synthesis of any drug-candidate. The open access tools/ ing in a broader sense. In the profiling of drug metabolism in silico tools and computer coding are breaking the old-age (under ADMET), human cytochrome P450 (CYP450) enzyme commercialization and monopoly of pharmaceutical

CONTACT Supratik Kar [email protected]; Jerzy Leszczynski [email protected] Interdisciplinary Center for Nanotoxicity, Department of Chemistry, Physics and Atmospheric Sciences, Jackson State University, Jackson, MS 39217, USA © 2020 Informa UK Limited, trading as Taylor & Francis Group 2 S. KAR AND J. LESZCZYNSKI

properties, drug-likeness score, ADMET properties, and ecotoxi­ Article highlights city endpoints in Table 1. Here we have only demonstrated the most common properties. However, the list can continue to ● Fine-tuning the balance in between drug-likeness and ADMET (absorption, distribution, metabolism, elimination, and toxicity), dur­ grow, depending on the nature of the drug and its target. ing the synthesis of drug molecules, can help avoid late-stage failure Major in silico tools implemented to model these properties of the candidate-drug in the process. (endpoints) are also reported in Table 1. ● In silico tools are capable of predicting the drug-likeness and ADMET profiling of drug candidates, even before their practical synthesis, helps minimize the costs involving synthesis, preclinical and clinical studies. ● Open access databases with integrated drug-likeness and ADMET 3. Application of in silico tools in drug discovery and profiling are the key-sources for in silico modeling and one of the primary resources for making in silico tools. ADMET ● The accuracy of ADMET prediction of an in silico tool depends on the quality of the experimental data under the database, the implemen­ In silico tools such as regression, classification based-QSAR, ML ted in silico tools to model them, stringent validation criteria and the techniques, combinatorial chemistry along with , idea of applicability domain of the developed model. docking, and molecular dynamics approaches have helped ● User friendliness, the output type (qualitative or quantitative), easy interpretation of the obtained results, and the user’s study require­ achieving impeccable impression in drug designing [20] and ments are the major criteria for choosing in silico tools. new catalyst designing [40], with substantial capacity in the ● For reliable prediction, users should use multiple in silico tools for field of reaction pathway prediction [41]. Over the last two prediction purposes and compare the results, which are to be fol­ lowed by identifying the most probable or similar prediction among decades, the literature and predictive in silico tools sufficiently the in silico tools. inform that well-developed prediction models have been able to predict the ADMET profiling and drug-likeness long before drug’s This box summarizes key points contained in the article. real synthesis [3,5,42]. ADMETlab [16], admetSAR [17], SwissADME [19], FAF-Drugs [43], TOPKAT [44] are some impor­ tant in silico tools that are most frequently used by academicians and industries. Apart from that the transformation products of industries. These open access in silico tools are fully integrated drugs in the human body as well as the biodegradation products with ADMET prediction platforms, including multiple quanti­ of these drugs in different environmental systems can be pre­ tative-structure-activity-relationships (QSAR) and ML models cisely predicted through the developed models included in that are capable of excluding undesirable drug candidates META-PC and Meteor-Nexus. The site of metabolism and the based on ADMET, reducing the number of synthesis and the type of enzymes involved in the biotransformation of the drugs degree of failures in preclinical and late-stage clinical trials can be identified with the development of computerized models [20,21]. The in silico tools are capable of predicting ADMET, (CypReact [45], XenoSite [46]). Straker et al. [47] efficaciously just from their chemical structure, long before practical synth­ employed computerized models to develop a catalyst that offers esis of the drug candidates. Thus, considering the importance asymmetric control of a cycloisomerization reaction. Quantum of in silico tools, and especially the open access ones, these are chemistry methods such as density functional theory (DFT) and leading the way of data reproducibility and research time-dependent DFT (TD-DFT) can design an optimized ligand, transparency. with noticeably improved selectivity and rate, over the existing ligand [42]. DiRocco et al. [48] designed another superior catalyst in the synthesis of a pronucleotide, utilizing computational ana­ 2. ADMET profiling lysis. Machine learning has been found profoundly useful for A fine balance of drug target potency, selectivity and suitable forward reaction prediction [49], for predicting the major pro­ ADMET will ultimately lead to the selection and clinical devel­ duct of a reaction, employing probability and rank to a potential opment of new drug candidates with increased potentialities product’s algorithm. Ahneman et al. fruitfully predicted the per­ [1–3]. The typical compound entering a Phase I formance of the Buchwald-Hartwig amination reaction against has undergone years of rigorous preclinical testing, and yet, variables such as catalysts, bases, reactants, and additives [50]. has only less than a 10% chance of reaching the market. So, it Another significant area of research, where computational mod­ is quite clear that the discharge of a large number of dropout els can be implemented, is the synthetic route planning to pharmaceuticals during these later stages, collectively contri­ a target molecule [51]. Therefore, in silico tools have widespread butes to the increment in environment toxicity [22,23]. applications from the drug design to the ADMET profiling fol­ Therefore, drug-induced toxicity to humans and the toxicity lowed by modification in the synthesis routes. of pharmaceuticals released into the environment have Another imperative aspect of a drug candidate is concern emerged as major toxicity problems of pharmaceuticals. about ecotoxicity. As per U.S. Environmental Protection Physicochemical properties of a drug candidate lead to multi­ Agency (USEPA), around 20,000 to 100,000 animal study ple drug-likeness scores, which can be considered as the first requests are submitted every year, which include live animals step to understand the probability between a successful and as mice, rats, rabbits, guinea pigs, dogs and other species. a failed drug candidate. The PK/PD properties as well as bioavail­ With the recent announcement to stop study on live mammals ability also largely depend on ADMET associated properties [24– by 2035, USEPA strongly recommend for the in silico modeling 39]. A series of ADMET associated properties are checked before to study the toxicities of the investigated compounds [51]. a drug candidate is introduced into the clinical trials. For better Thus, for risk assessment of any chemical, one must rely on understanding, we have illustrated major physicochemical in silico models. EXPERT OPINION ON DRUG DISCOVERY 3

Table 1. Commonly employed physicochemical properties, Drug-likeness score, ADMET and ecotoxicity endpoints along with commonly used modeling algorithm under in silico tools. Drug-likeness Physicochemical properties rules ADMET-associated properties Ecotoxicity endpoints Modeling algorithm Molecular weight (MW), number Lipinski rule Caco-2 cell permeability (CacoP), Biodegradation (BD), Crustaceans Multiple linear regression (MLR), of atoms (nA), number of heavy of 5, Blood Brain Barrier (BBB) toxicity (CT), Fish aquatic toxicity random forests (RF), support atoms (nHA), Molar refractivity Ghose, permeation, human intestinal (FAT), Tetrahymena pyriformis vector machine (SVM), decision (MR), Molecular polar surface Veber, absorption (HIA), oral toxicity (TPT), Honeybee toxicity tree (DT), recursive partitioning area (MPSA), Total charge (TC), Egan, bioavailability (OB), CYP (HBT), Fathead minnow toxicity regression (RP), naïve Bayes number of H-bond acceptors Muegge, inhibitors and substrates (CYPI/ (FMT) (NB), partial least square (PLS), (HBA), number of H-bond Oprea, S), P-gp inhibitors and substrates Convolutional neural networks donors (HBD), Number of Varma. (P-gp+/P-gp-), Volume (CNN), k-nearest neighbor rotatable bond (nRB), number distribution (VD), Clearance (Cl) (kNN), Radial basis function of aromatic ring (nAROM), [Total, renal, hepatic, Billiary], (RBF), Principal Component lipophilicity (logP)*, distribution t1/2, Blood-placenta barrier (BPB), Analysis (PCA), Quantitative coefficient (logD)*, solubility Acute oral toxicity (AOT), estimate of drug-likeness (QED), (logS)* Mutagenicity (MT), Plasma Read-across (RA), Quantitative *Properties are predicted one. Protein Binding (PPB), Skin structure-activity relationship They can vary from software to sensitization (SkinSen), (QSAR), Variable nearest software for the same chemical. Cytotoxicity (CytTox), neighbor (vNN), SMOreg: SVM Others are computed intrinsic Cardiotoxicity (CarTox) implemented into Sequential properties of a chemical which Hepatotoxicity (HepTox), Minimal Optimization (SMO), are fixed irrespective of software) Immunotoxicity (ImmTox), LBM (learning base model), ada Carcinogenicity (CAR), Drug-drug boost (AB), linear discriminant interactions (DDI), Drug-induced analysis (LDA), gradient liver injury (DILI), Eye irritation boosting (GB), extra trees (EI), Human ether-a-g-go classifier (ETC), Flexible inhibition (Herg), Micronuclear discriminant analysis (FlDA), toxicity (MT), Endocrine Nearest centroid (NC), disruption (ED), Uridine Maximum likelihood (MLHD), diphosphate Rule-based C5.0 algorithm (R glucuronosyltransferases C5.0) catalyzed (UGTC), Ames toxicity (AT), Ames bacterial mutation assays (ABMA), adverse outcomes (Tox21) pathways and toxicity targets, Skin permeation (SP), Brenk Structural Alert (BRENK), Sites of Metabolism (SOM), Quinone Formation (QF), maximum recommended daily dose (MRDD), Breast Cancer Resistant Protein (BCRP), Organic cation transporter 2 (OCT2), organic anion transporter polypeptide 1B1 (OATP1B1)

4. Open access in silico tools chemical based on the comprehensive database of around 288,967 chemicals and 31 optimized QSAR models. This is an In silico models have no alternatives for early prediction of Open access module highly suitable for rapid screening of drugs’ PK/PD behaviors and toxicity profiling while the best ADMET profiles and subsequent screening and prioritization tools for accurate and reliable prediction need continuous of any NCEs [52]. There are two major components with development. However, free accessibility of these in silico tools a shared running environment. The ADMETlab consists of or standalone software tools to non-expert new researchers five in silico tools named – a) Drug-likeness Evaluation, b) looms at large due to commercialization of few specific model­ ADMET Prediction, c) Systemic Evaluation, d) Application ing companies. Thus, the development of scientifically effective Domain and e) Aggregator prediction. Under the functional open access prediction tools for ADMET profiling is the future of modules, the user can carry out drug-likeness investigations drug design and discovery. Continuous development of new and 31 ADMET endpoints predictions (3 physicochemical open access tools with inherent multiple parameter character­ properties, 6 absorptions, 3 distribution, 10 metabolisms, 2 izations, shall strengthen the quality of predictions and reliability elimination, 7 toxicity), use one prediction-based and five of the NCE (new chemical entity). The most popular and com­ rules-based models, and also can carry out systematic evalua­ monly used open access in silico tools have been discussed here tion and database screening. with suitable examples of drugs as input. Predictions are executed through a series of predictive models prepared using diverse modeling tools (RF, SVM, RP, PLS, NB, DT), representation patterns (2D, Electrotopological 4.1. ADMETlab state atom (E-state), and Molecular ACCess System (MACCS)). Dong et al. [16] developed a web-based platform named The database integrated all the ADMET entries from the ADMETlab for the evaluation of ADMET profiling of query Drugbank database, ChEMBL database, EPA database and 4 S. KAR AND J. LESZCZYNSKI from literature sources found in the in silico tool documenta­ Table 2. Prediction output files of ADMET profiling and Eco-toxicity properties tion. The complete documentation of the in silico tools has of Aspirin, employing admetSAR. thoroughly been explained [52]. ADMET predicted profile – Classification Value Probability Input file and output details: SMILES or SDF format of Human Intestinal Absorption + 0.9884 Caco-2 - 0.7353 chemicals is accepted as input. Users can also use molecular Blood Brain Barrier + 0.9752 editor to analyze multiple molecules at a time. The obtained Human oral bioavailability + 0.6714 outputs can be exported as .csv file. Subcellular localization Mitochondria 0.9369 OATP2B1 inhibitor - 1 OATP1B1 inhibitor + 0.9774 OATP1B3 inhibitor + 0.9823 4.2. admetSAR MATE1 inhibitor - 0.98 OCT2 inhibitor - 0.975 The admetSAR is a user-friendly, freely accessible web-tool for BSEP inhibitor - 0.9074 predicting ADMET properties by providing a name, SMILES, P-glycoprotein inhibitor - 0.9802 CAS Registry Number (CASRN), and similarity search [17]. The P-glycoprotein substrate - 0.9748 CYP3A4 substrate - 0.7931 admetSAR can predict around 50 important ADMET endpoints CYP2C9 substrate + 1 along with multiple ecotoxicity endpoints (Biodegradation CYP2D6 substrate - 0.9005 and chemotoxicity in crustaceans, fishes, Tetrahymena pyrifor­ CYP3A4 inhibition - 0.9611 CYP2C9 inhibition - 0.9071 mis and honeybees), employing QSAR models [53]. The pre­ CYP2C19 inhibition - 0.9445 sent updated tool, admetSAR 2.0, is prepared based on 47 CYP2D6 inhibition - 0.9576 models which are optimized for drug discovery and ecotoxi­ CYP1A2 inhibition - 0.9046 CYP inhibitory promiscuity - 0.9557 city predictions. UGT catalyzed - 0 Users can use SMILES as direct input with a maximum input of Carcinogenicity (binary) - 0.6204 20 molecules at a time. The output results can be downloaded as Carcinogenicity (trinary) Non-required 0.7139 Eye corrosion - 0.6035 the CSV (file.csv) file format for further analysis. A total of 40 Eye irritation + 1 binary endpoints, 4 regression models and 3 multi-class models Ames mutagenesis - 0.98 with five-fold cross-validation were employed for classification- Human either-a-go-go inhibition - 0.877 micronuclear + 0.6966 based models [2]. Machine learning tools as SVM, RF and kNN Hepatotoxicity - 0.85 were employed through scikit-learn package with python scripts. Acute Oral Toxicity (c) II 0.726 One of the important features of this tool is the prediction of Estrogen receptor binding - 0.7113 Androgen receptor binding - 0.749 CYP450 enzyme inhibition, key for drug toxicity, and drug-drug Thyroid receptor binding - 0.8815 interaction predictions. For illustration, we have used SMILES of Glucocorticoid receptor binding - 0.8524 Aspirin as an input file and followed by using the predict tabs of Aromatase binding - 0.8352 PPAR gamma - 0.8031 ‘ADMET properties for drug discovery’ as well as ‘Predict for eco- Honey bee toxicity + 0.692 toxicity’ for the prediction of endpoints (Table 2). Biodegradation + 0.85 Input file and output details: For input, the user can crustacea aquatic toxicity - 0.89 Fish aquatic toxicity + 0.9702 provide SMILES (for multiple inputs) or can draw a query ADMET predicted profile – Regressions Value Unit molecule (one at a time). The tool can predict ADMET proper­ Water solubility −1.783 logS ties as well as ecotoxicity properties, with customization of an Plasma protein binding 0.96 100% Acute Oral Toxicity 2.143 kg/mol endpoint. Qualitative and quantitative outputs are available Tetrahymena pyriformis −0.375 pIGC50 (ug/L) along with the applicability domain details.

4.3. CypReact positive. The developers [45] claimed that this prediction tool can correctly predict the metabolites of Phase I, Phase CypReact is another open access – in silico tool capable of II, and microbial metabolism in humans. Furthermore, predicting enzymatic reaction with the CYP450 enzyme CypReact achieves expressively better outcomes as com­ [45,54]. It provides tools for the metabolism prediction and pared to SmartCyp and ADMET Predictor. The schematic necessitates prediction of whether an explicit drug will inter­ workflow of CypReact in silico tool is illustrated in Figure 1. act with one or multiple metabolizing enzymes, with pre­ Input file and output details: User can provide a standard dicting individual enzymatic reactions. The CypReact helps in SDF file or SMILES string of an arbitrary molecule as an input the first step of reactant prediction whether the query mole­ and it will precisely predict the reaction of the input com­ cule will react with any one of the nine most significant pound with any of the nine CYPs mentioned above. CYP450 enzymes, namely CYP1A2, CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1 and CYP3A4. Thus, CypReact has nine sub-tools built for individual 4.4. CypRules CYP450 enzymes. To validate the CypReact and to prepare a dataset, 1632 molecules had been used, showing com­ CypRules Server [55] is capable of predicting the metabolizing mendable outcome. Individually, CypReact classifier is com­ CYPs inhibition which includes CYP1A2, CYP2C9, CYP2C19, petent to lessen the average weighted cost score for its CYP3A4 and CYP2D6. The prediction is done on the basis of allied CYP450 isoform, based on a weighted cost that pena­ the C5.0 algorithm and the tool is openly accessible [56]. The lizes each false negative five times more than each false rules are calculated based on Mold2 – 2D descriptors and are EXPERT OPINION ON DRUG DISCOVERY 5

Figure 1. Schematic illustration of CypReact in silico tool.

also listed under the output. For the two most vital para­ Input file and output details: The user must draw an indivi­ meters, drug metabolism and drug interactions, CYP predic­ dual structure for the analysis, which is a little exhaustive as we tions are quite important task under ADMET. have found. Users will get drug-likeness properties as an output. Input file and output details: CypRules accepts only the SDF file format and it works well with thousands of chemical structures. Thus, at a single run, a huge number of compounds 4.6. DruLiTo can be predicted. The outcome of CypRules contains three Drug-Likeness Tool (DruLiTo) [59] is an open access standalone forms of data as – a) the visualization of the chemical struc­ screening tool for drug-likeness and runs on two operating sys­ ture; b) the prediction as an inhibitor or non-inhibitor in the tems i.e. Windows and Linux. The software is developed employ­ form of a qualitative output; and c) the rulesets information ing the Chemistry Development Kit (CDK) by the Department of for the prediction. Pharmacoinformatics, National Institute of Pharmaceutical Education and Research (NIPER), Punjab, India. DruLiTo calculates diverse drug-likeness rules such as Lipinski’s rule, Veber rule, 4.5. DrugMint MDDR-like rule, Ghose filter, CMC-50 like rule BBB rule and Quantitative Estimate of Drug-likeness (QED). The available tutor­ DrugMint is another server for predicting drug-likeliness for any ial on the website offers complete guidelines about the use of NCE [57]. It was developed employing approved and experimen­ DruLiTo. tal NCE from DrugBank 2.5 [58]. The models have been prepared Input file and output details: The SDF file format is accep­ using open-source software packages as WEKA, PaDEL and table and multiple molecules can be used as an input file. The SVM_Light. The in silico tool can be employed for drug-likeness output can be exported as a .csv file format. prediction virtual screening, and drug – designing. DrugMint has four modules as – i) Drawing Structures of the query molecule employing Marvin applet; ii) Virtual Screening of diverse chemi­ 4.7. FAF-Drugs cal libraries; iii) Analog Designing for lead optimization and analogs-based drug designing for the preparation of a virtual FAFDrugs is an open access online ADME-Tox Filtering Tool chemical library, employing a user-specified scaffold, blocks and based on the FAF-Drugs program version 4 [60]. Along with linkers; and iv) The Search Database from ZINC and CheEMBL the ADMET filtering, the FAF-Drugs have capabilities of remov­ databases for possible drug candidates. ing salts and counter-, duplicates, or computing the Dhanda et al. [58] used PaDEL software for the computation ADME-Tox descriptors without filtering the database. The pre­ of descriptors and molecular fingerprints for the development of sent version FAF-Drugs4 assists in silico screening and experi­ models. Weka software was employed for the selection of mod­ mental screening as it helps select chemicals for in cellulo/in eled features. The developed model is capable of predicting silico/in vitro assays [61]. The package is a combination of drug-likeness of the molecules with 89.96% accuracy. Python modules, the OpenBabel v2.2.3 toolkit, the pybel 6 S. KAR AND J. LESZCZYNSKI module, OpenBabel C++ library, and multiple ChemAxon Table 3. Type of approaches considered under Hit Dexter 2.0. JChem plugins. FAF-Drugs4 has the following Services: Types of approach Details Four machine learning classifiers Molecules expected to show moderate or ● Bank-Cleaner: Can curate a small chemical library without models to identify high hit rates in primary screening assays (PSA) and considered as possibly computing other properties. promiscuous compounds ● Bank-Formatter: Can convert SDF or SMILES to a suitable Molecules likely to show high hit rates in SDF input file for FAF-Drugs4. PSA and hence regarded as potentially highly promiscuous compounds ● Filter-Editor: Customize user-defined filtering parameters Molecules likely to show moderate or high for FAF-Drugs4 [61]. hit rates in confirmatory dose-response assays (CDRA) and hence regarded as potentially promiscuous compounds In the latest version of FAF-Drugs4, the developers have Molecules likely to show high hit rates in included a quantitative estimate of the drug-likeness (QED) CDRA and hence regarded as potentially method which is termed as FAF-QED. This option does not highly promiscuous compounds Similarity-based approaches to Closest known aggregate-forming consider a sharp threshold on physicochemical properties measure the distance to the compound or aggregator based on such as the Lipinski rule of 5 (Lipinski’s RO5) or the Gleeson’s Tanimoto coefficient similarity RO4. The FAF-QED approach involves descriptors like MW, Closest known dark chemical matter compound defines molecules that have logP, HBA, HBD, MPSA, nRB, nAROMs, and a search of 113 been tested in at least one hundred structural alerts (ALERTs). Moreover, FAF-Drugs4 is capable of different biochemical assays and have computing toxicophores and the identification of the Pan- never shown activity. Rule-based approaches to flag Molecules based on scaffolds that have assay interference structure (PAINS). The newer version has been linked to pan-assay interference the option of optimization of the standardization–neutraliza­ (PAINS) undesirable molecules, tion process, the generation of the filtered subsets protonated identified based on substructures at physiological pH, improvement of the SMARTS definitions for 137 structural alerts and better detection of the 515 PAINS, significant outcome of this in silico tool is the prediction of and finally, the possibility for the users to obtain a PDF report the presence of large fractions of promiscuous candidates for the selected compounds. Moreover, the tool can also among approved drugs. This in silico tool is a valued tool for search for small molecules that could make a covalent bond decision support and chemical de-prioritization without the with a macromolecular target such as protein. FAF-Drugs4 intention of hard filter for discarding chemicals. Hit Dexter 2.0 helps to prioritize chemicals and to analyze small molecules offers a series of different models and approaches for an NCE such as protein-protein interaction inhibitors. inspection that are summarized in Table 3. Input file and output details: Users can upload compounds Input file and output details: A single SMILES string or in the SDF/SMILES format or draw a compound as an input a file with multiple SMILES in the .txt format can be used as file. As an output, the tool generates CSV tables with com­ an input file. We tried with 20 SMILES (more can be taken but puted physicochemical descriptors, structural alerts, PAINS, it may take a little more time) at a time, and it works com­ and writes SDF filtered output files. pletely fine. The output data can be exported as a .csv file format.

4.8. Hit Dexter 2.0 4.9. MetStabOn Hit Dexter is a machine learning approach prepared from a dataset of 250,000 chemicals with experimentally deter­ MetStabOn is an online platform for the qualitative prediction of mined activity for at least 50 different protein groups. Hit metabolic stability into three different classes (low, medium,

Dexter refers to the evaluation of how likely a small organic high) based on two parameters: half-lifetime (T1/2) and clearance molecule is to trigger a positive response or false-positive [64]. The platform has the application of six machine-learning readouts in biochemical assays. The present version Hit tools covering classification as well as regression-based analysis Dexter 2.0 [62] is an open access web service that offers along with ligand-based methodology. The implied regression- users access to the models as well as to similarity-based based approach is SMOreg, which is a modification of the methods for the prediction of dark chemical matter, aggrega­ Support Vector Machine (SVM) implemented into Sequential tors for flagging frequent hitters and undesired substructures, Minimal Optimization (SMO) regression analysis. The five classi­ drug-like compounds, approved drugs, potential PAINS, and fication approaches were Random Forest (RF), k-nearest neigh­ natural products [63]. Hit Dexter 2.0 is one of the open access bor (IBk’s kNN), SMO, decision tree J48, and Naïve Bayes. The tools under the New E-Resource for Drug Discovery (NERDD) predictive models were constructed for human, mouse, and rat module. The other important tools are FAME 3, GLORY, NP- models, employing in vitro assays data on liver microsomes and

Scout, and Skin-doctor. plasma for T1/2 and liver microsomes data for clearance. To The ML models under Hit Dexter can cover both primary classify the metabolic stability into three classes, the author’s screening assays and confirmatory dose-response assays. To set following cutoffs (Considered unit of T1/2 is hours): (≤0.6) – minimalize the overexpression of functionally and structurally Low; (0.6–2.32) – Medium; and (> 2.32)-high. Common ADMET related proteins, a new method, named protein sequence prediction tools do not predict the complex phenomenon of clustering, was introduced in this in silico tool. Another metabolic stability, and there lies the importance of this tool EXPERT OPINION ON DRUG DISCOVERY 7

Table 4. Results of metabolic stability tests for three CHEMBL dataset compounds to human, mouse, and rat models using MetaStabOn.

half-lifetime (T1/2 in hour) Molecular structure from CHEMBL Human Mouse Rat 5 0.533 1.833

7.25 0.548 0.933

5.5 3.85 0.5

[65]. The three CHEMBL dataset compound’s results for human, 4.11. Pred-hERG mouse, and rat models are illustrated as an example in Table 4, Braga et al. [67] developed an Open access Web App, using employing MetaStabOn. the tools as Flask, uWSGI, nginx, JSME Python, and JavaScript, Input file and output details: SMILES can be used as an to predict cardiac toxicity through the blockage of the hERG input format, and the predicted T value of human, rat, and 1/2 K+ channels, which is one of the significant anti-targets con­ mouse will be available as an output. sidered in the drug discovery process. The present open access in silico tool Web App Pred-hERG 4.2 [68], is capable of predicting a compound as a hERG channel blocker (cardio­ 4.10. pkCSM toxic) or non-blocker (non-cardiotoxic). The Pred-hERG has been developed based on statistically significant and predic­ pkCSM is a freely accessible in silico tool for the prediction of + ADMET properties that uses graph-based signatures to tive QSAR models employing the hERG K channel blocking develop predictive models [18,66]. The integrated platform data of 5,984 compounds and achieved accuracy, sensitivity, can rapidly evaluate major pharmacokinetic and toxicity prop­ and specificity up to 89–90%. Once the user inserts the SMILES erties required for drug-likeness and bioavailability just by of the studied chemical or draws the structure in the available providing the query molecule as an input in the form of editor, they can predict the hERG toxicity. The results will be SMILES. The major component of the pkCSM signature refers available with three outputs: (i) binary prediction (blocker or to molecular properties such as toxicophore fingerprint, non-blocker), (ii) multiclass prediction (non-blocker, weak/ atomic pharmacophore frequency count, lipophilicity, MW, moderate blocker, or strong blocker), and (iii) probability surface area, number of rotatable bonds, etc. along with the maps that are extracted from the generated binary model distance-based signature. The pkCSM platform considers four­ utilizing Morgan fingerprints. The map helps to visualize the teen regression-based models for quantitative prediction of atomic contributions of a structure to the hERG blockage where pink atoms contribute to a decrease of the hERG block­ ADMET properties and sixteen classification-based models to + categorize the outcomes in the form of binary classes. The age while green atoms or fragments help block the hERG K schematic flow of the in silico tool along with the computed channel. For experimentations, paracetamol was used as an ADMET profiling of acetaminophen is illustrated in Figure 2. input molecule in Pred-Herg and is predicted as non- Input file and output details: Users can provide a SMILES cardiotoxic with 90% confidence, with the probability map file with a header (limit 100 SMILES at a go) or an individual and 8 similar off-target compounds in the output. SMILES string as an input. The output information can be Input file and output details: The SMILES, MOL, or SDF copied and pasted in an excel or a text file. format or, drawing of the query molecule in the molecular 8 S. KAR AND J. LESZCZYNSKI

Figure 2. pkCSM workflow followed by computed output ADMET parameters for acetaminophen. editor, is accepted as an input. The drawback is that only For example, we predicted three molecules employing the a single molecule is predicted at a time. As an output, the QSAR LLNA model and found reliability with confidence in qualitative prediction is available in the form of potential, prediction along with the prediction result and probability moderate, and non-cardiotoxic with the information on the map. The first molecule, 2-hydroxyphenyl-2-hydroxybenzoate, applicability domain, the confidence of prediction in %, and is identified as a non-sensitizer with higher confidence, where the probability map for visualization. the cloud of pink atoms is quite dominating and there is almost no sign of a green cloud. However, in the case of phenyl salicylate, although it is predicted as a sensitizer, the 4.12. Pred-Skin prediction confidence is low due to the presence of green and pink clouds proportionately. On the contrary, the whole phe­ Pred-Skin is a web application capable of predicting chemi­ nyl aminobenzoate is covered with a green cloud, making its cally induced skin sensitization, an important parameter to prediction as a sensitizer with high confidence. Therefore, to check under ADMET profiling. The App is developed by design a non-sensitizer drug molecule, we need to avoid Braga et al. [69], employing QSAR models for the skin sensiti­ fragments with a green cloud and opt for pink cloud zation potential of 109 compounds for humans (correct classi­ fragments. fication of 0.70–0.81) and murine local lymph node assay Input file and output details: The SMILES, MOL, or SDF (LLNA) data for 515 compounds (correct classification of 0.­ format of the chemical are accepted as an input. Users can 72–0.84). Later, Alves et al. [70] proposed further modification also draw the query molecule in the molecular editor. The only integrating multiple computational models using animal drawback is that the user can predict a single molecule at (LLNA), non-animal (Direct Peptide Reactivity Assay [DPRA], a time. As an output, a qualitative prediction is available in the KeratinoSens, the human Cell Line Activation Test (h-CLAT)), form of a sensitizer or a non-sensitizer with the information on and human (human repeat insult patch test and human max­ the applicability domain and probability map for visualization. imization test) data. The Pred-Skin web application version 2.0 is freely available for the web, iOS (Apple Store), and Android (Google Play) [71]. Once the SMILES or drawing of the query- drug-molecule in the editor is given as an input, the output 4.13. SwissADME offers a binary prediction as a sensitizer or non-sensitizer for all mentioned five models along with the percentage of the SwissADME [19] is one of the most commonly used web tools prediction confidence. The most important output along with for fast prediction of physicochemical properties (molecular the prediction is the probability maps demonstrating the con­ weight, molar refractivity, H-bond donors and acceptors, num­ tribution of the chemical fragments to skin-sensitization, ber of heavy atoms, the fraction of Csp3), (GI which offers an easy interpretation of the studied structure, absorption, blood brain barrier permeability, CYP enzymes followed by structural modifications if required during the inhibitors, P-gp substrate, skin permeation), lipophilicity, drug design. A newer version, Pred-Skin 3.0, is available for water solubility, drug-likeness (Lipinski, Ghosh, Verber, Egan prediction purposes. factors and Bioavailability score), and EXPERT OPINION ON DRUG DISCOVERY 9

friendliness (Pan-assay interference structure [PAINS] alert, E) SwissParam: Provides molecular mechanics calculation synthetic accessibility, lead likeness) for any NCE [72]. offering topology and parameters for small organic molecules The best of SwissADME lies in the possibility of different companionable with the CHARMM all atoms force field to use input methods, computation of multiple molecules at in CHARMM and GROMACS [77]. a single click, an easy interactive display and a possibility to When SwissADME predicts that any query molecule lacking save the results for future analysis and interpretation. The drug-friendliness or has a poor ADMET profile, the above- most significant feature is the integration of SwissADME with mentioned web servers can help in redesigning the molecule the Swiss Drug Design workspace, which is a collection of rather than starting from scratch. computer-aided drug design (CADD) tools developed by the Among the interactive displays of obtained analyses by Molecular Modeling Group of the Swiss Institute of SwissADME, the Bioavailability Radar plot (Figure 3) offers an Bioinformatics (SIB). The Swiss Drug Design consists of the instant idea about the drug-likeness of a query molecule. The following tools: plot considers the optimal range of six physicochemical prop­ A) SwissSimilarity: Performs ligand-based virtual screening erties and their size (MW: 150–500 g/mol), lipophilicity employing multiple libraries of small molecules, which include (XLOGP3 between −0.7 and +5.0), solubility (log S no higher existing drugs, bioactive molecules, and commercially avail­ than 6), polarity (TPSA between 20 and 130Å2), saturation able 205 million of virtual compounds readily synthesizable (fraction of carbons in the sp3 hybridization no less than [73]. The predictions were carried out employing 2D molecular 0.25) and flexibility (no more than 9 rotatable bonds). fingerprints as well as super positional and fast non-super A physicochemical range on each axis was defined by these positional 3D similarity approaches. mentioned properties, which is colored as a pink area, and the B) SwissTargetPrediction: Helps estimate the most possi­ query molecule must stay entirely in this prepared pink zone ble macromolecular targets of a bioactive small molecule. The to be considered drug-like. bio target prediction is established using a combination of 2D The BOILED-Egg plot (Figure 3) can be prepared from the red and 3D similarity with a library of 370,000 known actives on button that appears just above the graphical output of the more than 3000 proteins from Homo sapiens L. ssp. sapiens, SwissADME [78]. The BOILED-Egg can concurrently predict two Mus musculus L., and Rattus norvegicus Berk [74]. key ADME parameters, i.e. the blood–brain barrier (BBB) accessi­ C) SwissDock: Predicts the molecular interactions between bility and the passive gastrointestinal absorption (HIA), which a small organic molecule and a target protein. The algorithm is relies on two physicochemical descriptors: TPSA (polarity) and based on the docking software EADock DSS [75]. WLOGP (lipophilicity). If we analyze Figure 3, the egg-shaped plot D) SwissBioisostere: Useful in bioisosteric designing comprises the yolk, which represents the physicochemical space through the knowledgebase molecular replacements of small for extremely feasible BBB permeation, and the white, which organic molecules [76]. demonstrates the physicochemical zone for high HIA absorption.

Figure 3. Schematic workflow of SwissADME along with different features and graphs from the output screen for the input molecule Aspirin. 10 S. KAR AND J. LESZCZYNSKI

Here, both spaces are not mutually exclusive, and the outside can draw the structure in the drawing editor. Once the job is gray space stands for compounds with properties suggesting submitted, the user will get e-mails about the job start statement predicted low absorption and inadequate brain penetration. and the job done statement. The output offers two kinds of Input file and output details: Users can draw one molecule prediction: the first one includes models with restricted AD and at a time, followed by a SMILES conversion and prediction of the the second one is without restricted AD analysis. All output can ADMET. If the user has SMILES information, then the SMILES of be exported in the .csv format, and the implemented prediction molecules can be pasted into the SMILES editor, followed by models can be checked from the output window (Figure 4). prediction. We have checked around 50 molecules, and the program works fine. A series of qualitative and quantitative ADMET data can be checked as output files that can be exported 4.15. XenoSite as a .csv format file. As a visualization output, the Bioavailability Radar plot and BOILED-Egg plot are accessible. XenoSite [46] is freely accessible in silico tool for human in vivo metabolism and reactivity prediction for small organic mole­ cules [81]. The XenoSite server consists of multiple predictor 4.14. vNN-ADMET models such as XenoSite P450 Metabolism [46], XenoSite The vNN-ADMET is a publicly accessible online platform capable Epoxidation [82], XenoSite Quinone Formation [83], XenoSite of predicting 15 ADMET properties such as mutagenicity, cyto­ Reactivity [84], and XenoSite UGT [85]. The details about indi­ toxicity, cardiotoxicity, drug-induced liver injury, microsomal sta­ vidual systems are discussed in Table 5. XenoSite employs the bility, and drug-drug interactions [79,80]. Here, the user needs to computation of topological and quantum chemical descrip­ register with a valid e-mail ID to access the in silico tool for future tion models of multiple molecules including the reactivity of applications at no cost. Along with prediction, users can prepare atomic sites generated by the SmartCyp software [86]. For their prediction model, employing classification (Build example, we used Diphenhydramine (Benadryl) as an input Classification Model tab) as well as regression-based models molecule in the XenoSite in silico tool and as an output, we (Build Regression Model tab). The new models are generated got multiple metabolites in the form of epoxidation, quinone based on our variable nearest neighbor (vNN) methodology. formation, and reactive ones (Figure 5). The vNN method is a form of the widely used k-nearest neigh­ Input file and output details: The SMILES of the query bor (k-NN) approach, which has several advantages over existing molecules can be pasted, or the SDF file can be uploaded. in silico methods. The vNN calculates the similarity distance Multiple molecules can be predicted at one go. As an output, between compounds in the form of their chemical structure and users will get multiple visualization interpretations of epoxida­ utilizes a distance threshold to express an applicability domain tion, stable and unstable oxygenation, hydrolysis, reduction, (AD). Implemented vNN models under this in silico tool achieved quinone formation, and reactive metabolites. accuracies of >71%, where on average, the models predicted 75% The type of endpoint prediction by the discussed in silico of the chemicals in their datasets employing 10-fold CV. tools is summarized in Table 6 so that the user can easily Input file and output details: The input file format of the decide which tool will be required for his/her analysis. The vNN-ADMET is SMILES (either a .csv or a .txt file that contains table will also be useful for comparison of similar endpoints by column headers labeled NAME or SMILES can be used), or one multiple tools for analysis purposes.

Figure 4. Schematic representation of vNN-ADMET in silico tool. EXPERT OPINION ON DRUG DISCOVERY 11

Table 5. Associate prediction systems under XenoSite. and ADMET research by integrating important physicochemical/ Prediction System Details ADMET- associated properties within the database [17,87–93]. XenoSite P450 It can predict the way a drug is oxidized by DrugBank and ChEMBL offer drug-likeness properties for indivi­ Metabolism cytochromes (CYP) enzymes. The model can offer predictions of regio-selectivity (which atoms on dual chemicals. On the contrary, admet-SAR, DrugBank, and a molecule are likely to be oxidized by a given CYP PKKB consist of ADMET data, and BindingDB consists of interac­ enzyme), but they do not explicitly model selectivity tions of molecules with potential target proteins. These data­ (which molecules are substrates of a given CYP enzyme). bases are important resources for in silico modeling as well as XenoSite Epoxides are electrophilic and highly reactive due to artificial intelligence (AI) modeling due to their open access Epoxidation ring tension and polarized carbon-oxygen bonds nature and up-to-date data. Based on the user’s requirement, commonly formed by CYP P450 acting on aromatic or double bonds. The server can identify the site of these databases can be strategically used for the ADMET profil­ epoxidation (SOE), which can help in interpreting ing of new drug candidates in drug discovery. The real challenge adverse effects related to reactive metabolites and is to keep these databases up to date if they are offering many direct modification to prevent epoxidation for safer drugs. more ADMET endpoints in the upcoming year to increase their XenoSite Quinone The server can predict the quinone formation through reliability and precision. A list of most commonly used databases Formation metabolic oxidation. Common Quinone metabolites integrated with Drug-Likeness and ADMET profiling is portrayed (quinone-methides, quinone-imines, and imine- methides) are electrophilic Michael acceptors, which in Table 7. are highly reactive. Modeling of quinone formation offers a quick screening tool for a key drug toxicity risk. XenoSite Reactivity The server can predict the reactivity of molecules to 6. Expert opinion DNA and protein along with common electrophiles such as cyanide and glutathione. The developed The accuracy of prediction of ADMET profiling depends on the model can also predict both molecular reactivity and types of datasets and employed modeling tools used to the sites of reactivity (SOR). The SOR of reactivity modeling can be utilized not only to predict develop the models. Thus, experimental errors in the dataset, molecular reactivity but also to propose elusive poor quality models and the idea of applicability domain, are structural modifications to minimalize toxicity while major concerns related to the reliability of prediction. All three maintaining drug efficacy. XenoSite UGT The XenoSite UGT model predicts the sites of Uridine factors can offer false positive or false negative classification of diphosphate glucuronosyltransferases’ (UGTs) a query chemical. mediated metabolism on drug-like molecules, which To build a reliable in silico tool one must strictly adhere is important for Lead optimization of drug candidates. to the following sets of rules:

(1) The dataset of choice should be highly diverse, suffi­ 5. Open access databases with integrated ciently large enough to consider as a global dataset, drug-likeness and ADMET profiling followed by the least experimental errors in the end­ point analysis. If errors exist in the experimental data Open access database is the first source to build an effective in from the beginning, the in silico model can never repli­ silico tool. Over the years, incredible progress of drug/drug can­ cate the true idea of in vivo and in vitro study. didate databases have emerged, which helped in drug-likeness Combining multiple datasets into a single dataset is

Figure 5. Prediction outcome of Diphenhydramine (Benadryl) in different prediction system of XenoSite. 12 S. KAR AND J. LESZCZYNSKI

Table 6. Prediction endpoints for the discussed 15 in silico tools along with their implemented modeling methods for prediction models [Details about the parameters can be found in Table 1]. In silico tool Physicochemical properties Drug-likeness rules ADMET parameters Modeling methods

ADMETlab 3 [logP, logD, logS etc.] Lipinski, Ghose, Oprea, 31 (CacoP, BBB, HIA, OB, 10 CYPI/S, P-gp±, PPB, VD, Cl, t1/2, SVM, RF, RP, PLS, NB, Veber, Varma SkinSen, AOT, AT, Herg, HepTox, etc.) DT admetSAR 5 [MW, logP, HBA, HBD, RB] Lipinski 50 [CacoP, BBB, HIA, OB, P-gp±, PPB, BPB, CYPI/S, Cl, t1/2, CNN, RF, SVM, kNN AOT, AT, CAR, Herg, DILI, EI, MT, ED, CAR etc.] CypReact - - 9 CYPI LBM CypRules - - 5 CYPI R C5.0 DrugMint 7 [MW, logP, HBA, HBD, RB DrugMint models - SVM, PCA, MACCS etc.] keys DruLiTo 15 [MW, HBA, HBD, RB, nA, Lipinski, Ghose, Veber, - SVM, QSAR nHA, MR, logP etc.] FAF-Drugs4 17 [MW, HBA, HBD, RB, TC, Lipinski, Egan, Veber, Yes, with PAINS and structural alerts QED MPSA, logP etc.] PhysChem Hit Dexter 2.0 MW, logP . Dark chemical matter, aggregators for flagging frequent Similarity and rule- hitters and undesired substructures, and potential PAINS based ML MetStabOn - - Metabolic stability on basis of Cl and t1/2 SMOreg, SMO, Rf, k-NN, DT, NB pkCSM 6 [MW, logP, RB, HBA, HBD, Lipinski 28 [CacoP, HIA, P-gp+/, VD, BBB, 7 CYPI/S, Cl, AT, Herg, LBM TPSA] AOT, AT, SkinSen, HepTox etc.] Pred-Herg - - Herg Binary and multi- class QSAR Pred-Skin - - SkinSen QSAR, Bayesian, RA SwissADME 19 [MW, HBA, HBD, RB, nA, Lipinski, Ghose, Veber, 10 (5 CYPI/S, SP), HIA, BBB, PAINS, Synthetic accessibility, MLR, SVM nHA, MR, logP, logS etc.] Egan, Muegge Structural alert vNN-ADMET - - 15 [MT, CytTox, CarTox, HepTox, Herg, 5 CYPI/S, P-gp±, DILI, vNN DDI etc.] XenoSite - - 5 [SOM, Epoxidation, QF, Reactivity and UGT mediated NN SOM]

Table 7. Open access databases integrated with Drug-Likeness and ADMET profiling. Structure formats Database Entries No. of physicochemical/ADMET- associated properties provided Reference admetSAR 96,000 5 physicochemical and 49 ADMET properties SMILES [87], BindingDB 1,854,767 binding data (7,493 protein ADMET-related results found within the 1.2 million SMILES, SDF [88] targets and 820,433 small molecules) pieces of binding data ChEMBL >15 million 16 physicochemical properties and 2,18,412 ADMET- SMILES, SDF [17] related assay data with Lipinski parameters CompTox Chemicals 8,75,000 16 physicochemical and 6 ADMET properties SMILES [89] Dashboard DrugBank 13,339 18 physicochemical properties and ADMET assay results SMILES, SDF, MOL, PDB [90] followed by 22 predicted ADMET features. Drug-likeness properties included along with bioavailability score IMPPAT 9596 20 physicochemical and 22 predicted ADMET features SMILE, MOL, MOL2, [91] along with drug-likeness properties SDF, PDB, PDBQT PKKB 1685 12 physicochemical and 15 ADMET properties SMILES, SDF [92] Super Natural II 3,25,508 12 physicochemical properties and Toxicity class SMILES; MOL [93]

possible only when the experimental protocols, test (3) The final and foremost point is the idea of AD. As no species, and endpoints are completely identical. model can predict all chemicals of the universe, before (2) The in silico models can be classification- and regression- predicting any query chemical, one needs to check based chemometric models and machine learning mod­ whether it is residing inside the AD of the developed els. Now, based on the type of available experimental data model or not. If the query chemical is outside of the AD, and size of data points one must decide on the form of then this specific prediction value for a specific end­ models needs to be prepared. Another imperative point is point is unreliable. Therefore, the integration of AD the validation of the model by employing an external data analysis along with the prediction data is an important set to confirm the predictive ability of the model. Most criterion for choosing the best in silico tool for ADMET importantly, the developed models need to be validated prediction. stringently, followed by checking the robustness of the (4) Along with the above-mentioned technical rules, user model through Y-scrambling or randomization study. friendliness and ease of interpretation of the obtained Many times, consensus modeling of a single dataset by results are other major criteria for selecting good open multiple approaches offers better prediction. access in silico tools. EXPERT OPINION ON DRUG DISCOVERY 13

It is now up to the researcher’s capability to find and select the 2. Fleming N. How artificial intelligence is changing drug discovery. ideal and/or best possible in silico tools for his/her study from Nature. 2018;557:S55–S57. ● the mentioned ones. Important in terms of use of artificial intelligence in drug discovery. The best possible approaches for selecting tools are the 3. Ferreira LLG, Andricopulo AD. ADMET modeling approaches in following: drug discovery. Drug Discov Today. 2019;24:1157–1165. a) The user needs to know what he/she is looking to 4. Kinch MS, Griesenauer RH. 2017 in review: FDA approvals of new predict. For example, someone needs to check a complete molecular entities. Drug Discov Today. 2018;23:1469–1473. ADMET profiling and another researcher needs to check only 5. Jia C-Y, Li J-Y, Hao G-F, et al. A drug-likeness toolbox facilitates ADMET study in drug discovery. Drug Discov Today. the toxicity profile of a query molecule. Thus, users need to 2019;25:248–258. check each endpoint integrated into those in silico tools. If ●● The review highlights the role of ADMET and drug-likeness in these endpoints satisfy their requirements, then they can go the drug discovery. ahead and check how these models are developed including 6. Kar S, Leszczynski J. Recent advances of computational modeling datasets and employed in silico tools to develop the model. for predicting drug metabolism: a perspective. Curr Drug Metab. 2017;18:1106–1122. b) If the user is satisfied with the type of datasets and 7. Lucas AJ, Sproston JL, Barton P, et al. Estimating human ADME models that will be used to predict the query compound, properties, pharmacokinetic parameters and likely clinical dose in then he/she needs to check whether the model will offer drug discovery. Expert Opin Drug Discov. 2019;14:1313–1327. any AD decision or not. If the AD decision is available for 8. Martiny VY, Carbonell P, Chevillard F, et al. Integrated structure- each endpoint prediction, then the user can confidently and ligand-based in silico approach to predict inhibition of cyto­ chrome P450 2D6. Bioinformatics. 2015;31:3930–3937. decide and interpret the data. 9. Bhhatarai B, Walters WP, Hop CECA, et al. Opportunities and chal­ c) The obtained output is another important factor. The lenges using artificial intelligence in ADME/Tox. Nat Mater. qualitative model can predict in the form of a binary classifica­ 2019;18:418–422. tion (Yes/No, Toxic/nontoxic). On the contrary, the quantitative 10. CASE ULTRA software. Available from: http://www.multicase.com/ model can offer prediction in the form of numerical numbers. case-ultra-models 11. DEREK software. Available from: http://www.lhasalimited.org/ Therefore, the first and foremost criterion is to decide upon 12. META-PC software. Available from: http://www.multicase.com/ the kind of output required by the user. meta-pc d) Finally, as each server considers different datasets and dif­ 13. METEOR software. Available from: http://www.lhasalimited.org/ ferent tools to model specific endpoints, no model is the ultimate 14. PASS. Available from: http://www.pharmaexpert.ru/ or self-sufficient to satisfy all the end points. Therefore, we suggest 15. GUSAR. Available from: http://www.way2drug.com/gusar/index. html that the user apply multiple in silico tools for prediction purposes 16. Dong J, Wang -N-N, Yao Z-J, et al. ADMETlab: a platform for and compare the results to identify the most probable and similar systematic ADMET evaluation based on a comprehensively col­ predictions before the conclusion is drawn. lected ADMET database. J Cheminform. 2018;10:29. 17. Yang H, Lou C, Sun L, et al. admetSAR 2.0: web-service for predic­ tion and optimization of chemical ADMET properties. Funding Bioinformatics. 2019;35:1067–1069. 18. Pires DEV, Blundell TL, Ascher DB. pkCSM: predicting The authors are supported by the National Science Foundation (via grants small-molecule pharmacokinetic and toxicity properties using NSF/CREST HRD-1547754 and NSF/RISE HRD-1547836). graph-based signatures. J Med Chem. 2015;58:4066–4072. 19. Daina A, Michielin O, Zoete V. SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry Reviewer disclosures friendliness of small molecules. Sci Rep. 2017;7:42717. 20. Roy K, Kar S, Das RN. Understanding the basics of QSAR for applica­ Peer reviewers on this manuscript have no relevant financial or other tions in pharmaceutical sciences and risk assessment. New York relationships to disclose. (NY): Academic Press, Elsevier; 2015. ●● One of the fundamental books related to the QSAR study. 21. Roy K, Kar S, Das RN. A primer on QSAR/QSPR Modeling. Berlin: Declaration of interest Springer; 2015. 22. Kar S, Sanderson H, Roy K, et al. Ecotoxicological assessment of The authors have no other relevant affiliations or financial involvement pharmaceuticals and personal care products using predictive tox­ with any organization or entity with a financial interest in or financial icology approaches. Green Chem. 2020;22:1458–1516. conflict with the subject matter or materials discussed in the manuscript ●● Important review on the ecotoxicological assessment of phar­ apart from those disclosed. maceuticals and personal care products. 23. Kar S, Roy K. Risk assessment for ecotoxicity of pharmaceuticals - an ORCID emerging issue. Expert Opin Drug Saf. 2012;11:235–274. 24. Agoram B, Woltosz WS, Bolger MB. Predicting the impact of phy­ Supratik Kar http://orcid.org/0000-0002-9411-2091 siological and biochemical processes on oral drug bioavailability. Adv Drug Deliv Rev. 2001;50:S41–67. 25. Ghose AK, Viswanadhan VN, Wendoloski JJ. A knowledge-based References approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative Papers of special note have been highlighted as either of interest (•) or of characterization of known drug databases. J Comb Chem. considerable interest (••) to readers. 1999;1:55–68. 1. Waring MJ, Arrowsmith J, Leach AR, et al. An analysis of the 26. Veber DF, Johnson SR, Cheng HY, et al. Molecular properties that attrition of drug candidates from four major pharmaceutical influence the oral bioavailability of drug candidates. J Med Chem. companies. Nat Rev Drug Discov. 2015;14(7):475–486. 2002;45:2615–2623. 14 S. KAR AND J. LESZCZYNSKI

27. Egan WJ, Merz KM, Baldwin JJ. Prediction of drug absorption using 51. Grimm D. EPA plan to end animal testing splits scientists. Science. multivariate statistics. J Med Chem. 2000;43:3867–3877. 2019;365:1231. 28. Muegge I, Heald SL, Brittelli D. Simple selection criteria for drug-like ● Important document regarding the ban on mammal testing by chemical matter. J Med Chem. 2001;44:1841–1846. USEPA. 29. Martin YC. A bioavailability score. J Med Chem. 2005;48:3164–3170. 52. ADMETlab webserver; [cited 2020 Jun 11]. Available from: http:// 30. Hann MM, Keserű GM. Finding the sweet spot: the role of nature and admet.scbdd.com/ nurture in medicinal chemistry. Nat Rev Drug Discov. 2012;11:355–365. 53. admetSAR webserver; [cited 2020 Jun 11]. Available from: http:// 31. Teague S, Davis A, Leeson P, et al. The design of lead like combi­ lmmd.ecust.edu.cn/admetsar2 natorial libraries. Angew Chem Int Ed Engl. 1999;38:3743–3748. 54. CypReact; [cited 2020 Jun 11]. Available from: https://bitbucket. 32. Fukunishi Y, Kurosawa T, Mikami Y, et al. Prediction of synthetic org/Leon_Ti/cypreact/src/master/ accessibility based on commercially available compound 55. Shao CY, Su BH, Tu YS, et al. CypRules: a rule-based P450 inhibition databases. J Chem Inf Model. 2014;54:3259–3267. prediction server. Bioinformatics. 2015;31:1869–1871. ● Important study on synthetic accessibility of drug candidates. 56. CypRules Server; [cited 2020 Jun 11]. Available from: https:// 33. Mishra NK, Agarwal S, Raghava GP. Prediction of cytochrome P450 cyprules.cmdm.tw/ isoform responsible for metabolizing a drug molecule. BMC 57. DrugMint is a web server; [cited 2020 Jun 11]. Available from: Pharmacol. 2010;10:8. http://crdd.osdd.net/oscadd/drugmint/ 34. Sedykh A, Fourches D, Duan J, et al. Human intestinal transporter 58. Dhanda SK, Singla D, Mondal AK, et al. DrugMint: A in silico tool for database: QSAR modeling and virtual profiling of drug uptake, predicting and designing of drug-like molecules. Biol Direct. efflux and interactions. Pharm Res. 2015;30:996–1007. 2013;8:28. 35. Ali J, Camilleri P, Brown MB, et al. Revisiting the general solubility 59. Drug-Likeness Tool (DruLiTo); [cited 2020 Jun 11]. Available from: equation: in silico prediction of aqueous solubility incorporating http://www.niper.gov.in/pi_dev_tools/DruLiToWeb/DruLiTo_index. the effect of topographical polar surface area. J Chem Inf Model. html 2012;52:420–428. 60. Lagorce D, Bouslama L, Becot J, et al. FAF-Drugs4: free ADME-tox 36. van Waterschoot RAB, Schinkel AH. A critical analysis of the inter­ filtering computations for chemical biology and early stages drug play between cytochrome P450 3A and P-glycoprotein: recent discovery. Bioinformatics. 2017;33:3658–3660. insights from knockout and transgenic mice. Pharmacol Rev. 61. FAF-Drugs tool; [cited 2020 Jun 11]. Available from: http://faf 2011;63:390–410. drugs4.mti.univ-paris-diderot.fr/ 37. Wolf CR, Smith G, Smith RL. Science, medicine, and the future: 62. Stork C, Chen Y, Šícho M, et al. Hit Dexter 2.0: machine-learning pharmacogenetics. Br Med J. 2000;320:987–990. models for the prediction of frequent hitters. J Chem Inf Model. 38. Di L. The role of drug metabolizing enzymes in clearance. Expert 2019;59:1030–1043. Opin Drug Metab Toxicol. 2014;10:379–393. 63. Hit Dexter 2.0; [cited 2020 Jun 11]. Available from: http://hitdex 39. Hollenberg PF. Characteristics and common properties of inhibi­ ter2.zbh.uni-hamburg.de tors, inducers, and activators of CYP enzymes. Drug Metab Rev. 64. Podlewska S, Kafel R. MetStabOn-online platform for metabolic 2002;34:17–35. stability predictions. Int J Mol Sci. 2018;19:1040. 40. Orlandi M, Toste FD, Sigman MS. Multidimensional correlations in 65. MetStabOn platform; [cited 2020 11]. http://skandal.if-pan.krakow. asymmetric catalysis through parameterization of uncatalyzed tran­ pl/met_stab_pred/ sition states. Angew Chem Int Ed. 2017;56:14080–14084. 66. pkCSM tool; [cited 2020 Jun 11]. Available from: http://biosig.unim 41. Yu M, Lee H, Park A, et al. In silico prediction of potential chemical elb.edu.au/pkcsm/ reactions mediated by human enzymes. BMC Bioinformatics. 67. Braga RC, Alves VM, Silva MF, et al. Pred-hERG: a novel 2018;19:207. web-accessible computational tool for predicting cardiac toxicity. 42. Campos KR, Coleman PJ, Alvarez JC, et al. The importance of Mol Inform. 2015;34:698–701. synthetic chemistry in the pharmaceutical industry. Science. 68. Pred-Herg tool; [cited 2020 Jun 11]. Available from: http://pre 2019;363:eaat0805. dherg.labmol.com.br/ 43. Lagorce D, Sperandio O, Baell JB, et al. FAF-Drugs3: a web server 69. Braga RC, Alves VM, Muratov EN, et al. Pred-skin: a fast and reliable for compound property calculation and chemical library design. web application to assess skin sensitization effect of chemicals. Nucleic Acids Res. 2015;43(W1):W200–7. J Chem Inf Model. 2017;57:1013–1017. 44. Enslein K, Gombar VK, Blake BW. International commission for 70. Alves VM, Capuzzi SJ, Braga RC, et al. A perspective and a new protection against environmental mutagens and carcinogens. Use integrated computational strategy for skin sensitization of SAR in computer-assisted prediction of carcinogenicity and assessment. ACS Sustain Chem Eng. 2018;6:2845–2859. mutagenicity of chemicals by the TOPKAT program. Mutat Res. 71. Pred-Skin; [cited 2020 Jun 11]. Available from: http://labmol.com. 1994;305:47–61. br/predskin/ 45. Tian S, Djoumbou-Feunang Y, Greiner R, et al. CypReact: a software 72. SwissADME webserver; [cited 2020 Jun 11]. Available from: http:// tool for in silico reactant prediction for human cytochrome P450 swissadme.ch/ enzymes. J Chem Inf Model. 2018;58:1282–1291. 73. Zoete V, Daina A, Bovigny C, et al. SwissSimilarity: a web tool for 46. Zaretzki J, Matlock M, Swamidass SJ. XenoSite: accurately predict­ low to ultra high throughput ligand-based virtual screening. ing CYP-mediated sites of metabolism with neural networks. J Chem Inf Model. 2016;56:1399–1404. J Chem Inf Model. 2013;53:3373–3383. 74. Gfeller D, Grosdidier A, Wirth M, et al. SwissTargetPrediction: a web 47. Straker RN, Peng Q, Mekareeya A, et al. Computational ligand server for target prediction of bioactive small molecules. Nucleic design in enantio- and diastereoselective ynamide [5+2] Acids Res. 2014;42:W32–W38. cycloisomerization. Nat Commun. 2016;7:10109. 75. Grosdidier A, Zoete V, Michielin O. SwissDock, a protein-small 48. DiRocco DA, Ji Y, Sherer EC, et al. A multifunctional catalyst that molecule docking web service based on EADock DSS. Nucleic stereoselectively assembles prodrugs. Science. 2017;356:426–430. Acids Res. 2011;39:W270–7. 49. Coley CW, Barzilay R, Jaakkola TS, et al. Prediction of organic 76. Wirth M, Zoete V, Michielin O, et al. SwissBioisostere: a database of reaction outcomes using machine learning. ACS Cent Sci. molecular replacements for ligand design. Nucleic Acids Res. 2017;3:434–443. 2013;41:D1137–43. 50. Ahneman DT, Estrada JG, Lin S, et al. Predicting reaction perfor­ 77. Zoete V, Cuendet MA, Grosdidier A, et al. SwissParam: a fast force mance in C-N cross-coupling using machine learning. Science. field generation tool for small organic molecules. J Comput Chem. 2018;360:186–190. 2011;32:2359–2368. EXPERT OPINION ON DRUG DISCOVERY 15

78. Daina A, Zoete V. A BOILED-Egg to predict gastrointestinal absorp­ 87. Wishart DS, Feunang YD, Guo AC, et al. DrugBank 5.0: a major tion and brain penetration of small molecules. Chem Med Chem. update to the DrugBank database for 2018. Nucleic Acids Res. 2016;11:1117–1121. 2018;46:D1074–82. 79. Schyman P, Liu R, Desai V, et al. vNN web server for ADMET 88. Mendez D, Gaulton A, Bento AP, et al. (2019) ChEMBL: towards predictions. Front Pharmacol. 2017;8:889. direct deposition of bioassay data. Nucleic Acids Res. 2019;47: 80. vNN-ADMET tool; [cited 2020 Jun 11]. Available from: https://vnnad D930–40. met.bhsai.org/ 89. Cao D, Wang J, Zhou R, et al. ADMET evaluation in drug discovery. 81. XenoSite tool; [cited 2020 Jun 11]. Available from: https://swami. 11. PharmacoKinetics Knowledge Base (PKKB): a comprehensive wustl.edu/xenosite/submit database of pharmacokinetic and toxic properties for drugs. 82. Hughes TB, Miller GP, Swamidass SJ. Modeling epoxidation of J Chem Inf Model. 2012;52:1132–1137. drug-like molecules with a deep machine learning network. ACS 90. Williams AJ, Grulke CM, Edwards J, et al. The comptox chemistry Cent Sci. 2015;1:168–180. dashboard: a community data resource for environmental 83. Hughes TB, Swamidass SJ. Deep learning to predict the formation of chemistry. J Cheminf. 2017;9:61. quinone species in drug metabolism. Chem Res Toxicol. 2017;30:642–656. 91. Gilson MK, Liu T, Baitaluk M, et al. (2016) BindingDB in 2015: 84. Hughes TB, Miller GP, Swamidass SJ. Site of reactivity models A public database for medicinal chemistry, computational chem­ predict molecular reactivity of diverse chemicals with glutathione. istry and systems . Nucleic Acids Res. 2016;44: Chem Res Toxicol. 2015;28:797–809. D1045–53. 85. Dang NL, Hughes TB, Krishnamurthy V, et al. A simple model predicts 92. Banerjee P, Erehman J, Gohlke BO, et al. Super Natural II-a database UGT-mediated metabolism. Bioinformatics. 2016;32:3183–3189. of natural products. Nucleic Acids Res. 2015;43:D935–39. 86. Rydberg P, Gloriam DE, Zaretzki J, et al. SMARTCyp: A 2D method 93. Mohanraj K, Karthikeyan BS, Vivek-Ananth RP, et al. IMPPAT: for prediction of cytochrome P450-mediated drug metabolism. ACS a curated database of Indian medicinal plants, phytochemistry Med Chem Lett. 2010;1:96–100. and therapeutics. Sci Rep. 2018;8:4329.