Towards Cheminformatics-Based Estimation of Drug Therapeutic Index: Predicting the Protective Index of Anticonvulsants Using
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Molecular Graphics and Modelling 67 (2016) 102–110 Contents lists available at ScienceDirect Journal of Molecular Graphics and Modelling j ournal homepage: www.elsevier.com/locate/JMGM Towards cheminformatics-based estimation of drug therapeutic index: Predicting the protective index of anticonvulsants using a new quantitative structure-index relationship approach a a b a a a Shangying Chen , Peng Zhang , Xin Liu , Chu Qin , Lin Tao , Cheng Zhang , c a,∗ a,∗ Sheng Yong Yang , Yu Zong Chen , Wai Keung Chui a Department of Pharmacy, National University of Singapore, 18 Science Drive 4, Singapore 117543, Singapore b Shanghai Applied Protein Technology Co. Ltd, Research Center for Proteome Analysis, Institute of Biochemistry and cell Biology, Shanghai Institutes for Biological Sciences, Shanghai, 200233, China c State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, West China Medical School, Sichuan University, Sichuan, China a r a t i c l e i n f o b s t r a c t Article history: The overall efficacy and safety profile of a new drug is partially evaluated by the therapeutic index in Received 8 March 2016 clinical studies and by the protective index (PI) in preclinical studies. In-silico predictive methods may Received in revised form 17 May 2016 facilitate the assessment of these indicators. Although QSAR and QSTR models can be used for predicting Accepted 18 May 2016 PI, their predictive capability has not been evaluated. To test this capability, we developed QSAR and QSTR Available online 19 May 2016 models for predicting the activity and toxicity of anticonvulsants at accuracy levels above the literature- reported threshold (LT) of good QSAR models as tested by both the internal 5-fold cross validation and Keywords: external validation method. These models showed significantly compromised PI predictive capability due Quantitative structure–activity relationship to the cumulative errors of the QSAR and QSTR models. Therefore, in this investigation a new quantitative Quantitative structure-index relationship structure-index relationship (QSIR) model was devised and it showed improved PI predictive capability Support vector regression Protective index that superseded the LT of good QSAR models. The QSAR, QSTR and QSIR models were developed using Anticonvulsant support vector regression (SVR) method with the parameters optimized by using the greedy search method. The molecular descriptors relevant to the prediction of anticonvulsant activities, toxicities and PIs were analyzed by a recursive feature elimination method. The selected molecular descriptors are primarily associated with the drug-like, pharmacological and toxicological features and those used in the published anticonvulsant QSAR and QSTR models. This study suggested that QSIR is useful for estimating the therapeutic index of drug candidates. © 2016 Published by Elsevier Inc. 1. Introduction favorable therapeutic efficacy and toxicity profiles [4,5]. An impor- tant and routinely evaluated indicator in clinical studies is the Therapeutic efficacy and safety are the major determinants of therapeutic index (TI), which is the highest non-toxic dose divided the quality of drug candidates [1–3]. Drug discovery productiv- by the desired therapeutic dose for a given indication [6]. In various ity can be improved by early identification of the candidates with drug discovery stages, TI can be estimated from the ratio of off- target to target potencies [6–8], in-vitro toxicity to activity levels [6,9], or animal safety to efficacy endpoints [6,10,11]. In preclinical studies, TI has been frequently assessed by measuring the protec- Abbreviations: TI, therapeutic index; PI, protective index; TD50, toxic dose for 50% tive index (PI), which is the toxic dose for 50% of the population of the population; ED50, the minimum effective dose for 50% of the population; QSAR, (TD ) divided by the minimum effective dose for 50% of the pop- quantitative structure activity relationship; QSTR, quantitative structure toxicity 50 relationship; QSIR, quantitative structure index relationship; LT, literature-reported ulation (ED50) [12]. threshold; SVR, support vector regression; SVM, support vector machine; 5FCV, 5- Early assessment of TI or PI may be facilitated by means of in- fold cross validation; EV, external validation; RFE, recursive feature elimination; silico prediction and modeling methods [13,14]. In particular, the RBF, radial basis function; RMSE, root mean square error; k-MCA, k-Means cluster analysis. well-developed quantitative structure activity relationship (QSAR) ∗ [15–19] and quantitative structure toxicity relationship (QSTR) Corresponding authors. E-mail addresses: [email protected] (Y.Z. Chen), [email protected] [20–23] models for predicting efficacy and toxicity levels may be (W.K. Chui). http://dx.doi.org/10.1016/j.jmgm.2016.05.006 1093-3263/© 2016 Published by Elsevier Inc. S. Chen et al. / Journal of Molecular Graphics and Modelling 67 (2016) 102–110 103 collectively used for predicting TI or PI (herein it shall be called the C-QSAR-QSTR method). This can be achieved in such a way where QSAR predicts activity logA and QSTR predicts toxicity logT, and PI is given by logPI = logT − logA. However to date the C-QSAR-QSTR method has not been tested or used for directly predicting TI or PI, even though the QSAR and QSTR models have been used for guiding the improvement of TI [24,25]. The prediction of PI is complicated by the difficulty in modeling the complex pharmacodynamic and toxicological profiles of drug candidates [26]; and the noises that arise from the varying experi- mental conditions and measurement errors. The current developed QSAR and QSTR models (Supplementary Table S1) produced sub- stantial degrees of predictive errors as reflected in the squared correlation coefficient values being in the range of 0.45–0.97 and 0.38–0.95 respectively (most models in the range of 0.50–0.80). While the predictive errors of the good QSAR and QSTR models 2 (R pred > 0.60) may be sufficiently small to enable fair prediction of logA and logT respectively [27,28], however, their cumulative errors may become large enough to compromise the prediction of TI/PI. If the root mean square error (RMSE) of a QSAR and a QSTR model is EA and ET respectively, the RMSE for computing logPI ± (=(logT ET) − (logA ± EA)) may be as large as EA + ET. Hence, before C-QSAR-QSTR is used for predicting TI/PI, its pre- Fig. 1. Flowchart of SVR model development. dictive capability must be evaluated. If this capability is severely compromised, a new method will be needed. These issues were neural toxicity data and protective index of the anticonvulsants investigated in this work; beginning with the aim of developing from literatures. Each collected anticonvulsant was represented by good QSAR and QSTR models for predicting the in-vivo anticonvul- 2 molecular descriptors computed from MoDEL [48], PROFEAT [49] sant activity and neurotoxicity at accuracy levels of R pred > 0.6 as and PaDEL [50]. The collected 481 anticonvulsants were then ratio- the threshold for good QSAR models [27,28] based on both the 5- nally divided into the training and independent test set by using fold cross validation (5FCV) and the external validation (EV) tests. k-Means cluster analysis (k-MCA) method [51–53]. Specifically, This was followed by determining the PI predictive capability using the k-Means clustering algorithm was written in our own code the corresponding C-QSAR-QSTR method. In order to cover the using FORTRAN language. In this work, the 481 anticonvulsants diverse structures and activity/toxicity mechanisms of the anticon- were clustered into 20 clusters by the k-Means clustering method vulsants, a machine learning regression method, namely support such that each cluster contains 1–65 (mostly 10–65) anticonvul- vector regression (SVR), was used to develop the QSAR and QSTR sants of similar structures or molecular scaffolds. Approximately models [29–33]. Secondly, a new SVR quantitative structure index 20% of the anticonvulsants in each cluster were randomly selected relationship (QSIR) model for predicting PI was tested on the basis for constructing a test set of 78 anticonvulsants. The remaining 403 that the molecular descriptors of the anticonvulsants could be non- compounds were used as the training set for SVR model develop- linearly correlated to their PI values. ment. Next, the SVR QSAR, QSTR and QSIR models were constructed These studies were conducted on anticonvulsants because of the by using the -SVR module of LibSVM package (version 3.18) availability of the in-vivo anticonvulsant activity and toxicity data [54], with the SVR parameters determined by the greedy search for a higher number of active compounds (Supplementary Table method [55] (by means of our own Perl script code in conjunc- S2). These anticonvulsants have been developed for the treatment tion with LibSVM). The subsets of the molecular descriptors most of epilepsy and seizures [34], and routinely tested for neural toxicity relevant for the QSAR, QSTR and QSIR prediction of the activity, [35]. SVR was used for developing the QSAR, QSTR and QSIR models toxicity and PI were selected by using the RFE method [45,46]. because of its consistently good performances in classifying a vari- To ensure the predictive ability of the SVR models, both inter- ety of pharmacological properties for diverse structures [29–33] nal 5-fold cross validation (5FCV) method and external validation particularly in the development of the QSAR and QSTR models method were performed. In the internal 5FCV, the compounds of [36–39]. Moreover, SVR models may, to some extent, tolerate sam- the training set were randomly divided into five sets of approx- ple redundancy [40–42] and have lower over-fitting risks [43,44]. imately equal size, each set was used as a test set and the other To further probe which molecular descriptors are relevant for the four as a training set, and the average performance of the five prediction of the PI values of anticonvulsants, the recursive fea- test set was used for evaluating the SVR models.