Journal of Molecular Graphics and Modelling 67 (2016) 102–110
Contents lists available at ScienceDirect
Journal of Molecular Graphics and Modelling
j ournal homepage: www.elsevier.com/locate/JMGM
Towards cheminformatics-based estimation of drug therapeutic
index: Predicting the protective index of anticonvulsants using a new
quantitative structure-index relationship approach
a a b a a a
Shangying Chen , Peng Zhang , Xin Liu , Chu Qin , Lin Tao , Cheng Zhang ,
c a,∗ a,∗
Sheng Yong Yang , Yu Zong Chen , Wai Keung Chui
a
Department of Pharmacy, National University of Singapore, 18 Science Drive 4, Singapore 117543, Singapore
b
Shanghai Applied Protein Technology Co. Ltd, Research Center for Proteome Analysis, Institute of Biochemistry and cell Biology, Shanghai Institutes for
Biological Sciences, Shanghai, 200233, China
c
State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, West China Medical School, Sichuan University, Sichuan, China
a r a
t i c l e i n f o b s t r a c t
Article history: The overall efficacy and safety profile of a new drug is partially evaluated by the therapeutic index in
Received 8 March 2016
clinical studies and by the protective index (PI) in preclinical studies. In-silico predictive methods may
Received in revised form 17 May 2016
facilitate the assessment of these indicators. Although QSAR and QSTR models can be used for predicting
Accepted 18 May 2016
PI, their predictive capability has not been evaluated. To test this capability, we developed QSAR and QSTR
Available online 19 May 2016
models for predicting the activity and toxicity of anticonvulsants at accuracy levels above the literature-
reported threshold (LT) of good QSAR models as tested by both the internal 5-fold cross validation and
Keywords:
external validation method. These models showed significantly compromised PI predictive capability due
Quantitative structure–activity relationship
to the cumulative errors of the QSAR and QSTR models. Therefore, in this investigation a new quantitative
Quantitative structure-index relationship
structure-index relationship (QSIR) model was devised and it showed improved PI predictive capability
Support vector regression
Protective index that superseded the LT of good QSAR models. The QSAR, QSTR and QSIR models were developed using
Anticonvulsant support vector regression (SVR) method with the parameters optimized by using the greedy search
method. The molecular descriptors relevant to the prediction of anticonvulsant activities, toxicities and
PIs were analyzed by a recursive feature elimination method. The selected molecular descriptors are
primarily associated with the drug-like, pharmacological and toxicological features and those used in the
published anticonvulsant QSAR and QSTR models. This study suggested that QSIR is useful for estimating
the therapeutic index of drug candidates.
© 2016 Published by Elsevier Inc.
1. Introduction favorable therapeutic efficacy and toxicity profiles [4,5]. An impor-
tant and routinely evaluated indicator in clinical studies is the
Therapeutic efficacy and safety are the major determinants of therapeutic index (TI), which is the highest non-toxic dose divided
the quality of drug candidates [1–3]. Drug discovery productiv- by the desired therapeutic dose for a given indication [6]. In various
ity can be improved by early identification of the candidates with drug discovery stages, TI can be estimated from the ratio of off-
target to target potencies [6–8], in-vitro toxicity to activity levels
[6,9], or animal safety to efficacy endpoints [6,10,11]. In preclinical
studies, TI has been frequently assessed by measuring the protec-
Abbreviations: TI, therapeutic index; PI, protective index; TD50, toxic dose for 50%
tive index (PI), which is the toxic dose for 50% of the population
of the population; ED50, the minimum effective dose for 50% of the population; QSAR,
(TD ) divided by the minimum effective dose for 50% of the pop-
quantitative structure activity relationship; QSTR, quantitative structure toxicity 50
relationship; QSIR, quantitative structure index relationship; LT, literature-reported ulation (ED50) [12].
threshold; SVR, support vector regression; SVM, support vector machine; 5FCV, 5- Early assessment of TI or PI may be facilitated by means of in-
fold cross validation; EV, external validation; RFE, recursive feature elimination;
silico prediction and modeling methods [13,14]. In particular, the
RBF, radial basis function; RMSE, root mean square error; k-MCA, k-Means cluster
analysis. well-developed quantitative structure activity relationship (QSAR)
∗ [15–19] and quantitative structure toxicity relationship (QSTR)
Corresponding authors.
E-mail addresses: [email protected] (Y.Z. Chen), [email protected] [20–23] models for predicting efficacy and toxicity levels may be
(W.K. Chui).
http://dx.doi.org/10.1016/j.jmgm.2016.05.006
1093-3263/© 2016 Published by Elsevier Inc.
S. Chen et al. / Journal of Molecular Graphics and Modelling 67 (2016) 102–110 103
collectively used for predicting TI or PI (herein it shall be called the
C-QSAR-QSTR method). This can be achieved in such a way where
QSAR predicts activity logA and QSTR predicts toxicity logT, and PI
is given by logPI = logT − logA. However to date the C-QSAR-QSTR
method has not been tested or used for directly predicting TI or PI,
even though the QSAR and QSTR models have been used for guiding
the improvement of TI [24,25].
The prediction of PI is complicated by the difficulty in modeling
the complex pharmacodynamic and toxicological profiles of drug
candidates [26]; and the noises that arise from the varying experi-
mental conditions and measurement errors. The current developed
QSAR and QSTR models (Supplementary Table S1) produced sub-
stantial degrees of predictive errors as reflected in the squared
correlation coefficient values being in the range of 0.45–0.97 and
0.38–0.95 respectively (most models in the range of 0.50–0.80).
While the predictive errors of the good QSAR and QSTR models
2
(R pred > 0.60) may be sufficiently small to enable fair prediction
of logA and logT respectively [27,28], however, their cumulative
errors may become large enough to compromise the prediction of
TI/PI. If the root mean square error (RMSE) of a QSAR and a QSTR
model is EA and ET respectively, the RMSE for computing logPI
±
(=(logT ET) − (logA ± EA)) may be as large as EA + ET.
Hence, before C-QSAR-QSTR is used for predicting TI/PI, its pre- Fig. 1. Flowchart of SVR model development.
dictive capability must be evaluated. If this capability is severely
compromised, a new method will be needed. These issues were
neural toxicity data and protective index of the anticonvulsants
investigated in this work; beginning with the aim of developing
from literatures. Each collected anticonvulsant was represented by
good QSAR and QSTR models for predicting the in-vivo anticonvul-
2 molecular descriptors computed from MoDEL [48], PROFEAT [49]
sant activity and neurotoxicity at accuracy levels of R pred > 0.6 as
and PaDEL [50]. The collected 481 anticonvulsants were then ratio-
the threshold for good QSAR models [27,28] based on both the 5-
nally divided into the training and independent test set by using
fold cross validation (5FCV) and the external validation (EV) tests.
k-Means cluster analysis (k-MCA) method [51–53]. Specifically,
This was followed by determining the PI predictive capability using
the k-Means clustering algorithm was written in our own code
the corresponding C-QSAR-QSTR method. In order to cover the
using FORTRAN language. In this work, the 481 anticonvulsants
diverse structures and activity/toxicity mechanisms of the anticon-
were clustered into 20 clusters by the k-Means clustering method
vulsants, a machine learning regression method, namely support
such that each cluster contains 1–65 (mostly 10–65) anticonvul-
vector regression (SVR), was used to develop the QSAR and QSTR
sants of similar structures or molecular scaffolds. Approximately
models [29–33]. Secondly, a new SVR quantitative structure index
20% of the anticonvulsants in each cluster were randomly selected
relationship (QSIR) model for predicting PI was tested on the basis
for constructing a test set of 78 anticonvulsants. The remaining 403
that the molecular descriptors of the anticonvulsants could be non-
compounds were used as the training set for SVR model develop-
linearly correlated to their PI values.
ment. Next, the SVR QSAR, QSTR and QSIR models were constructed
These studies were conducted on anticonvulsants because of the
by using the -SVR module of LibSVM package (version 3.18)
availability of the in-vivo anticonvulsant activity and toxicity data
[54], with the SVR parameters determined by the greedy search
for a higher number of active compounds (Supplementary Table
method [55] (by means of our own Perl script code in conjunc-
S2). These anticonvulsants have been developed for the treatment
tion with LibSVM). The subsets of the molecular descriptors most
of epilepsy and seizures [34], and routinely tested for neural toxicity
relevant for the QSAR, QSTR and QSIR prediction of the activity,
[35]. SVR was used for developing the QSAR, QSTR and QSIR models
toxicity and PI were selected by using the RFE method [45,46].
because of its consistently good performances in classifying a vari-
To ensure the predictive ability of the SVR models, both inter-
ety of pharmacological properties for diverse structures [29–33]
nal 5-fold cross validation (5FCV) method and external validation
particularly in the development of the QSAR and QSTR models
method were performed. In the internal 5FCV, the compounds of
[36–39]. Moreover, SVR models may, to some extent, tolerate sam-
the training set were randomly divided into five sets of approx-
ple redundancy [40–42] and have lower over-fitting risks [43,44].
imately equal size, each set was used as a test set and the other
To further probe which molecular descriptors are relevant for the
four as a training set, and the average performance of the five
prediction of the PI values of anticonvulsants, the recursive fea-
test set was used for evaluating the SVR models. Both RFE feature
ture elimination (RFE) method [45,46] was employed for filtering
selection and greedy search parameters optimization were per-
the molecular descriptors relevant to the prediction of anticon-
formed in the 5FCV. The final SVR models were constructed based
vulsant activities, toxicities and PI values. The RFE method was
on the optimally selected descriptors and parameters, their pre-
used because of its demonstrated ability in selecting the molecular
diction performance were subjectively measured by the external
descriptors relevant for the prediction of specific pharmacokinetic
validation set of 78 compounds that have not been contemplated
and toxicological properties [47].
during the model development.
2. Materials and methods 2.2. Data collection
2.1. Overview of computational procedure A total of 481 anticonvulsants and their in-vivo anticonvulsant
activity and neural toxicity data were collected from literatures
The overview of SVR model development procedure is out- (Supplementary Table S2). Their 2D structures, 3D structures and
lined in Fig. 1. In this study, we first searched the in-vivo activity, molecular scaffolds were generated by using ChemDraw [56], Download English Version: https://daneshyari.com/en/article/443380
Download Persian Version:
https://daneshyari.com/article/443380
Daneshyari.com