Journal of Molecular Graphics and Modelling 67 (2016) 102–110

Contents lists available at ScienceDirect

Journal of Molecular Graphics and Modelling

j ournal homepage: www.elsevier.com/locate/JMGM

Towards cheminformatics-based estimation of therapeutic

index: Predicting the protective index of anticonvulsants using a new

quantitative structure-index relationship approach

a a b a a a

Shangying Chen , Peng Zhang , Xin Liu , Chu Qin , Lin Tao , Cheng Zhang ,

c a,∗ a,∗

Sheng Yong Yang , Yu Zong Chen , Wai Keung Chui

a

Department of , National University of Singapore, 18 Science Drive 4, Singapore 117543, Singapore

b

Shanghai Applied Protein Technology Co. Ltd, Research Center for Proteome Analysis, Institute of and , Shanghai Institutes for

Biological Sciences, Shanghai, 200233, China

c

State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, West China Medical School, Sichuan University, Sichuan, China

a r a

t i c l e i n f o b s t r a c t

Article history: The overall efficacy and safety profile of a new drug is partially evaluated by the in

Received 8 March 2016

clinical studies and by the protective index (PI) in preclinical studies. In-silico predictive methods may

Received in revised form 17 May 2016

facilitate the assessment of these indicators. Although QSAR and QSTR models can be used for predicting

Accepted 18 May 2016

PI, their predictive capability has not been evaluated. To test this capability, we developed QSAR and QSTR

Available online 19 May 2016

models for predicting the activity and of anticonvulsants at accuracy levels above the literature-

reported threshold (LT) of good QSAR models as tested by both the internal 5-fold cross validation and

Keywords:

external validation method. These models showed significantly compromised PI predictive capability due

Quantitative structure–activity relationship

to the cumulative errors of the QSAR and QSTR models. Therefore, in this investigation a new quantitative

Quantitative structure-index relationship

structure-index relationship (QSIR) model was devised and it showed improved PI predictive capability

Support vector regression

Protective index that superseded the LT of good QSAR models. The QSAR, QSTR and QSIR models were developed using

Anticonvulsant support vector regression (SVR) method with the parameters optimized by using the greedy search

method. The molecular descriptors relevant to the prediction of anticonvulsant activities, and

PIs were analyzed by a recursive feature elimination method. The selected molecular descriptors are

primarily associated with the drug-like, pharmacological and toxicological features and those used in the

published anticonvulsant QSAR and QSTR models. This study suggested that QSIR is useful for estimating

the therapeutic index of drug candidates.

© 2016 Published by Elsevier Inc.

1. Introduction favorable therapeutic efficacy and toxicity profiles [4,5]. An impor-

tant and routinely evaluated indicator in clinical studies is the

Therapeutic efficacy and safety are the major determinants of therapeutic index (TI), which is the highest non-toxic dose divided

the quality of drug candidates [1–3]. productiv- by the desired therapeutic dose for a given indication [6]. In various

ity can be improved by early identification of the candidates with drug discovery stages, TI can be estimated from the ratio of off-

target to target potencies [6–8], in-vitro toxicity to activity levels

[6,9], or animal safety to efficacy endpoints [6,10,11]. In preclinical

studies, TI has been frequently assessed by measuring the protec-

Abbreviations: TI, therapeutic index; PI, protective index; TD50, toxic dose for 50%

tive index (PI), which is the toxic dose for 50% of the population

of the population; ED50, the minimum effective dose for 50% of the population; QSAR,

(TD ) divided by the minimum effective dose for 50% of the pop-

quantitative structure activity relationship; QSTR, quantitative structure toxicity 50

relationship; QSIR, quantitative structure index relationship; LT, literature-reported ulation (ED50) [12].

threshold; SVR, support vector regression; SVM, support vector machine; 5FCV, 5- Early assessment of TI or PI may be facilitated by means of in-

fold cross validation; EV, external validation; RFE, recursive feature elimination;

silico prediction and modeling methods [13,14]. In particular, the

RBF, radial basis function; RMSE, root mean square error; k-MCA, k-Means cluster

analysis. well-developed quantitative structure activity relationship (QSAR)

∗ [15–19] and quantitative structure toxicity relationship (QSTR)

Corresponding authors.

E-mail addresses: [email protected] (Y.Z. Chen), [email protected] [20–23] models for predicting efficacy and toxicity levels may be

(W.K. Chui).

http://dx.doi.org/10.1016/j.jmgm.2016.05.006

1093-3263/© 2016 Published by Elsevier Inc.

S. Chen et al. / Journal of Molecular Graphics and Modelling 67 (2016) 102–110 103

collectively used for predicting TI or PI (herein it shall be called the

C-QSAR-QSTR method). This can be achieved in such a way where

QSAR predicts activity logA and QSTR predicts toxicity logT, and PI

is given by logPI = logT − logA. However to date the C-QSAR-QSTR

method has not been tested or used for directly predicting TI or PI,

even though the QSAR and QSTR models have been used for guiding

the improvement of TI [24,25].

The prediction of PI is complicated by the difficulty in modeling

the complex pharmacodynamic and toxicological profiles of drug

candidates [26]; and the noises that arise from the varying experi-

mental conditions and measurement errors. The current developed

QSAR and QSTR models (Supplementary Table S1) produced sub-

stantial degrees of predictive errors as reflected in the squared

correlation coefficient values being in the range of 0.45–0.97 and

0.38–0.95 respectively (most models in the range of 0.50–0.80).

While the predictive errors of the good QSAR and QSTR models

2

(R pred > 0.60) may be sufficiently small to enable fair prediction

of logA and logT respectively [27,28], however, their cumulative

errors may become large enough to compromise the prediction of

TI/PI. If the root mean square error (RMSE) of a QSAR and a QSTR

model is EA and ET respectively, the RMSE for computing logPI

±

(=(logT ET) − (logA ± EA)) may be as large as EA + ET.

Hence, before C-QSAR-QSTR is used for predicting TI/PI, its pre- Fig. 1. Flowchart of SVR model development.

dictive capability must be evaluated. If this capability is severely

compromised, a new method will be needed. These issues were

neural toxicity data and protective index of the anticonvulsants

investigated in this work; beginning with the aim of developing

from literatures. Each collected anticonvulsant was represented by

good QSAR and QSTR models for predicting the in-vivo anticonvul-

2 molecular descriptors computed from MoDEL [48], PROFEAT [49]

sant activity and neurotoxicity at accuracy levels of R pred > 0.6 as

and PaDEL [50]. The collected 481 anticonvulsants were then ratio-

the threshold for good QSAR models [27,28] based on both the 5-

nally divided into the training and independent test set by using

fold cross validation (5FCV) and the external validation (EV) tests.

k-Means cluster analysis (k-MCA) method [51–53]. Specifically,

This was followed by determining the PI predictive capability using

the k-Means clustering algorithm was written in our own code

the corresponding C-QSAR-QSTR method. In order to cover the

using FORTRAN language. In this work, the 481 anticonvulsants

diverse structures and activity/toxicity mechanisms of the anticon-

were clustered into 20 clusters by the k-Means clustering method

vulsants, a machine learning regression method, namely support

such that each cluster contains 1–65 (mostly 10–65) anticonvul-

vector regression (SVR), was used to develop the QSAR and QSTR

sants of similar structures or molecular scaffolds. Approximately

models [29–33]. Secondly, a new SVR quantitative structure index

20% of the anticonvulsants in each cluster were randomly selected

relationship (QSIR) model for predicting PI was tested on the basis

for constructing a test set of 78 anticonvulsants. The remaining 403

that the molecular descriptors of the anticonvulsants could be non-

compounds were used as the training set for SVR model develop-

linearly correlated to their PI values.

ment. Next, the SVR QSAR, QSTR and QSIR models were constructed

These studies were conducted on anticonvulsants because of the

by using the ␧-SVR module of LibSVM package (version 3.18)

availability of the in-vivo anticonvulsant activity and toxicity data

[54], with the SVR parameters determined by the greedy search

for a higher number of active compounds (Supplementary Table

method [55] (by means of our own Perl script code in conjunc-

S2). These anticonvulsants have been developed for the treatment

tion with LibSVM). The subsets of the molecular descriptors most

of epilepsy and seizures [34], and routinely tested for neural toxicity

relevant for the QSAR, QSTR and QSIR prediction of the activity,

[35]. SVR was used for developing the QSAR, QSTR and QSIR models

toxicity and PI were selected by using the RFE method [45,46].

because of its consistently good performances in classifying a vari-

To ensure the predictive ability of the SVR models, both inter-

ety of pharmacological properties for diverse structures [29–33]

nal 5-fold cross validation (5FCV) method and external validation

particularly in the development of the QSAR and QSTR models

method were performed. In the internal 5FCV, the compounds of

[36–39]. Moreover, SVR models may, to some extent, tolerate sam-

the training set were randomly divided into five sets of approx-

ple redundancy [40–42] and have lower over-fitting risks [43,44].

imately equal size, each set was used as a test set and the other

To further probe which molecular descriptors are relevant for the

four as a training set, and the average performance of the five

prediction of the PI values of anticonvulsants, the recursive fea-

test set was used for evaluating the SVR models. Both RFE feature

ture elimination (RFE) method [45,46] was employed for filtering

selection and greedy search parameters optimization were per-

the molecular descriptors relevant to the prediction of anticon-

formed in the 5FCV. The final SVR models were constructed based

vulsant activities, toxicities and PI values. The RFE method was

on the optimally selected descriptors and parameters, their pre-

used because of its demonstrated ability in selecting the molecular

diction performance were subjectively measured by the external

descriptors relevant for the prediction of specific pharmacokinetic

validation set of 78 compounds that have not been contemplated

and toxicological properties [47].

during the model development.

2. Materials and methods 2.2. Data collection

2.1. Overview of computational procedure A total of 481 anticonvulsants and their in-vivo anticonvulsant

activity and neural toxicity data were collected from literatures

The overview of SVR model development procedure is out- (Supplementary Table S2). Their 2D structures, 3D structures and

lined in Fig. 1. In this study, we first searched the in-vivo activity, molecular scaffolds were generated by using ChemDraw [56], Download English Version: https://daneshyari.com/en/article/443380

Download Persian Version:

https://daneshyari.com/article/443380

Daneshyari.com