<<

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 DMD FastThis Forward. article has not Published been copyedited on andSeptember formatted. The 7, final 2012 version as doi:10.1124/dmd.112.047068may differ from this version. DMD #47968

Mitigating the inhibition of human Bile Salt Export Pump by : opportunities provided by physicochemical property modulation, in-silico modeling and structural modification

Daniel J. Warner, Hongming Chen, Louis-David Cantin, J. Gerry Kenna, Simone Stahl, Clare

L. Walker, Tobias Noeske.

Department of Medicinal Chemistry, AstraZeneca R&D Montreal, Montreal, Quebec, H4S Downloaded from

1Z9, Canada (DJW, LDC)

Computational Sciences, Discovery Sciences, AstraZeneca R&D Mölndal, Pepparedsleden dmd.aspetjournals.org 1, Mölndal 43183, Sweden (HC)

Molecular Toxicology, Global Safety Assessment, AstraZeneca, Alderley Park, Macclesfield,

Cheshire, SK10 4TG, UK (JGK, SS, CLW)

Global Safety Assessment, AstraZeneca R&D Mölndal, Pepparedsleden 1, Mölndal 43183, at ASPET Journals on October 10, 2021

Sweden (TN)

1

Copyright 2012 by the American Society for and Experimental Therapeutics. DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

Inhibition of the human Bile Salt Export Pump by drugs.

Corresponding author: Tobias Noeske Global Safety Assessment AstraZeneca R&D Mölndal S-431 83 Mölndal, Sweden Phone: +46-31-7064002 Mobile: +46-727-158344

Fax: +46-31-776-3759 Downloaded from Email: [email protected]

Statistics: dmd.aspetjournals.org Number of text pages: 25 Number of tables: 1 Number of figures: 7 Number of references: 46 at ASPET Journals on October 10, 2021 Number of words in the Abstract: 231 Number of words in the Introduction: 672 Number of words in the Discussion: 1,495

List of non-standard abbreviations: BSEP, human bile salt export pump; Bsep, non-human bile salt export pump; SVM, support vector machine; PLS, partial least square; ABC transporter, ATP-binding cassette transporter; DILI, induced injury; QSAR, quantitative structure-activity relationship; RF, random forest; PFIC, progressive familial intra- hepatic cholestasis; BRIC, benign recurrent intra-hepatic cholestasis.

2

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

Abstract

The human Bile Salt Export Pump (BSEP) is a membrane protein expressed on the canalicular plasma membrane domain of hepatocytes which mediates active transport of unconjugated and conjugated bile salts from liver cells into bile. BSEP activity therefore plays an important role in bile flow. In humans, genetically inherited defects in BSEP expression or activity cause cholestastic liver injury and many drugs which cause cholestatic drug induced liver injury (DILI) in humans have been shown to inhibit BSEP activity in vitro and in vivo.

These findings suggest that inhibition of BSEP activity by drugs could be one of the Downloaded from mechanisms which initiate human DILI. To gain insight into the chemical features responsible for BSEP inhibition, we have used a recently described in vitro membrane vesicle BSEP

inhibition assay to quantify transporter inhibition for a set of 624 compounds. The relationship dmd.aspetjournals.org between BSEP inhibition and molecular physicochemical properties was investigated and our results show that lipophilicity and molecular size are significantly correlated with BSEP inhibition. This data set was further used to build predictive BSEP classification models at ASPET Journals on October 10, 2021 through multiple QSAR modeling approaches. The highest level of predictive accuracy was provided by a support vector machine model (Accuracy=0.87, Kappa=0.74). These analyses highlight the potential value that can be gained by combining computational methods with experimental efforts in early stages of drug discovery projects to minimize the propensity of drug candidates to inhibit BSEP.

3

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

Introduction

The pharmaceutical industry continues to face high attrition rates throughout the entire drug discovery and development process, with only 8% of all drug candidates that enter Phase I studies progressing to the market (FDA white paper 2004; Schuster et al., 2005). A major cause of compound attrition in drug development is toxicity (Kramer et al., 2007; Greaves et al., 2007). The financial burden of safety related dropouts can reach many hundreds of million dollars in late stages of drug development (DiMasi et al., 2003). Consequently, it is

important to identify and mitigate potential compound related safety issues as early as Downloaded from possible during the drug discovery process.

One important cause of drug toxicity in humans is drug induced liver injury (DILI). This is a dmd.aspetjournals.org major cause for attrition in clinical trials, failed drug registration and withdrawal of marketed drugs (Abboud and Kaplowitz, 2007). For a few drugs, for example acetaminophen (Wallace

2004), DILI is a dose dependent and reproducible process which occurs in humans only following accidental or deliberate over-dosage and is reproducible in animals (Pirmohamed at ASPET Journals on October 10, 2021 et al., 1998). However many drugs cause DILI only infrequently or very rarely in humans, which is not overtly dose dependent and is not reproducible (Zimmerman 1978). The most common patterns of clinical presentation of DILI in humans are defined as either hepatocellular (i.e. primarily affecting hepatocyte function), cholestatic (primarily affecting the biliary system) or mixed hepatocellular/cholestatic (Mumoli et al., 2006). The underlying mechanisms are complex and include both compound related properties and factors which are specific to individual susceptible patients (Thompson et al., 2011). A variety of compound related DILI risk factors have been identified or proposed. These include formation of chemically reactive metabolites, mitochondrial impairment, potent cell cytotoxicity and inhibition of the Bile Salt Export Pump (BSEP) (Greer et al., 2010; Dawson et al., 2012).

BSEP (encoded by the ABCB11 gene) is a liver specific ABC transporter which is expressed on the canalicular domain of the hepatocyte plasma membrane and which mediates the

4

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968 active secretion of monovalent bile acids or salts into the bile canaliculi (Gerloff et al., 1998;

Noé et al., 2002; Byrne et al., 2002). Mutations in the ABCB11 gene occurr via codon deletions, insertions and various point mutations. These have been shown to cause either progressive familial intrahepatic cholestasis type 2 (PFIC2), which is a very rare but severe form of cholestatic liver disease that results in fatal liver failure unless treated by liver transplantation, or a milder form of liver injury termed benign recurrent intrahepatic cholestasis type 2 (BRIC-2) (Davit-Spraul et al., 2009). Many drugs which cause cholestatic

DILI in humans have been found to inhibit the activity of human BSEP, and its rat analogue Downloaded from Bsep, in vitro. In several cases inhibition of rat Bsep activity in vivo has also been demonstrated (Kostrubsky et al., 2006; Fattinger et al., 2001). Furthermore, it has been

shown recently that drugs which cause cholestatic DILI in humans exhibit markedly greater dmd.aspetjournals.org potencies and frequencies of BSEP or Bsep inhibition in vitro than drugs which cause hepatocellular DILI or drugs that do not cause DILI (Dawson et al., 2012; Morgan et al.,

2011). These findings suggest that mitigating BSEP inhibition may help reduce the likelihood at ASPET Journals on October 10, 2021 of DILI for humans.

However, applying high volume screening for BSEP inhibition in a drug discovery project would be cost intensive and time consuming and potentially could delay the rate of progression of the project. A more desirable strategy would be to combine in vitro BSEP screening with in silico BSEP predicting models (Saito et al., 2009). In the present study we have investigated the use of various computational algorithms to build a BSEP inhibition model for a set of 624 chemically diverse compounds. The effect of these compounds on

BSEP activity was quantified in vitro using a membrane vesicle assay. An accurate classification model for BSEP inhibition has been developed, which can be used for library profiling. The relationship between some molecular physicochemical properties, such as lipophilicity and size, and propensity of BSEP inhibition were also highlighted in this analysis.

5

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

Materials & Methods

Experimental Measurement

All standard reagents and chemicals were purchased from Sigma-Aldrich Company Ltd (St.

Louis, MO, USA), Merck (Hohenbrunn, Germany) or WAKO (Osaka, Japan) and were of the highest purity available. [3H]-taurocholate was obtained from American Radiolabeled

Chemicals, Inc. (St. Louis, MO, USA). Baculoviruses expressing human BSEP (ABCB11) were provided by Bruno Stieger (Stieger et al., 2000; Noé et al., 2002). Test compounds

(>95% purity) were provided by the Compound Management Group, Discovery Sciences, Downloaded from

AstraZeneca, Alderley Park. hBSEP (ABCB11) was expressed in Spodoptera frugiperda Sf21 insect cells from which dmd.aspetjournals.org inside-out membrane vesicles were prepared as described previously (Dawson et al., 2012) with the modification that at the end of the isolation procedure vesicles were resuspended in buffer containing 50 mM sucrose, 10 mM HEPES/Tris, pH 7.4 and protease inhibitor tablets at ASPET Journals on October 10, 2021 (Roche, Basel, Switzerland). Inhibition of hBSEP mediated ATP-dependent transport of

[3H]-taurocholate into vesicles was measured by a rapid filtration method as described previously (Dawson et al., 2012). Briefly, 60 μg hBSEP vesicles were incubated with test compound or (DMSO) vehicle and 0.5 μM [3H]-taurocholate for 5 minutes at 37°C in buffer containing 5 mM ATP, 10 mM MgCl2, 15.0 mM HEPES/Tris pH 7.4, 142 mM

KNO3, 158 mM sucrose, 12.5 mM Mg(NO3)2. For each compound, three independent experiments were performed with triplicates in each experiment for each compound test concentration (10, 30, 100, 250, 500, 1000 μM). The DMSO concentration in all reactions was 2% (v/v). The transport reaction was stopped by addition of vesicles to stop buffer containing 50 mM sucrose, 100 mM KCl, 10 mM HEPES/Tris pH 7.4, 5 mM EDTA, and 0.1 mM taurocholate. [3H]-taurocholate uptake into vesicles was measured by determining the counts per minute (CPM) using a TopCount NXT (PerkinElmer Life and Analytical Sciences,

Waltham, MA, USA). CPM values determined for reactions containing AMP were subtracted from reactions containing ATP to determine the ATP-dependent uptake activity. The 6

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968 percentage of uptake activity relative to DMSO vehicle control (100%) was determined for each compound test concentration. IC50 values were calculated with non-linear regression for a sigmoidal dose-response using the following formula:

ʜʚğĢĚĂüytđʛdzġāʝ (1) where X is the common logarithm of the concentration, Y is the response, nH is the variable

Hill slope, Bottom is 0 and Top is 100. Downloaded from Data Set

The BSEP data set for QSAR modeling comprises 624 internal compounds and external

reference compounds (Supplemental Table S4). A BSEP IC50 of 300 µM was used as a dmd.aspetjournals.org threshold value for classifying compounds, i.e. compounds with a geometric mean IC50 value less than 300 µM were regarded as BSEP active compounds (labeled as POS class), otherwise they were regarded as BSEP inactive compounds (labeled as NEG class). An IC50 at ASPET Journals on October 10, 2021 value of 300 µM determined in the membrane vesicle model has previously been found to be a useful operational threshold to identify drugs which are associated with cholestatic or mixed hepatocellular/cholestatic liver injury in human (Dawson et al., 2012). An in-house perl script was run to randomly split the BSEP data set into a training set and a test set with the ratio of 7:3 resulting in a training set of 437 compounds (231 POS, 206 NEG) and the test set of 187 compounds (94 POS, 93 NEG).

Physical property dependencies

Logistic regression analysis of the relationships between molecular properties and BSEP activity were conducted using JMP8.0 software (SAS Institute Inc., Cary, NC, USA) by estimating the probability of a compound being free from BSEP inhibition as a smooth function of a single parameter (e.g., log10 molecular weight, calculated ClogP (BioByte Corp.,

Claremont, CA, USA), calculated logD and acid dissociation constants (Advanced Chemistry

Development, Inc., Toronto, ON, Canada)). The statistical significance of probabilities based 7

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968 on the relationship as compared with the background probabilities were evaluated using a

Chi-square test statistic (G2) evaluated as in equation 2.

2∑ ln ∑ ln (2)

The observed significance probability (or p-value) for the Chi-square test is the probability of obtaining, by chance alone, a Chi-square test statistic greater than the one computed.

Probabilities of <0.05 were considered significant. Downloaded from R2 is the metric used to quantify how well each molecular property predicts BSEP activity, and is defined according to equation 3.

∑ ∑ dmd.aspetjournals.org ∑ (3)

R2 ranges from 0 to 1: a value of 0 would indicate no benefit in the use of a given molecular property to classify BSEP activity, whereas an R2 of 1 would indicate perfect classification. at ASPET Journals on October 10, 2021

Modeling methods

Multiple approaches for building a classification model of a compound´s BSEP inhibition activity were investigated. The first and simplest approach used a basic recursive partitioning

(RP) algorithm, based solely on the molecular weight and calculated lipophilicity of the structures. The partition scheme was built using JMP software. In contrast to the 1D physical properties modeling above, the scheme was derived from 437 training set compounds and evaluated using the randomly selected 187 test set compounds. Automatic parameter selection was applied at each point in the scheme to maximize the significance of the split.

Partitioning was performed until R2 (equation 3) for the test set ceased to improve, resulting in a total of four partitions, two for each parameter.

To further refine our description of the structure-activity relationships surrounding this transporter, two more elaborate sets of descriptors were calculated for all 624 molecular

8

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968 structures. A set of 196 2D/3D descriptors (referred to as the AZdesc set), including descriptors for molecular size, lipophilicity, bonding, electrostatics and topology were calculated with an in-house program. The descriptors themselves are described elsewhere (Paine et al., 2010; Bruneau 2001; Katritzky et al., 1998). A second descriptor set

(referred to as the AZFP_AZtop14 set) comprises a combination of fingerprint based descriptors including the in-house developed Ghose-Crippen atom type fingerprints (Ghose and Crippen, 1987) and a set of functional groups fingerprints (Arnold et al., 2004). This collection of structural descriptors was further supplemented with a subset of the AZdesc set, Downloaded from comprising an additional 14 descriptors to encode molecular bulk properties like lipophilicity, size, ionic states, etc. dmd.aspetjournals.org Partial least squares (PLS), and two non-linear machine learning methods – support vector machine (SVM) and random forest (RF) – were used to build the classification models based on the AZdesc set. For the AZFP_AZtop14 set, which was mostly focused on structural

fingerprints, only the two non-linear techniques (RF and SVM) were used. All three methods at ASPET Journals on October 10, 2021 are implemented in the in-house machine learning package AZOrange (Stålring et al., 2011), which is an extension of the open source package Orange (Orange official web site).

Classification performance measurements based on the confusion matrix were generated to assess the quality of the classification across all six approaches (Supplemental Table S2).

WizePairZ methodology

Matched molecular pairs analysis (MMPA) (Leach et al., 2011) was performed on the full set of 624 compounds using the WizePairZ algorithm (Warner et al., 2010). In order to deconvolute the intrinsic effects of structural transformations from their associated effect on molecular properties, two variables were considered in the analysis: BSEP activity, expressed as -log10(IC50) (or pIC50) and ClogP. As in our original study, the mean change in activity for each molecular pairing was plotted as a function of the change in lipophilicity,

9

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968 allowing those transformations which link the anticipated trend with physicochemical properties to be easily identified.

Results

Measurement of BSEP inhibition by the tested compounds

Typical concentration response curves of test compound effects on BSEP activity are shown in Fig 1, which includes data obtained with a compound that did not inhibit BSEP

(acetaminophen), a compound which exhibited potent BSEP inhibition (bosentan) and two Downloaded from compounds which exhibited less potent inhibition (bezafibrate and ). Experiment results are the mean ± standard error of the mean (S.E.M.) from three separate test occasions. For each compound an IC50 value was calculated and these values are dmd.aspetjournals.org summarized in the supplemental data. As can be seen from Fig. 2, 240 compounds did not exhibit BSEP inhibition. For 269 compounds, BSEP inhibition IC50 values were observed ranging between 1000 and 10 μM, i.e. were within the range of tested compound at ASPET Journals on October 10, 2021 concentrations. An additional 115 compounds exhibited very potent BSEP inhibition, which was defined as IC50 < 10 μM.

Molecular property dependencies

The influence of several molecular properties on the likelihood of BSEP inhibition with an apparent IC50 value of <300 µM is displayed in Figure 3. The line of fit (blue) in each plot of

Figure 3 is interpreted as the probability (y) of a data point appearing below the line (i.e. being free of BSEP inhibition for a given property value (x)). The scattered dots in the plots are the tested compounds and their color refers to the class (POS/NEG class) that they belong to. It seems that the likelihood of a compound inhibiting BSEP activity is highly dependent on its molecular weight and calculated lipophilicity. Any compound with a molecular weight of greater than 309 is more likely to exhibit a BSEP inhibition than not and compounds whose molecular weight below this threshold become more likely to be free from

10

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

BSEP inhibition (Figure 3d). The strong statistical significance of the relationship with molecular weight (p<0.0001) and reasonably high R2 (0.22) dictate that changes in the likelihood of activity occur fairly sharply. Calculated octanol/water partition coefficients provide an even more powerful mean of classifying the compounds into BSEP inhibition actives vs. BSEP inactives (R2 = 0.29). A ClogP of 2.27 marks the point at which the balance shifts from a compound being more likely to be free of BSEP inhibition to becoming a more likely BSEP inhibitor (Figure 3a).

In an attempt to identify an even more discriminatory lipophilicity metric, we investigated the Downloaded from impact of calculated logD on BSEP inhibition potency below the selected threshold value.

While this approach demonstrated less utility than the analysis based on ClogP described dmd.aspetjournals.org above (data not shown), we noticed an interesting improvement in classification with the use of ACDlogD at pH 6.5 over the same calculation based on pH 7.4 (R2 = 0.19 and 0.17 respectively). This indicates that while Biobyte’s logP algorithm provides the more effective

classifier in this case, the ionization state of the active species may also play a role. This led at ASPET Journals on October 10, 2021 us to investigate further the influence of pKa on BSEP inhibition (Figure 3b and 3e). The two hypotheses to be tested were: the reduced pH was causing weak acids to appear more lipophilic to the binding site thus improving their BSEP inhibition potency classification; or reduced pH caused weak bases to appear more hydrophilic, thus reducing their potency of

BSEP inhibition. The data would indicate the latter to be true, as while there was no significant relationship with ACD pKa (Acid 1) (p=0.2751), a significant decrease in the likelihood of BSEP inhibition with increasing basicity was observed (p=0.0002).

To seek additional support for the argument that the formation of cations disfavors BSEP inhibition potency, we classified all 624 compounds into five ion class categories: neutrals, bases, acids, cations and zwitterions (Figure 4). Across the full dataset the distribution between BSEP inhibition potency above and below the threshold value was almost completely even, and this was mirrored across the acids, bases and neutral compounds.

11

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

However, it is notable that cations formed the least likely class of BSEP inhibitors, with only two “active” compounds and ten “inactive” compounds (17%). The final paired logistic regression analysis compared the hydrogen bond donor/acceptor characteristics of the dataset (Figure 3c and 3f). Here, the only significant relationship was with the hydrogen bond donor count, where molecules with increasing number of donors were at reduced risk of

BSEP inhibition (p<0.0001). There was no significant relationship between BSEP inhibition category and the acceptor count.

BSEP classification model Downloaded from

In an effort to devise a predictive in silico model of BSEP inhibition, we embarked on the exploration of a number of approaches. Since molecular weight (MW) and lipophilicity are dmd.aspetjournals.org useful predictors of BSEP inhibition, as presented in the previous section, we decided to build a classification model based on a recursive partition (RP) algorithm regarding these two properties to serve as a baseline model (RP_ClogP_MW, Figure 5). At the top of the scheme, the first partition occurs at a ClogP threshold of 1.697, which provides reasonable at ASPET Journals on October 10, 2021 separation between BSEP actives and inactives on the second tier. The second partition takes place to the far left of the scheme (on the more lipophilic compounds), such that those with molecular weight above and below 296.2 are separated. This third tier is where the recursion ends for 204 of these compounds, as those with molecular weight > 296.2 are deemed almost certainly active with a probability of 91% (red section in the contour plots of

Figure 5b). Classification of the 78 lower weight compounds within this more lipophilic group is more ambiguous, but generally favors inactive compounds with a probability of 66% (light blue section in the contour plots of Figure 5b).

To the right of the scheme, the situation for the less lipophilic compounds is somewhat more complex. In this category where ClogP is < 1.697, compounds are most likely to be inactive, and overall, this does not change either way with the third tier partitioning at a ClogP of 0.59.

Below a ClogP of 0.59, the training set compounds are almost unanimously inactive (95

12

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968 compounds, 98% inactive). However, the fourth and final tier shows an interesting effect, where for the compounds of intermediate lipophilicity (0.59 ≤ ClogP < 1.697) the classification returns to being dependant on molecular weight (mid-blue/light-red area of figure 5b). In summary, with high lipophilicity and molecular weight ≥ 296.2, or alternatively, intermediate lipophilicity and molecular weight ≥ 360.4, compounds are more likely to be

BSEP inhibitors than not.

Although a highly predictive baseline model (RP model) could be obtained using only ClogP and MW descriptors, we sought to expand the number of descriptors included in the RP Downloaded from model to further refine the classifications through the identification of more subtle effects.

Several BSEP classification models were built, which allowed comparison of two nonlinear dmd.aspetjournals.org machine learning algorithms with a linear algorithm and comparison of bulk property based descriptors (AZdesc) with structural descriptors (AZFP_AZtop14). The models were built using a training set of 437 compounds (231 POS, 206 NEG) and then evaluated using a test

set of 187 compounds (94 POS, 93 NEG). The analysis of the test set using the different at ASPET Journals on October 10, 2021 models is summarized in Table 1 and the details of individual metrics for measuring model performance are presented as supplemental data (Table S1). Of the six models evaluated, the combination of the SVM algorithm and the AZdesc set (SVM_AZdesc) showed the best performance across most of the parameters (Table 1). The probabilities of this model correctly classifying compounds predicted to be positive and negative for BSEP inhibition were 0.85 and 0.90, respectively. Thus, if a compound was predicted positive by the model it stood an 85% chance of causing BSEP inhibition with an IC50 of <300 μM. The Kappa value for this SVM_AZdesc model was the highest (0.74), which provides an indication of the true accuracy of the model since it takes the probability of obtaining such agreement by chance into account. It is also interesting to note that the linear PLS model showed good negative precision and its sensitivity was almost as good as that of the SVM_AZdesc model.

13

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

Comparing the results from the two descriptor sets, it was clear that the non-linear models which combined fingerprint and a subset of the bulk property descriptors (AZFP_AZtop14) generally performed no better than the same methods using the AZdesc set. For this reason, these two models will not be discussed further. In comparison to the RP model, the

SVM_AZdesc model has a better accuracy in terms of sensitivity and negative precision

(0.90 vs. 0.84 in both cases). The common denominator in both these metrics, and hence the driving force behind this improvement, was the false negative count (compounds predicted negative but measured positive) which was reduced from fifteen to nine. Downloaded from

To check the robustness of these model generation strategies, a five-fold cross validation analysis was carried out on all 624 compounds for the non-linear and PLS modeling dmd.aspetjournals.org methods. The performance metrics for cross validation analysis is listed in the supplemental data (Table S2). Although SVM_AZdesc method still performs best in general, the difference between various non-linear modeling methods is actually very small and all non-linear

methods perform better than the PLS_AZdesc method. at ASPET Journals on October 10, 2021

In situations such as this, where lipophilicity of compounds has such a strong bearing on their effect on BSEP activity, it is desirable to identify subtle structural changes which reduce activity without compromising other molecular properties. This is especially useful in cases where a chemical series is in late-stage optimization and working with limited opportunities to modify the physicochemical properties. WizePairZ was designed for precisely this purpose and its application to our BSEP data set identified a total of 103 unique molecular transformations when including four bonds of the local environment (counting each transformation and its reverse just once).

Of the four analogues depicted in Figure 6a, it was only (5-amino-2- hydroxybenzoic acid) that exhibited in vitro BSEP inhibition. This compound was neither the most lipophilic in the group (ClogP = 1.06 vs. 2.44) nor that with the highest molecular weight

(153 vs. 154), although WizePairZ was able to identify a structural motif which correlated with 14

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968 reduced BSEP inhibition activity. It is the 5-amino group, which is absent from the remaining three structures. The presence of the amine marks this compound as the only one theoretically capable of forming zwitterions at physiological pH, which raises the intriguing possibility that this compound could be binding to BSEP in a higher-order state, analogous to that observed in its crystalline form (Banić-Tomišić Z, 1997). To explore this observation further it would be important to test 5-amino analogues of salicylic amide and thiosalicylic acid. Our analysis of BSEP inhibition activity across different ion classes revealed a low frequency of BSEP inhibitors amongst the zwitterionic compounds tested, which suggests Downloaded from that this may not be a common feature of compounds which inhibit the transporter.

We determined that the compound to the right of Figure 6b (Proxicromil) was a relatively dmd.aspetjournals.org potent BSEP inhibitor with an IC50 of 30 µM. Its matched molecular pair in the centre (from which only the hydroxyl group has been removed) is remarkable in that its BSEP inhibition activity was reduced by around 20-fold. This suggests that the hydroxyl group plays a role in

the interaction of this compound with the transporter. Further support for this interpretation is at ASPET Journals on October 10, 2021 provided by the BSEP inhibition potency of the structure on the left, in which three aliphatic carbon atoms had been removed and the hydroxyl group had been re-introduced, resulting in an overall modest decrease in ClogP when compared with the non-inhibitory compound in the centre. Previous attempts to model BSEP structure-activity relationship (SAR) identified hydroxyl counts as promoting inhibition (Saito et al., 2009). In their case the group was specifically bonded to aliphatic carbon atoms, whereas for our data the group was linked to an aromatic carbon atom.

Perhaps the striking SAR lies amongst the compounds paired with , which are depicted in Figure 6c. Conversion of Mianserin to Mepyramine on the left reduces lipophilicity slightly, while to the right features an increased ClogP although it is likely that this is offset by the increased basicity of the dihydroimidazole over the corresponding .

The fairly subtle differences between the physicochemical properties of the three compounds

15

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968 can be contrasted with the very large observed differences in BSEP inhibition activities and suggest that inhibition of BSEP activity by Mianserin is strongly influenced by the conformation of the molecule.

The final pair of structures provides what is potentially the most surprising result from the

WizePairZ analysis (Figure 6d). The conversion of the 2-methylindole () to its naphthyl equivalent () induced a near 30-fold drop in BSEP inhibition activity, despite an increase in ClogP of approximately 0.6. While this suggests that the indole in

Mepindolol makes a significant contribution to BSEP inhibition, there were no further pairs of Downloaded from the active species in the dataset to corroborate or invalidate this finding.

It should be noted that the absence of multiple observations makes statistical analysis on the dmd.aspetjournals.org potential to generalise these transformations impossible. Thus the examples illustrated are provided on an anecdotal basis, intended to fuel hypothesis formulation.

Discussion at ASPET Journals on October 10, 2021

Inhibition of BSEP is favored by high molecular weight hydrophobes

How lipophilicity and molecular weight might relate to toxicity or adverse events has been reported earlier (Leeson and Springthorpe, 2007), although the precise property ranges over which this relationship takes hold are surprising. A recent study on 85 marketed drugs demonstrated the correlation between DILI in humans and in vitro BSEP inhibition suggesting 300 µM as a useful operational threshold (Dawson et al., 2012). Using a 300 µM cutoff, we see a very sharp onset of BSEP inhibition within a lipophilicity and molecular weight range that is typically considered desirable for many drug discovery projects (Lipinski et al., 1997; Wenlock et al., 2003). With over 90% chance of BSEP inhibition activity for compounds with ClogP >1.7 and MW >300, we would consider routine screening for BSEP inhibition of promising chemical series to be a prudent strategy. Thus, we would suggest a general guidance when working with compounds on the borderline of BSEP activity to reduce

16

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968 molecular weight and/or lipophilicity. The contour plot (Figure 5b) indicates how a relatively modest reduction in either of these properties may be enough to tip the balance in the investigator’s favor. Where tolerated by the primary pharmacological target, incorporation of increasingly basic functional groups is likely to be beneficial in reducing the level of BSEP inhibition, as is the systematic replacement of hydrogen bond acceptor features with donors.

In particular circumstances, where compounds exhibit more significant inhibition and the proposed route of administration for the compound allows, introducing cationic groups such as quaternary amines appears to be an almost certain way of eliminating activity. Downloaded from

We have previously constructed a homology model of BSEP (unpublished) taking the X-ray structure of P-glycoprotein (Pgp) as a template (Aller et al., 2009) (Supplemental Figure S3). dmd.aspetjournals.org It was used to align the findings reported here with structural information available on this transporter. Aller et al. described two portals, allowing access to an inner cavity from either side of the transporter from within the inner leaflet. The strong dependence on ClogP

observed for our compounds is fully consistent with this hypothesis, as the more lipophillic at ASPET Journals on October 10, 2021 members of the dataset are more likely to become embedded within the vesicle, where they can then gain access to the BSEP cavity. Conversely, the more polar compounds in the set appear to be unable to access this site, regardless of their molecular weight. The correlations with calculated pKa and hydrogen bonding counts both point to the pore being an unfavourable environment for polar hydrogen atoms. This is likely due to the presence of the histidine and two arginine residues in the transmembrane domain (His72, Arg223 and Arg1034), and the apparent absence of any anionic residues in this region (Supplemental Figure S3a), consistent with an active site environment that has evolved to facilitate the transportation of derived acids from hepatocytes into bile.

Integrating computational and experimental approaches

Our proposal for the integration of computational and experimental approaches in BSEP screening is based on our experience with the QSAR model developed above

17

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

(SVM_AZdesc). If we were to prioritize only those compounds with high molecular weight and lipophilicity for experimental testing, there is a danger that active inhibitors will fail to be identified at an early stage (affecting 16% of active compounds). Our results show how the additional information encoded by this model facilitates the correct identification of positive compounds from the low molecular weight category.

Here we attempt to expose deeper insight into the origin of this improvement giving drug designers the confidence to use the model in prioritizing compounds for experimental testing.

In Figure 7 we show ten structures intended to illustrate from where the crucial improvements Downloaded from originate. None of these compounds were involved in training the model. The first two compounds in this figure, Benzylpenicillin and , contradict our hypothesis by dmd.aspetjournals.org exposing flaws in the model. Although they would both fall into the ‘high’ ClogP/molecular weight category, they are incorrectly classified as inactive by the model. However, these deficiencies are outweighed by the compounds in Figure 7b where exactly the reverse is

true. All eight of these compounds would be incorrectly classified according to their physical at ASPET Journals on October 10, 2021 properties alone, as according to the scheme no compound with a molecular weight below

296.2 can ever be classified as positive, regardless of its lipophilicity. Here we see how the additional information in the AZdesc set enables the model to correctly identify these compounds as active inhibitors of BSEP. Regrettably, the nature of some of the descriptors and the modeling method employed confine this computational approach to a black-box with limited interpretability. Future work should be focused on further developing transparent and predictive BSEP models which are beyond the liphophilicity and size factors.

Lipophilicity dependence and inhibition of multiple transporters

A preference for large lipophiles is an emerging theme from published analyses of cation transporter inhibition (Karlgren et al., 2012; Kido et al., 2011), with previous studies also demonstrating reasonable summaries of experimental datasets using only a handful of 18

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968 simple descriptors (Ahlin et al., 2008). In contrast, it was reported that lipophilicity has very little effect on P-glycoprotein (P-gp) efflux (Hitchcock, 2012). In this study, only 17% of compounds remain misclassified following the simple partitioning scheme, yet they are the ones potentially holding the most information when it comes to understanding the specific recognition features that are important for BSEP. According to the published trends we would not expect small, polar compounds to inhibit other transporters, making the BSEP actives in this category ideal tool compounds for further study. Ultimately, the identification (and possibly even design) of compounds which are able to discriminate between proteins in this Downloaded from class will be important to fully understand the consequences of specific transporter inhibition in vivo. dmd.aspetjournals.org Limitations of the current analyses

Currently, the relationship between in vitro potency of BSEP inhibition in membrane vesicle assays and compound concentrations within human hepatocytes in vivo which cause functional BSEP inhibition remains undefined. Our in vitro BSEP assay conditions were at ASPET Journals on October 10, 2021 similar to those used in a study reported by Dawson et al. (2012), who evaluated BSEP inhibition by 85 drugs and found that a BSEP IC50 threshold value of 300 μM provided useful discrimination between the majority of the tested drugs which caused cholestatic or mixed hepatocellular/cholestatic DILI and drugs which caused hepatocellular DILI, or did not cause

DILI. An IC50 cut-off value of 300 μM was therefore used in the current analysis. Notably,

Morgan et al. (2011) found that numerous drugs associated with DILI in humans exhibit in vitro BSEP IC50 values below 25 μM. Two of our potent BSEP inhibitors – and

Tamoxifen – have been reported by Morgan et al. not to exhibit significant BSEP inhibition.

The differences in BSEP inhibition potency could be related to the different assay conditions

(temperature, incubation time, substrate concentration) used in the two studies. Verapamil did not show significant inhibition of human BSEP mediated [3H]-taurocholate transport in Sf9 insect vesicles up to 135 μM test concentration in the study by Morgan et al., whereas inhibition of [3H]-taurocholate uptake into canalicular membrane vesicles isolated from rat 19

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

liver has been demonstrated for Verapamil with an Ki value of 92.5 μM (Horikawa et al.,

2003), although a species difference cannot be excluded in this case. Similarly, Tamoxifen did not affect human BSEP mediated [3H]-taurocholate transport in High Five insect cell derived membrane vesicles up to a concentration of 30 μM (Byrne et al., 2002), which is consistent with the findings by Morgan et al. However, in a study undertaken using SK-E2 cells expressing human BSEP, both Verapamil and Tamoxifen inhibited transporter activity which was determined using the fluorescent substrates Bodipy or dihydrofluorescein (Wang et al., 2003). Downloaded from

The Sf21 insect cell expression system used in our BSEP inhibition studies has a low content of cholesterol (Paulusma et al., 2009). An increase in the cholesterol content of vesicles has dmd.aspetjournals.org been shown to increase taurocholate uptake activity, with moderate effects in insect cell derived membrane vesicles but stronger in vesicles derived from liver canalicular membranes. This typically results in an increase in the transport velocity vmax whilst having no

significant effect on the affinity of taurocholate for the BSEP transporter (Km value) at ASPET Journals on October 10, 2021

(Paulusma et al., 2009; Kis et al., 2009). Importantly, for a limited number of drugs it has been demonstrated in the Sf21 insect cell model that BSEP inhibition potencies are not affected by the membrane cholesterol content (Kis et al., 2009). We therefore consider that the BSEP inhibition potencies for the compounds evaluated in this study using the insect vesicles are likely to be indicative of values that would be observed in the presence of high concentrations of cholesterol.

The kinetics of BSEP inhibition may differ between drugs. Saito et al. (2009) have described a BSEP inhibition QSAR model which did not provide good prediction of inhibition by compounds which were non-competitive inhibitors of BSEP activity. Future studies would need to explore the effect of the mechanism of BSEP inhibition on the performance of QSAR models.

20

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

Conclusions

We have presented a thorough investigation of the physicochemical properties of compounds in relation to their inhibition of the BSEP transporter. We conclude that, for compounds where molecular weight is greater than 309, the most important parameter influencing BSEP inhibition potency is lipophilicity as calculated by ClogP. We also identified a statistically significant relationship with the acid dissociation constants of basic compounds, indicating that more basic compounds and those with more hydrogen bond donors tend to be

less potent inhibitors. The two properties with the most marked effect on activity (molecular Downloaded from weight and ClogP) were used to construct a simple recursive partitioning scheme which allowed improved classification (R2 = 0.5 in training, 0.36 in test) over any single property,

and served as a benchmark against which we compared more elaborate modeling strategies. dmd.aspetjournals.org

Our evaluation of a collection of molecular descriptor sets and modeling algorithms yielded a further improvement in classification of a collection of 187 compounds to which the modeling algorithm had not previously been exposed. The source of this improvement appeared to at ASPET Journals on October 10, 2021 originate from the ability of the QSAR model to identify low molecular weight actives where the partitioning scheme failed. These improved classifications appear to be the result of better molecular descriptions of the molecules as encoded by the AZdesc set, rather than the nature of the classification scheme used (i.e. linear vs. non-linear methods). We have provided the full dataset with this publication to allow others to emulate (or possibly surpass) this result.

Finally, we present a collection of interesting structural modifications giving rise to a reduction of BSEP activity. Although there is no statistical evidence to support the generalisability of these modifications, with further data they could serve as an interesting starting point for those wishing to further examine the SAR surrounding this transporter.

21

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

Acknowledgements

We are grateful to our AstraZeneca colleagues Sarah Dawson, John Cuff, Dearg Brown,

Mike Rolf and the team from Reagents and Assay Development for excellent scientific and technical assistance. We also thank Prof. Dr. Bruno Stieger (University of Zürich,

Switzerland) for the provision of the hBSEP baculovirus stocks. The data set used in the analysis was generated under contract by Ricerca Biosciences LLC (Bothell, WA, USA) and we thank Gonzalo Castillo for his excellent management of this activity. Downloaded from Authorship Contributions Participated in research design: D. J. Warner, H. Chen, L.-D. Cantin, J. G. Kenna, S.

Stahl, T. Noeske dmd.aspetjournals.org Conducted experiments: C. L. Walker

Contributed new reagents or analytic tools:

Performed data analysis: D. J. Warner, H. Chen

Wrote or contributed to the writing of the manuscript: D. J. Warner, H. Chen, L.-D. at ASPET Journals on October 10, 2021

Cantin, J. G. Kenna, S. Stahl, C. L. Walker, T. Noeske

22

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

References

Abboud G, Kaplowitz N (2007) Drug induced liver injury. Drug Saf 30:277-294.

Ahlin G, Karlsson J, Pedersen JM, Gustavsson L, Larsson R, Matsson P, Norinder U,

Bergstrom CAS, Artursson P (2008) Structural Requirements for Drug Inhibition of the

Liver Specific Human Organic Cation Transport Protein 1. J Med Chem 51:5932-5942.

Aller SG, Yu J, Ward A, Weng Y, Chittabonia S, Zhuo R, Harrell PM, Trinh YT, Zhang Q,

Urbatsch IL, Chang G (2009) Structure of P-glycoprotein reveals a molecular basis for

poly-specific drug binding. Science 323:1718-1722. Downloaded from

Arnold JR, Lerman CL, Damewood JR (2004) Functional group fingerprints: Augmenting hit

and lead identification, 228th ACS National Meeting Philadelphia, PA, August 22-26,

2004. http://acscinf.org/docs/meetings/228nm/presentations/228nm82.pdf (accessed 2012 dmd.aspetjournals.org

March)

Byrne JA, Strautnieks SS, Mieli-Vergani G, Higgins CF, Linton KJ, Thompson RJ (2002) The

human bile salt export pump: characterization of substrate specificity and identification of at ASPET Journals on October 10, 2021 inhibitors. Gastroenterology 123:1649-1658.

Bruneau P (2001) Search for Predictive Generic Model of Aqueous Solubility Using Bayesian

Neural Nets. J Chem Inf Comput Sci 41:1605-1616.

Banic-Tomisic Z, Kojic-Prodic B, Sirola I (1997) Hydrogen bonds in the crystal packings of

mesalazine and mesalazine hydrochloride. J Mol Struct 416:209-220.

BioByte ClogP, Version 4.3. Claremont, CA.

DiMasi JA, Hansen RW, Grabowski HG (2003) The price of innovation: new estimates of

drug development costs. J Health Econ 22:151-85.

Dawson S, Stahl S, Paul N, Barber J, Kenna JG (2012) In Vitro Inhibition of the Bile Salt

Export Pump Correlates with Risk of Cholestatic Drug-Induced Liver Injury in Humans.

Drug Metab Dispos 40:130-138.

Davit-Spraul A, Gonzales E, Baussan C, Jacquemin E (2009) Progressive familial

intrahepatic cholestasis. Orphanet J Rare Dis 4:1-12. 23

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

Fattinger K, Funk C, Pantze M, Weber C, Reichen J, Stieger B, Meier PJ (2001) The

endothelin antagonist bosentan inhibits the canalicular bile salt export pump: a potential

mechanism for hepatic adverse reactions. Clin Pharmacol Ther 69:223-231

FDA white paper 2004, Innovation/stagnation: challenge and opportunity on the critical path

to new medical products. (accessed June 2012)

Greer ML, Barber J, Eakins J, Kenna JG (2010) Cell based approaches for evalutaion of

drug-induced liver injury. Toxicology 268:125-131.

Gerloff T, Stieger B, Hagenbuch B, Madon J, Landmann L, Roth J, Hofmann AF, Meier PJ Downloaded from (1998) The sister of P-glycoprotein represents the canalicular bile salt export pump of

mammalian liver. J Biol Chem 273:10046-10050.

Ghose AK, Crippen GM (1987) Atomic Physicochemical Parameters for Three-Dimensional- dmd.aspetjournals.org

Structure-Directed Quantitative Structure-Activity Relationships. 2. Modeling Dispersive

and Hydrophobic Interactions. J Chem Inf Comput Sci 27:21-35.

Greaves P, Williams A, Eve M (2004) First dose of potential new medicines to humans: how at ASPET Journals on October 10, 2021 animals help. Nat Rev Drug Discov 3:226-236.

Hitchcock SA (2012) Structural Modifications that Alter the P-Glycoprotein Efflux Properties

of Compounds. J Med Chem doi:10.1021/jm201136z.

Horikawa M, Kato Y, Tyson CA, Sugiyama Y (2003) Potential Cholestatic Activity of Various

Therapeutic Agents Assessed by Bile Canalicular Membrane Vesicles Isolated from Rats

and Humans. Drug Metab Pharmacokin 18:16-22.

JMP, Version 8.0, SAS Institute Inc., Cary, NC.

Katritzky A, Wang Y, Sild S, Tamm T, Kalrelson M (1998) QSPR studies on vapor pressure,

aqueous solubility, and the prediction of water–air partition coefficient. J Chem Inf Comp

Sci 38:720-725.

Kis E, Ioja E, Nagy T, Szente L, Herédi-Szabó K, Krajcsi P (2009) Effect of membrane

cholesterol on BSEP/Bsep activity: species specificity studies for substrates and inhibitors.

Drug Metab Dispos 37:1878-1886.

24

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

Kostrubsky SE, Strom SC, Kalgutkar AS, Kulkarni S, Atherton J, Mireles R, Feng B, Kubik R,

Hanson J, Urda E, Mutlib AE (2006) Inhibition of hepatobiliary transport as a predictive

method for clinical hepatotoxicity of . Toxicol Sci 90:451-459.

Kramer JA, Sagarts JE, Morris DL (2007) The application of discovery toxicology and

pathology towards the design of safer pharmaceutical lead candidates. Nat Rev Drug

Discov 6:636-649.

Karlgren M, Ahlin G, Bergström CAS, Svensson R, Palm J, Artursson P (2012) In Vitro and

In Silico Strategies to Identify OATP1B1 Inhibitors and Predict Clinical Drug–Drug Downloaded from Interactions. Pharm Res 29:411-426.

Kido Y, Matsson P, Giacomini KM (2011) Profiling of a Library for Potential

Renal Drug–Drug Interactions Mediated by the Organic Cation Transporter 2. J Med dmd.aspetjournals.org

Chem 54:4548-4558.

Leach AG, Jones HD, Cosgrove DA, Kenny PW, Ruston L, MacFaul P, Wood JM, Colclough

N, Law B (2011) Matched Molecular Pairs as a Guide in the Optimization of at ASPET Journals on October 10, 2021 Pharmaceutical Properties; a Study of Aqueous Solubility, Plasma Protein Binding and

Oral Exposure. J Med Chem 49:6672-6682.

Leeson PD, Springthorpe B (2007) The influence of drug-like concepts on decision-making in

medicinal chemistry. Nat Rev Drug Disc 6:881-890.

Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (1997) Experimental and computational

approaches to estimate solubility and permeability in drug discovery and development

settings. Adv Drug Delivery Rev 23:3-25.

Mumoli N, Cei M, Cosimi A (2006) Drug-related hepatotoxicity. N. Engl. J. Med. 354: 2191-

2193.

Morgan RE, Trauner M, van Staden CJ, Lee PH, Ramachandran B, Eschenberg M, Afshari

CA, Qualls CW Jr., Lightfoot-Dunn R, Hamadeh HK (2010) Interference with bile salt

export pump function is a susceptibility factor for human liver injury in drug development.

Toxicol Sci 118:485-500.

25

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

Noé J, Stieger B, Meier PJ (2002) Functional expression of the canalicular bile salt export

pump of human liver. Gastroenterology 123:1659-1666.

Orange official web site, http://www.ailab.si/orange/ (accessed 2012 March)

Paulusma CC, de Waart DR, Kunne C, Mok KS, Elferink RP (2009) Activity of the bile salt

export pump (ABCB11) is critically dependent on canalicular membrane cholesterol

content. J Biol Chem 284:9947-9954.

Pirmohamed M, Breckenridge AM, Kitteringham NR, Park BK (1998) Adverse drug reactions.

BMJ 316:1295-1298. Downloaded from Paine SW, Barton P, Bird J, Denton R, Menochet K, Smith A, Tomkinson NP, Chohan KK

(2010) A rapid computational filter for predicting the rate of human renal clearance. J Mol

Graph Model 29:529-537. dmd.aspetjournals.org

Stieger B, Fattinger K, Madon J, Kullak-Ublick GA, Meier PJ (2000) Drug- and estrogen-

induced cholestasis through inhibition of the hepatocellular bile salt export pump (Bsep) of

rat liver. Gastroenterology 118:422-430. at ASPET Journals on October 10, 2021 Saito H, Osumi M, Hirano H, Shin W, Nakamura R, Ishikawa T (2009) Technical Pitfalls and

Improvements for High-speed Screening and QSAR Analysis to Predict Inhibitors of the

Human Bile Salt Export Pump (ABCB11/BSEP). The AAPS Journal 11:581-589.

Schuster D, Laggner C, Langer T (2005) Why drugs fail - a study on side effects in new

chemical entities. Curr Pharm Des 11:3545-3559.

Stålring JC, Carlsson LA, Almedida P, Boyer S (2011) AZOrange – High performance open

source machine learning for QSAR modeling in a graphical programming environment. J

Cheminform 3:28-38.

Thompson RA, Isin EM, Li Y, Weaver R, Weidolf L, Wilson I, Claesson A, Page K, Dolgos H,

Kenna JG (2011) Risk assessment and mitigation strategies for reactive metabolites in

drug discovery and development. Chem Biol Interact 192:65-71.

Wallace JL (2004) Acetaminophen hepatotoxicity: No to the rescue. Br. J. Pharmacol. 143: 1-

2.

26

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

Wang EJ, Casciano CN, Clement RP, Johnson WW (2003) Fluorescent substrates of sister-

P-glycoprotein (BSEP) evaluated as markers of active transport and inhibition: evidence

for contingent unequal binding sites. Pharm Res 20:537-544.

Warner DJ, Griffen EJ, St-Gallay SA (2010) WizePairZ: A Novel Algorithm to Identify,

Encode, and Exploit Matched Molecular Pairs with Unspecified Cores in Medicinal

Chemistry. J Chem Inf Model 50:1350-1357.

Wenlock MC, Austin RP, Barton P, Davis AM, Leeson PD (2003) A Comparison of

Physiochemical Property Profiles of Development and Marketed Oral Drugs. J Med Chem Downloaded from

46:1250-1256.

Zimmerman HJ (1978) Drug-induced liver disease. Drugs 16:25-45.

dmd.aspetjournals.org

at ASPET Journals on October 10, 2021

27

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

Legends for Figures

Figure 1. Inhibition of BSEP mediated [3H]-taurocholate transport measured in inverted membrane vesicles from BSEP overexpressing Sf21 cells. Effects of selected pharmaceuticals on BSEP activity are shown: Bosentan (), Bezafibrate (), Labetalol (),

Acetaminophen ().

Figure 2. Distribution of IC50 values for compound effects on BSEP activity. An IC50 value of Downloaded from 300 μM was used as the threshold for significant BSEP inhibition (Dawson et al, 2012). Data are presented as geometric means from at least three test occasions, with the compounds ordered by potency. dmd.aspetjournals.org

Figure 3. Logistic regression plots to demonstrate the influence of different physicochemical properties on the likelihood of BSEP inhibition for all 624 compounds. (a) ClogP, (b) pKa for

241 negatively ionized compounds, (c) number of hydrogen bond (HB) donors, (d) MW, (e) at ASPET Journals on October 10, 2021 pKa for 338 positively ionized compounds, (f) number of hydrogen bond acceptors. Red points in the plots refer to BSEP positive compounds (POS class) and blue points refer to negatives (NEG class).

Figure 4. Pie charts depicting ion classes of 299 inactive and 325 active BSEP inhibitors

(300µM cutoff).

Figure 5. (a) Recursive partitioning scheme for the training set (437 compound) based on molecular weight and ClogP (red bars correspond the ratio of BSEP actives and blue bars correspond to the ratio of BSEP inactives). (b) Probability contour plot of being in the BSEP

POS class for the full dataset (based on the RP scheme).

Figure 6. Matched molecular pairs exhibiting unexpected activity profiles. Structures are displayed from left to right in order of increasing ClogP. The molecular weight and calculated

28

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968 logP of each compound is expressed below the BSEP activities in the format MW / ClogP.

The most potent BSEP inhibitor in each row is represented in red.

Figure 7. Improvement in false negative classification through use of the SVM_AZdesc model vs. simple property based recursive partitioning. (a) Two active compounds which are incorrectly flagged as negative by SVM_AZdesc but not by the RP method. (b) Eight active compounds which are incorrectly flagged as negative by the RP method but not by

SVM_AZdesc. The molecular weight and calculated logP of each compound is expressed below the BSEP activities in the format molecular weight / ClogP. Downloaded from

dmd.aspetjournals.org at ASPET Journals on October 10, 2021

29

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. DMD #47968

Table 1. Performance metrics for all six BSEP classification models. Those indicating an

improvement in performance of ≥ 0.05 over the simple property based recursive partitioning

scheme are highlighted in bold.

Matthews Pos. Neg. Accurac Model Sensitivity Specificity Fscore Kappa Correlation prec. prec. y Coefficient

RP_ClogP_MW 0.83 0.84 0.84 0.83 0.83 0.83 0.67 0.67

RF_AZdesc 0.83 0.82 0.82 0.83 0.82 0.82 0.65 0.65

SVM_AZdesc 0.85 0.90 0.90 0.84 0.87 0.88 0.74 0.74 Downloaded from

PLS_AZdesc 0.82 0.88 0.89 0.80 0.84 0.85 0.69 0.69

RF_AZFP_AZtop14 0.82 0.83 0.83 0.82 0.82 0.83 0.59 0.65

SVM_AZFP_AZtop14 0.84 0.85 0.85 0.84 0.84 0.85 0.69 0.69 dmd.aspetjournals.org

at ASPET Journals on October 10, 2021

30

DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. Downloaded from dmd.aspetjournals.org at ASPET Journals on October 10, 2021 DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. Downloaded from dmd.aspetjournals.org at ASPET Journals on October 10, 2021 DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. Downloaded from dmd.aspetjournals.org at ASPET Journals on October 10, 2021 DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. Downloaded from dmd.aspetjournals.org at ASPET Journals on October 10, 2021 DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. Downloaded from dmd.aspetjournals.org at ASPET Journals on October 10, 2021 DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. Downloaded from dmd.aspetjournals.org at ASPET Journals on October 10, 2021 DMD Fast Forward. Published on September 7, 2012 as DOI: 10.1124/dmd.112.047068 This article has not been copyedited and formatted. The final version may differ from this version. Downloaded from dmd.aspetjournals.org at ASPET Journals on October 10, 2021 Mitigating the inhibition of human Bile Salt Export Pump by drugs: opportunities provided by physicochemical property modulation, in-silico modeling and structural modification

Daniel J. Warner, Hongming Chen, Louis-David Cantin, J. Gerry Kenna, Simone Stahl, Clare L. Walker, Tobias Noeske and Disposition

Supplemental Data

Table S4. Complete list of all compounds used for this study.

BSEP Mean Compound Name SMILES IC50 (µM) ClogP MW Data Set Ion Class Class 1,3-BIS( 2-CHLOROETHYL)-N- 97.21 1.316 214.1 Test Neutral POS NITROSOUREA C(CCl)NC(=O)N(CCCl)N=O 4-AMINO BUTYLBENZOATE 181.1 2.981 193.2 Train Neutral POS CCCCOC(=O)c1ccc(cc1)N 4-CHLORO-3,5- 129.9 3.483 156.6 Train Acid POS DIMETHYLPHENOL Cc1cc(cc(c1Cl)C)O 5-FLUOROURACIL c1c(c(=O)[nH]c(=O)[nH]1)F >1000 -0.577 130.1 Test Neutral NEG 8-HYDROXY QUINOLINE c1cc2cccnc2c(c1)O >1000 2.077 145.2 Train Neutral NEG A-967079 CC/C(=N\O)/C(=C/c1ccc(cc1)F)/C 26.81 3.536 207.2 Train Neutral POS ACEPROMAZINE CC(=O)c1ccc2c(c1)N(c3ccccc3S2)CC 111.6 4.18 326.5 Test Base POS CN(C)C ACETAMINOPHEN CC(=O)Nc1ccc(cc1)O >1000 0.494 151.2 Train Neutral NEG CC(=O)Nc1ccccc1 >1000 1.161 135.2 Test Neutral NEG ACETARSONE >1000 -1.363 275.1 Train Acid NEG CC(=O)Nc1cc(ccc1O)[As](=O)(O)O ACETYL CC(=O)OCC[N+](C)(C)C >1000 -3.467 146.2 Train Cationic NEG ACNECLEAR c1cc(c(cc1Cl)Cl)COC(Cn2ccnc2)c3ccc <10 5.812 416.1 Train Base POS (cc3Cl)Cl ACTARIT CC(=O)Nc1ccc(cc1)CC(=O)O >1000 0.433 193.2 Test Acid NEG ACYCLOVIR c1nc2c(=O)[nH]c(nc2n1COCCO)N >1000 -2.422 225.2 Train Neutral NEG ADENINE c1[nH]c2c(ncnc2n1)N >1000 -0.148 135.1 Train Neutral NEG ADIPHENINE CCN(CC)CCOC(=O)C(c1ccccc1)c2cc 158.4 4.525 311.4 Train Base POS ccc2 ALBUTEROL >1000 0.061 239.3 Train Base NEG CC(C)(C)NCC(c1ccc(c(c1)CO)O)O CCCS(=O)CCCN(CC)CC(COc1ccc(cc 55.75 2.213 352.5 Train Base POS 1)C#N)O ALPHA- >1000 -2.264 211.2 Test Zwitterion NEG C[C@](Cc1ccc(c(c1)O)O)(C(=O)O)N CCCN(CCC)C(=O)Cc1c(nc2n1cc(cc2) 9.155 5.581 404.3 Test Neutral POS Cl)c3ccc(cc3)Cl ALTRETAMINE CN(C)c1nc(nc(n1)N(C)C)N(C)C >1000 1.665 210.3 Train Neutral NEG Erlotinib COCCOc1cc2c(cc1OCCOC)ncnc2Nc3 <10 4.342 393.4 Train Neutral POS cccc(c3)C#C >1000 -2.222 229.6 Test Base NEG c1(c(nc(c(n1)Cl)N)N)C(=O)NC(=N)N AMINOGLUTETHIMIDE >1000 0.766 232.3 Test Neutral NEG CCC1(CCC(=O)NC1=O)c2ccc(cc2)N AMINOHIPPURIC ACID c1cc(ccc1C(=O)NCC(=O)O)N >1000 -0.253 194.2 Train Acid NEG CCCCc1c(c2ccccc2o1)C(=O)c3cc(c(c( <10 8.945 645.3 Test Base POS c3)I)OCCN(CC)CC)I CCOC(=O)C1=C(NC(=C(C1c2ccccc2 16 3.434 408.9 Train Base POS Cl)C(=O)OC)C)COCCN AMODIAQUINE CCN(CC)Cc1cc(ccc1O)Nc2ccnc3c2cc 77.76 5.465 355.9 Train Base POS c(c3)Cl c1ccc2c(c1)N=C(c3cc(ccc3O2)Cl)N4C 230.1 3.406 313.8 Test Base POS CNCC4 c1cnccc1c2cc(c(=O)[nH]c2)N 370.8 -0.689 187.2 Train Neutral NEG ANTAZOLINE >1000 4.112 265.4 Train Base NEG c1ccc(cc1)CN(CC2=NCCN2)c3ccccc3 ANTIPYRINE Cc1cc(=O)n(n1C)c2ccccc2 >1000 0.204 188.2 Train Neutral NEG CN1CCc2cccc- 211.5 2.49 267.3 Train Base POS 3c2[C@H]1Cc4c3c(c(cc4)O)O ARC 68475AA c1ccc(cc1)CCOCCS(=O)(=O)NCCNC 35.69 2.523 465.6 Test Base POS Cc2ccc(c3c2sc(=O)[nH]3)O CN1CCC=C(C1)C(=O)OC >1000 0.903 155.2 Train Base NEG ARH047108 CCc1cccc(c1CNc2cc(cn3c2nc(c3C)C) <10 3.997 336.4 Test Neutral POS C(=O)N)C ARM 390 CCN(CC)C(=O)c1ccc(cc1)C(=C2CCN 105.8 3.529 348.5 Train Base POS CC2)c3ccccc3 Sorafenib CNC(=O)c1cc(ccn1)Oc2ccc(cc2)NC(= <10 5.455 464.8 Test Neutral POS O)Nc3ccc(c(c3)C(F)(F)F)Cl C[C@@H]1C[C@H]2[C@@H]3CCC4 101.7 1.785 392.5 Train Neutral POS =CC(=O)C=C[C@@]4([C@]3([C@H]( C[C@@]2([C@]1(C(=O)CO)O)C)O)F) C ASPARTAME COC(=O)[C@H](Cc1ccccc1)NC(=O)[C >1000 -2.135 294.3 Train Zwitterion NEG @H](CC(=O)O)N COc1ccc(cc1)CCN2CCC(CC2)Nc3nc4 <10 5.837 458.6 Train Base POS ccccc4n3Cc5ccc(cc5)F AZ1 c1ccc(c(c1)C(=O)NCC(=O)O)N >1000 0.347 194.2 Train Acid NEG AZ10 Cc1ccc(cc1n2cnc3ccc(cc3c2=O)N4C <10 3.403 539.6 Train Base POS CN(CC4)C)NC(=O)c5ccnc(c5)N6CCO CC6 AZ100 CC[C@@H](c1ccccc1)NC(=O)c2c3cc <10 7.385 382.5 Train Acid POS ccc3nc(c2O)c4ccccc4 AZ101 CC1(O[C@@H]2CO[C@@]3([C@H]([ >1000 0.042 339.4 Train Neutral NEG C@@H]2O1)OC(O3)(C)C)COS(=O)(= O)N)C AZ102 16.42 2.411 376.5 Train Neutral POS C[C@@H]1C[C@H]2[C@@H]3CCC4 =CC(=O)C=C[C@@]4([C@]3([C@H]( C[C@@]2([C@H]1C(=O)CO)C)O)F)C AZ103 CCCC(=O)OCOC(=O)C1=C(NC(=C(C 36.35 5.53 456.3 Train Neutral POS 1c2cccc(c2Cl)Cl)C(=O)OC)C)C AZ104 CN1[C@@H][C@H](C[C@H]1[C@ >1000 0.29 303.4 Test Base NEG H]3[C@@H]2O3)OC(=O)[C@H](CO)c 4ccccc4 AZ105 CCOC(=O)C(CCc1ccccc1)NC2CCc3c 74.86 1.824 424.5 Train Zwitterion POS cccc3N(C2=O)CC(=O)O AZ106 CN(C)c1c2c(ncn1)n(cn2)C3C(C(C(O3) >1000 -1.245 294.3 Train Base NEG CO)N)O AZ107 >1000 0.119 341.4 Train Neutral NEG CCCSc1nc(c2c(n1)n(cn2)[C@H]3[C@ @H]([C@@H]([C@H](O3)CO)O)O)N AZ108 CCC(=O)OC(C(C)C)OP(=O)(CCCCc1 <13.34 7.445 563.7 Test Acid POS ccccc1)CC(=O)N2CC(CC2C(=O)O)C3 CCCCC3 AZ109 Cc1c(c(n[nH]1)C)Cc2c(c3c(=O)n(c(=O 247.8 2.577 461.5 Test Neutral POS )n(c3s2)CC(C)C)C)C(=O)N4C[C@@H ](CO4)O AZ11 <12.85 1.929 478.9 Train Neutral POS C[C@@](C(=O)Nc1ccc(cc1Cl)S(=O)(= O)c2ccc(cc2)C(=O)N(C)C)(C(F)(F)F)O AZ110 Cc1c(c(n[nH]1)C)Cc2c(c3c(=O)n(c(=O 703 1.958 447.5 Test Neutral NEG )n(c3s2)C(C)C)C)C(=O)N4C[C@@H]( CO4)O AZ111 Cc1cc(on1)c2ccc(cc2F)N3C[C@@H]( >1000 1.017 343.3 Train Neutral NEG OC3=O)Cn4ccnn4 AZ112 >1000 -4.226 344.3 Test Neutral NEG C([C@@H]1[C@@H]([C@@H]([C@H ]([C@@H](O1)O[C@H]([C@@H](CO) O)[C@@H]([C@H](CO)O)O)O)O)O)O AZ113 c1ccc(c(c1)C(=O)NCCCCCCCC(=O)O >750.9 3.24 279.3 Test Acid NEG )O AZ114 CCOCCNS(=O)(=O)c1ccc(cc1)Nc2ncc <10 3.228 444.6 Train Neutral POS c(n2)c3cnc(n3C(C)C)C AZ115 COC(=O)NN=Cc1c[n+](c2ccccc2[n+]1[ >939.8 -1.839 262.2 Train Neutral NEG O-])[O-] AZ116 c1ccc(cc1)CCOCCCS(=O)(=O)CCCN 32.35 2.665 478.6 Train Base POS CCc2ccc(c3c2sc(=O)[nH]3)O AZ117 16.26 3.168 488.4 Test Neutral POS Cc1nc(c(n1Cc2c(c3c(=O)n(c(=O)n(c3s 2)CC(C)C)C)C(=O)N(C)OC)Cl)Cl AZ118 CNC[C@@H](Cc1ccc(c(c1)C(=O)NCC 133.2 4.008 391 Train Base POS 23CC4CC(C2)CC(C4)C3)Cl)O AZ119 c1cc(cnc1)CC[C@H](COc2ccc(cc2)c3 >1000 2.111 404.5 Test Neutral NEG ccc(s3)S(=O)(=O)N)O AZ12 Cc1ccc(cc1NC(=O)c2cccc(c2)CN3CC <10 5.259 532.6 Train Base POS N(CC3)C)NC(=O)c4cccc5c4oc6c5cccc 6 AZ120 59.51 3.937 437.6 Train Base POS Cc1ccc(cc1NC(=O)CC23CC4CC(C2)C C(C4)C3)C(=O)N5CC6CNCC(C5)O6 AZ121 Cc1ccc2c(c1)c(c(n2CC(=O)O)C)c3ccn <10 5.006 344.4 Train Acid POS c4c3cccc4C AZ122 19.09 2.696 413.9 Train Neutral POS c1ccc2c(c1)CC(C(=O)N2C[C@H](CO) O)NC(=O)c3cc4cc(ccc4[nH]3)Cl AZ123 164 0.888 654.8 Train Neutral POS CCCCCC(C1C(CC(CC(CC(CC(CC(CC (/C(=C/C=C/C=C/C=C/C=C/C(C(OC1= O)C)O)/C)O)O)O)O)O)O)O)O AZ124 CCCc1ncc(n1CC)c2ccnc(n2)Nc3ccc(c <10 3.588 444.6 Test Neutral POS c3)S(=O)(=O)NCCOC AZ125 CCc1ncc(n1C(C)C)c2ccnc(n2)Nc3ccc( <10 3.368 444.6 Train Neutral POS cc3)S(=O)(=O)NCCOC AZ126 Cc1ccc2c(c1)c(c(n2CC(=O)O)C)c3ccn <10 5.489 398.4 Train Acid POS c4c3cccc4C(F)(F)F AZ127 <10.95 2.402 419.9 Train Neutral POS c1ccc2c(c1)CC(C(=O)N2C[C@H](CO) O)NC(=O)c3cc4cc(sc4[nH]3)Cl AZ128 Cc1ccc2c(c1)c(c(n2CC(=O)O)C)c3ccn <10 5.271 364.8 Train Acid POS c4c3cccc4Cl AZ129 CN1[C@@H]2CC[C@H]1C[C@H](C2) 188.6 3.52 307.4 Train Base POS OC(c3ccccc3)c4ccccc4 AZ13 c1cc(c(cc1Cn2c3ccsc3cc2C(=O)O)Cl) 13.72 5.233 326.2 Train Acid POS Cl AZ130 C[C@@H](CN1c2ccccc2Sc3c1cc(cc3) 127 4.831 328.5 Test Base POS OC)CN(C)C AZ131 c1cc(ccc1c2ccc(o2)/C=N/N3CC(=O)N 145.5 1.631 314.3 Test Neutral POS C3=O)[N+](=O)[O-] AZ132 C[C@H]([C@@H]1[C@H]2CC(=C(N2 >1000 -1.353 299.3 Train Zwitterion NEG C1=O)C(=O)O)SCCNC=N)O AZ133 CN1CCC[C@H]1c2cccnc2 >1000 0.883 162.2 Train Base NEG AZ134 <10 2.21 455.5 Train Neutral POS c1ccc(cc1)C2=C(C(=O)N(C2=O)Cc3cc c(nc3)N)Nc4ccc(cc4)N5CCOCC5 AZ135 CC(C)(CO)[C@H](C(=O)NCCCO)O >1000 -1.648 205.3 Test Neutral NEG AZ136 CCCSc1nc(c2c(n1)n(cn2)[C@H]3[C@ >1000 0.391 648.2 Train Acid NEG @H]([C@@H]([C@H](O3)COP(=O)(O )OP(=O)(C(P(=O)(O)O)(Cl)Cl)O)O)O) N AZ137 C[C@H](CO)NCCCc1ccc(c(c1)C(=O) 65.56 4.916 419 Test Base POS NCC23CC4CC(C2)CC(C4)C3)Cl AZ138 CC(C)(C)NC[C@H](c1ccc(c(c1)CO)O) >1000 0.061 239.3 Train Base NEG O AZ139 CC(=O)Nc1ccccc1OC[C@](C)(CNC2C 71.54 2.285 446 Train Base POS CN(CC2)Cc3ccc(cc3)Cl)O AZ14 c1cc(c(cc1Cn2c3ccc(cc3cc2C(=O)O) 18.49 4.977 369.7 Train Acid POS O)C(F)(F)F)Cl AZ140 CC(=O)Nc1ccccc1OC[C@H](CNC2C 55.57 1.886 432 Test Base POS CN(CC2)Cc3ccc(cc3)Cl)O AZ141 CC[C@@H]1[C@@]([C@@H]([C@H] <12.88 2.374 748 Train Base POS (C(=O)[C@@H](C[C@]([C@@H]([C@ H]([C@H]([C@H](C(=O)O1)C)O[C@H] 2C[C@@]([C@H]([C@@H](O2)C)O)( C)OC)C)O[C@H]3[C@@H]([C@H](C[ C@H](O3)C)N(C)C)O)(C)OC)C)C)O)( C)O AZ142 CS(=O)(=O)Oc1ccc(cc1)OCCc2ccc(cc <13.36 4.823 516.6 Train Acid POS 2)C[C@@H](C(=O)O)SCCc3ccc(cc3) O AZ143 <10 1.262 469 Test Neutral POS c1cnc(nc1)CCC[C@@H](CS(=O)(=O) N2CCN(CC2)c3ccc(cn3)Cl)N(C=O)O AZ144 CC(=O)OC1C(Sc2ccccc2N(C1=O)CC 82.02 3.647 414.5 Test Base POS N(C)C)c3ccc(cc3)OC AZ145 <10 4.45 310.4 Train Neutral POS C[C@]12CC[C@@H]3c4ccc(cc4CC[C @H]3[C@@H]1CC[C@]2(C#C)O)OC AZ146 >1000 -6.523 615.6 Test Base NEG C1[C@H]([C@@H]([C@H]([C@@H]([ C@H]1N)O[C@@H]2[C@@H]([C@H] ([C@@H]([C@H](O2)CO)O)O)N)O[C @H]3[C@@H]([C@@H]([C@H](O3)C O)O[C@@H]4[C@@H]([C@H]([C@@ H]([C@@H](O4)CN)O)O)N)O)O)N AZ147 C[C@H]1C[C@H]2[C@@H]3CC[C@ 33.03 1.742 374.5 Train Neutral POS @]([C@]3(C[C@@H]([C@@H]2[C@ @]4(C1=CC(=O)C=C4)C)O)C)(C(=O) CO)O AZ148 C[C@H]1[C@@H]([C@H]([C@H]([C >1000 -1.901 584.7 Train Neutral NEG @@H](O1)O[C@H]2C[C@H]([C@@] 3([C@@H]4[C@@H](CC[C@@]3(C2) O)[C@]5(CC[C@@H]([C@]5(C[C@H] 4O)C)C6=CC(=O)OC6)O)CO)O)O)O) O AZ149 CCCCC[C@@H](/C=C/[C@H]1[C@@ >1000 2.013 352.5 Test Acid NEG H](CC(=O)[C@@H]1C/C=C\CCCC(=O )O)O)O AZ15 CCCC(c1ccccc1)(c2ccccc2)C(=O)OC <13.19 5.982 353.5 Test Base POS CN(CC)CC AZ150 C[C@]12C[C@@H]([C@]3(C(C1C[C 235.6 0.708 394.4 Train Neutral POS @H]([C@@]2(C(=O)CO)O)O)CCC4= CC(=O)C=C[C@@]43C)F)O AZ151 CCS(=O)(=O)c1ccc(c(c1)F)c2cc(ccc2 168.3 2.718 372.8 Test Acid POS OCC(=O)O)Cl AZ152 C[C@]12CC[C@H]3[C@H]([C@@H]1 <10 3.927 337.5 Train Neutral POS CC[C@]2(C#C)O)CCC4=Cc5c(cno5)C [C@]34C AZ153 <27.03 3.71 408.9 Train Base POS C[C@H](CC(=O)Nc1c2ccc(nc2ccc1Cl) N3CC[C@@H](C3)N)c4ccccc4 AZ154 234.8 2.959 404.4 Train Acid POS c1ccc(cc1)C(=O)N[C@@H](Cc2ccc(cc 2)O)C(=O)Nc3ccc(cc3)C(=O)O AZ155 c1cc(nc2c1c(c(cc2)Cl)C(=O)NCC3CC 106.9 3.489 386.9 Train Base POS CCC3)N4CC[C@@H](C4)N AZ156 c1ccc(c(c1)CCNC(=O)c2cc(ccc2Cl)n3 34.42 3.686 405.2 Train Neutral POS c(=O)[nH]c(=O)cn3)Cl AZ157 Cc1c(c2c3c4c1O[C@@](C4=O)(O/C= <10 3.71 822.9 Test Zwitterion POS C/[C@@H]([C@H]([C@H]([C@@H]([ C@@H]([C@@H]([C@H]([C@H](/C= C/C=C(\C(=O)Nc(c2O)c(c3O)/C=N/N5 CCN(CC5)C)/C)C)O)C)O)C)OC(=O)C) C)OC)C)O AZ158 CCOC(=O)[C@H](CCc1ccccc1)N[C@ 15.5 1.736 438.5 Test Zwitterion POS @H](C)C(=O)N2Cc3ccccc3C[C@H]2C (=O)O AZ159 Cc1c(c(no1)c2c(cccc2Cl)Cl)C(=O)N[C <10 2.981 470.3 Train Acid POS @H]3[C@@H]4N(C3=O)[C@H](C(S4) (C)C)C(=O)O AZ16 <10 5.76 519.6 Train Base POS Cc1ccc(cc1n2cnc3ccc(cc3c2=O)N4C CN(CC4)C)NC(=O)c5cccc(c5)c6ccco6 AZ160 C[C@]1(c2cccc(c2C(=O)C3=C([C@]4( >1000 -1.78 444.4 Test Zwitterion NEG C(CC31)[C@@H](C(=C(C4=O)C(=O) N)O)N(C)C)O)O)O)O AZ161 CN1[C@@H]2CC[C@H]1C[C@H](C2) >1000 1.299 289.4 Test Base NEG OC(=O)C(CO)c3ccccc3 AZ162 CN1[C@@H]2CC[C@H]1C[C@H](C2) >1000 1.427 275.3 Test Base NEG OC(=O)C(c3ccccc3)O AZ163 CN1CCN(CC1)c2ccccc2C=C3C(=O)N <10 5.668 448.4 Train Base POS (CCS3)c4ccc(c(c4)Cl)Cl AZ164 >1000 -3.75 547.6 Train Zwitterion NEG CC(C)(C(=O)O)O/N=C(/c1csc(n1)N)\C (=O)N[C@H]2[C@@H]3N(C2=O)C(=C (CS3)C[n+]4ccccc4)C(=O)O AZ165 >1000 -7.138 685.7 Train Base NEG C1[C@@H](C(=O)N[C@H](C(=O)N[C @H](C(=O)N/C(=C/NC(=O)N)/C(=O)N[ C@H](C(=O)N1)C2CC(NC(=N)N2)O)C O)CO)NC(=O)CC(CCCN)N AZ166 C[C@H]([C@@H]1[C@H]2CC(=C(N2 >1000 -1.353 299.3 Train Zwitterion NEG C1=O)C(=O)O)SCCNC=N)O AZ167 CC1[C@@]([C@H](C(O1)OC2C(C(C( >1000 -4.279 581.6 Train Base NEG C(C2O)O)NC(=N)N)O)NC(=N)N)OC3[ C@@H]([C@H]([C@@H]([C@H](O3) CO)O)O)NC)(C=O)O AZ168 c1ccc(cc1)[C@@H]2C[C@H]2N >1000 1.478 133.2 Test Base NEG AZ169 C[C@]1([C@H]2CC[C@H](C2)C1(C)C >1000 2.825 167.3 Train Base NEG )NC AZ17 C=CC1CN2CCC1CC2C(c3ccnc4c3cc >1000 2.49 294.4 Train Base NEG cc4)O AZ170 COC(=O)[C@H]1[C@H](CC[C@@H]2 254 2.171 354.4 Train Base POS [C@@H]1C[C@H]3c4c(c5ccccc5[nH] 4)CCN3C2)O AZ171 CN1[C@@H]2CC[C@H]1C[C@H](C2) >1000 1.299 289.4 Train Base NEG OC(=O)[C@H](CO)c3ccccc3 AZ172 CC[C@@H](CO)NC(=O)[C@H]1CN([ 33.07 2.224 353.5 Test Base POS C@@H]2Cc3cn(c4c3c(ccc4)C2=C1)C )C AZ173 C[N+]1([C@@H]2C[C@H](C[C@H]1[ >1000 -1.706 392.5 Test Cationic NEG C@H]3[C@@H]2O3)OC(=O)C(c4cccs 4)(c5cccs5)O)C AZ174 c1cc(c2c3c1C[C@@H]4[C@]5([C@]3( >1000 0.359 341.4 Train Base NEG CCN4CC6CC6)[C@@H](O2)C(=O)CC 5)O)O AZ175 c1c(c(=O)[nH]c(=O)n1[C@H]2C[C@@ >1000 -0.413 296.2 Train Neutral NEG H]([C@H](O2)CO)O)C(F)(F)F AZ176 C[C@H](CCC(=O)O)[C@H]1CC[C@ <10 4.514 392.6 Train Acid POS @H]2[C@@]1(CC[C@H]3[C@H]2[C @H](C[C@H]4[C@@]3(CC[C@H](C4) O)C)O)C AZ177 549 0.159 327.4 Train Base NEG C=CCN1CC[C@]23c4c5ccc(c4O[C@ H]2C(=O)CC[C@]3([C@H]1C5)O)O AZ178 <10 2.707 430.4 Train Neutral POS C[C@](CS(=O)(=O)c1ccc(cc1)F)(C(=O )Nc2ccc(c(c2)C(F)(F)F)C#N)O AZ179 Cc1cccc(c1NC(=O)[C@H]2CCCCN2C >1000 2.104 246.4 Test Base NEG )C AZ18 Cc1ccc(cc1NC(=O)c2cccc(c2)OC3CC <10 6.048 561.7 Test Base POS N(CC3)C(C)C)NC(=O)c4cccc5c4oc6c 5cccc6 AZ180 <10 3.341 470.5 Train Acid POS Cc1ccc(c(c1)[N+](=O)[O- ])OC(C)(C)[C@H]2OC[C@H]([C@H]( O2)c3cccnc3)C/C=C\CCC(=O)O AZ181 CCCCCC(C)(/C=C/[C@H]1[C@@H](C 328.7 2.67 368.5 Train Acid NEG [C@@H]([C@@H]1C/C=C\CCCC(=O) O)O)O)O AZ182 CC1=C(N2C([C@@H](C2=O)NC(=O) >1000 -1.727 349.4 Test Zwitterion NEG C(C3=CCC=CC3)N)SC1)C(=O)O AZ183 CC1([C@@H](N2[C@H](S1)[C@@H]( >1000 -1.872 365.4 Train Zwitterion NEG C2=O)NC(=O)[C@@H](c3ccc(cc3)O) N)C(=O)O)C AZ184 CC[C@@H]1[C@@]([C@@H]([C@H] >983.1 1.611 733.9 Train Base NEG (C(=O)[C@@H](C[C@@]([C@@H]([C @H]([C@@H]([C@H](C(=O)O1)C)O[C @H]2C[C@@]([C@H]([C@@H](O2)C )O)(C)OC)C)O[C@H]3[C@@H]([C@H ](C[C@H](O3)C)N(C)C)O)(C)O)C)C)O )(C)O AZ185 COc1ccc2c(c1)c(ccn2)[C@H](C3CC4 98.83 2.785 324.4 Train Base POS CCN3CC4C=C)O AZ186 CN(C)c1c2c(ncn1)n(cn2)C3[C@@H]([ >787.1 0.268 471.5 Test Base NEG C@@H]([C@H](O3)CO)NC(=O)C(Cc4 ccc(cc4)OC)N)O AZ187 C[C@]12C[C@@H]([C@H]3[C@H]([C 91.47 1.423 360.4 Train Neutral POS @@H]1CC[C@@]2(C(=O)CO)O)CCC 4=CC(=O)C=C[C@]34C)O AZ188 C[C@]12CCC(=O)C=C1CC[C@@H]3[ 251.7 1.887 362.5 Train Neutral POS C@@H]2[C@H](C[C@]4([C@H]3CC[ C@@]4(C(=O)CO)O)C)O AZ189 <10 3.864 296.4 Train Neutral POS C[C@]12CC[C@@H]3c4ccc(cc4CC[C @H]3[C@@H]1CC[C@]2(C#C)O)O AZ19 CC12CCN(C1N(c3c2cc(cc3)OC(=O)N >1000 1.95 275.4 Test Base NEG C)C)C AZ190 C[C@]12CC[C@H]3[C@H]([C@@H]1 17.06 3.44 330.5 Train Neutral POS CC[C@@H]2C(=O)CO)CCC4=CC(=O )CC[C@]34C AZ191 35.84 3.61 456.4 Test Acid POS C[C@@H](C(=O)O)Oc1ccc(cc1c2ccc( c(c2)C(F)(F)F)S(=O)(=O)C)C(F)(F)F AZ192 C=CC1=C(N2[C@@H]([C@@H](C2= >1000 0.252 453.5 Test Acid NEG O)NC(=O)/C(=N\OCC(=O)O)/c3csc(n3 )N)SC1)C(=O)O AZ193 c1c(n(c(=N)nc1N2CCCCC2)O)N >1000 -2.134 209.3 Train Neutral NEG AZ194 <10 4.732 847 Train Zwitterion POS Cc1c(c2c3c4c1O[C@@](C4=O)(O/C= C/[C@@H]([C@H]([C@H]([C@@H]([ C@@H]([C@@H]([C@H]([C@H](/C= C/C=C(\C(=O)NC(=C5C3=NC6(N5)CC N(CC6)CC(C)C)C2=O)/C)C)O)C)O)C) OC(=O)C)C)OC)C)O AZ195 c1ccc(cc1)OCCC[N+]23CCC(CC2)[C 127.1 2.341 484.7 Test Cationic POS @H](C3)OC(=O)C(c4cccs4)(c5cccs5) O AZ196 C[C@]12CC[C@H]3[C@H]([C@@H]1 18.46 2.97 298.4 Test Neutral POS CC[C@]2(C#C)O)CCC4=CC(=O)CC[C @H]34 AZ197 471.9 1.659 361.4 Train Neutral NEG C[C@@H](C(=O)N[C@H]1c2ccccc2C CN(C1=O)C)NC(=O)[C@H](C(C)C)O AZ198 196.6 4.116 470.5 Train Neutral POS CC1(C(=O)Nc2c(ccc(n2)Nc3c(cnc(n3) Nc4cc(c(c(c4)OC)OC)OC)F)O1)C AZ199 CCCCCCCCCCCCc1ccc(cc1)S(=O)(= 21.04 7.114 409.6 Train Acid POS O)Nc2nncs2 AZ2 c1ccc(c(c1)C(=O)O)S >1000 2.438 154.2 Train Acid NEG AZ20 CC[C@@H]1[C@@]([C@@H]([C@H] >1000 2.638 749 Train Base NEG (N(C[C@@H](C[C@@]([C@@H]([C@ H]([C@@H]([C@H](C(=O)O1)C)O[C@ H]2C[C@@]([C@H]([C@@H](O2)C)O )(C)OC)C)O[C@H]3[C@@H]([C@H]( C[C@H](O3)C)N(C)C)O)(C)O)C)C)C) O)(C)O AZ200 <10 6.058 487.9 Train Acid POS Cc1cc(cc(c1c2[nH]c3cc(ccc3n2)c4nnc (o4)Nc5cccc(c5)Cl)C)CCC(=O)O AZ201 COc1cccc(c1)N2C(=NC(=NC23CCCC >1000 1.408 287.4 Test Base NEG C3)N)N AZ21 c1ccc(cc1)CCC(CS(=O)(=O)N2CCN(C <10 3.651 435.5 Train Neutral POS C2)c3ccc(cc3)F)N(C=O)O AZ22 c1cnc(nc1)CCCC(CS(=O)(=O)N2CCN( <16.42 1.726 451.5 Train Neutral POS CC2)c3ccc(cc3)F)N(C=O)O AZ23 Cc1cccc(c1NC(=O)NC(=N)NC)C >1000 1.249 220.3 Train Base NEG AZ24 CC(=O)NC[C@H]1CN(C(=O)O1)c2ccc >1000 0.168 337.3 Train Neutral NEG (c(c2)F)N3CCOCC3 AZ25 c1cnc(nc1)CCC(CS(=O)(=O)N2CCN(C <10 0.733 454.9 Train Neutral POS C2)c3ccc(cn3)Cl)N(C=O)O AZ26 CN(C)C(=N)NC(=N)N >1000 -1.633 129.2 Test Base NEG AZ27 CN1CCc2cc3c(c(c2C1C4c5ccc(c(c5C( 23.91 2.27 413.4 Train Base POS =O)O4)OC)OC)OC)OCO3 AZ28 c1cnc(nc1)CCCC(CS(=O)(=O)N2CCC( <10 1.29 484 Train Neutral POS CC2)Oc3ccc(cn3)Cl)N(C=O)O AZ29 CC1(CC(CC(N1[O])(C)C)O)C >1000 0.949 172.2 Train Neutral NEG AZ3 <10 3.776 509.4 Train Neutral POS Cc1cc(ncn1)N2CCN(CC2)C(=O)N3CC N(CC3)S(=O)(=O)c4ccc(cc4)Br AZ30 c1ccc2c(c1)[nH]c(n2)c3[nH]c4ccccc4n 31.33 3.505 234.3 Train Neutral POS 3 AZ31 CCOc1cc(ccc1C(=O)OC)NC(=O)C 478.5 1.938 237.3 Train Neutral NEG AZ32 202.2 2.325 402.5 Train Acid POS CC(CCC(=O)O)C1CCC2C1(C(=O)CC 3C2C(=O)CC4C3(CCC(=O)C4)C)C AZ33 CC(C)(C)c1cc2c(=O)cc(oc2c(c1)C(C)( <10 4.952 302.4 Train Acid POS C)C)C(=O)O AZ34 c1ccc(cc1)CCC(CS(=O)(=O)N2CCN(C <10 3.303 435.5 Train Neutral POS C2)c3ccc(cc3)F)C(=O)NO AZ35 c1c(cc(cc1[N+](=O)[O-])[N+](=O)[O- 184.5 0.627 211.1 Train Neutral POS ])C(=O)N AZ36 CCOCn1cnc2c1c(=O)n(c(=O)n2C)CC 304.9 0.846 338.4 Train Neutral NEG CCC(C)(C)O AZ37 CC1CC2C3CC(C4=CC(=O)C=CC4(C3 33.56 2.522 434.5 Test Neutral POS C(CC2(C1(C(=O)COC(=O)C)O)C)O)C) F AZ38 c1cc2c(cc(n2Cc3ccc(c(c3)Cl)Cl)C(=O) <14.01 4.833 394.3 Test Acid POS O)c(c1)OCCCO AZ39 CCCc1ncc(c(n1)N)C[n+]2ccccc2C >1000 -3.891 243.3 Test Cationic NEG AZ4 c1cc(oc1)c2nc3nc(nc(n3n2)N)NCCN4 >1000 1.055 330.4 Train Base NEG CCOCC4 AZ40 Cc1cc2c([nH]1)ccc(c2F)Oc3c4cc(c(cc 22.73 5.835 464.5 Train Base POS 4ncn3)OCCCN5CCCCC5)OC AZ41 CN1CCc2cc3c(cc2C1C4c5ccc(c(c5C( 65.64 2.278 383.4 Train Base POS =O)O4)OC)OC)OCO3 AZ42 CNCC(c1cccc(c1)O)O >1000 -0.088 167.2 Train Base NEG AZ43 c1ccc(cc1)CCC(CS(=O)(=O)N2CCN(C <10 3.187 453 Train Neutral POS C2)c3ccc(cn3)Cl)N(C=O)O AZ44 CN1CCc2cccc- 390.9 2.49 267.3 Train Base NEG 3c2C1Cc4c3c(c(cc4)O)O AZ45 CC(CO)c1ccc2cc(ccc2c1)OC 97.58 2.825 216.3 Test Neutral POS AZ46 CC(C)(C)NCC(COc1c(nsn1)N2CCOC >1000 1.214 316.4 Train Base NEG C2)O AZ47 >1000 0.7 200.6 Train Neutral NEG c1cc(c(cc1[N+](=O)[O-])Cl)C(=O)N AZ48 CC(C)(C)NCC(COc1cccc2c1CC(C(C2) >1000 0.379 309.4 Train Base NEG O)O)O AZ49 c1cc(cnc1)CCC(CS(=O)(=O)N2CCN(C <10 1.69 453.9 Test Neutral POS C2)c3ccc(cn3)Cl)N(C=O)O AZ5 c1ccnc(c1)CCC(CS(=O)(=O)N2CCN(C <10 1.69 453.9 Test Neutral POS C2)c3ccc(cn3)Cl)N(C=O)O AZ50 CC(=O)Oc1ccc2c(c1)cc(n2Cc3ccc(c(c 55.21 4.742 378.2 Train Acid POS 3)Cl)Cl)C(=O)O AZ51 >1000 0.041 409.4 Train Neutral NEG CC(=O)NC[C@H]1CN(C(=O)O1)c2cc( c(c(c2)F)C3=CCN(CC3)C(=O)CO)F AZ52 Cc1cc2c(o1)cc(n2Cc3ccc(c(c3)Cl)Cl)C 15.89 5.219 324.2 Train Acid POS (=O)O AZ53 27.55 1.761 412.8 Train Neutral POS CC(=O)NC[C@H]1CN(C(=O)O1)c2ccc (cc2)C(=O)c3cnc4n3cc(cc4)Cl AZ54 CC(=O)NC[C@H]1CN(C(=O)O1)c2ccc 349.2 1.373 397.2 Train Neutral NEG (c(c2)F)n3cc(nc3)Br AZ55 c1cc(cnc1)CCC(CS(=O)(=O)N2CCC(C <10 1.718 469 Train Neutral POS C2)Oc3ccc(cn3)Cl)N(C=O)O AZ56 c1cc(c(cc1Cn2c(cc3c2ncs3)C(=O)O)C 22.39 4.172 327.2 Train Acid POS l)Cl AZ57 COc1c2cc(ccc2n(c1C(=O)O)Cc3ccc(c 39.57 4.428 366.2 Train Acid POS (c3)Cl)Cl)O AZ58 CN(Cc1cnc2c(n1)c(nc(n2)N)N)c3ccc(c >1000 -0.529 454.4 Train Acid NEG c3)C(=O)N[C@@H](CCC(=O)O)C(=O) O AZ59 Cc1cc([nH]n1)Nc2c(cnc(n2)NCc3cc(n <10 2.939 382.8 Train Neutral POS o3)c4ccccn4)Cl AZ6 c1cc(c(cc1Cn2c3ccc(cc3cc2C(=O)O) 49.84 4.887 336.2 Train Acid POS O)Cl)Cl AZ60 CCN(CC)C(=O)c1ccc(cc1)C(=C2CCN 86.07 3.672 366.5 Train Base POS CC2)c3cccc(c3)F AZ61 CCC1C(COC1=O)Cc2cncn2C >1000 -0.203 208.3 Train Base NEG AZ62 C[N+](C)(C)CC(CC(=O)O)O >1000 -4.767 162.2 Train Zwitterion NEG AZ63 CC(=CCC1C(O1)(C)C2C(C(CCC23CO 25.59 2.913 458.5 Train Acid POS 3)OC(=O)C=CC=CC=CC=CC(=O)O)O C)C AZ64 c1cc(oc1C=NN2CC(=O)NC2=O)[N+](= >1000 -0.467 238.2 Test Neutral NEG O)[O-] AZ65 CC(C)N=c1cc- 134.9 7.699 473.4 Train Neutral POS 2n(c3ccccc3nc2cc1Nc4ccc(cc4)Cl)c5c cc(cc5)Cl AZ66 CNC(=C[N+](=O)[O- >1000 0.67 314.4 Train Base NEG ])NCCSCc1ccc(o1)CN(C)C AZ67 161.3 1.909 245.3 Test Base POS CC(C(C)(C)C)NC(=NC#N)Nc1ccncc1 AZ68 >1000 0.19 252.3 Train Base NEG Cc1c(nc[nH]1)CSCCN=C(NC)NC#N AZ69 C1C(C(C(C(C1N)OC2C(C(C(C(O2)CN >1000 -6.466 614.6 Train Base NEG )O)O)N)OC3C(C(C(O3)CO)OC4C(C(C (C(O4)CN)O)O)N)O)O)N AZ7 27.35 2.242 466.9 Train Neutral POS C[C@@](C(=O)Nc1ccc(cc1Cl)S(=O)(= O)c2ccc(cc2)NCCO)(C(F)(F)F)O AZ70 C[n+]1ccccc1C=NO >1000 -3.666 137.2 Test Cationic NEG AZ71 CCCCCCCCC=CCCCCCCCC(=O)OC <16.04 12.16 787.1 Train Zwitterion POS C(COP(=O)(O)OCC[N+](C)(C)C)OC(= O)CCCCCCCC=CCCCCCCCC AZ72 CCN(c1ccccc1C)C(=O)C=CC 109.5 3.429 203.3 Train Neutral POS AZ73 <10 6.379 510.6 Train Acid POS CCCc1c(ccc(c1O)C(=O)C)OCCCOc2c cc3c(=O)cc(oc3c2CCC)CCC(=O)O AZ74 CCCc1c2c(cc3c1oc(cc3=O)C(=O)O)C 606.4 4.429 286.3 Train Acid NEG CCC2 AZ76 CN(C)N=Nc1c(nc[nH]1)C(=O)N >1000 0.467 182.2 Train Neutral NEG AZ77 CCCc1c2c(cc3c1oc(cc3=O)C(=O)O)c( >892.2 3.886 361.7 Test Acid NEG cc(n2)C(=O)O)Cl AZ78 c1cc(c(cc1[C@H](CN)O)O)O >1000 -0.989 169.2 Test Base NEG AZ79 CCCc1c2c(cc3c1oc(cc3=O)C(=O)O)c( >1000 2.102 398.4 Train Acid NEG cc(n2)C(=O)O)N(C)C(=O)C AZ8 <10 4.081 481 Test Neutral POS c1cc(ccc1c2ccncc2)C(=O)N3CCN(CC 3)S(=O)(=O)c4cc5cc(ccc5[nH]4)Cl AZ80 CCCc1c2c(cc3c1oc(cc3=O)C(=O)O)c( >1000 0.893 343.3 Train Acid NEG =O)cc([nH]2)C(=O)O AZ81 CCCc1c2c(cc3c1oc(cc3=O)C(=O)O)c( >1000 2.031 344.3 Test Acid NEG =O)cc(o2)C(=O)O AZ82 CCCc1c2c(cc3c1oc(cc3=O)C(=O)O)c( 364.7 4.055 395.3 Test Acid NEG cc(n2)C(=O)O)C(F)(F)F AZ83 CC1=C([C@H](C(=C(N1)C)C(=O)OC( <10 3.916 371.4 Train Neutral POS C)C)c2cccc3c2non3)C(=O)OC AZ84 23.03 4.093 265.3 Train Neutral POS COc1ccc(cc1)Nc2ccn(n2)c3ccccc3 AZ85 Cc1cccc(c1S(=O)c2[nH]c3ccccc3n2)c <10 3.758 363.4 Test Neutral POS 4cc(ccn4)OC AZ86 Cc1cc(cc(c1OC(=O)C)C)Nc2ccn(n2)c <10 4.521 321.4 Train Neutral POS 3ccccc3 AZ87 CCc1cc(c2c(c1O)c(=O)cc(o2)C(=O)O) 111.7 3.903 262.3 Train Acid POS CC AZ88 CCCc1ccc(c2c1oc(cc2=O)C(=O)O)OC 270.6 2.335 306.3 Train Acid POS C(C)O AZ89 >1000 -0.445 226.2 Train Acid NEG C[C@](Cc1ccc(c(c1)O)O)(C(=O)O)NN AZ9 c1cc(c(cc1Cn2c3ccc(c(c3cc2C(=O)O) 19.3 4.916 354.2 Test Acid POS F)O)Cl)Cl AZ90 c1cc(c(c(c1)Cl)C=NNC(=N)N)Cl 343.7 1.35 231.1 Test Base NEG AZ91 c1ccc(cc1)[C@H]2C[C@@H]2N 102.7 1.478 133.2 Train Base POS AZ92 c1cc(c(cc1CN2CCNCC2)C(=O)NCC34 124 4.997 402 Train Base POS CC5CC(C3)CC(C5)C4)Cl AZ93 c1cc(ccc1[C@H]([C@@H](CO)NC(=O >957.9 1.283 323.1 Test Neutral NEG )C(Cl)Cl)O)[N+](=O)[O-] AZ94 c1ccc2c(c1)C(=O)O[Bi](O2)O >1000 0.339 362.1 Train Neutral NEG AZ95 CC(C)C(=O)N[C@@H](CS)C(=O)O >1000 0.214 191.3 Train Acid NEG AZ96 C1CNC(=N)NC1C2C(=O)NCC(C(=O) >1000 -6.704 668.7 Train Base NEG NC(C(=O)NC(C(=O)N/C(=C\NC(=O)N) /C(=O)N2)CNC(=O)CC(CCCN)N)CO) N AZ97 CCCC(=O)c1cnc2c(c1Nc3ccc(cc3C)F) <10 5.861 428.5 Test Neutral POS cccc2OCCS(=O)C AZ98 CC1=C(N2[C@@H]([C@@H](C2=O) >1000 -2.507 363.4 Train Zwitterion NEG NC(=O)[C@H](c3ccc(cc3)O)N)SC1)C( =O)O AZ99 Cc1c(n2ccc3c(c2n1)OC(CC3)c4ccccc <10 3.12 294.4 Train Neutral POS 4)CO AZATHIOPRINE Cn1cnc(c1Sc2c3c(nc[nH]3)ncn2)[N+]( >1000 0.512 277.3 Train Neutral NEG =O)[O-] AZD0865 Cc1cccc(c1CNc2cc(cn3c2nc(c3C)C)C <10 3.195 366.5 Test Neutral POS (=O)NCCO)C AZD-4407 C[C@H]1C[C@](CCO1)(c2cc(sc2)Sc3 <10.93 2.13 375.5 Train Neutral POS ccc4c(c3)CC(=O)N4C)O AZD7009 CC(C)(C)C(=O)CN1CC2CN(CC(C1)O 513.6 3.573 384.5 Train Base NEG 2)CCCNc3ccc(cc3)C#N BECLOMETHASONE 23.67 2.125 408.9 Train Neutral POS C[C@H]1C[C@H]2[C@@H]3CCC4=C C(=O)C=C[C@@]4([C@]3([C@H](C[C @@]2([C@]1(C(=O)CO)O)C)O)Cl)C BENDROFLUMETHIAZIDE c1ccc(cc1)CC2Nc3cc(c(cc3S(=O)(=O) 69.55 1.727 421.4 Test Neutral POS N2)S(=O)(=O)N)C(F)(F)F CC(c1ccc2c(c1)nc(o2)c3ccc(cc3)Cl)C( 103.7 3.824 301.7 Test Acid POS =O)O BENOXINATE CCCCOc1cc(ccc1N)C(=O)OCCN(CC) 309.1 4.36 308.4 Test Base NEG CC BENSERAZIDE 116.6 -2.899 257.2 Train Base POS c1cc(c(c(c1CNNC(=O)C(CO)N)O)O)O CCOC(=O)c1ccc(cc1)N >1000 1.923 165.2 Train Neutral NEG BENZONATATE <10 2.54 603.7 Train Neutral POS CCCCNc1ccc(cc1)C(=O)OCCOCCOC COCCOCCOCCOCCOCCOCCOC BENZOYLPAS c1ccc(cc1)C(=O)Nc2ccc(c(c2)O)C(=O) >1000 2.784 257.2 Train Acid NEG O BENZTHIAZIDE c1ccc(cc1)CSCC2=Nc3cc(c(cc3S(=O) 57.43 2.172 431.9 Train Base POS (=O)N2)S(=O)(=O)N)Cl BENZYLPENICILLIN <10 1.747 334.4 Test Acid POS (PENICILLIN G) CC1([C@@H](N2[C@H](S1)[C@@H]( C2=O)NC(=O)Cc3ccccc3)C(=O)O)C CC(C)COCC(CN(Cc1ccccc1)c2ccccc2 <16.46 6.2 366.5 Train Base POS )N3CCCC3 BETA CAROTENE CC1=C(C(CCC1)(C)C)/C=C/C(=C/C= <10 15.23 536.9 Test Neutral POS C/C(=C/C=C/C=C(/C=C/C=C(/C=C/C2 =C(CCCC2(C)C)C)\C)\C)/C)/C BETAINE C[N+](C)(C)CC(=O)O >1000 -4.173 118.2 Train Zwitterion NEG CC(C)NCC(COc1ccc(cc1)CCOCC2CC >1000 2.319 307.4 Test Base NEG 2)O CC(C[N+](C)(C)C)OC(=O)N >1000 -4.01 161.2 Train Cationic NEG BEZAFIBRATE CC(C)(C(=O)O)Oc1ccc(cc1)CCNC(=O 180.8 3.698 361.8 Train Acid POS )c2ccc(cc2)Cl BISACODYL CC(=O)Oc1ccc(cc1)C(c2ccc(cc2)OC( 56.36 2.848 361.4 Train Neutral POS =O)C)c3ccccn3 BISMUTH SUBSALICYLATE c1ccc(c(c1)C(=O)O)O >1000 2.187 138.1 Train Acid NEG BOSENTAN <25.3 4.167 551.6 Train Acid POS CC(C)(C)c1ccc(cc1)S(=O)(=O)Nc2c(c( nc(n2)c3ncccn3)OCCO)Oc4ccccc4OC CC[N+](C)(C)Cc1ccccc1Br >1000 -1.246 243.2 Train Cationic NEG BROMHEXINE 20.27 4.884 376.1 Train Base POS CN(Cc1cc(cc(c1N)Br)Br)C2CCCCC2 CCCCNc1cc(cc(c1Oc2ccccc2)S(=O)(= 526.4 3.372 364.4 Test Acid NEG O)N)C(=O)O BUPHENINE CC(CCc1ccccc1)NC(C)C(c2ccc(cc2)O 280.3 3.006 299.4 Test Base POS )O >1000 3.211 239.7 Train Base NEG CC(C(=O)c1cccc(c1)Cl)NC(C)(C)C c1cnc(nc1)N2CCN(CC2)CCCCN3C(= 59.63 2.185 385.5 Train Base POS O)CC4(CCCC4)CC3=O BUSULFAN CS(=O)(=O)OCCCCOS(=O)(=O)C >1000 -0.592 246.3 Train Neutral NEG CCCCN(CCCC)CCCOC(=O)c1ccc(cc1 487.5 5.007 306.4 Test Base NEG )N Cn1cnc2c1c(=O)n(c(=O)n2C)C >1000 -0.04 194.2 Train Neutral NEG CSCCNc1c2c(nc(n1)SCCC(F)(F)F)n(c >1000 3.024 776.4 Train Acid NEG n2)[C@H]3[C@@H]([C@@H]([C@H]( O3)COP(=O)(O)OP(=O)(C(P(=O)(O)O )(Cl)Cl)O)O)O CAPOBENIC ACID COc1cc(cc(c1OC)OC)C(=O)NCCCCC >1000 1.045 325.4 Test Acid NEG C(=O)O CC(C)/C=C/CCCCC(=O)NCc1ccc(c(c1 <10 3.75 305.4 Train Neutral POS )OC)O CARBACHOL C[N+](C)(C)CCOC(=O)N >1000 -4.319 147.2 Train Cationic NEG >1000 2.38 236.3 Test Neutral NEG c1ccc2c(c1)C=Cc3ccccc3N2C(=O)N CARBIMAZOLE CCOC(=O)n1ccn(c1=S)C >1000 1.679 186.2 Train Neutral NEG >1000 2.668 290.8 Train Base NEG CN(C)CCOC(c1ccc(cc1)Cl)c2ccccn2 CC(C)(C)NCC(COc1cccc2c1CCC(=O) >1000 1.289 292.4 Test Base NEG N2)O CHLORIMIPRAMINE CN(C)CCCN1c2ccccc2CCc3c1cc(cc3) 37.4 5.921 314.9 Train Base POS Cl CCN(CC)CCOC(=O)c1ccc(cc1Cl)N >1000 2.909 270.8 Train Base NEG CN(C)CCCN1c2ccccc2Sc3c1cc(cc3)C 90.97 5.3 318.9 Test Base POS l >1000 2.35 276.7 Test Acid NEG CCCNC(=O)NS(=O)(=O)c1ccc(cc1)Cl CN(C)CC/C=C/1\c2ccccc2Sc3c1cc(cc 27.47 5.482 315.9 Train Base POS 3)Cl CCCCOc1cc(c2ccccc2n1)C(=O)NCCN 41.65 5.354 343.5 Train Base POS (CC)CC CINCHOPHEN >813.5 4.281 249.3 Train Acid NEG c1ccc(cc1)c2cc(c3ccccc3n2)C(=O)O CIPROFLOXACIN c1c2c(cc(c1F)N3CCNCC3)n(cc(c2=O) >1000 -0.725 331.3 Train Zwitterion NEG C(=O)O)C4CC4 CLAVULANATE POTASSIUM C1[C@@H]2N(C1=O)[C@H](/C(=C/C >1000 -1.065 199.2 Test Acid NEG O)/O2)C(=O)O CLOMIPHENE CCN(CC)CCOc1ccc(cc1)/C(=C(\c2ccc <10 7.151 406 Train Base POS cc2)/Cl)/c3ccccc3 CLOTRIMAZOLE c1ccc(cc1)C(c2ccccc2)(c3ccccc3Cl)n4 <10 5.004 344.8 Test Base POS ccnc4 CLOXACILLIN Cc1c(c(no1)c2ccccc2Cl)C(=O)N[C@H 41.57 2.516 435.9 Train Acid POS ]3[C@@H]4N(C3=O)[C@H](C(S4)(C) C)C(=O)O CLOXYQUIN c1cc2c(ccc(c2nc1)O)Cl 356.7 2.913 179.6 Train Acid NEG CN1CCN(CC1)C2=Nc3cc(ccc3Nc4c2c 75.14 3.714 326.8 Train Base POS ccc4)Cl CO-DIOVAN 31.07 4.859 435.5 Test Acid POS CCCCC(=O)N(Cc1ccc(cc1)c2ccccc2c 3[nH]nnn3)[C@@H](C(C)C)C(=O)O COLCHICINE CC(=O)N[C@H]1CCc2cc(c(c(c2- 298.7 1.195 399.4 Train Neutral POS c3c1cc(=O)c(cc3)OC)OC)OC)OC C[C@]12CCC(=O)C=C1CC[C@@H]3[ 108.1 1.485 360.4 Test Neutral POS C@@H]2C(=O)C[C@]4([C@H]3CC[C @@]4(C(=O)CO)O)C c1ccc2c(c1)ccc(=O)o2 >1000 1.412 146.1 Train Neutral NEG CROMOLYN >1000 1.481 468.4 Train Acid NEG c1cc2c(c(c1)OCC(COc3cccc4c3c(=O) cc(o4)C(=O)O)O)c(=O)cc(o2)C(=O)O CC1CC(CC(C1)(C)C)OC(=O)C(c2cccc 15.03 4.643 276.4 Train Neutral POS c2)O CN(C)CCC=C1c2ccccc2C=Cc3c1cccc 107.4 5.097 275.4 Train Base POS 3 C1[C@H](C(=O)NO1)N >1000 -1.192 102.1 Train Base NEG CYCLOSPORIN A CCC1C(=O)N(CC(=O)N(C(C(=O)NC(C <10 14.36 1203 Train Neutral POS (=O)N(C(C(=O)NC(C(=O)NC(C(=O)N( C(C(=O)N(C(C(=O)N(C(C(=O)N(C(C(= O)N1)C(C(C)C/C=C/C)O)C)C(C)C)C)C C(C)C)C)CC(C)C)C)C)C)CC(C)C)C)C( C)C)CC(C)C)C)C CYSTEAMINE C(CS)N 331 -0.252 77.15 Test Base NEG DAPSONE >1000 0.886 248.3 Test Neutral NEG c1cc(ccc1N)S(=O)(=O)c2ccc(cc2)N DASATINIB Cc1cccc(c1NC(=O)c2cnc(s2)Nc3cc(nc <10 2.529 488 Train Base POS (n3)C)N4CCN(CC4)CCO)Cl DAZOXIBEN c1cc(ccc1C(=O)O)OCCn2ccnc2 >1000 1.716 232.2 Train Acid NEG DEBRISOQUIN c1ccc2c(c1)CCN(C2)C(=N)N >951.8 0.895 175.2 Test Base NEG COc1ccc(cc1F)c2cc(nn2c3ccc(cc3)S( 20.18 3.122 397.4 Train Neutral POS =O)(=O)N)C(F)F CNCCCN1c2ccccc2CCc3c1cccc3 150.4 4.468 266.4 Train Base POS DEVAZEPIDE CN1c2ccccc2C(=NC(C1=O)NC(=O)c3 <10 3.694 408.5 Train Neutral POS cc4ccccc4[nH]3)c5ccccc5 DEXTROTHYROXINE c1c(cc(c(c1I)Oc2cc(c(c(c2)I)O)I)I)C[C <10 3.508 776.9 Train Zwitterion POS SODIUM @@H](C(=O)O)N CC1=Nc2ccc(cc2S(=O)(=O)N1)Cl >1000 1.201 230.7 Train Base NEG DIBENZOTHIOPHENE c1ccc2c(c1)c3ccccc3s2 387.9 4.556 184.3 Train Neutral NEG DICHLORPHENAMIDE c1c(cc(c(c1S(=O)(=O)N)Cl)Cl)S(=O)(= >919.1 0.241 305.2 Train Neutral NEG O)N 64.97 4.726 296.2 Test Acid POS c1ccc(c(c1)CC(=O)O)Nc2c(cccc2Cl)Cl DICUMAROL c1ccc2c(c1)c(c(c(=O)o2)Cc3c(c4ccccc 39.93 3.658 336.3 Test Neutral POS 4oc3=O)O)O DICYCLOMINE CCN(CC)CCOC(=O)C1(CCCCC1)C2C 40.04 6.138 309.5 Train Base POS CCCC2 DIFTALONE c1ccc2c(c1)CN3C(=O)c4ccccc4CN3C >1000 2.408 264.3 Train Neutral NEG 2=O DIGITOXIN CC1C(C(CC(O1)OC2C(OC(CC2O)OC <10 2.854 764.9 Train Neutral POS 3C(OC(CC3O)OC4CCC5(C(C4)CCC6 C5CCC7(C6(CCC7C8=CC(=O)OC8)O )C)C)C)C)O)O DIGOXIN 264.1 1.423 780.9 Train Neutral POS C[C@@H]1[C@H]([C@H](C[C@@H]( O1)O[C@@H]2[C@H](O[C@H](C[C@ @H]2O)O[C@@H]3[C@H](O[C@H](C [C@@H]3O)O[C@H]4CC[C@]5([C@ @H](C4)CC[C@@H]6[C@@H]5C[C @H]([C@]7([C@@]6(CC[C@@H]7C8 =CC(=O)OC8)O)C)O)C)C)C)O)O c1ccc2c(c1)c(nnc2NN)NN >1000 1.754 190.2 Train Neutral NEG DIIODOHYDROXYQUINOLIN >1000 4.139 397 Test Acid NEG E c1cc2c(cc(c(c2nc1)O)I)I CN(C)CCOC(c1ccccc1)c2ccccc2 >1000 3.452 255.4 Test Base NEG DIMETHADIONE CC1(C(=O)NC(=O)O1)C >1000 0.507 129.1 Train Neutral NEG DIMETHISOQUIN 68.68 4.824 272.4 Train Base POS CCCCc1cc2ccccc2c(n1)OCCN(C)C 508.8 3.206 281.4 Train Base NEG CN1CCC(CC1)OC(c2ccccc2)c3ccccc3 Cn1c2c(c(=O)n(c1=O)C)n(cn2)CC(CO >1000 -1.286 254.2 Test Neutral NEG (DYPHYLLINE) )O DISULFIRAM CCN(CC)C(=S)SSC(=S)N(CC)CC <10 3.88 296.5 Train Neutral POS DITIOCARB CCN(CC)C(=S)S 69.26 0.59 149.3 Train Neutral POS CC(CCc1ccc(cc1)O)NCCc2ccc(c(c2)O >1000 2.433 301.4 Train Base NEG )O DOCOSANOL CCCCCCCCCCCCCCCCCCCCCCO >1000 10.35 326.6 Test Neutral NEG CN(CCc1ccc(cc1)NS(=O)(=O)C)CCOc 708.4 1.994 441.6 Train Base NEG 2ccc(cc2)NS(=O)(=O)C DONEPEZIL COc1cc2c(cc1OC)C(=O)C(C2)CC3CC 145.6 4.598 379.5 Test Base POS N(CC3)Cc4ccccc4 c1cc(c(cc1CCN)O)O >1000 0.169 153.2 Train Base NEG c1ccc(cc1)CCNCCCCCCNCCc2ccc(c( >1000 3.316 356.5 Train Base NEG c2)O)O >1000 2.354 270.4 Train Base NEG CC(c1ccccc1)(c2ccccn2)OCCN(C)C c1ccc2c(c1)[nH]c(=O)n2C3=CCN(CC3 <10 3.056 379.4 Train Base POS )CCCC(=O)c4ccc(cc4)F DYCLONINE CCCCOc1ccc(cc1)C(=O)CCN2CCCC 275.4 4.832 289.4 Train Base POS C2 EBSELEN <10 3.704 274.2 Test Neutral POS c1ccc(cc1)n2c(=O)c3ccccc3[se]2 ECONAZOLE c1cc(ccc1COC(Cn2ccnc2)c3ccc(cc3Cl <10 5.099 381.7 Train Base POS )Cl)Cl ENALAPRILAT >1000 0.748 348.4 Train Zwitterion NEG C[C@@H](C(=O)N1CCC[C@H]1C(=O )O)N[C@@H](CCc2ccccc2)C(=O)O >1000 0.396 194.2 Train Neutral NEG CCCn1c2c(c(=O)[nH]c1=O)[nH]cn2 ENROFLOXACIN CCN1CCN(CC1)c2cc3c(cc2F)c(=O)c( >1000 0.379 359.4 Train Zwitterion NEG cn3C4CC4)C(=O)O c1ccc(c(c1)C#N)OCC(CNCCNC(=O)C 244.4 1.04 369.4 Train Base POS c2ccc(cc2)O)O ESTRIOL C[C@]12CC[C@@H]3c4ccc(cc4CC[C >1000 3.2 288.4 Train Neutral NEG @H]3[C@@H]1C[C@H]([C@@H]2O) O)O ESTRONE C[C@]12CC[C@@H]3c4ccc(cc4CC[C >1000 3.382 270.4 Test Neutral NEG @H]3[C@@H]1CCC2=O)O ETHACRIDINE 43.48 3.372 253.3 Train Neutral POS CCOc1ccc2c(c1)c(c3ccc(cc3n2)N)N ETHIONAMIDE CCc1cc(ccn1)C(=S)N >1000 1.729 166.2 Train Neutral NEG ETHOPROPAZINE CCN(CC)C(C)CN1c2ccccc2Sc3c1ccc 139.2 5.458 312.5 Test Base POS c3 CCC1(CC(=O)NC1=O)C >1000 0.395 141.2 Train Neutral NEG CCN1C(=O)C(NC1=O)c2ccccc2 >791.9 1.482 204.2 Train Neutral NEG ETHYL CCOC(=O)N >1000 -0.175 89.09 Train Neutral NEG ETIDRONATE CC(O)(P(=O)(O)O)P(=O)(O)O >1000 -2.543 206 Train Acid NEG CCOC(=O)c1cncn1C(C)c2ccccc2 61.61 2.417 244.3 Train Neutral POS >1000 0.497 238.2 Train Neutral NEG c1ccc(cc1)C(COC(=O)N)COC(=O)N 235.7 4.708 297.1 Train Acid POS c1ccc(c(c1)CC(=O)O)Oc2ccc(cc2Cl)Cl c1cc(ccc1c2nc(cs2)CC(=O)O)Cl >1000 2.576 253.7 Train Acid NEG CN1CC(=O)N=C1NC(=O)Nc2cccc(c2) >1000 2.195 266.7 Test Base NEG Cl FENOLDOPAM c1cc(ccc1C2CNCCc3c2cc(c(c3Cl)O)O >1000 2.4 305.8 Train Zwitterion NEG )O CC(Cc1ccc(cc1)O)NCC(c2cc(cc(c2)O) >1000 0.984 303.4 Train Base NEG O)O c1cc(c(cc1OCC(F)(F)F)C(=O)NCC2C 217.6 3.66 414.3 Train Base POS CCCN2)OCC(F)(F)F FLUCLOXACILLIN Cc1c(c(no1)c2c(cccc2Cl)F)C(=O)N[C <10 2.661 453.9 Train Acid POS @H]3[C@@H]4N(C3=O)[C@H](C(S4) (C)C)C(=O)O C[C@]12CCC(=O)C=C1[C@H](C[C@ 71.19 2.873 436.5 Train Neutral POS @H]3[C@@H]2[C@H](C[C@]4([C@H ]3C[C@@H]5[C@]4(OC(O5)(C)C)C(= O)CO)C)O)F FLUMAZENIL CCOC(=O)c1c2n(cn1)- >733.5 1.291 303.3 Train Neutral NEG c3ccc(cc3C(=O)N(C2)C)F C[C@@H]1C[C@H]2[C@@H]3C[C@ 68.46 1.828 410.5 Train Neutral POS @H](C4=CC(=O)C=C[C@@]4([C@]3( [C@H](C[C@@]2([C@]1(C(=O)CO)O) C)O)F)C)F c1ccc(cc1)/C=C/CN2CCN(CC2)C(c3c <10 6.337 404.5 Train Base POS cc(cc3)F)c4ccc(cc4)F C[C@H]1C[C@H]2[C@@H]3CC[C@ 100.9 2.111 376.5 Train Neutral POS @]([C@]3(C[C@@H]([C@@]2([C@@ ]4(C1=CC(=O)C=C4)C)F)O)C)(C(=O) C)O CNCCC(c1ccccc1)Oc2ccc(cc2)C(F)(F) 143.8 4.566 309.3 Train Base POS F c1ccc2c(c1)N(c3cc(ccc3S2)C(F)(F)F) 25.82 4.118 437.5 Train Base POS CCCN4CCN(CC4)CCO 724.7 3.754 244.3 Train Acid NEG CC(c1ccc(c(c1)F)c2ccccc2)C(=O)O FLUTAMIDE CC(C)C(=O)Nc1ccc(c(c1)C(F)(F)F)[N+ 60.78 3.335 276.2 Train Neutral POS ](=O)[O-] FOLIC ACID c1cc(ccc1C(=O)N[C@@H](CCC(=O)O >1000 -2.376 441.4 Train Acid NEG )C(=O)O)NCc2cnc3c(n2)c(=O)nc([nH] 3)N Folinic acid c1cc(ccc1C(=O)N[C@@H](CCC(=O)O >1000 -1.452 473.4 Train Acid NEG )C(=O)O)NCC2CNc3c(c(=O)[nH]c(n3) N)N2C=O FPL-55712 <10 5.191 498.5 Train Acid POS CCCc1c(ccc(c1O)C(=O)C)OCC(COc2 ccc3c(=O)cc(oc3c2CCC)C(=O)O)O c1cc(oc1)CNc2cc(c(cc2C(=O)O)S(=O) >964.8 1.9 330.7 Train Acid NEG (=O)N)Cl C1CCC(CC1)(CC(=O)O)CN >1000 -0.66 171.2 Train Zwitterion NEG c1ccc(cc1)NC(=O)/C=C/c2c3c(cc(cc3 <10 5.188 375.2 Test Acid POS Cl)Cl)[nH]c2C(=O)O GEMIFLOXACIN CO/N=C/1\CN(CC1CN)c2c(cc3c(=O)c( >1000 -1.245 389.4 Train Zwitterion NEG cn(c3n2)C4CC4)C(=O)O)F c1ccc(c(c1)C(=O)OCC(CO)O)Nc2ccnc <14.93 4.189 372.8 Train Neutral POS 3c2ccc(c3)Cl <10 4.239 494 Train Acid POS COc1ccc(cc1C(=O)NCCc2ccc(cc2)S(= O)(=O)NC(=O)NC3CCCCC3)Cl Cc1ccc(cc1)S(=O)(=O)NC(=O)NN2CC 590.8 1.093 323.4 Train Acid NEG 3CCCC3C2 CCC1=C(CN(C1=O)C(=O)NCCc2ccc( <10 3.962 490.6 Train Acid POS cc2)S(=O)(=O)NC(=O)N[C@H]3CC[C @@H](CC3)C)C GLYCERYL GUAIACOLATE >1000 0.102 198.2 Test Neutral NEG COc1ccccc1OCC(CO)O GUAIACOL COc1ccccc1O >1000 1.324 124.1 Train Neutral NEG C1CCCN(CCC1)CCNC(=N)N >1000 1.339 198.3 Train Base NEG GW 572016 CS(=O)(=O)CCNCc1ccc(o1)c2ccc3c(c <10 5.965 581.1 Train Base POS 2)c(ncn3)Nc4ccc(c(c4)Cl)OCc5cccc(c 5)F C[C@]12CCC(=O)C=C1CC[C@@H]3[ <10 3.311 455 Train Neutral POS C@@]2([C@H](C[C@]4([C@H]3C[C @@H]5[C@]4(OC(O5)(C)C)C(=O)CCl )C)O)F c1cc(ccc1C(=O)CCCN2CCC(CC2)(c3 22.76 3.849 375.9 Train Base POS ccc(cc3)Cl)O)F HEXAMETHYLENE >1000 -0.734 200.3 Test Neutral NEG BISACETAMIDE CC(=O)NCCCCCCNC(=O)C c1cc(c(cc1C(CNCCCCCCNCC(c2ccc( >1000 0.053 420.5 Train Base NEG c(c2)O)O)O)O)O)O 505.7 3.944 261.4 Train Base NEG HYDROCHLORIDE CC(CNC1CCCCC1)OC(=O)c2ccccc2 HISTAMINE c1c(nc[nH]1)CCN >1000 -0.968 111.1 Train Base NEG HYCANTHONE CCN(CC)CCNc1ccc(c2c1c(=O)c3cccc 29.8 3.657 356.5 Train Base POS c3s2)CO c1ccc2c(c1)cnnc2NN >1000 1.173 160.2 Train Neutral NEG HYDROFLUMETHIAZIDE c1c(c(cc2c1NCNS2(=O)=O)S(=O)(=O) >1000 -0.21 331.3 Train Neutral NEG N)C(F)(F)F c1ccc(cc1)C(c2ccc(cc2)Cl)N3CCN(CC 72.91 3.995 374.9 Train Base POS 3)CCOCCO CC(C)Cc1ccc(cc1)C(C)C(=O)O >789 3.679 206.3 Train Acid NEG IMATINIB <10 4.529 493.6 Train Base POS Cc1ccc(cc1Nc2nccc(n2)c3cccnc3)NC( =O)c4ccc(cc4)CN5CCN(CC5)C CCn1ccnc1CC2COc3ccccc3O2 61.65 2.297 244.3 Train Base POS INDALPINE >1000 3.391 228.3 Train Base NEG c1ccc2c(c1)c(c[nH]2)CCC3CCNCC3 INDAPAMIDE CC1Cc2ccccc2N1NC(=O)c3ccc(c(c3) 144.7 2.957 365.8 Test Neutral POS S(=O)(=O)N)Cl INDOMETHACIN Cc1c(c2cc(ccc2n1C(=O)c3ccc(cc3)Cl) 49.32 4.18 357.8 Train Acid POS OC)CC(=O)O c1ccc(cc1)C(=O)NC2CCN(CC2)CCc3 109 2.836 347.5 Test Base POS c[nH]c4c3cccc4 IODIPAMIDE MEGLUMINE >969.9 4.191 1140 Train Acid NEG c1c(c(c(c(c1I)NC(=O)CCCCC(=O)Nc2 c(cc(c(c2I)C(=O)O)I)I)I)C(=O)O)I IPRONIAZID CC(C)NNC(=O)c1ccncc1 >952.6 0.256 179.2 Train Neutral NEG c1ccc2c(c1)C(=O)N(S2(=O)=O)CCCC 31.17 1.354 401.5 Train Base POS N3CCN(CC3)c4ncccn4 ISOETHARINE >1000 0.991 239.3 Test Base NEG CCC(C(c1ccc(c(c1)O)O)O)NC(C)C c1cnccc1C(=O)NN >1000 -0.668 137.1 Train Neutral NEG ISOPROTERENOL CC(C)NCC(c1ccc(c(c1)O)O)O >986.2 0.153 211.3 Train Base NEG CC(COc1ccccc1)NC(C)C(c2ccc(cc2)O 572.5 2.615 301.4 Test Base NEG )O

IVERMECTIN CC[C@H](C)[C@@H]1[C@H](CC[C@ <10 5.388 875.1 Train Neutral POS @]2(O1)C[C@@H]3C[C@H](O2)C/C= C(/[C@H]([C@H](/C=C/C=C/4\CO[C@ H]5[C@@]4([C@@H](C=C([C@H]5O) C)C(=O)O3)O)C)O[C@H]6C[C@@H]( [C@H]([C@@H](O6)C)O[C@H]7C[C @@H]([C@H]([C@@H](O7)C)O)OC) OC)\C)C CC(=O)N1CCN(CC1)c2ccc(cc2)OC[C 4.1 3.635 531.4 Train Base POS @H]3CO[C@](O3)(Cn4ccnc4)c5ccc(c c5Cl)Cl CN1CCC(=C2c3ccccc3CC(=O)c4c2cc >428.6 3.593 309.4 Train Base NEG s4)CC1 KS-01019 <10 4.481 418.6 Test Neutral POS CCC(C)(C)C(=O)O[C@H]1C[C@H](C =C2[C@H]1[C@H]([C@H](C=C2)C)C C[C@@H]3C[C@H](CC(=O)O3)O)C LABETALOL CC(CCc1ccccc1)NCC(c2ccc(c(c2)C(= 649.1 2.5 328.4 Train Base NEG O)N)O)O C[C@H](CN(c1ccccn1)C(=O)c2ccc(cc <10 3.193 483.6 Test Base POS 2)C#N)N3CCN(CC3)c4cccc5c4OCCO 5 LEFLUNOMIDE 110.4 2.324 270.2 Test Neutral POS Cc1c(cno1)C(=O)Nc2ccc(cc2)C(F)(F)F LEMAKALIM CC1([C@H]([C@@H](c2cc(ccc2O1)C 353.3 1.106 286.3 Train Neutral NEG #N)N3CCCC3=O)O)C LETROZOLE c1cc(ccc1C#N)C(c2ccc(cc2)C#N)n3cn 225.2 1.429 285.3 Test Neutral POS cn3 LEVAMISOLE c1ccc(cc1)[C@H]2CN3CCSC3=N2 >1000 1.844 204.3 Train Base NEG CC[C@@H](C(=O)N)N1CCCC1=O >1000 -0.344 170.2 Test Neutral NEG LEVOFLOXACIN C[C@H]1COc2c3n1cc(c(=O)c3cc(c2N >1000 -0.508 361.4 Train Zwitterion NEG 4CCN(CC4)C)F)C(=O)O LIGUSTRAZINE Cc1c(nc(c(n1)C)C)C >1000 1.584 136.2 Test Neutral NEG LIOTHYRONINE SODIUM c1cc(c(cc1Oc2c(cc(cc2I)C[C@@H](C( <11.27 2.631 651 Test Zwitterion POS =O)O)N)I)I)O C1CSSC1CCCCC(=O)O >1000 2.386 206.3 Test Acid NEG LONAPALENE CC(=O)Oc1c2ccc(cc2c(c(c1OC)OC)O 14.39 1.758 338.7 Test Neutral POS C(=O)C)Cl LOSARTAN CCCCc1nc(c(n1Cc2ccc(cc2)c3ccccc3 <10 3.852 422.9 Train Acid POS c4[nH]nnn4)CO)Cl LY 171883 CCCc1c(ccc(c1O)C(=O)C)OCCCCc2[ <10.74 3.664 318.4 Train Acid POS nH]nnn2 27.24 6.406 296.2 Train Acid POS Cc1ccc(c(c1Cl)Nc2ccccc2C(=O)O)Cl CC1=C[C@@H]2[C@H](CC[C@]3([C <10 3.772 384.5 Test Neutral POS @H]2CC[C@@]3(C(=O)C)OC(=O)C)C )[C@@]4(C1=CC(=O)CC4)C MELPHALAN 93.75 -0.207 305.2 Train Zwitterion POS c1cc(ccc1CC(C(=O)O)N)N(CCCl)CCCl CC12CC3CC(C1)(CC(C3)(C2)N)C >1000 3.033 179.3 Train Base NEG MEPENZOLATE C[N+]1(CCCC(C1)OC(=O)C(c2ccccc2 >1000 0.157 340.4 Test Cationic NEG )(c3ccccc3)O)C Cc1ccccc1OCC(CO)O >1000 0.862 182.2 Train Neutral NEG >1000 2.001 218.3 Test Neutral NEG CCC1(C(=O)N(C(=O)N1)C)c2ccccc2 MEPINDOLOL Cc1cc2c([nH]1)cccc2OCC(CNC(C)C) 35.87 2.17 262.4 Train Base POS O CCCNC(C)(C)COC(=O)c1ccccc1 >1000 3.37 235.3 Train Base NEG MEPYRAMINE >1000 3.225 285.4 Test Base NEG CN(C)CCN(Cc1ccc(cc1)OC)c2ccccn2 MESALAZINE c1cc(c(cc1N)C(=O)O)O 381.4 1.056 153.1 Test Acid NEG Cc1c(c(c(cn1)CO)CO)O >1000 -0.345 169.2 Train Neutral NEG METAPROTERENOL CC(C)NCC(c1cc(cc(c1)O)O)O >1000 0.083 211.3 Train Base NEG >1000 -0.083 167.2 Train Base NEG C[C@@H]([C@@H](c1cccc(c1)O)O)N Cn1cc2c3c1cccc3[C@H]4C[C@H](CN <10 4.694 403.5 Train Base POS ([C@@H]4C2)C)CNC(=O)OCc5ccccc 5 CN(C)CCN(Cc1cccs1)c2ccccn2 >1000 2.952 261.4 Train Base NEG METHIMAZOLE Cn1cc[nH]c1=S >1000 -0.34 114.2 Train Neutral NEG COc1ccccc1OCC(COC(=O)N)O >1000 0.146 241.2 Train Neutral NEG CC(C(c1cc(ccc1OC)OC)O)N >1000 0.592 211.3 Train Base NEG METHYLTHIOURACIL Cc1cc(=O)[nH]c(=S)[nH]1 >1000 -0.087 142.2 Train Neutral NEG METRIZAMIDE >1000 -1.699 789.1 Test Neutral NEG CC(=O)Nc1c(c(c(c(c1I)N(C)C(=O)C)I) C(=O)NC2C(C(C(OC2O)CO)O)O)I CC(C)(c1cccnc1)C(=O)c2cccnc2 229.3 1.459 226.3 Test Neutral POS MIANSERIN 46.82 3.762 264.4 Test Base POS CN1CCN2c3ccccc3Cc4ccccc4C2C1 MINIZIDE COc1cc2c(cc1OC)nc(nc2N)N3CCN(C 171.8 2.028 383.4 Train Neutral POS C3)C(=O)c4ccco4 MINOCROMIL CCCc1c2c(cc3c1oc(cc3=O)C(=O)O)c( >1000 3.662 356.3 Train Acid NEG cc(n2)C(=O)O)NC MITOTANE <10 6.06 320 Test Neutral POS c1ccc(c(c1)C(c2ccc(cc2)Cl)C(Cl)Cl)Cl MOXIFLOXACIN COc1c2c(cc(c1N3C[C@@H]4CCCN[ >1000 -0.082 401.4 Train Zwitterion NEG C@@H]4C3)F)c(=O)c(cn2C5CC5)C(= O)O N-ACETYL CCN(CC)CCNC(=O)c1ccc(cc1)NC(=O >1000 1.643 277.4 Test Base NEG )C NAFRONYL CCN(CC)CCOC(=O)C(Cc1cccc2c1ccc <11.04 5.439 383.5 Train Base POS c2)CC3CCCO3 COc1ccccc1N2CCN(CC2)CC(COc3cc <10 4.433 392.5 Test Base POS cc4c3cccc4)O c1ccc2c(c1)cccc2CC3=NCCN3 >1000 3.826 210.3 Train Base NEG C[C@@H](c1ccc2cc(ccc2c1)OC)C(=O >1000 2.816 230.3 Train Acid NEG )O NEDOCROMIL CCCc1c2c(cc3c1oc(cc3=O)C(=O)O)c( >1000 1.495 371.3 Train Acid NEG =O)cc(n2CC)C(=O)O NEFAZODONE CCc1nn(c(=O)n1CCOc2ccccc2)CCCN <10 5.725 470 Train Base POS 3CCN(CC3)c4cccc(c4)Cl NEOSTIGMINE >1000 -2.812 223.3 Test Cationic NEG CN(C)C(=O)Oc1cccc(c1)[N+](C)(C)C NEVIRAPINE Cc1ccnc2c1NC(=O)c3cccnc3N2C4CC 294.6 2.65 266.3 Train Neutral POS 4 CC1=C(C(C(=C(N1)C)C(=O)OCCN(C) <10 5.23 479.5 Train Base POS Cc2ccccc2)c3cccc(c3)[N+](=O)[O- ])C(=O)OC CC1=C(C(C(=C(N1)C)C(=O)OC)c2ccc 18.65 3.125 346.3 Test Neutral POS cc2[N+](=O)[O-])C(=O)OC <10 3.038 385.4 Train Neutral POS CC1=C(C(C(=C(N1)C#N)C(=O)OC)c2 cccc(c2)[N+](=O)[O-])C(=O)OC(C)C NOMIFENSIN CN1Cc2c(cccc2N)C(C1)c3ccccc3 >1000 2.372 238.3 Train Base NEG OFLOXACIN CC1COc2c3n1cc(c(=O)c3cc(c2N4CC >1000 -0.508 361.4 Train Zwitterion NEG N(CC4)C)F)C(=O)O Cc1cc2c(s1)Nc3ccccc3N=C2N4CCN( 54.57 3.009 312.4 Train Base POS CC4)C >1000 3.901 269.4 Train Base NEG Cc1ccccc1C(c2ccccc2)OCCN(C)C OXEDRINE CNCC(c1ccc(cc1)O)O >1000 -0.088 167.2 Train Base NEG OXETHAZAINE <10 4.721 467.7 Test Base POS CC(C)(Cc1ccccc1)N(C)C(=O)CN(CCO )CC(=O)N(C)C(C)(C)Cc2ccccc2 OXIBENDAZOLE CCCOc1ccc2c(c1)nc([nH]2)NC(=O)O >1000 3.073 249.3 Train Neutral NEG C CC(C)NCC(COc1ccccc1OCC=C)O >1000 2.092 265.4 Test Base NEG PANTOLACTONE CC1(COC(=O)C1O)C >1000 -0.361 130.1 Train Neutral NEG CC(C)(CO)[C@H](C(=O)NCCC(=O)O) >1000 -1.494 219.2 Test Acid NEG O COc1ccc(cc1OC)Cc2c3cc(c(cc3ccn2) <10 3.782 339.4 Train Neutral POS OC)OC PARBENDAZOLE CCCCc1ccc2c(c1)nc([nH]2)NC(=O)O >1000 3.795 247.3 Test Neutral NEG C PARGYLINE CN(CC#C)Cc1ccccc1 >1000 2.576 159.2 Test Base NEG c1cc(ccc1[C@@H]2CCNC[C@H]2CO 68.91 4.238 329.4 Test Base POS HYDROCHLORIDE c3ccc4c(c3)OCO4)F c1cc(ccc1C(=N)N)OCCCCCOc2ccc(cc >1000 2.308 340.4 Train Base NEG 2)C(=N)N PENTYLENETETRAZOLE C1CCc2nnnn2CC1 >1000 1.195 138.2 Train Neutral NEG C1CCC(CC1)C(CC2CCCCN2)C3CCC 14.36 7.153 277.5 Train Base POS CC3 c1ccc2c(c1)N(c3cc(ccc3S2)Cl)CCCN4 34.84 3.807 404 Train Base POS CCN(CC4)CCO PF-1913539 Cc1ccc2c(c1)c3c(n2CCc4ccc(nc4)C)C 135.9 4.095 319.5 Train Base POS CN(C3)C c1ccc(cc1)CC(=O)NC(=O)N >1000 0.874 178.2 Train Neutral NEG c1ccc(cc1)CCNN >897.2 1.033 136.2 Test Neutral NEG PHENETHICILLIN CC(C(=O)N[C@H]1[C@@H]2N(C1=O <10 2.253 364.4 Test Acid POS )[C@H](C(S2)(C)C)C(=O)O)Oc3ccccc 3 PHENFORMIN c1ccc(cc1)CCNC(=N)NC(=N)N >1000 -0.669 205.3 Train Base NEG MALEATE CN(C)CCC(c1ccccc1)c2ccccn2 >1000 2.435 240.3 Train Base NEG CN1C(=O)CC(C1=O)c2ccccc2 >1000 0.942 189.2 Test Neutral NEG Cc1ccc(cc1)N(CC2=NCCN2)c3cccc(c >1000 3.809 281.4 Test Base NEG 3)O PHENYLMERCURY <10 0.711 336.7 Train Neutral POS ACETATE CC(=O)O[Hg]c1ccccc1 PHTHALYLSULFACETAMIDE CC(=O)NS(=O)(=O)c1ccc(cc1)NC(=O) >1000 -0.445 362.4 Test Acid NEG c2ccccc2C(=O)O PHYTOMENADIONE >1000 12.01 450.7 Train Neutral NEG CC1=C(C(=O)c2ccccc2C1=O)C/C=C(\ C)/CCCC(C)CCCC(C)CCCC(C)C <10 6.401 461.6 Train Base POS c1ccc2c(c1)[nH]c(=O)n2C3CCN(CC3) CCCC(c4ccc(cc4)F)c5ccc(cc5)F PIOGLITAZONE CCc1ccc(nc1)CCOc2ccc(cc2)CC3C(= 0.2621 3.533 356.4 Train Neutral POS O)NC(=O)S3 PIPERACILLIN 197.4 1.697 517.6 Train Acid POS CCN1CCN(C(=O)C1=O)C(=O)NC(c2c cccc2)C(=O)N[C@H]3[C@@H]4N(C3 =O)[C@H](C(S4)(C)C)C(=O)O PIPERONYL BUTOXIDE CCCCOCCOCCOCc1cc2c(cc1CCC)O <10 4.479 338.4 Train Neutral POS CO2 PIRACETAM C1CC(=O)N(C1)CC(=O)N >1000 -1.182 142.2 Test Neutral NEG CN1CCN(CC1)CC(=O)N2c3ccccc3C( >1000 -0.349 351.4 Test Base NEG =O)Nc4c2nccc4 c1ccc(cc1)Oc2c(cc(cc2S(=O)(=O)N)C( >790.6 2.497 362.4 Train Acid NEG =O)O)N3CCCC3 >1000 0.755 266.3 Train Base NEG CC(C)NCC(COc1ccc(cc1)NC(=O)C)O PRAZIQUANTEL c1ccc2c(c1)CCN3C2CN(CC3=O)C(=O 60.27 3.36 312.4 Train Neutral POS )C4CCCCC4 PRIMAQUINE CC(CCCN)Nc1cc(cc2c1nccc2)OC 53.12 2.598 259.4 Test Base POS PROBENECID CCCN(CCC)S(=O)(=O)c1ccc(cc1)C(= 586.8 3.371 285.4 Test Acid NEG O)O PROBUCOL CC(C)(C)c1cc(cc(c1O)C(C)(C)C)SC(C >1000 10.97 516.9 Train Neutral NEG )(C)Sc2cc(c(c(c2)C(C)(C)C)O)C(C)(C) C CCN(CC)CCOC(=O)c1ccc(cc1)N >1000 2.538 236.3 Test Base NEG CN1CCN(CC1)CCCN2c3ccccc3Sc4c2 <10 4.382 374 Train Base POS cc(cc4)Cl PROGESTERONE CC(=O)[C@H]1CC[C@@H]2[C@@]1( <10 3.965 314.5 Train Neutral POS CC[C@H]3[C@H]2CCC4=CC(=O)CC[ C@]34C)C CCCNCC(COc1ccccc1C(=O)CCc2ccc 53.7 3.636 341.4 Test Base POS cc2)O CC(C)[N+](C)(CCOC(=O)C1c2ccccc2 291.3 1.493 368.5 Train Cationic POS Oc3c1cccc3)C(C)C PROPRANOLOL CC(C)NCC(COc1cccc2c1cccc2)O >1000 2.753 259.3 Train Base NEG CC(Cc1ccc2c(c1)OCO2)NCC(c3ccc(c( >982.3 1.686 331.4 Train Base NEG HYDROCHLORIDE c3)O)O)O 128.6 4.865 263.4 Train Base POS CNCCCC1c2ccccc2C=Cc3c1cccc3 PROXICROMIL CCCc1c2c(c(c3c1oc(cc3=O)C(=O)O) 30.48 5.026 302.3 Train Acid POS O)CCCC2 PS-291822 CC[C@H](c1ccc(o1)C)Nc2c(c(=O)c2= <10 1.463 397.4 Train Acid POS O)Nc3cccc(c3O)C(=O)N(C)C PTK-787 c1ccc2c(c1)c(nnc2Nc3ccc(cc3)Cl)Cc4 <10 4.848 346.8 Test Neutral POS ccncc4 PYRIDOSTIGMINE BROMIDE C[n+]1cccc(c1)OC(=O)N(C)C >1000 -4.264 181.2 Test Cationic NEG PYRIMETHAMINE CCc1c(c(nc(n1)N)N)c2ccc(cc2)Cl 200.1 3.003 248.7 Test Neutral POS QUILOSTIGMINE <10 4.335 377.5 Test Base POS C[C@@]12CCN([C@@H]1N(c3c2cc(c c3)OC(=O)N4CCc5ccccc5C4)C)C QUINACRINE CCN(CC)CCCC(C)Nc1c2ccc(cc2nc3c 101.1 6.723 400 Train Base POS 1cc(cc3)OC)Cl C[C@@H](C(=O)N)NCc1ccc(cc1)OCc 174.7 2.49 302.3 Train Base POS 2ccccc2F 62.4 3.972 416.5 Train Acid POS c1ccc2c(c1)c3c(n2CCC(=O)O)CC[C@ H](C3)NS(=O)(=O)c4ccc(cc4)F CCC(=O)NCC[C@@H]1CCc2c1c3c(c 85.37 2.485 259.3 Test Neutral POS c2)OCC3 288.1 2.552 268.4 Test Base POS CC(Cc1ccccc1)(c2ccccc2)NC(=O)CN REMOXIPRIDE CCN1CCC[C@H]1CNC(=O)c2c(ccc(c <10 3.254 371.3 Train Base POS 2OC)Br)OC RESERPINE COc1ccc2c(c1)[nH]c3c2CCN4[C@@H <10 3.859 608.7 Train Base POS ]3C[C@H]5[C@@H](C4)C[C@H]([C@ @H]([C@H]5C(=O)OC)OC)OC(=O)c6 cc(c(c(c6)OC)OC)OC c1cc2c(cc1OC(F)(F)F)sc(n2)N 139 3.235 234.2 Train Neutral POS RIMONABANT Cc1c(n(nc1C(=O)NN2CCCCC2)c3ccc( <10 6.471 463.8 Train Neutral POS cc3Cl)Cl)c4ccc(cc4)Cl Cc1c(c(=O)n2c(n1)CCCC2)CCN3CCC <10.25 2.711 410.5 Test Base POS (CC3)c4c5ccc(cc5on4)F RO 1130830 c1cc(ccc1Oc2ccc(cc2)Cl)S(=O)(=O)C 23.36 1.852 425.9 Train Neutral POS C3(CCOCC3)C(=O)NO CS(=O)(=O)c1ccc(cc1)C2=C(C(=O)O 95.36 1.798 314.4 Train Neutral POS C2)c3ccccc3 CCCN1CCCC[C@H]1C(=O)Nc2c(cccc 646.4 3.162 274.4 Train Base NEG 2C)C ROSIGLITAZONE CN(CCOc1ccc(cc1)CC2C(=O)NC(=O) 6.362 3.02 357.4 Test Neutral POS S2)c3ccccn3 ROXARSONE c1cc(c(cc1[As](=O)(O)O)[N+](=O)[O- >1000 -0.233 263 Train Acid NEG ])O >1000 -1.059 286.3 Train Neutral NEG c1ccc(c(c1)CO)O[C@H]2[C@@H]([C @H]([C@@H]([C@H](O2)CO)O)O)O c1ccc(c(c1)C(=O)N)O >1000 1.277 137.1 Train Acid NEG c1ccc(c(c1)C(=O)Oc2ccccc2C(=O)O) >1000 3.334 258.2 Train Acid NEG O CN1C2CC(CC1C3C2O3)OC(=O)C(CO >1000 0.29 303.4 Train Base NEG )c4ccccc4 SEMATILIDE CCN(CC)CCNC(=O)c1ccc(cc1)NS(=O >1000 1.363 313.4 Test Zwitterion NEG )(=O)C CC1=C(C(=O)C(=C(C1=O)C)C(CCCC <14.31 4.95 354.4 Train Acid POS CC(=O)O)c2ccccc2)C SERTRALINE CN[C@H]1CC[C@H](c2c1cccc2)c3cc 29.34 5.347 306.2 Train Base POS c(c(c3)Cl)Cl SIBENADET c1ccc(cc1)CCOCCCS(=O)(=O)CCNC <10 2.453 464.6 Test Base POS Cc2ccc(c3c2sc(=O)[nH]3)O 16.43 1.981 474.6 Train Base POS CCCc1c2c(c(=O)[nH]c(n2)c3cc(ccc3O CC)S(=O)(=O)N4CCN(CC4)C)n(n1)C SIROLIMUS <10 7.039 914.2 Test Neutral POS C[C@@H]1CC[C@H]2C[C@@H](/C( =C/C=C/C=C/[C@H](C[C@H](C(=O)[ C@@H]([C@@H](/C(=C/[C@H](C(=O )C[C@H](OC(=O)[C@@H]3CCCCN3 C(=O)C(=O)[C@@]1(O2)O)[C@H](C) C[C@@H]4CC[C@H]([C@@H](C4)O C)O)C)/C)O)OC)C)C)/C)OC SITAGLIPTIN c1c(c(cc(c1F)F)F)C[C@H](CC(=O)N2 147.5 0.691 407.3 Test Base POS CCn3c(nnc3C(F)(F)F)C2)N SIVELESTAT CC(C)(C)C(=O)Oc1ccc(cc1)S(=O)(=O) 142.2 2.771 434.5 Train Acid POS Nc2ccccc2C(=O)NCC(=O)O SODIUM ARSANILATE c1cc(ccc1N)[As](=O)(O)O >1000 -1.172 217.1 Train Acid NEG SODIUM FUSIDATE <10 7.284 516.7 Test Acid POS C[C@H]1[C@@H]2CC[C@]3([C@H]([ C@]2(CC[C@H]1O)C)[C@@H](C[C@ @H]\4[C@@]3(C[C@@H](/C4=C(/CC C=C(C)C)\C(=O)O)OC(=O)C)C)O)C c1ccc(cc1)N2CNC(=O)C23CCN(CC3) <10 2.815 395.5 Test Base POS CCCC(=O)c4ccc(cc4)F SPIRAPRIL CCOC(=O)[C@H](CCc1ccccc1)N[C@ 47.11 1.046 466.6 Train Zwitterion POS HYDROCHLORIDE @H](C)C(=O)N2CC3(C[C@H]2C(=O) O)SCCS3 SU11248 CCN(CC)CCNC(=O)c1c(c([nH]c1C)/C 45.81 2.998 398.5 Test Base POS =C\2/c3cc(ccc3NC2=O)F)C SUCCINYLSULFATHIAZOLE c1cc(ccc1NC(=O)CCC(=O)O)S(=O)(= >1000 0.727 355.4 Train Acid NEG O)Nc2nccs2 SULFACETAMIDE CC(=O)NS(=O)(=O)c1ccc(cc1)N >1000 -0.976 214.2 Train Acid NEG SULFACHLORPYRIDAZINE >1000 0.706 284.7 Test Acid NEG c1cc(ccc1N)S(=O)(=O)Nc2ccc(nn2)Cl SULFADIAZINE >1000 0.1 250.3 Train Acid NEG c1cnc(nc1)NS(=O)(=O)c2ccc(cc2)N SULFADIMETHOXINE COc1cc(nc(n1)OC)NS(=O)(=O)c2ccc( >1000 1.981 310.3 Train Acid NEG cc2)N SULFADIMIDINE Cc1cc(nc(n1)NS(=O)(=O)c2ccc(cc2)N) >1000 1.097 278.3 Test Acid NEG C SULFADOXINE COc1c(ncnc1OC)NS(=O)(=O)c2ccc(cc >1000 1.231 310.3 Train Acid NEG 2)N SULFAMERAZINE >1000 0.599 264.3 Train Acid NEG Cc1ccnc(n1)NS(=O)(=O)c2ccc(cc2)N SULFAMETHIZOLE >1000 0.418 270.3 Train Acid NEG Cc1nnc(s1)NS(=O)(=O)c2ccc(cc2)N SULFAMETHOXAZOLE >1000 0.563 253.3 Test Acid NEG Cc1cc(no1)NS(=O)(=O)c2ccc(cc2)N SULFISOXAZOLE >1000 0.222 267.3 Train Acid NEG Cc1c(noc1NS(=O)(=O)c2ccc(cc2)N)C CC\1=C(c2cc(ccc2/C1=C\c3ccc(cc3)S 63.69 3.157 356.4 Train Acid POS (=O)C)F)CC(=O)O SULPIRIDE CCN1CCCC1CNC(=O)c2cc(ccc2OC)S 276.2 1.11 341.4 Train Base POS (=O)(=O)N SURAMIN 107 -8.579 1297 Test Acid POS Cc1ccc(cc1NC(=O)c2cccc(c2)NC(=O) Nc3cccc(c3)C(=O)Nc4cc(ccc4C)C(=O) Nc5ccc(c6c5c(cc(c6)S(=O)(=O)O)S(= O)(=O)O)S(=O)(=O)O)C(=O)Nc7ccc(c 8c7c(cc(c8)S(=O)(=O)O)S(=O)(=O)O) S(=O)(=O)O TACRINE c1ccc2c(c1)c(c3c(n2)CCCC3)N >1000 3.274 198.3 Train Neutral NEG TALNIFLUMATE c1ccc2c(c1)C(OC2=O)OC(=O)c3cccn >1000 5.085 414.3 Test Neutral NEG c3Nc4cccc(c4)C(F)(F)F TAMOXIFEN CC/C(=C(\c1ccccc1)/c2ccc(cc2)OCCN <10 6.818 371.5 Train Base POS (C)C)/c3ccccc3 Cc1c2c(cs1)C(=O)Nc3ccccc3N2C(=O) 315.3 0.657 370.5 Train Base NEG CN4CCN(CC4)C <17.47 6.073 471.7 Test Base POS CC(C)(C)c1ccc(cc1)C(CCCN2CCC(C C2)C(c3ccccc3)(c4ccccc4)O)O 512.9 3.834 264.4 Train Base NEG CCCCNc1ccc(cc1)C(=O)OCCN(C)C TETROQUINONE C1(=C(C(=O)C(=C(C1=O)O)O)O)O >1000 -1.849 172.1 Train Acid NEG THALIDOMIDE c1ccc2c(c1)C(=O)N(C2=O)C3CCC(=O >1000 0.528 258.2 Test Neutral NEG )NC3=O THIABENDAZOLE c1ccc2c(c1)[nH]c(n2)c3cscn3 560 2.358 201.3 Train Neutral NEG CN1CCCCC1CCN2c3ccccc3Sc4c2cc( 24.33 6.003 370.6 Train Base POS cc4)SC THIOTEPA C1CN1P(=S)(N2CC2)N3CC3 >1000 0.524 189.2 Train Neutral NEG THIOTHIXENE CN1CCN(CC1)CC/C=C\2/c3ccccc3Sc 30.4 3.228 443.6 Train Base POS 4c2cc(cc4)S(=O)(=O)N(C)C 26.95 4.388 263.8 Train Base POS c1ccc(c(c1)CN2CCc3c(ccs3)C2)Cl TIPREDANE <10 4.115 410.6 Train Neutral POS CCS[C@@]1(CC[C@@H]2[C@@]1(C [C@@H]([C@]3([C@H]2CCC4=CC(= O)C=C[C@@]43C)F)O)C)SC Cc1ccc(cc1)S(=O)(=O)NC(=O)NN2CC 285.7 1.338 311.4 Train Acid POS CCCC2 c1ccc(cc1)CC2=NCCN2 >1000 2.652 160.2 Train Base NEG >1000 2.497 270.4 Train Acid NEG CCCCNC(=O)NS(=O)(=O)c1ccc(cc1)C TOLCAPONE Cc1ccc(cc1)C(=O)c2cc(c(c(c2)O)O)[N <33.92 3.245 273.2 Train Acid POS +](=O)[O-] TOLNAFTATE Cc1cccc(c1)N(C)C(=S)Oc2ccc3ccccc3 <10.32 5.339 307.4 Test Neutral POS c2 TOLRESTAT CN(CC(=O)O)C(=S)c1cccc2c1ccc(c2 185.3 3.431 357.4 Test Acid POS C(F)(F)F)OC TORSEMIDE Cc1cccc(c1)Nc2ccncc2S(=O)(=O)NC( 130.7 3.358 348.4 Train Acid POS =O)NC(C)C TRAPIDIL CCN(CC)c1cc(nc2n1ncn2)C 627.8 1.752 205.3 Test Neutral NEG TRETINOIN CC1=C(C(CCC1)(C)C)/C=C/C(=C/C= <10 6.744 300.4 Train Acid POS C/C(=C/C(=O)O)/C)/C CN1CCN(CC1)CCCN2c3ccccc3Sc4c2 <10 4.692 407.5 Train Base POS cc(cc4)C(F)(F)F TRIFLUPERIDOL c1cc(cc(c1)C(F)(F)F)C2(CCN(CC2)CC 19.7 4.019 409.4 Train Base POS CC(=O)c3ccc(cc3)F)O CN(C)CCCN1c2ccccc2Sc3c1cc(cc3)C 39.01 5.61 352.4 Train Base POS (F)(F)F CCC(COC(=O)c1cc(c(c(c1)OC)OC)O <10 4.258 387.5 Test Base POS C)(c2ccccc2)N(C)C TRIMEPRAZINE 73.72 4.798 298.5 Train Base POS CC(CN1c2ccccc2Sc3c1cccc3)CN(C)C TRIMETHOPRIM 833.1 0.981 290.3 Train Base NEG COc1cc(cc(c1OC)OC)Cc2cnc(nc2N)N TRIMETOZINE COc1cc(cc(c1OC)OC)C(=O)N2CCOC >1000 0.275 281.3 Test Neutral NEG C2 TRIOXSALEN Cc1cc(=O)oc2c1cc3cc(oc3c2C)C >525.1 3.469 228.2 Train Neutral NEG Cc1ccc(cc1)/C(=C\CN2CCCC2)/c3ccc 290.8 3.634 278.4 Train Base POS cn3 TROGLITAZONE 6.9 5.585 441.5 Train Neutral POS Cc1c(c2c(c(c1O)C)CCC(O2)(C)COc3c cc(cc3)CC4C(=O)NC(=O)S4)C CCN(Cc1ccncc1)C(=O)C(CO)c2ccccc 150.9 1.181 284.4 Test Neutral POS 2 CC(C)(C)NCC(c1ccccc1Cl)O >1000 2.529 227.7 Test Base NEG Cn1c(cc(=O)n(c1=O)C)NCCCN2CCN( <11.3 2.443 387.5 Test Base POS CC2)c3ccccc3OC UTIBAPRIL CCOC(=O)[C@H](CCc1ccccc1)N[C@ <13.51 2.211 449.6 Test Zwitterion POS @H](C)C(=O)N2[C@@H](SC(=N2)C( C)(C)C)C(=O)O VALPROIC ACID CCCC(CCC)C(=O)O >1000 2.76 144.2 Train Acid NEG VENLAFAXINE CN(C)CC(c1ccc(cc1)OC)C2(CCCCC2) >1000 3.269 277.4 Train Base NEG O VERAPAMIL CC(C)C(CCCN(C)CCc1ccc(c(c1)OC)O 20.5 4.466 454.6 Train Base POS HYDROCHLORIDE C)(C#N)c2ccc(c(c2)OC)OC VX 745 c1cc(c(c(c1)Cl)c2c3ccc(nn3cnc2=O)S <10 5.467 436.3 Test Neutral POS c4ccc(cc4F)F)Cl CC(=O)CC(c1ccccc1)c2c(c3ccccc3oc 115.3 2.901 308.3 Train Neutral POS 2=O)O XALIPRODEN c1ccc2cc(ccc2c1)CCN3CCC(=CC3)c4 117.1 6.669 381.4 Train Base POS cccc(c4)C(F)(F)F XANTRYL <10 9.04 973.7 Train Acid POS c1c2c(c(c(c1I)O)I)Oc3c(cc(c(c3I)O)I)C 24c5c(c(c(c(c5Cl)Cl)Cl)Cl)C(=O)O4 Cc1cccc(c1NC2=NCCCS2)C >1000 0.52 220.3 Test Base NEG >796.1 5.376 244.4 Test Base NEG Cc1cc(cc(c1CC2=NCCN2)C)C(C)(C)C ZD-2486 C[C@H](CC(=O)O)COc1ccc(cc1)N2C >1000 3.041 355.4 Test Acid NEG CN(CC2)c3ccncc3 ZD-2574 Cc1cnc(c(n1)OC)NS(=O)(=O)c2cccnc 19.45 5.495 412.5 Train Acid POS 2c3ccc(cc3)CC(C)C ZD-6169 C[C@](C(=O)Nc1ccc(cc1)C(=O)c2ccc 21.39 3.786 337.3 Train Neutral POS cc2)(C(F)(F)F)O ZD-9379 Cc1cc(ccc1n2c(=O)c3c(c(=O)[nH]2)[n 490 1.842 383.8 Test Neutral NEG H]c4cc(ccc4c3=O)Cl)OC ZD-9583 Cc1ccc(c(c1)C#N)OC(C)(C)[C@H]2O <10 3.336 450.5 Train Acid POS C[C@H]([C@H](O2)c3cccnc3)C/C=C\ CCC(=O)O ZILEUTON CC(c1cc2ccccc2s1)N(C(=O)N)O >751.2 2.483 236.3 Train Neutral NEG ZIMELDINE 229.7 3.194 317.2 Train Base POS CN(C)C/C=C(/c1ccc(cc1)Br)\c2cccnc2 c1cc2c(cc1Cl)nc(o2)N >1000 1.893 168.6 Train Neutral NEG DMD #47068

Mitigating the inhibition of human Bile Salt Export Pump by drugs: opportunities provided by physicochemical property modulation, in-silico modeling and structural modification

Daniel J. Warner, Hongming Chen, Louis-David Cantin, J. Gerry Kenna, Simone Stahl, Clare

L. Walker, Tobias Noeske

Drug Metabolism and Disposition

Supplemental Data

Table S1. Metrics used in the assessment of the classification performance in Table 1 (main article).

Measure Formulaea Description

Probability of correctly

Accuracy classifying compounds.

Probability of predicting Sensitivity

positive when true class is (Recall) positive.

Probability of predicting

Specificity negative when true class is

negative.

Probability of correctly Positive

classifying compounds precision predicted to be positive.

Probability of correctly Negative

classifying compounds precision predicted to be negative.

Harmonic average of positive Fscore

precision and sensitivity.

1

DMD #47068

True accuracy which is

corrected by probability of Kappa obtaining agreement by

chance.

Correlation coefficient Matthews ( ) between the observed and Correlation predicted binary Coefficient classifications.

Note: TP: true positive, FP: false positive, TN: true negative, FN: false negative

Table S2. Five-fold cross validation results for different BSEP modeling strategies.

Matthews Neg. Pos. Model Sensitivity Specificity Accuracy Fscore Kappa Correlation prec. prec. Coefficient

RF_AZdesc 0.88 0.85 0.86 0.87 0.87 0.87 0.73 0.73

SVM_AZdesc 0.88 0.86 0.87 0.87 0.87 0.87 0.73 0.73

PLS_AZdesc 0.84 0.85 0.86 0.83 0.85 0.85 0.69 0.69

RF_AZFP_AZtop14 0.87 0.86 0.87 0.86 0.87 0.87 0.73 0.73

SVM_AZFP_AZtop14 0.89 0.84 0.84 0.88 0.86 0.86 0.72 0.72

2

DMD #47068

Figure S3. Transmembrane section of a BSEP homology model based on the X-ray crystal structure of P-glycoprotein (Aller, et al. 2009). (a) View through a portal illustrating basic (green carbon atoms) and acidic (magenta carbon atoms) residues. (b) View through a portal illustrating hydrophobic (brown carbon atoms) and polar (cyan carbon atoms) residues. Highlighted region illustrates an approximately 20Å section expected to be entirely embedded within the membrane. Backbone atoms are omitted for clarity.

3