ARTICLE
pubs.acs.org/jced
A Predictive Model for the Solubility and Octanol-Water Partition Coefficient of Pharmaceuticals † ‡ † ‡ Chieh-Ming Hsieh, Shu Wang, Shiang-Tai Lin,*, and Stanley I. Sandler †Department of Chemical Engineering, National Taiwan University, Taipei, 10617, Taiwan ‡Center for Molecular and Engineering Thermodynamics, Department of Chemical Engineering, University of Delaware, Newark, Delaware 19716, United States bS Supporting Information
ABSTRACT: The prediction of drug solubility in various pure and mixed solvents and the octanol-water partition coefficient (KOW) are evaluated using the recently revised conductor-like screening segment activity coefficient (COSMO-SAC) model. The solubility data of 51 drug compounds in 37 different solvents and their combinations over a temperature range of 273.15 K to 323.15 K (300 systems, 2918 data points) are calculated from the COSMO-SAC model and compared to experiments. The solubility data cover a wide range of solubility from (10-1 to 10-6) in mole fraction. When only the heat of fusion and the normal melting temperature of the drug are used, the average absolute error from the revised model is found to be 236 %, a significant reduction from that (388 %) of the original COSMO-SAC model. When the pure drug properties (heat of fusion and melting temperature) are not available, predictions can still be made with a similar accuracy using the solubility data of the drug in any other solvent or solvent mixture. The accuracy in prediction of the solubility in a mixed solvent can be greatly improved (average error of 70 %), if the measured solubility data in one pure solvent is used. Also, the standard deviation in log KOW of 89 drugs, whose values range from -3.7 to 5.25, is found to be 1.14 from the revised COSMO-SAC model. Our results show that the COSMO-SAC model can provide reliable predictions of properties of pharmaceuticals and is a useful tool for drug discovery.
1. INTRODUCTION of extremely large or small values, the accuracy of the measure- The knowledge of solubility and the octanol-water partition ments is sometimes questionable. It is not uncommon that reported values for the same compound may differ by a factor coefficient (KOW) of a drug are important in drug discovery, development, and manufacturing.1-3 For example, in the area of of 10 (one log unit) or more. Thus, a predictive thermodynamic rational drug design, a strategy in the discovery of new drugs is to model, that does not rely on experimental data, would be very useful for the design and development of drugs.9 use information about the structure of a drug receptor or ligands 10 4 In the literature several ways to estimate values of KOW and to identify candidate drugs. The octanol-water partition coeffi- 9,11-16 cient may be used for this process, as it is one index in the rule of the solubility of organic compounds have been reported five, in particular that log should not exceed five for a with varying degrees of accuracy. In particular, the diverse KOW chemical structures of drug molecules and the complexity in candidate drug. Another example of the use of the octanol-water drug-solvent interactions present great challenges. Most pre- partition coefficient is in quantitative structure-activity relation- dictive methods for KOW are based on an arbitrarily defined set of ships (QSARs), where it is assumed that there is a simple fragments and a correlation using approximately 1000 com- relationship between KOW and biological responses, such as 4 pounds (training set) for which there were reliable experimental the lethal dose for 50 % of test organisms, LD50. For the 17 KOW data. While such structure-based methods can be very synthesis, production, and formulation of a drug, solubility is accurate, their accuracy deteriorates for compounds that are an important factor, as solubility requirements can be very outside the training set. This is especially so for compounds different in different stages of the process. For example, high having complex structures, such as pharmaceuticals, for which it (or intermediate) solubility is desired in the reaction stage, while may also be difficult to divide the molecule systematically into the low solubility in the solvent is desired in the purification and fragments or groups that were defined in the training set. crystallization stage. The solubility of a drug in water is also Furthermore, some methods have not been verified for calculat- important to estimate the drug access to biological membranes.5 ing the solubility of drugs. Solvent screening (i.e., finding the optimal solvent or solvent Theoretically based approaches allow for the prediction of combinations for desired solubility) requires making many both KOW and solubility of drugs simultaneously. One very measurements, a costly and time-consuming task. Although the techniques of solubility and KOW measurement have improved,6-8 it is impractical to measure these data for all drug Special Issue: John M. Prausnitz Festschrift candidates at all possible operating conditions (temperature, Received: August 27, 2010 composition of a mixed solvent, etc.). Furthermore, as a result of Accepted: October 21, 2010 the wide range of values of solubility and KOW, especially in cases Published: November 11, 2010
r 2010 American Chemical Society 936 dx.doi.org/10.1021/je1008872 | J. Chem. Eng. Data 2011, 56, 936–945 Journal of Chemical & Engineering Data ARTICLE successful example is the nonrandom two-liquid segment activity The correction terms involving the heat capacities have been coefficient (NRTL-SAC) model of Chen and Song.13 In this neglected in eq 3 because their contributions are much smaller model, each compound is characterized using four species- compared to the contribution from the heat of fusion. Substitut- specific parameters that can be determined from the regression ing eqs 2 and 3 into eq 1, we obtain an expression for calculating 24 of few experimental vapor-liquid equilibrium or solubility data. the solubility of drug i, mole fraction xi of drug in the solvent, as Once the parameters are available for both the drug and the ! solvent, satisfactory predictions can be achieved. This method ΔfusHi 1 1 ln xi ¼ - - ln γiðT, P, x Þð4Þ has been shown to be a practical tool in the design of drug R T Tm, i purification processes.18 Another example is the conductor-like 11,12,16 screening (COSMO)-based activity coefficient models, in Therefore, to evaluate the solubility xi, it is necessary to have which the drug-solvent interactions are determined from quan- ΔfusHi and Tm,i of the drug and its activity coefficient γi in the tum mechanical solvation calculations. Therefore, predictions liquid mixture. The experimental values for ΔfusHi and Tm,i are can be made prior to, and in the absence of, any experimental used when available. Most of these data can be found in the measurements. Mullins et al.11 examined the accuracy of the DECHEMA25,26 and/or National Institute of Standards and conductor-like screening segment activity coefficient (COSMO- Technology (NIST)27 databases. When experimental data are SAC) model of Lin and Sandler19 using 33 drugs in 37 solvents not available, the group contribution method, such as that of and solvent mixtures. Shu and Lin16 have shown that the Chickos and Acree,28,29 can be used (only for 2-(4-methylphe- accuracy of COSMO-SAC in the prediction of drug solubility nyl)acetic acid in this work). in mixed solvents can be greatly improved when experimental Water and 1-octanol are partially miscible at ambient condi- solubility data in relevant pure solvents are used. Kaemmerer tions. When mixed, an octanol-rich phase (containing approxi- et al.20 used the COSMO-SAC model21 for solvent or antisolvent mately 0.725 mole fraction 1-octanol and 0.275 mole fraction screening and developed a selective crystallization process for water) coexists with another water-rich phase (which is nearly chiral pharmaceuticals. Recently, the COSMO-SAC activity pure water) at equilibrium. The octanol-water partition coeffi- coefficient model has been improved to provide better descrip- cient of a drug is the ratio of its distribution in these two phases tions of vapor-liquid and liquid-liquid equilibria.22 when a trace amount of the drug is added, that is, In this study, we examine the accuracy of the latest COSMO- O SAC model [denoted here as COSMO-SAC(2010)] in the Ci ð5Þ KOW, i ¼ lim W prediction of solubilities and octanol-water partition coeffi- Ci sf 0 Ci cients of drugs. A larger set of solubility data (51 drugs in 37 O W solvents and their mixtures) and the (of 89 drugs) is where Ci and Ci are the molar concentrations of drug i in the KOW octanol-rich and water-rich phases, respectively. In the limit of used, and the performance of the model on drugs of different O isomeric structures is tested. No experimental data are low solute concentrations, Ci can be estimated from the mole fraction of the drug and the total molar concentration of the needed for the prediction of KOW. Heat of fusion and melting O O O W W W temperature are used for the prediction of drug solubilities. liquid, that is, Ci ≈ xi C and Ci ≈ xi C . Furthermore, the However, when the needed data for the drug (heat of fusion equilibrium criteria require the equivalence of chemical potential and melting temperature) are missing, we show that consis- of the drug in both phases tent predictions can be achieved in both pure and mixed O O W W μ ðT, P, x Þ¼μ ðT, P, x Þð6Þ solvents using a single solubility data point. When the i i correction method of Shu and Lin16 is used, the COSMO- Therefore (from eq 2), the mole fraction of the drug is related to SAC(2010) model predictions of solubility data are within the activity coefficient as 70 % of the experiment. O W W O xi γi ðT, P, x Þ W ¼ O Þð7Þ 2. THEORY xi γi ðT, P, x The solubility limit of a solid drug i in a solvent S (whose and the octanol-water partition coefficient can be determined composition is denoted by x) is determined from the equality of from chemical potentials of the drug i in the solid (assuming a pure ! ! phase) and the liquid solvent at equilibrium, that is, O O O W;¥ xi C C γi log OW, i ¼ log lim ¼ log K sf 0 W W W O;¥ μsolidð , Þ¼μliquidð , , xÞð1Þ xi xi C C γi i T P i T P ! W;¥ The chemical potential of the species in the liquid mixture is 8:37γi ¼ log O;¥ ð8Þ related to its activity coefficient as 55:5γi liquid liquid where γO,¥ and γW,¥ are the infinite dilution activity coefficients μ ð , , x Þ¼μi ðT, PÞþRT ln xiγiðT, P, xÞð2Þ i i i T P of the drug in the octanol-rich phase and the water-rich phase, and the difference in chemical potentials of the compound in the respectively. Under ambient conditions, the octanol-rich phase -1 -1 liquid and solid states can be estimated from its heat of fusion contains approximately 2.3 mol 3 L 1-octanol and 6.07 mol 3 L 30 O -1 (Δ ) and normal melting temperature ( )23 water. Therefore, the total concentration C is 8.37 mol 3 L . fusHi Tm,i W -1 ! For pure water C =55.5mol3 L . The activity coefficient γ needed in eqs 4 and 8 is determined solid liquid T i μi ðT, PÞ - μi ðT, PÞ = ΔfusHi 1 - ð3Þ from the COSMO-SAC model originally developed by Lin and Tm, i Sandler [denoted as COSMO-SAC(2002)].19 The COSMO-SAC
937 dx.doi.org/10.1021/je1008872 |J. Chem. Eng. Data 2011, 56, 936–945 Journal of Chemical & Engineering Data ARTICLE model is a refinement of the COSMO-RS model of Klamt and his where the subscript S denotes the solution of interest (S = i for colleagues.31-33 In such models19,21,31-35 the interactions between pure liquid i), the superscripts s and t can be nhb, OH, or OT, and solute and solvent are considered as the interactions between the the segment exchange energy ΔW measures the interaction screening charges on the paired surface segments of the same size. energy between two surface segments of the same area aeff with These surface charges on the molecular cavity are determined from charge densities σ and σ m n a two-step first principles calculation. First, the optimal conforma- Δ ðσt , σs Þ¼ ðσt þ σs Þ2 - ðσt , σs Þðσt - σs Þ2 tion of the molecule in vacuum is determined by geometry W m n cES m n chb m n m n optimization using quantum mechanical (QM) density functional ð15Þ theory (DFT). Then, a COSMO solvation calculation32 is used to determine the total energy in the perfect conductor (solvent with The cES, the electrostatic interaction parameter, is a temperature- infinite dielectric constant) and the ideal screening charges on the dependent function molecular surface. The σ-profile is the probability of finding a BES surface segment with screening charge density σ and defined as cES ¼ AES þ ð16Þ T2 AiðσÞ ðσÞ¼ ð9Þ where AES and BES are adjustable parameters and their values are pi 22 Ai taken from literature. The hydrogen-bonding interaction para- meter (σt , σs) is independent of temperature and given by where and (σ) are the total surface areas of molecule i and the chb m n Ai Ai 8 summation of the surface areas of all segments with charge density > if s ¼ t ¼ OH and σt σs 0 > cOH - OH m 3 n < σ; pi(σ)isaspecific property of each pure component. The σ- > < if s ¼ t ¼ OT and σt σs 0 profile of a mixture is the summation of the σ-profile of each t s cOT - OT m 3 n < chbðσ , σ Þ¼ substance in the system weighted by its surface area (Ai)andmole m n > if s ¼ OH; t ¼ OT; and σt σs 0 > cOH - OT m 3 n < fraction ( i), that is, > x : 0 otherwise ∑ xiAipiðσÞ i ð17Þ pSðσÞ¼ ð10Þ ∑ xiAi where the values of three hydrogen-bonding interaction para- i meters have been given earlier.22 With the segment activity To have a better description of hydrogen-bonding (hb) 22 coefficient determined above, the activity coefficient of species interactions, Hsieh et al. suggested dividing the σ-profile into i in mixture S is determined from three components: contributions from surfaces of non-hydrogen bonding atoms, from surfaces of hydroxyl (OH) group, and from nhb, OH, OT Ai t t surfaces of all other types of hydrogen bond donating and ln γi=S ¼ ∑ ∑ piðσmÞ½ln ΓSðσmÞ t σ accepting atoms, that is, aeff m - ln Γtðσ Þ þ ln γcomb ð18Þ nhb OH OT i m i=S pðσÞ¼p0 ðσÞþp0 ðσÞþp0 ðσÞð11Þ comb nhb where γi/S , the combinatorial contribution to activity coeffi- where p0 (σ) is the collection of all non-hydrogen-bonding OH cient, is used to take into account the molecular size and shape segments; p0 (σ) are the segments on the hydroxyl groups; and OT differences of the species and is given by p0 (σ) includes all of the other hb segments, that is the segments belong to oxygen, nitrogen, fluorine, or hydrogen atoms con- φ θ φ ln γcomb ¼ ln i þ z ln i þ - i ∑ ð19Þ nected to N or F atoms. For interactions between atoms of i=S qi li xjlj xi 2 φi xi j hydrogen bonding donors and acceptors, Wang et al.21 suggested refining the hydrogen bonding σ profiles by removing the less with φi =(xiri)/(∑jxjrj), θi =(xiqi)/(∑jxjqj), and li =5(ri - qi) - polarized surfaces using a Gaussian-type function (ri - 1), where xi is the mole fraction of component i; ri and qi are ! the normalized volume and surface area parameters for compo- -σ2 nent i (the standard volume and area used for normalization are HBðσÞ¼1 - exp ð12Þ 3 2 P 2σ 2 0.06669 nm and 0.7953 nm ); the summation is over all of the 0 species in the mixture. This revised COSMO-SAC model is -2 21 denoted as the COSMO-SAC(2010) model in this study. The with σ0 = 0.7 e 3 nm . The σ-profile becomes values of all universal parameters in the COSMO-SAC(2010) nhb OH OT pðσÞ¼p ðσÞþp ðσÞþp ðσÞð13Þ model are given in ref 22 and have not been changed. All of the species specific quantities [ , , , (σ)] are obtained from first with OH(σ)= OH(σ) HB(σ), OT(σ)= OT(σ) HB(σ), Ai ri qi pi p p0 3 P p p0 3 P principle COSMO calculations. Two changes have been made in and nhb(σ)= nhb(σ) þ [ OH(σ) þ OT(σ)][1 - HB(σ)]. p p0 p0 p0 P COSMO-SAC(2010) compared to the original COSMO-SAC- Once the σ-profile is established, the activity coefficient (Γ) (2002) model. First, the electrostatic interaction parameter cES is for surface segment σm can be determined from 8 made to be temperature-dependent (eq 16). Second, the varia-