Predicting the Pka of Small Molecules Matthias Rupp*,1,2, Robert Körner1 and Igor V

Combinatorial Chemistry & High Throughput Screening, 2011, 14, 307-327 307 Predicting the pKa of Small Molecules Matthias Rupp*,1,2, Robert Körner1 and Igor V. Tetko1 1Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstraße 1, D-85764 Neuherberg, Germany 2Present Address: Machine Learning Group, Technische Universität Berlin, FR 6-9, Franklinstr. 28/29, D-10587 Berlin, Germany, and Institute for Pure and Applied Mathematics, University of California at Los Angeles, 460 Portola Plaza, Los Angeles, CA 90095-7121, USA Abstract: The biopharmaceutical profile of a compound depends directly on the dissociation constants of its acidic and basic groups, commonly expressed as the negative decadic logarithm pKa of the acid dissociation constant (Ka). We survey the literature on computational methods to predict the pKa of small molecules. In this, we address data availability (used data sets, data quality, proprietary versus public data), molecular representations (quantum mechanics, descriptors, structured representations), prediction methods (approaches, implementations), as well as pKa-specific issues such as mono- and multiprotic compounds. We discuss advantages, problems, recent progress, and challenges in the field. Keywords: pKa, acid dissociation constant, QSPR, quantitative structure-property relationships. 1. INTRODUCTION energy relationships for equilibrium constants of meta- and The acid dissociation constant (also protonation or para-substituted benzoic acids, log K / K0 = , where K0 , ionization constant) Ka is an equilibrium constant defined as K are the equilibrium constants of the substituted and the ratio of the protonated and the deprotonated form of a unsubstituted compound, and, and are constants compound; it is usually stated as pKa = log10 Ka. The pKa depending only on the substituent and the reaction, value of a compound strongly influences its pharmacokinetic respectively [10, 11]. Robert Taft modified this equation by and biochemical properties. Its accurate estimation is separating steric from polar and resonance effects [12]. therefore of great interest in areas such as biochemistry, Later, Corwin Hansch and Toshio Fujita introduced an medicinal chemistry, pharmaceutical chemistry, and drug additional parameter =logP log P for the substituent development. Aside from the pharmaceutical industry, it also 0 has relevance in environmental ecotoxicology, as well as the effect on hydrophobicity, where P , P0 are the octanol-water agrochemicals and specialty chemicals industries. In this partition ratios of the substituted and unsubstituted work, we survey approaches to the computational estimation compound [13, 14]. In the same year, Spencer Free and of pKa values of small compounds in an aqueous James Wilson [15, 16] published a closely related approach, environment. For related aspects like the prediction of pKa later improved by Toshio Fujita and Takashi Ban [17], with values of proteins, the prediction of pKa values in solvents structural features (presence and absence of substituents) other than water, or, the experimental determination of pKa instead of experimentally determined properties. values, we refer to the literature (Table 1). Table 1. Reviews of pKa Prediction and Related Topics, Sorted by Year and First Author Name. ADME = Absorption, 1.1. History Distribution, Metabolism, and Excretion 1.1.1. Quantitative Structure-Property Relationships The empirical estimation (as opposed to ab initio Ref. Author (Year) Coverage/Emphasis calculations) of pK values belongs to the field of a [1] Ho and Coote (2010) Continuum solvent pK calculations quantitative structure-property relationships (QSPR). The a basic postulate in QSPR modeling (and the closely related [2] Cruciani et al. (2009) pKa prediction and ADME profiling field of quantitative structure-activity relationships, QSAR) [3] Lee and Crippen Prediction of pKa values (proteins is that a compound’s physico-chemical properties are a (2009) and small molecules) function of its structure as described by (computable) [4] Manallack (2007) Distribution of pK values in drugs features. The idea that physiological activity of a compound a is a (mathematical) function of the chemical composition and [5] Fraczkiewicz (2006) In silico prediction of ionization constitution of the compound dates back at least to the work (theory and software) by Brown and Fraser [9] in 1868. Major break-throughs [6] Wan and Ulander High-throughput pKa screening, include the work by Louis Hammett, who established free (2006) and pKa prediction [7] Tomasi (2005) Quantum mechanical continuum solvation models *Address correspondence to this author at Technische Universität Berlin, FR 6-9, Franklinstr. 28/29, 10587 Berlin, Germany; Tel: ++49-30- [8] Selassie (2003) History of quantitative 31424927; Fax: ++49-30-31478622; E-mail: [email protected] structure-property relationships 1386-2073/11 $58.00+.00 © 2011 Bentham Science Publishers Ltd. 308 Combinatorial Chemistry & High Throughput Screening, 2011, Vol. 14, No. 5 Rupp et al. 1.1.2. pKa Estimation + c(A )c(H3O ) K a O . (3) QSPR studies involving pKa values were published in the c(HA)c early 1940s [18, 19]. Since then, a vast number of books, book chapters, conference contributions, and journal articles Taking the negative decadic logarithm have been published on the topic (Section 3). pKa = log10(K a ) yields the Henderson-Hasselbalch [25] equation 1.2. Definition c(HA) pKa pH + log10 , (4) 1.2.1. pKa-Values c(A ) According to the Brønsted-Lowry theory of acids and + + where pH= log a(H O ) log (c(H O )/cO ) . In an bases, an acid HA is a proton (hydrogen cation) donor, 10 3 10 3 ideal solution, the pK of a monoprotic weak acid is therefore HA H+ A , and a base B is a proton acceptor, a + the pH at which 50% of the substance is in deprotonated + + . For a weak acid in aqueous solution, the B + H BH form, and Equation 4 is an approximation of the mass action + law applicable to low-concentration aqueous solutions of a dissociation HA + H2O A + H3O is reversible. In the forward reaction, the acid HA and water, acting as a base, single monoprotic compound [26, 27]. + yield the conjugate base A and oxonium H3O 1.2.2. pKb-Values (protonated water) as conjugate acid. In the backward The protonation of a base B + H O HB+ + HO can reaction, oxonium acts as acid and A as base. The 2 corresponding equilibrium constant [20], known as the acid be described in the same terms as the deprotonation of an acid, leading to the base association constant dissociation constant Ka, is the ratio of the activities of + products and reagents, Kb = a(HB )a(HO )/(a(B)a(H2O)) . Adding the reaction equations for the deprotonation of HA and the protonation a(A )a(H O+ ) K = 3 , (1) of its conjugate base A gives 2H O H O+ + OH , with a a(HA)a(H O) 2 3 2 + 2 equilibrium constant Kw = a(H3O )a(OH )/a (H2O) . It where a() is the activity of a species under the given follows that K w = K a K b , and therefore conditions. The form of Equation 1 follows from the law of pK =p K pK 14 pK , where pK 14 from mass action for elementary (one-step) reactions like the b w a a w + 7 considered proton transfer reaction. Activity is a measure of c(H3O )=c(HO ) 10 mol/L at T = 298.15K and under “effective concentration”, a unitless quantity defined in the same assumptions as for Equation 3. Since pKa and pKb terms of chemical potential [21, 22], and can be expressed use the same scale, pKa-values are used for both acids and relative to a standard concentration: bases; however, data in older references is sometimes given O as pK -values. For prediction, one should not mix pK and μ(x) μ (x) c(x) b a a(x)=exp = (x) O , (2) pKb values. RT c 1.2.3. Multiprotic Compounds where μ() is the chemical potential of a species under the A multiprotic (also polyprotic) compound has more than 1 O given conditions (partial molar Gibbs energy ), μ () is the one ionizable center, i.e., it can donate or accept more than chemical potential of the species in a standard state (molar one proton. For n protonation sites, there are 2n Gibbs energy), R = 8.314472(15) JK 1mol1 is the gas microspecies (each site is either protonated or not, yielding n n1 constant, T is the temperature in kelvin, () is a 2 combinations) and n2 micro- pKas, i.e., equilibrium n dimensionless activity coefficient, c() is the molar (or constants between two microspecies (for each of the 2 O microspecies, each of the n protonation sites can change its molal) concentration of a species, and, c =1mol/L (or state; division by 2 corrects for counting each transition 1 mol / kg ) is a standard concentration. Values of () 1 twice). All microspecies with the same number of bound indicate deviations from ideality. Note that the activity of an protons form one of the n +1 possible macrostates acid can depend on its concentration [24]. In an ideal ( 0,1,…,n protons bound). Fig. (1) presents cetirizine as an solution ()=1, and effective concentrations equal example. For n >3, micro- pKas cannot be derived from analytical ones. With the assumptions ()=1 and titration curves without additional information or O assumptions, such as from symmetry considerations [28, 29]. c(H2O) = c =1 mol/L, inserting Equation 2 into Equation 1 yields an approximation valid for low concentrations of 1.2.4. Remarks HA in water: Compounds are called amphiprotic if they can act as both acid and base, e.g., water, or are multiprotic compounds with both acidic and basic groups. Neutral compounds with formal unit charges of opposite sign are called zwitterions; 1(Partial) molar Gibbs energy is also called (partial) molar free enthalpy [23]. Predicting the pKa of Small Molecules Combinatorial Chemistry & High Throughput Screening, 2011, Vol.

Predicting the Pka of Small Molecules Matthias Rupp*,1,2, Robert Körner1 and Igor V

Solution Is a Homogeneous Mixture of Two Or More Substances in Same Or Different Physical Phases. the Substances Forming The

Physical Chemistry

Simple Mixtures

Chapter 9 Ideal and Real Solutions

9.2 Raoult's Law and Henry's

The Dissolution Process Page 1 of 2 Chemistry 2E 11: Solutions And

Calculation of the Gibbs Free Energy of Solvation and Dissociation of Hcl in Water Via Monte Carlo Simulations and Continuum

Articles Plays Relative Humidity (RH), and Hygroscopic Growth Factors at an Important Role in Numerous Atmospheric Processes Such As 90% RH Ranged from 1.08 to 1.17

3. Ideal Solution Dissolution Equilibrium of Gas in Dissolution Equilibrium State of Gas Into Water, A(G)  A(Aq)

Solutions Notes.Notebook February 24, 2015

SOLUTIONS and SOLUBILITY a Saturated Solution Is One in Which the Solute Is in Equilibrium with the Solid Phase (Solute)

Raoult's Law Po Po B B Henry's Law