Very Fast Empirical Prediction and Rationalization of Protein Pka Values
Total Page:16
File Type:pdf, Size:1020Kb
PROTEINS: Structure, Function, and Bioinformatics 61:704–721 (2005) Very Fast Empirical Prediction and Rationalization of Protein pKa Values Hui Li,1† Andrew D. Robertson,2 and Jan H. Jensen1* 1Department of Chemistry and Center for Biocatalysis and Bioprocessing, The University of Iowa, Iowa City, Iowa 2Department of Biochemistry, The University of Iowa, Iowa City, Iowa 8,14–17 ABSTRACT A very fast empirical method is added to a model pKa value. These models usually presented for structure-based protein pKa predic- have a root-mean-square deviation (RMSD) from experi- tion and rationalization. The desolvation effects ment of Ͻ1. However, the currently available LPBE and intra-protein interactions, which cause varia- models tend to overestimate the intraprotein charge– 18,19 tions in pKa values of protein ionizable groups, are charge interactions and underestimate the hydrogen- 20 empirically related to the positions and chemical bonding and desolvation effects in calculating pKa shifts. nature of the groups proximate to the pKa sites. A The LPBE methods typically require tens of minutes to computer program is written to automatically pre- hours of computer time. dict pKa values based on these empirical relation- The pKa values of small organic acids and bases can be 21 ships within a couple of seconds. Unusual pKa val- predicted with the Hammet-Taft method based on empiri- ues at buried active sites, which are among the most cal relationship between pKa shifts and substituents. The interesting protein pKa values, are predicted very Hammet-Taft method is accurate for molecules similar to well with the empirical method. A test on 233 car- the ones included in the parameterization and is very fast. boxyl, 12 cysteine, 45 histidine, and 24 lysine pKa However, it is not generally applicable to proteins because values in various proteins shows a root-mean- the protein pKa shifts are not solely due to substituent square deviation (RMSD) of 0.89 from experimental effects. values. Removal of the 29 pKa values that are upper In this article, we present a study on the empirical ؍ or lower limits results in an RMSD 0.79 for the relationships between protein pKa shifts and structures. remaining 285 pKa values. Proteins 2005;61:704–721. These empirical relationships are in turn used for structure- © 2005 Wiley-Liss, Inc. based protein pKa prediction and rationalization. We show that this empirical method is able to accurately predict INTRODUCTION most protein pKa values in a matter of seconds. Proteins possess ionizable groups, which are important This article is organized as follows: first, we describe the for intraprotein, protein–solvent and protein–ligand inter- empirical method for fast protein pKa prediction and the actions,1 and play key roles in protein solubility, folding, rationale behind the approach. Second, we present the pKa stability, binding ability, and catalytic activity. The pKa predictions made with the empirical method and compare values of the ionizable residues are thus the basis for them to those made with other methods. Based on the understanding the pH-dependent characteristics of pro- predictions, we analyze the structural determinants of teins and catalytic mechanisms of many enzymes. carboxyl pKa values in proteins. Some interesting pKa Theoretical interpretation and prediction of protein pKa sites in various proteins are discussed in detail. Finally, values are useful for understanding many biochemical we summarize our work and discuss future directions. problems. Ionizable groups with unusually low or high pKa values tend to occur at protein active sites,2–4 so identifica- THEORY tion of unusual pK values may facilitate the identification a The pKa value of an ionizable group in a protein is of protein/enzyme active sites,5,6 as well as their functional predicted by applying an environmental perturbation, mechanisms.3,7 Rational design of drugs and new proteins will benefit tremendously from fast pKa prediction meth- ods. Some theoretical studies of proteins, such as force The Supplementary Materials referred to in this article can be found at www.interscience.wiley.com/jpages/0887-3585/suppmat/ field simulations, need the pKa values of ionizable groups Grant sponsor: the National Science Foundation; Grant number: to determine their charge states. MCB 0209941; Grant sponsor: the National Institutes of Health; The most popular pKa prediction methods for pro- Grant number: GM 46869. 8–13 teins are based on electrostatic continuum models that †Current address: Department of Chemistry, Iowa State University, numerically solve the linearized Poisson-Boltzmann equa- Ames, Iowa 50011. tion (LPBE). In these methods, the protein is treated by a *Correspondence to: Jan H. Jensen, Department of Chemistry and Center for Biocatalysis and Bioprocessing, The University of Iowa, molecular mechanics force field, embedded in a uniform Iowa City, Iowa 52242. E-mail: [email protected] dielectric continuum with dielectric constants of 80 for the Received 11 March 2005; Revised 12 May 2005; Accepted 2 June solvent and 4–20 for the protein interior. A pKa shift is 2005 calculated from the difference in electrostatic energy of a Published online 17 October 2005 in Wiley InterScience residue in its charged and neutral form and this shift is (www.interscience.wiley.com). DOI: 10.1002/prot.20660 © 2005 WILEY-LISS, INC. VERY FAST pKa PREDICTION 705 TABLE I. pKModel Values, Definitions of Center Points, and agreement to the experimental values of 2.4, 3.2, and 2.2. Radii Used to Compute Desolvation Effects Using No hydrogen bonds are found for Glu10 and Glu43 in Equations 6 and 7 OMTKY3, thus their pKa values are predicted to be 4.5 (i.e., pKModel values), within 0.4 units of the experimental pKModel Center points (CT) RLocal ϩ values of 4.1 and 4.8. This simple approach also works well C-ter 3.20 (rO rOXT)/2 4.5 ϩ for several other proteins (data not shown). Asp 3.80 (rOD1 rOD2)/2 4.5 ϩ However, Equation 2 is not always applicable because Glu 4.50 (rOE1 rOE2)/2 4.5 Surface his 6.50 (r ϩ r ϩ r ϩ r ϩ r )/5 4.0 hydrogen-bonding strength and, hence, its effects on pKa CG ND CE NE CD 23 Buried his 6.50 (r ϩ r ϩ r ϩ r ϩr )/5 6.0 shift are distance and angle dependent. For example, the CG ND CE NE CD O ... ... N-ter 8.00 rN 4.5 amide N H O hydrogen bond is strong if the H O ϳ Є ϳ Cys 9.00 rSG 3.5 distance is 2.0 Å and the angle NHO is 180°, while it Tyr 10.00 rOH 3.5 is significantly weaker if the H...O distance is Ͼ3.0Åor Lys 10.50 rNZ 4.5 the angle ЄNHO is ϳ90°. Including weaker hydrogen Arg 12.50 rCZ 5.0 bonds (larger NHB) may lead to an overestimation of ⌬ pKHB, while excluding weaker hydrogen bonds (smaller NHB) may lead to an underestimation. Furthermore, differ- ⌬ ent types of hydrogen bonding may have significantly pKa, to the unperturbed intrinsic pKa value of the group, different effects on pK shifts and need different C pKModel: a HB values. ϭ ϩ⌬ pKa pKModel pKa (1) To include distance and angle corrections for hydrogen- ⌬ bonding effects we use a distance function (Fig. 2) to Both the pKModel and pKa values are determined empiri- describe the pKa shifts due to side-chain hydrogen bonds cally. We use pKModel values similar to those used in other studies13,16,17 (Table I). For ammonium groups of N- (SDC–HB), termini and carboxyl groups of C-termini and Asp resi- CHB if D Յ d1 dues, the substituent effects of the backbone peptide bonds D Ϫ d2 21 Ͻ Ͻ ⌬ ϭ C ⅐ if d1 D d2 significantly decrease their pKa values. These effects are pKSDC-HB HB Ϫ (3) Ά d1 d2 assumed to be constants and included in the pKModel 0ifdՅ D ϭ 2 values. For example, we use a pKModel 3.8 for Asp, which ϭ is lower than the pKModel 4.5 for Glu by 0.7 units. and a distance/angle function to describe the pKa shifts Next we present the empirical relationships between the due to backbone hydrogen bonds (BKB-HB), ⌬pK and protein structures. The functional forms of a ⌬ Equations 3–5, 7, and 11 as well as the numerical values of pKBKB-HB the associated parameters were ultimately chosen based ϪCos ⅐ CHB if D Յ d1,Ͼ90° on trial and error. We endeavored to develop the simplest D Ϫ d 2 Ͻ Ͻ Ͼ ϭ ϪCos ⅐ C ⅐ if d1 D d2, 90° possible model capable of predicting pKa values within ca 1 HB Ϫ Ά d1 d2 pH unit of experiment for a majority of the cases [cf. Fig. 0ifd2 Յ D, Յ 90° 4]. (4) Hydrogen Bonding The functional forms of Equations 3 and 4 is the simplest 22 Our previous study of the molecular determinants of we could think of that describes the weakening of hydro- the pKa values of Asp and Glu residues in the protein gen bonds with increasing distance and orientation.23 By Turkey ovomucoid third domain (OMTKY3) identified comparison to experimental pKa values (Fig. 4) we found hydrogen bonding as the prime pKa determinant. For that the orientation dependence is necessary for accurate example, Asp7 forms two hydrogen bonds to the side-chain pKa predictions in the case of hydrogen bonds between the hydroxyl group and the backbone amide proton of Ser9, ionizable groups and the backbone, but not to the side thus lowering the pKa of Asp7 to 2.4 compared to a model chains. This may be due to a greater conformational value of 3.8.