Solvation Parameters for Predicting the Structure of Surface Loops in Proteins: Transferability and Entropic Effects
Total Page:16
File Type:pdf, Size:1020Kb
PROTEINS: Structure, Function, and Genetics 51:470–483 (2003) Solvation Parameters for Predicting the Structure of Surface Loops in Proteins: Transferability and Entropic Effects Bedamati Das and Hagai Meirovitch* Center for Computational Biology and Bioinformatics and Department of Molecular Genetics and Biochemistry, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania ABSTRACT A new procedure for optimizing nuclear magnetic resonance (NMR) and X-ray crystallogra- parameters of implicit solvation models introduced phy experiments, where for the latter it is reflected in by us has been applied successfully first to cyclic terms of large B-factors1 or a complete disorder. Thus, the peptides and more recently to three surface loops of conformational change between a free and a bound anti- ribonuclease A (Das and Meirovitch, Proteins 2001; body shows the flexibility of the antibody-combining site, ؍ ؍ 43:303–314) using the simplified model Etot EFF( which typically includes hypervariable loops; this provides ؉⌺ nr) i iAi , where i are atomic solvation parame- an example of induced fit as a mechanism for antibody- ters (ASPs) to be optimized, Ai is the solvent acces- antigen recognition (see, e.g., Refs. 2 and 3). Alternatively, ؍ sible surface area of atom i, EFF( nr)isthe the selected-fit mechanism has been suggested, where the AMBER force-field energy of the loop-loop and loop- free active site interconverts among different states, and template interactions with a distance-dependent one of them is selected upon binding4; the same also -nr, where n is a parameter. applies for loops. Dynamic NMR experiments and molecu؍dielectric constant, The loop is free to move while the protein template lar dynamics (MD) simulations6 of HIV protease have is held fixed in its X-ray structure; an extensive found a strong correlation between the flexibility of certain conformational search for energy minimized loop segments of the protein and the movement of the flaps structures is carried out with our local torsional (that cover the active site) upon ligation.7 Loops are known deformation method. The optimal ASPs and n are to form “lids” over active sites of proteins, and mutagenesis those for which the structure with the lowest mini- experiments show that residues within these loops are mized energy [Etot(n, i)] becomes the experimental crucial for substrate binding or enzymatic catalysis; again, X-ray structure, or less strictly, the energy gap these loops are typically flexible (see review by Fetrow8). between these structures is within 2–3 kcal/mol. To The interest in surface loops has yielded extensive check if a set of ASPs can be defined, which is theoretical work where one avenue of research has been transferable to a large number of loops, we optimize the classification of loop structures.8–16 However, to under- ؍ individual sets of ASPs (based on n 2) for 12 stand various recognition mechanisms like those men- surface loops from which an “averaged” best-fit set tioned above, it is mandatory to be able to predict the is defined. This set is then applied to the 12 loops structure (or structures) of a loop by theoretical/computa- and an independent “test” group of 8 loops leading tional procedures. As discussed in detail below, this is not in most cases to very small RMSD values; thus, this a trivial task due to the irregular structures of loops, their set can be useful for structure prediction of loops in flexibility, and exposure to the solvent, which requires homology modeling. For three loops we also calcu- developing adequate modeling of solvation. In fact, struc- late the free energy gaps to find that they are only ture prediction of loops constitutes a challenge in protein slightly smaller than their energy counterparts, engineering, where a loop undergoes mutations, inser- indicating that only larger n will enable reducing tions, or deletions of amino acids. Determination of the too large gaps. Because of its simplicity, this model structure of large loops is still an unsolved problem in allowed carrying out an extensive application of our homology modeling.17–19 methodology, providing thereby a large number of Loop structures are commonly predicted by a compara- benchmark results for comparison with future calcu- tive modeling approach based on known loop structures lations based on n > 2 as well as on more sophisti- cated solvation models with as yet unknown perfor- mance for loops. Proteins 2003;51:470–483. Grant sponsor: National Institutes of Health; Grant number: © 2003 Wiley-Liss, Inc. ROIGM61916. *Correspondence to: Hagai Meirovitch, Center for Computational INTRODUCTION Biology and Bioinformatics and Department of Molecular Genetics and Biochemistry, School of Medicine, University of Pittsburgh, PA Surface loops in proteins take part in protein-protein 15213. E-mail: [email protected] and protein-ligand interactions, where their flexibility in Received 12 August 2002; Accepted 21 November 2002 many cases is essential for these recognition processes. The loop flexibility is demonstrated in multidimensional © 2003 WILEY-LISS, INC. SOLVATION PARAMETERS FOR SURFACE LOOPS 471 from the Protein Data Bank (PDB),20,21 an energetic The foregoing discussion indicates that to date the approach, or methods that are hybrid of these two ap- energetic approach is the best way for predicting the proaches. The first approach is based in most cases on structure of large loops in homology modeling and protein matching segments from the database with the length of a engineering, and it constitutes the only alternative for target loop and the relative positions of its adjacent studying the flexibility of loops. Recently, we developed a residues. Hence, this approach is especially appropriate statistical mechanics methodology46–49for treating flexibil- for homology studies, where the protein framework is not ity, which was used successfully to predict the solution known exactly. However, only short loops (up to five structures and populations of cyclic peptides in DMSO.50,51 residues) could be treated effectively by comparative mod- This methodology relies on a novel and general method for eling.22–26 To date, statistical and hybrid methods cannot optimizing parameters of implicit solvation models, which handle loops of more than n ϭ 9 residues because of the was used to optimize atomic solvation parameters (ASPs), 27–30 lack of sufficiently large databases (see a detailed i, of the simplified well-studied solvation model, discussion in our previous article, Ref. 31). ϭ ͑ε ϭ ͒ ϩ ϭ ͑ε ϭ ͒ ϩ With the energetic approach, loop structures are gener- Etot EFF nr Esolv EFF nr i Ai (1) ated by conformational search methods (simulated anneal- i ing, bond relaxation algorithm, and others) subject to the where E is the force-field energy and A is the solvent- spatial restrictions imposed by the known three-dimen- FF i accessible surface area of atom i; ε ϭ nr is a distance- sional (3D) structure of the rest of the protein (the tem- dependent dielectric constant where n is an additional plate). The quality of the prediction depends on the quality parameter. This optimization requires extensive conforma- of the loop-loop and loop-template interaction energy, the tional search for low-energy minimized structures, which modeling of the solvent, and the extent of conformational 27–39 is carried out with our highly efficient local torsional search applied. These methods are not expected to deformations (LTD) method.46,52,53 From a large sample of handle efficiently large loops because of the lack of confor- structures generated by LTD (using Etot with the opti- mational search capabilities, which in many cases is mized ASPs) one identifies a relatively small group of partially caused by complex construction procedures based low-energy structures that are significantly different. Each on at least two stages, where the side-chains are added to of the latter becomes a “seed” for Monte Carlo simulation an initially generated backbone (see a more detailed which spans its vicinity, the free energies (hence the discussion in our Ref. 31). relative populations) are calculated from the MC samples Modeling of the solvent is of special importance. In some with the local states (LS) method,54,55 and averages of of these studies, the solvation problem is not addressed at various properties over the samples’ contributions weighted all, whereas most of them only use a distance-dependent by the populations can be calculated and compared with ε ϭ dielectric constant ( r). Better treatments of solvation the experiment. Our long-range objective is to extend this 28 40 were applied by Moult and James and Mas et al. A methodology to surface loops of proteins in water, and an systematic comparison of solvation models was first car- initial step in this direction was carried out in our previous 41 ε ϭ ried out by Smith and Honig who tested the r model article31 (called here article I), where the optimization against results obtained by the Finite Difference Poisson procedure [based on Eq. 1 with the all-atom AMBER force Boltzmann calculation including a hydrophobic term; the field (EFF)] was tested as applied to two surface loops of 42 implicit solvation model of Wesson and Eisenberg with ε RNase A, leading to optimal n ϭ 2 and two independent ϭ r was also studied by them. More recently, the GB/SA but similar sets of ASPs. 43 44 model was applied to loops of Ribonuclease (RNase) A. It should be pointed out that although our parameter Comparing the efficiency of these methods is not straight- optimization is general, its practical applicability depends forward. However, as expected, the structure prediction is on the complexity of the solvation model and the loop size. found to improve as the ratio (loop length)/(distance be- Therefore, we have initially chosen to treat the relatively tween ends) decreases, and the conformational restrictions simplified model defined in Eq.