Protein Structure Prediction and Potential Energy Landscape Analysis Using Continuous Global Minimization
Total Page:16
File Type:pdf, Size:1020Kb
Protein Structure Prediction and Potential Energy Landscape Analysis using Continuous Global Minimization KEN A. DILL Departmentof Pharmaceutical Chemistry, University of California at San Francisco, San Francisco, CA 94118 <[email protected]~edu~ ANDREW T. PHILLIPS ComputerScience Department, United StatesNaval Academy,Annapolis, MD 21402 <[email protected]> J. BEN ROSEN Computer Science and Engineering Department, University of California at San Diego, San Diego, CA 92093 <[email protected]> INTRODUCTION form of energy minimization technique, such as simu- lated annealing and genetic algorithms, rather than the Proteins require specific three-dimensional conforma- computationally more efficient continuous minimiza- tions to function properly. These “native” conforma- tion techniques. Each such method correctly predicts a tions result primarily from intramolecular interactions few protein structures, but missesmany others, and the between the atoms in the macromolecule, and also process is both slow and limited to structures of rela- intermolecular interactions between the macromole- tively small size. cule and the surrounding solvent. Although the folding The purpose of this paper is to show how one can process can be quite complex, the instructions guiding apply more efficient continuous minimization tech- this process are specified by the one-dimensional pri- niques to the energy minimization problem by using mary sequenceof the protein or nucleic acid: external an accurate continuous approximation to the discrete factors, such as helper (chaperone) proteins, present at information provided for known protein structures. In the time of folding have no effect on the final state of addition, we will show how the results of one particu- the protein. Many denatured proteins spontaneously lar computational method for protein structure predic- refold into functional conformations once denaturing tion (the CGU algorithm), which is based on this conditions are removed. Indeed, the existence of a continuous minimization technique, can be used both unique native conformation, in which residues distant to accurately determine the global minimum of poten- in sequence but close in proximity exhibit a densely tial energy function and also to offer a quantitative packed hydrophobic core, suggests that this three- analysis of all of the local (and global) minima on the dimensional structure is largely encoded within the energy landscape. sequential arrangement of these specific amino acids. Computational results using a continuous global In any case, the native structure is often the conforma- minimization technique on small to medium size pro- tion at the global minimum energy (see [ 11). tein structures has already proven successful. This In addition to the unique native (minimum energy) method has been extensively tested on a variety of structure, other less stable structures exist as well, computational platforms including the Intel Paragon, each with a corresponding potential energy. These Cray T3D, an 8 workstation Dee Alpha cluster, and a structures, in conjunction with the native structure, heterogeneousnetwork of 13 Sun SparcStations and 7 make up an energy landscape that can be used to char- SGI Indys. Protein structures with as many as 46 resi- acterize various aspects of the protein structure. dues have been computed in under 40 hours on the 20 Over 20 years of research into this “protein folding workstation heterogeneousnetwork. problem” has resulted in numerous important algo- rithms that aim to predict native three-dimensional THE POLYPEPTIDE MODEL AND POTENTIAL ENERGY protein structures (see [2], [3], [4], [8], [9], [lo], [12], FUNCTION [14], [15], [18], [19], [20], [21], and [22]). Suchmeth- ods assumethat the native structure is a balance of var- Since computational search methods are not yet fast ious interactions. These methods invariably use some enough to find global optima in real-space representa- 109 K. Dill et al. tions using accurate all-atom models and potential neighborhood of an experimentally determined value. functions, a practical conformational search strategy Among these are the 3n-1 backbone bond lengths 1 requires both a simplified, yet sufficiently realistic, between the pairs of consecutive atoms N-C’, C’-C,, molecular model with an associated potential energy and C,-N. Also, the 3n-2 backbone bond angles 8 function which consists of the dominant forces defined by N-&-C’, C&‘-N, and C/-N-C, are also involved in protein folding, and also a global optimiza- fixed at their ideal values. Finally, the n-l peptide bond tion method which takes full advantage of any special dihedral angles o are fixed in the trans (1 SO’) confor- properties of this kind of energy function. mation. This leaves only the n-l backbone dihedral Each residue in the primary sequenceof a protein is angle pairs (cp,yr)in the reduced representation model. characterized by its backbone components NH-C,H- These also are not completely independent; in fact, C’O and one of 20 possible amino acid sidechains they are severely constrained by known chemical data attached to the central C, atom. The three-dimensional (the Ramachandranplot) for each of the 20 amino acid structure of macromolecules is determined by internal residues. Furthermore, since the atoms from one C, to molecular coordinates consisting of bond lengths 1 the next C, along the backbone can be grouped into (defined by every pair of consecutive backbone rigid planar peptide units, there are no extra parame- atoms), bond angles 8 (defined by every three consecu- ters required to express the three-dimensional position tive backbone atoms), and the backbone dihedral of the attached 0 and H peptide atoms. Hence, these angles cp,v, and o, where cpgives the position of C’ rel- bond lengths and bond angles are also known and ative to the previous three consecutive backbone atoms fixed. C-N-C,, v gives the position of N relative to the pre- A key element of this simplified polypeptide vious three consecutive backbone atoms N-Cc-C’, and model is that each sidechain is classified as either <I)gives the position of C, relative to the previous three hydrophobic or polar, and is represented by only a sin- consecutive backbone atoms C&‘-N. Figure 1 illus- gle “virtual” center of mass atom. Since each trates this model. sidechain is represented by only the single center of mass “virtual atom” C,, no extra parameters are needed to define the position of each sidechain with respect to the backbone mainchain. The twenty amino acids are thus classified into two groups, hydrophobic and polar, according to the scale given by Miyazawa and Jemigan in [ll]. Corresponding to this simplified polypeptide model is a potential energy function also characterized by its simplicity. This function includes four components: a contact energy term favoring pairwise H-H residues, a second contact term favoring hydrogen bond forma- tion between donor NH and acceptor C-0 pairs, a steric repulsive term which rejects any conformation that would permit unreasonably small interatomic dis- tances, and a main chain torsional term that allows only certain preset values for the backbone dihedral angle pairs (9,~). Since the residues in this model come in only two forms, H (hydrophobic) and P (polar), where the H-type monomers exhibit a strong pairwise attraction, the lowest free energy state is Figure 1 Simple Polypeptide Model obtained by those conformations which obtain the Fortunately, these 9n-6 parameters (for an n-resi- greatest number of H-H “contacts” (see [5]) and due structure) do not all vary independently. In fact, intrastrand hydrogen bonds. Despite its simplicity, the some of these (7~2-4of them) are regarded as fixed use of this type of potential function has already since they are found to vary within only a very small proven successful in studies conducted independently by Sun, Thomas, and Dill [ 181 and by Srinivasan and 110 Protein Structure Prediction and Potential Energy Landscape Analysis Rose [ 161. Both groups have demonstrated that this type of potential function is sufficient to accurately model the forces which are most responsible for fold- ing proteins. The specific potential function used ini- tially in this study is a simple modification of the Sun/ Thomas/Dill energy function and has the following form: E total = Eex + Ehp + Ebb + Eqny where E,, is the steric repulsive term which rejects any Figure 2 Combined Potential Function conformation that would permit unreasonably small Energy Terms E, + E,,p interatomic distances, Ehp is the contact energy term the most favorable hydrogen bond is one in which favoring pairwise H-H residues, EM is the contact energy term favoring pairwise hydrogen bonding, and all of the bonded pairs C/O, O/H, and H/N lie E,,,, is the main chain torsional term that allows only along a straight line, since the energies between those (cp,yr)pairs which are permitted by the Ram- the pairs C’/H and O/N are repulsive while the achandran maps. In particular, the excluded volume energiesbetween C/N and O/H are attractive (see energy term E, and the hydrophobic interaction Figure 3). If for the #i pair of carbonyl and amide energy term Ehp are defined in this case as follows: Cl Eex = C 9 and 1.0 + eXp((dij - d,ff j/d,) Figure 3 Schematic Illustration of Most ij Favorable Hydrogen Bond Alignment to be the distancebetween Ehp E C Eiif(dij> where groups, we define d(‘)ii 2)aato be the distance [i-J1 >2 ~t~~e~dC~ zltr$j)f ti be he distance r between 0 and H, and dt jii to be the distance .fCdij) = 1.0 + exp((~j- d,)/dt) ’ between the 0 and N, then the energy term Ebb can be representedas follows (with an additional The excluded volume term Eex is a soft sigmoidal steric repulsive energy term, not shown, between potential where dii is the interatomic distance between the O/H pair as well): two C, atoms or between two sidechain center of mass 4 atoms C,, &,, determines the rate of decrease of E,,, E and de8 determines the midpoint of the function (i.e.