<<

Proc. Natl. Acad. Sci. USA Vol. 88, pp. 3661-3665, May 1991 Biophysics Calculation of conformation as an assembly of stable overlapping segments: Application to bovine pancreatic trypsin inhibitor (conformational energy calculations/short-range interactions/build-up procedure/"conformon") ISTVAN SIMON*, LESLIE GLASSERt, AND HAROLD A. SCHERAGAt Baker Laboratory of Chemistry, Cornell University, Ithaca, NY 14853-1301 Contributed by Harold A. Scheraga, January 24, 1991

ABSTRACT Conformations of bovine pancreatic trypsin (1). This procedure has been applied to bovine pancreatic inhibitor were calculated by assuming that the final structure trypsin inhibitor (BPTI), a 58-residue protein. as well as properly chosen overlapping segments thereof are simultaneously in low-energy (not necessarily the lowest- Designation of the Native Conformation energy) conformational states. Therefore, the whole chain can be built up from building blocks whose conformations are We assume that the native conformation is not only the one determined primarily by short-range interactions. Our earlier of highest stability but also the one in which all properly buildup procedure was modified by taking account of a statis- chosen segments ofthe polypeptide chain are simultaneously tical analysis ofknown sequences that indicates that in low-energy (not necessarily the lowest-energy) conforma- there is nonrandom pairing of amino acid residues in short tions (3). This implies that short-range interactions play a segments along the chain, and by carrying out energy mini- dominant role in determining the conformations of these mization on only these segments and on the whole chain segments (4). [without minimizing the energies ofintermediate-size segments The number of low-energy conformations of an oligopep- (20-30 residues long)]. Results of this statistical analysis were tide is usually several orders of magnitude smaller than the used to determine the variable sizes of the overlapping oligo- number of combinations constructed by combining the low- building blocks used in the calculations; these varied energy conformations of the individual residues of which the from tripeptides to octapeptides, depending on the amino acid is constituted. Since the common part of two sequence. Successive stages of approximations were used to overlapping segments must be in the same conformation in combine the low-energy conformations ofthese building blocks both of the overlapping (5), the number of in order to keep the number of variables in the computations conformations of the whole polypeptide chain in which most to a manageable size. The calculations led to a limited number of the overlapping segments are simultaneously in a low- ofconformations ofthe protein (only two different groups, with energy conformation is rather limited (3). Such whole poly- very similar structure within each group), most residues of peptide conformations are designated as "conformons" [for- which were in the same conformational state as in the native merly called X-conformations (3)], and it has been suggested structure. that, for properly defined segment sizes and energy ranges, the native conformation is the only conformon of the whole To overcome the large entropy difference between the unique polypeptide chain (3); for short-chain , this unique native state and the ensemble of conformations constituting conformon would be the native conformation, but for long the unfolded states, the native conformation ofa protein must chains it would be the native conformation only of an correspond to a deep minimum in its conformational energy independently folding domain. Thus, we search the confor- hypersurface. The whole conformational space available to mational space for the conformon ofthe whole chain (defined the protein cannot possibly be explored within any reason- by short-range interactions) and assume that it corresponds able amount of time, so that the success of refolding exper- to the native conformation. iments implies that the minimum in the Gibbs free energy On the other hand, we must keep in mind that, whereas function which corresponds to the native state must be the mini- significantly deeper than any other minima attainable during short-range interactions dominate in determining the course of refolding. When the whole of conformational mum-energy conformation corresponding to the conformon space is considered, however, there are so many minima (1) of the whole chain, long-range and protein-solvent interac- that there is no hope offinding the one minimum correspond- tions affect the stabilization free energy and contribute in ing to the native conformation by attempting to examine all determining the exact conformation of the protein; these of them (2). Therefore, the native conformation must be additional interactions are incorporated by minimizing the identified in a computationally feasible way that does not energy of the whole structure (with inclusion of solvent and require comparison ofthe energies ofall the minimum-energy disulfide bonds) in the final stage of our procedure. conformations. Low-energy conformations of oligopeptides can be ob- In this paper, we present a procedure that uses information tained by using a buildup procedure (6). Vdsquez and Scher- only about the amino acid sequence and locations of the aga (7, 8) built up the whole BPTI conformation from disulfide bonds ofthe protein and that results in a very limited low-energy fragments by using a limited number of distance number of conformations, which are close enough to the native one for application offurther refinement techniques Abbreviation: BPTI, bovine pancreatic trypsin inhibitor. *On leave from: Institute of Enzymology, B.R.C., Hungarian Acad- emy of Sciences, H-1518 Budapest, P.O. Box 7, Hungary 1989-90. The publication costs of this article were defrayed in part by page charge tOn leave from: Department of Chemistry, University of the Wit- payment. This article must therefore be hereby marked "advertisement" watersrand, Wits 2050, South Africa 1989-90. in accordance with 18 U.S.C. §1734 solely to indicate this fact. tTo whom reprint requests should be addressed. 3661 Downloaded by guest on September 23, 2021 3662 Biophysics: Simon et al. Proc. Natl. Acad. Sci. USA 88 (1991) constraints from simulated NMR spectra. In that approach, conformations differed from each other (i.e., in 4, the low-energy conformations of all the constituent tetrapep- q1, XI), irrespective of the conformation of residue R. These tides were calculated independently of one another and then selected PD dipeptide conformations were combined with all overlapped. Since the number of conformations of larger of the low-energy conformations ofthe next residue, F. With fragments built up from various combinations of these tet- this overlapping new tripeptide, the whole of the above rapeptide conformations was exceedingly large, about 200 procedure was repeated. The process was then repeated for distance constraints (from simulated NMR spectra) were each successive overlapping tripeptide of the whole BPTI required to reduce the magnitude of the computational prob- polypeptide chain. An 8-kcal/mol cutoff was used for tripep- lem to select the final, native structure. tides that did not contain half-cystine, C, whereas a higher, In this paper, we modify the buildup procedure by intro- 10-kcal/mol cutoff was used for those that did contain C, ducing statistical information expressing the correlation be- because of the extra constraining covalent bond that can tween the types of residues that can exist at various positions compensate for higher conformational energies. along the chain; this information dictates the sizes of the It is important to note that, when the conformations of the fragments (which depend on the amino acid sequence) that second tripeptide, PDF, were calculated, the lowest-energy should be used in the buildup procedure and obviates the conformation was not necessarily the known global- previous need (7, 8) to introduce distance constraints. At the minimum one of the isolated tripeptide PDF. This is because present stage of development of this modified procedure, we used only those initial PD conformations which appeared however, the final results are not yet as good as those of among the low-energy conformations of the first tripeptide, refs. 7 and 8. RPD, and this ensemble of PD conformations reflects the Initial Screening of the Conformational influence of the residue R. Some of the conformations of Space isolated PDF might have lower energies than those computed We first considered the influence of neighboring residues on here but, when the PD portion is combined with R, they each other's conformations, without introducing correlations would have higher energies than the 8-kcal/mol cutoff for among the types of amino residues at this stage. For this RPD. The cutoff level was measured with respect to the purpose, we began with tripeptides as the initial building minimum of this modified set of PDF conformations (influ- blocks. enced by residue R), rather than from the global minimum of When the conformational energies of tripeptides (and, isolated PDF, and a range of energies (8-10 kcal/mol), later, of larger oligopeptides) were calculated, their N and C smaller than that used previously (7, 8), was enough to retain termini were blocked with acetyl and methylamide groups, the native conformations of all tripeptides. By using this respectively. These blocking groups simulated the adjacent overlapping procedure, all successive tripeptides, up to the peptide groups within a protein. The total energy was calcu- C-terminal one, feel the influence of all of the preceding lated with the ECEPP/2 potential (Empirical Conformational residues. Energy Program for ; refs. 9-11) together with the When all of the tripeptides of the BPTI polypeptide chain SUMSL minimizer (Secant Unconstrained Minimization had been generated in this manner-i.e., when the C-terminal Solver; ref. 12). All backbone and side-chain dihedral angles tripeptide, GGA, was reached-we had retained about 10 to were varied in the minimization. At the oligopeptide stage, no 10,000 conformations with an average of about 1000 for each solvent effects were included in the calculation because, at tripeptide. At this point, without carrying out any further this stage, it was not known which residues were exposed to energy calculations, we introduced the following screening solvent. In the final stage, when the conformational energy of process in the reverse direction to reduce the number of the whole BPTI molecule was calculated and minimized, the conformations even further. From the low-energy conforma- solvation free energy of its constituent groups, proportional tions of the C-terminal tripeptide, GGA, the different con- to the water-accessible surface area, was included in the total formations ofthe dipeptide GG (reflecting the influence ofthe energy (13) with an average penalty of0.025 kcal/(mol A2) for preceding residues and of the C-terminal residue A) were both polar and nonpolar groups (14). selected by the same criterion as that used for PD in the Unless noted otherwise, two conformations of a residue forward chain generation. Moving backward from the C were considered to be the same if their backbone dihedral terminus to the N terminus, we retained only those confor- angles fell in the same region of the (4, @i) map, as defined by mations of the preceding tripeptide, CGG, for which the the conformational code of Zimmerman et al. (15) [which conformation of the GG dipeptide portion appeared in the set divides the (4, @i) map into 16 regions], and the first side-chain of low-energy conformations of the GGA tripeptide, irre- dihedral angles, X1, were in the same rotational isomeric spective of the conformation of residue A. With the new, state, g+, g-, or t. reduced list for tripeptide CGG, the backward selection The conformational energies were minimized for sequen- (based on overlapping tripeptides) was continued until the tial tripeptides, starting at the N terminus of BPTI, as follows. N-terminal tripeptide, RPD, was reached. At the end of this All ofthe low-energy conformations ofthe terminally blocked backward selection, an average of about 300 conformations tripeptide RPD were calculated first, by combining all low- of each tripeptide remained, with the number varying from energy conformations (16) of the single terminally blocked about 10 to 3000; the native conformation of each tripeptide amino acid residues R, P, and D that fell within a 3.0-kcal/mol [in terms of the code of Zimmerman et al. (15)] still appeared range of the global minimum; all single residues differing only in each ensemble. in X' (j> 1) were replaced by the lowest-energy one in each set. All tripeptide conformations with minimized energies Introduction of Statistical Information: Identification of less than 8 kcal/mol above the global minimum were re- Low-Energy Building Blocks tained. These energy ranges were chosen in order to retain a manageable number of conformations. In order to build up the conformon of the whole protein from To minimize the low-energy conformations of the next low-energy conformations of smaller segments, we must first overlapping tripeptide, PDF, not as an isolated one but as one determine the sizes of the low-energy segments (to be built whose conformation reflects the presence of the preceding from the foregoing tripeptides) that will serve as the building residue R, the full list of low-energy conformations of RPD blocks. For this purpose, we next introduced the use of was reduced by selecting only the lowest-energy conforma- statistical information about regularities in amino acid se- tions (within the 8-kcal/mol range) of RPD for which the PD quences. Downloaded by guest on September 23, 2021 Biophysics: Simon et A Proc. Natd. Acad. Sci. USA 88 (1991) 3663 It has been shown that short-range regularities (nonran- tripeptide conformations of the previous reduced set for domness) exist in the amino acid sequences of proteins, and RPD, PDF, and DFC were combined (with no further screen- that these regularities lead to certain structural features ing at this point) to form the pentapeptide; the energies of all (17-19). Recently, the range of these regularities, measured of these conformations of RPDFC were then minimized with by the nonrandomness of amino acid pairing (or the corre- respect to all backbone and side-chain dihedral angles [the lation between the types of residues) as a function of the starting values of Xi, forj > 1, were the lowest-energy ones separation between two residues in an amino acid sequence, for single residues (16) for the given backbone conformation]. has been determined (20). If it is assumed that the range of It was observed that, after the earlier screening at the nonrandom pairing reflects the separation within which the tripeptide level, it was possible to retain the native confor- residues influence each other's conformations significantly, mations ofthe building blocks with a lower energy cutoffthan the lengths of the overlapping segments (building blocks) that 8-10 kcal/mol. Thus, a 5-kcal/mol cutoff was chosen in the simultaneously must be in low-energy conformations are energy minimization for each building block that did not found to vary from tripeptides to octapeptides, depending on include half-cystine, while a 7-kcal/mol cutoffwas chosen for the amino acid sequences of the relevant segments (20). those having half-cystine. Only structures having energies To obtain these building blocks, we combined the forego- below the cutoffs were retained as building blocks for further ing reduced set of tripeptide conformations into larger seg- buildup. When the C terminus was reached, with an accu- ments. The sizes chosen for these building blocks reflect the mulation of a few hundred to hundreds of thousands of separation distances for nonrandom pairing of the residues, oligopeptide conformations, the same backward selection as found from a large protein sequence data base (20). For (with no energy minimization) as was described for the example, the N-terminal sequence of BPTI is RPDF- tripeptides was applied; in this backward screening, the CLEPPY.... Starting with R at position i, there are non- degree of overlap shown in Fig. 1 was used. This screening random correlations of R with P at position i + 1, with D at resulted in 2 to 10,000 conformations per oligopeptide, with position i + 2, with F at position i + 3, and with C at position the retention of the native conformation of each building i + 4, but the correlation becomes random with residue L at block. position i + 5; therefore, the first building block is RPDFC. The remaining building blocks, selected by this criterion, are Buildup of Chain with Low-Energy Building Blocks: Use of each shifted along the chain by at least one residue and are Disulfide Loops ilhimctretaA crhermntieqilv in Pip 1 To IIemrnntrate how the second building block was obtained, we note that, for P taken To proceed further, it was necessary to invoke additional as residue i, there is a random correlation with the next three chemical information to limit the number of conformations residues, D, F, and C (20); likewise for D with respect to F retained during the construction of the whole molecule; this C, and L; however, the same logic, applied to F as residue i information was the locations of the disulfide-bonded loops. yields the hexapeptide FCLEPP. Therefore, these two oli- We began the build-up with the N terminus of the smallest gopeptides (RPDFC and FCLEPP), rather than the nonapep- disulfide-bonded loop, Cys30-Cys51 (actually from Gly28 to tide RPDFCLEPP, were considered as building blocks be- Arg53 to avoid splitting a building block, and then truncating cause R is correlated up to C but not beyond. back to Cys3-Cys51). Further reduction of the ensembles of The buildingblockswere assembled by again starting at t building blocks was achieved by considering the nondegen- N terminus ofthe protein. Forexample, forRPDFC, all ofthe erate minima-i.e., by selecting only one (lowest-energy) side-chain conformation for the given backbone conforma- l 10 20 30 40 50 58 tion (21) [according to the code of Zimmerman et al. (15)] for each residue in the building blocks. 1+1 l |In order to combine the ensembles of building blocks in all 1.4+' combinations [based on the conformational letter code (15)] according to the overlaps in Fig. 1 to form larger segments, 1-H-H the dihedral angles of the overlapping residues in the two Ill,, building blocks were averaged (since the conformations being averaged were similar, no change occurred in the conforma- Hl l H tional letter code). For values close to 1800, however, the Hll l l lldeviations4 from 1800 were averaged. This procedure of com- e+1-91 l §bination (without energy minimization) resulted in 3000 dif- F-H-I ferent backbone conformations (including the native) for the segment between residues 30 and 51. HlH Of these 3000 conformations, only 47 had a separation FH-Ibetween the two sulfur atoms of <10 A. The choice of 10 A l+Hl l llwas a compromise between 5 A, for which no conformations 1+4+1 were found, and 13 A, for which too many remained, espe- lHl l lcially when combined with later fragments. With the io-A cutoff, the native backbone conformation of the Cys30-Cys51 loop was lost, but several backbone conformations (very similar to the corresponding part of the native protein) were l-Hl l llretained. Energy minimization with a disulfide loop-closing H potential [without constraints on the (q5, f) dihedral angles] H was not applied at this stage because it might have shifted the conformational codes (15), thereby destroying the building and the essential FIG. 1. The sizes of the segments that were used as stable blocks preventing further implementation of building blocks in the buildup procedure, and their distribution along feature of this modified buildup procedure (the simultaneous the polypeptide chain of BPTI. The residues of the C-terminal existence of all the building blocks of the whole chain in tripeptide, G, G and A, are in random pairing with the preceding low-energy conformations). In the final stage of the proce- residues (20); therefore, this figure shows segments only up to dure, however, after the whole structure was built, energy residue 55. minimization (with a disulfide loop-closing potential) was Downloaded by guest on September 23, 2021 3664 Biophysics: Simon et al. Proc. Natl. Acad. Sci. USA 88 (1991) applied, and the conformational codes were allowed to The whole procedure resulted in two different families of change. conformations. Within each of these two conformational Among these 47 conformations, there were only 2 different families, the structures differed only in the conformations of conformations for the nonapeptide from residues 30 to 38, the terminal parts or in the side-chain conformations, but which is the C-terminal portion of a second loop, the one their backbones were very similar. involving Cys'4 and Cys38. These two nonapeptide confor- mations were combined, successively, with the building Results and Discussion blocks of Fig. 1, moving toward the N terminus of Cys'4 to form the Cys14-Cys38 segment. Of the resulting =2500 con- The lowest-energy structures of each of the two different formations of the 14-38 segment, only 49 had a separation conformational families were compared with the x-ray struc- distance between the sulfur atoms of <10 A. Combination of ture of native BPTI (22), using the coordinate set 5PTI from the 47 conformations of segment 30-51 and the 49 confor- the Brookhaven Data Bank (23, 24). Both calculated confor- mations of segment 14-38 resulted in 136 backbone confor- mations are much less compact than the x-ray structure. The mations for the 14-51 portion of the polypeptide. These two loops, from Cys'4 to Cys38 and from Cys30 to Cys51, conformations were extended in both directions with the appear to be similar to those in the native structure, although building blocks of Fig. 1, resulting in >4000 conformations of the Cys14-Cys38 loop is slightly less twisted and the Cys30- the 5-55 portion of the chain; only 28 of these, which had a Cys5' loop is more planar in the calculated conformations. separation distance between the sulfur atoms of Cys5 and The relative placements of the segments have been less Cys55 within the 10-A limit, were retained. accurately established. For example, the overlapping part of the two loops, the nonapeptide 30-38, was one of the central Consideration of the Side Chains of the Building Blocks elements of our calculation procedure. When the nonapep- tide from one of the families of the whole molecule is In each of the 28 backbone structures of the 5-55 segment, optimally superimposed on the corresponding portion of the the side chains were attached in all possible combinations x-ray structure, the rms deviation of the backbone atoms is (with respect to X1); however, we retained only those side- <1 A (the nonapeptide from the other family is not as good); chain conformations (i.e., values of X1) that were found however, when the whole molecule is superimposed on the earlier for the residues in the respective building blocks [the x-ray structure, these central nonapeptides lie far from one values of Xi (j > 1) were taken from the lowest-energy another, with a rms distance of 1oA. This demonstrates that conformations (for the given X1) of each single residue (16)]. when two structures are not sufficiently similar, the rms This selection procedure resulted in >300 conformations, but deviation can be a very misleading measure of agreement most of them were very similar in the backbone codes of because a single parameter cannot distinguish between, on Zimmerman et al. (15) and in the rotational isomeric states of the one hand, two completely different structures and, on the X1 of their side chains. These conformations essentially fell other, two structures consisting of very similar segments that into only two different groups, reflecting the two different are poorly placed with respect to each other. Therefore, at conformations for residues 30-38. We selected 20 structures the present stage ofthe development ofour procedure (where of segment 5-55 randomly (10 from each group), and com- our results are mainly ofconceptual interest), we use only the bined them with the N-terminal building blocks of Fig. 1 and conformational codes of Zimmerman et al. (15), rather than the original C-terminal tripeptides (the C-terminal residues, the rms deviation or a more detailed differential geometric G, G, and A being noncorrelated with the preceding resi- comparison (25), for comparing two conformations. This type dues). The terminal oligopeptides, especially the C-terminal of comparison not only illustrates the conformational differ- tripeptide GGA, exhibited great flexibility-i.e., hundreds of ences but also indicates whether the differences are distrib- conformations could be fitted to the rest of the molecule. uted evenly or are confined to a few positions along the Therefore, we combined 10 randomly selected conformations 14 of each of the N- and C-terminal oligopeptides with the 20 randomly selected conformations of segment 5-55 to produce F AClA BE F F IEC E ICI F 'Ft' F' F F E F*: B | F BPTI 2000 conformations of the whole BPTI molecule; 60 of these C A C F C IEj F F DJLI E C F 2 2000 conformations, chosen randomly from the two groups LAj (more from the group with lower energy), were selected for 15 29 further consideration. D F D C C E E E C A A A A A* G D F D C C E E E C C A A A A* E BPTI Energy Minimization of the Whole Molecule D FD C C E E E C A A AA A* G 2

Once the energies of the building blocks of Fig. 1 had been 30 44 C minimized, no further energy minimization had been applied I EiiDii Ei C CIrA iA*E GG G A D up to this point; the combination of the building blocks, thus IF IB' E A C D BPTI far, involved only selective screening. Now, the energies A E A* F A C D 2 L..JF: IEi FiE C'C'L IA*I [including a pseudopotential (9-11) to form the disulfide bonds] of the 60 selected conformations of the whole mole- 45 58 cule were minimized without solvation; only 24 of these 60 [m] FCI A C A A A A A A A* C* D G BPTI conformations had energies <104 kcal/mol (this high cutoff LIJ B: E AA A A A A A WrB B F* ? reflected high-energy local minima). Only 8 of these 24 had A LAJ A J [AJJ JA J LJ IAI C* E G 2 different conformations ofthe whole molecule, the remaining differences being only in the conformations of the N- and FIG. 2. Backbone conformational codes (15) of the BPTI resi- C-terminal groups. The of these 8 confor- dues. First row, structure 1; second row, standard-geometry version blocking energies (7, 8) of the x-ray structure; and third row, structure 2. The boxes mations were minimized again, but now including the solva- with full lines enclose identical conformational codes, whereas the tion free energy (13, 14); this reminimization required about boxes with broken lines enclose immediately adjacent conforma- 30 times more computational time but resulted only in some tional codes (15). The question marks arise because dihedral angle 4 reordering of the conformational energies with very little of residue 1 and dihedral angle 4i of residue 58 are not defined in the conformational change. x-ray structure. Downloaded by guest on September 23, 2021 Biophysics: Simon et A Proc. Natl. Acad. Sci. USA 88 (1991) 3665 polypeptide chain. Such a comparison is given in Fig. 2. For acknowledged. The computations were carried out at the Cornell both groups of calculated conformations, about two-thirds of National Supercomputer Facility, a resource of the Cornell Center the residues are in the same conformational code region as in for Theory and Simulation in Science and Engineering, which receives major funding from the National Science Foundation and the standard-geometry version (7, 8) of the x-ray structure. the IBM Corporation, with additional support from New York State Moreover, in structure no. 2, half of the residues with and members of the Corporate Research Institute. L.G. acknowl- incorrect backbone conformations are located at the two ends edges additional support for sabbatical leave from both the Univer- of the chain, which are more flexible than the interior of the sity of the Witwatersrand and the Foundation for Research Devel- chain. While such good agreement of conformational status opment. (15) is encouraging, it should be recalled that a perfect- prediction model [albeit only a 5-state one (26), compared to 1. Scheraga, H. A. (1989) Chem. Scr. 29A, 3-13. the 16-state one of Zimmerman et al. (15)] led to a poor 2. Levinthal, C. (1968) J. Chim. Phys. Phys.-Chim. Biol. 65, structure (26); however, further refinements by procedures 44-45. 3. Simon, I. (1985) J. Theor. Biol. 113, 703-710. outlined in ref. I can improve the structure. 4. Scheraga, H. A. (1973) Pure Appl. Chem. 36, 1-8. 5. Scheraga, H. A. (1983) Biopolymers 22, 1-14. Conclusions 6. Simon, I., Ndmethy, G. & Scheraga, H. A. (1978) Macromol- ecules 11, 797-804. Polypeptides with special, naturally selected amino acid 7. VAsquez, M. & Scheraga, H. A. (1988) J. Biomol. Struct. Dyn. sequences adopt unique, native conformations that have 5, 705-755. lower free energy than does the unfolded state. Therefore, 8. Vasquez, M. & Scheraga, H. A. (1988) J. Biomol. Struct. Dyn. when we attempt to solve the protein-folding problem, we 5, 757-784. must consider only those polypeptides having these 9. Momany, F. A., McGuire, R. F., Burgess, A. W. & Scheraga, special, H. A. (1975) J. Phys. Chem. 79, 2361-2381. naturally selected sequences. It therefore makes sense to 10. NMmethy, G., Pottle, M. S. & Scheraga, H. A. (1983) J. Phys. incorporate the special features of naturally selected se- Chem. 87, 1883-1887. quences in calculations ofprotein conformation. This was the 11. Sippl, M. J., Ndmethy, G. & Scheraga, H. A. (1984) J. Phys. basis for modifying the buildup procedure by selection of the Chem. 88, 6231-6233. sizes ofthe overlapping building blocks based on nonrandom 12. Gay, D. M. (1983) ACM Trans. Math. Software 9, 503-524. correlations of amino acid sequences, from which the whole 13. Vila, J., Williams, R. L., VAsquez, M. & Scheraga, H. A. BPTI molecule was built. (1990) Proteins Struct. Funct. Genet., in press. The whole procedure is based on the premise that short- 14. Chothia, C. (1974) Nature (London) 248, 338-339. range interactions determine which of the minimum-energy 15. Zimmerman, S. S., Pottle, M. S., Ndmethy, G. & Scheraga, H. A. (1977) Macromolecules 10, 1-9. structures correspond to the conformon, which is assumed to 16. VAsquez, M., Ndmethy, G. & Scheraga, H. A. (1983) Macro- be the native structure. It is believed that chain-folding molecules 16, 1043-1049. initiation sites (CFISs) are formed in various parts of the 17. Vonderviszt, F., Matrai, G. & Simon, I. (1986) Int. J. Pept. unfolded polypeptide chain in response to short-range inter- Protein Res. 27, 483-492. actions (27-29), and that further folding around these CFISs 18. Vonderviszt, F. & Simon, I. (1986) Biochem. Biophys. Res. is also dominated by short-range interactions. In fact, CFISs Commun. 139, 11-17. have been shown to exist in short peptide fragments (29). It 19. Tudos, E., Cserzo, M. & Simon, I. (1990) Int. J. Pept. Protein is further assumed that long-range and protein-solvent inter- Res. 36, 236-239. actions only enhance the ofthe folded structure. The 20. Cserzo, M. & Simon, I. (1989) Int. J. Pept. Protein Res. 34, stability 184-195. use of energy minimization (with inclusion of solvent and 21. Pincus, M. R., Klausner, R. D. & Scheraga, H. A. (1982) Proc. disulfide bonds) in the final stage of the procedure conforms Natd. Acad. Sci. USA 79, 5107-5110. to the collapse phenomenon discussed by Chan and Dill (30). 22. Wlodawer, A., Deisenhofer, J. & Huber, R. (1987) J. Mol. Biol. Polypeptide sequences that do not lead to a single structure 193, 145-156. whose free energy is significantly lower than that ofany other 23. Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, accessible conformation may be presumed to be unsuitable E. F., Jr., Brice, M. D., Rodgers, J. R., Kennard, O., Shiman- from a biological point of view, so that natural selection will ouchi, T. & Tasumi, M. (1977) J. Mol. Biol. 112, 535-542. not retain such amino acid sequences. The procedure pre- 24. Abola, E. E., Bernstein, F. C., Bryant, S. H., Koetzle, T. F. sented in this paper has taken advantage of this presumed & Weng, J. (1987) in Crystallographic Databases: Information Content, Software Systems, ScientificApplications, eds. Allen, natural selection to obtain a unique or, perhaps, a limited F. H., Bergerhoff, G. & Sievers, R. (Data Commission of the number of conformations for a polypeptide with selected Int. Union of Crystallography, Bonn), pp. 107-132. amino acid sequences in which the overlapping building 25. Rackovsky, S. & Scheraga, H. A. (1980) Macromolecules 13, blocks are simultaneously in low-energy conformations. 1440-1453. 26. Burgess, A. W. & Scheraga, H. A. (1975) Proc. Natl. Acad. We thank K. D. Gibson, A. Nayeem, G. Ndmethy, M. R. Pincus, Sci. USA 72, 1221-1225. S. Rackovsky, and M. VAsquez for helpful comments on the manu- 27. Matheson, R. R. & Scheraga, H. A. (1978) Macromolecules 11, script. This work was supported by research grants from the National 819-829. Institute of General Medical Sciences of the National Institutes of 28. Scheraga, H. A. (1980) in Protein Folding, ed. Jaenicke, R. Health (GM14312) and from the National Science Foundation (Elsevier, Amsterdam), pp. 261-288. (DMB84-01811). Support was also received from the National Foun- 29. Montelione, G. T. & Scheraga, H. A. (1989) Acc. Chem. Res. dation for Cancer Research and from the U.S.-Hungary Interna- 22, 70-76. tional Cooperative Science Program (NSF-INT88-22275). Support 30. Chan, H. S. & Dill, K. A. (1991) Annu. Rev. Biophys. Biophys. from the Hungarian Academy of Sciences (OTKA 318) is also Chem. 20, 447-490. Downloaded by guest on September 23, 2021