Generalized ensemble methods for de novo structure prediction

Alena Shmygelska1 and Michael Levitt1

Department of Structural Biology, Stanford University, Stanford, CA 94305-5126

Contributed by Michael Levitt, December 11, 2008 (sent for review October 12, 2008) Current methods for predicting structure depend on two temperatures, or computing other physical quantities affecting interrelated components: (i) an energy function that should have transitions between the states during the search. In particular, a low value near the correct structure and (ii) a method for advanced methods such as Temperature Replica Exchange searching through different conformations of the polypeptide Monte Carlo (TREM) (8) and Hamiltonian Replica Exchange chain. Identification of the most efficient search methods is essen- Monte Carlo (HREM) (10), have been shown to outperform tial if we are to be able to apply such methods broadly and with standard Monte Carlo in terms of sampling for both simplified confidence. In addition, efficient search methods provide a rigor- and all-atom force fields of small (8, 10, 11). ous test of existing energy functions, which are generally knowl- For longer proteins, the computational cost and ruggedness of edge-based and contain different terms added together with the all-atom energy function makes solving this problem partic- arbitrary weights. Here, we test different search methods with one ularly challenging as evidenced by the modest success of full- of the most accurate and predictive energy functions, namely atom refinement (12–14). For this reason, there are multiscale Rosetta the knowledge-based force-field from Baker’s group [Si- approaches that start with low-resolution or reduced-model mons K, Kooperberg C, Huang E, Baker D (1997) J Mol Biol 268:209– energy functions and then use all-atom energy functions on a few 225]. We use an implementation of a generalized ensemble search selected conformations [often relying on additional steps such as method to scale relevant parts of the energy function. This method, use of sequence homologs (2) or clustering (3, 4)] been devel- known as Hamiltonian Replica Exchange Monte Carlo, outperforms oped (4, 6, 12, 13). These approaches often fail to generate the original Monte Carlo Simulated Annealing used in the Rosetta low-resolution models within the ‘‘radius of convergence’’ (rmsd package in terms of sampling low-energy states. It also outper- Ͻ3 Å) of the native state necessary for the success of subsequent forms another widely used generalized ensemble search method full-atom refinement (2). known as Temperature Replica Exchange Monte Carlo. Our results In this work, we test whether enhanced conformational sam- reveal clear deficiencies in the low-resolution Rosetta energy pling of low-resolution models can improve structure prediction. function in that the lowest energy structures are not necessarily Specifically, we apply generalized Monte Carlo methods to one the most native-like. By using a set of nonnative low-energy of the most powerfully predictive de novo protein potential structures found by our extensive sampling, we discovered that energy functions, the low-resolution Rosetta force field (1). We the long-range and short-range backbone hydrogen-bonding en- compare the performance of two of the best-performing search ergy terms of the Rosetta energy discriminate between the non- methods, Temperature Replica Exchange Monte Carlo and native and native-like structures significantly better than the Hamiltonian Replica Exchange Monte Carlo, with the four- low-resolution score used in Rosetta. stage Monte Carlo Simulated Annealing protocol used in the original Rosetta algorithm. We show that for a representative set conformational search ͉ ͉ Rosetta force field of 40 proteins containing ␣, ␤, ␣ϩ␤, and ␣/␤ folds both the HREM and, to a lesser degree, TREM methods enhance sampling of low-energy states as compared with the original redicting the functional 3-dimensional structure (the native Rosetta method. More importantly, we are able to use the state) of a protein from its amino acid sequences is of central P nonnative-like low-energy structures sampled by generalized importance to structural and functional biology and has enor- ensemble methods to suggest improvements of the low-resolution mous applications in alleviating human disease. Even if the scoring function used in Rosetta. Our analysis of energy landscapes structures of all proteins were known, we would still not be able and structure clusters shows that HREM outperforms other search to answer questions related to diseases directly caused by protein methods, not only in terms of finding more low-energy states, but misfolding, such as certain types of cancer and Alzheimer’s and also in sampling a more diverse set of compact structures for use in Parkinson disease. For this we would need to understand the optimization of energy functions. physical basis of the energy terms that make the native state so BIOPHYSICS special. Such understanding of the energetics of the system Results and Discussion would also lead to more efficient and comprehensive drug Four Stages of the Rosetta Scoring Function. Rosetta’s low- design. Structure prediction depends on solving two problems: resolution Monte Carlo method (known here as ROSETTA) (i) describing the energy function with sufficient accuracy and employs a hierarchical protocol consisting of four sequential (ii) searching the conformational space sufficiently well. These searches that involve swapping fragments of length 9 and then 3 problems are particularly severe for proteins of biologically residues. Each stage employs a different scoring function. These relevant lengths (Ͼ150 aa). In this work we focus on conformational sampling, which has been recognized as the critical step in high-resolution structure Author contributions: A.S. and M.L. designed research; A.S. performed research; A.S. prediction (1–3). Most widely used standard methods for de analyzed data; and A.S. and M.L. wrote the paper. novo structure prediction are based on the variants of the Monte The authors declare no conflict of interest. Carlo method (4–6) and are unable to explore low-energy Freely available online through the PNAS open access option. regions efficiently because of the ruggedness of the potential 1To whom correspondence may be addressed. E-mail: [email protected] or energy surface. To overcome these problems, a number of [email protected]. generalized ensemble Monte Carlo methods have been devel- This article contains supporting information online at www.pnas.org/cgi/content/full/ oped (7–10). These methods strive to search energy space better 0812510106/DCSupplemental. by computing the density of states, sampling expanded ranges of © 2009 by The National Academy of Sciences of the USA

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0812510106 PNAS ͉ February 3, 2009 ͉ vol. 106 ͉ no. 5 ͉ 1415–1420 Downloaded by guest on October 2, 2021 Fig. 1. Energy and rmsd differences for HREM and TREM as compared to ROSETTA. (A) Showing the difference in energy value between conformations sampled during 20,000 independent runs by HREM and TREM and those conformations independently generated by ROSETTA for 40 selected proteins from the four structural classes of SCOP (55–208 aa). In each case we show differences for (i) the lowest energy values (min), (ii) the cutoff energy value for the 90th percentile of low-energy structures (p90, best 10% of structures), and (iii) the lowest energy values from the five largest clusters (Cbest). In all cases, HREM gets lower energy values than ROSETTA (energy differences Ͻ0), whereas TREM is better than ROSETTA in just 50% of the cases. (B) Showing the difference in C␣ root mean square deviation (rmsd) values between same conformations. In each case we show differences for (i) the rmsd for the lowest energy structure (min), (ii) the mean rmsd for the 90th percentile of low-energy structures (p90), and (iii) the cluster centroid rmsd from five largest clusters (Cbest).

four different energy-scoring functions involve (i) replacement found that the HREM search method generally outperforms of the extended chain (score0), (ii) buildup of the secondary other search methods in terms of sampling low-energy states on structure (score1), (iii) alternation of high (score2) and low all sequences. In particular, performance differences between (score5) sheet weights, and (iv) low-resolution centroid refine- the generalized ensemble methods, HREM and TREM, and ment (score3) (15). Finally, structures are selected according to ROSETTA (the lowest energy, the energy level below which another low-resolution centroid refinement score (score4). Each 10% of the structures lie, and the lowest energy among the five subsequent scoring function used in ROSETTA adds new terms, highly populated clusters) become more marked as the length of while leaving many energy contributions unchanged; this pro- the protein increases and seems to be larger for ␤-folds. In vides significant overlap of the energy values of conformations comparison with ROSETTA, HREM (consistently) and TREM sampled by different scoring functions. In addition, the cumu- (often) gave rise to significant improvement in terms of lower lative nature of the energy functions used consecutively in energy values. This did not always lead to the improvement in ROSETTA allows one to represent each scoring function as a rmsd because of false minima in the energy landscapes (Fig. 1B). scaled variant of the full energy function (score3). Additional information about specific energy contributions and scaling Energy Landscapes Sampled by ROSETTA and HREM. To gain addi- parameters for each energy component used by ROSETTA is tional insight into the energy landscape encountered during the provided in Materials and Methods. Our observation of overlap search for a given protein, we examined the 2-dimensional between scoring functions used in ROSETTA lead us to intro- distribution of conformations as a function of the low-resolution duce a new HREM implementation for Rosetta. Overlap pro- Rosetta’s score (score4), on the y axis and the C␣ rmsd or C␣ vides a number of similar Hamiltonians that are related by a global distance test total score [GDT࿝TS (16)] to the native state, scale. In our implementation of HREM, we assign each replica on the x axis. We used the density of states to reveal the free to one of the four scoring functions, and attempt exchanges energy of the underlying landscape when folding with HREM between the replicas. We find that HREM’s ‘‘low-effective- and ROSETTA for all 40 proteins. Particular insight comes from temperature’’ replicas (replicas that use the full, nonscaled comparing results obtained starting from an extended and the energy potential) sample lower energies than those sampled by native state: both starting states have converged to a similar the final stage of the low-resolution protocol in ROSETTA. structure in the lower-energy range for most proteins. However, Moreover, the overlap between the distributions of conforma- simulations from the native state showing location of the near- tions sampled by four different scoring functions is increased by native states (rmsd Ͻ 3 Å) usually reveal a false region of our HREM scheme [supporting information (SI) Fig. S1]. attraction with rmsd from 3.5 to 17.4 Å (average value, 8.7 Ϯ 4.5 Å) having energies 54.0 Ϯ 23.8 kT lower than the near-native Low-Energy and Low-rmsd Conformational Sampling. To study en- conformations (energy differences range from 14.4 to 135.3 kT). ergy landscape features of each of the three search methods, we Longer proteins (Ͼ90 aa) tend to be at the upper end of this examined energy values and rmsd value of conformations sam- range for both rmsd and energy differences. By contrast, shorter pled during 20,000 runs starting from an extended state. Fig. 1 proteins tend to have energy landscapes with a flat, false shows results for 40 sequences of different lengths (55–208 aa) lowest-energy region: many states with a wide range of rmsd and belonging to the four different structural classes ␣, ␤, ␣/␤, values have almost the same low energy value, which are lower and ␣ϩ␤. Analyzing the low energies sampled (Fig. 1A)we than the energy values of the near-native states.

1416 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0812510106 Shmygelska and Levitt Downloaded by guest on October 2, 2021 Fig. 2. Shown are distributions of conformations (blue to red for low to high density, a measure of the underlying free energy) generated by HREM and ROSETTA as a function of the low-resolution Rosetta score (score4) and the fit to the native structure as measured by either the C␣ root mean square deviation (rmsd) or the C␣ Global Distance Test Total Score (GDT࿝TS). A total of 20,000 structures were generated for each method starting from (i) an extended state or (ii) the native state of the all-␤ protein, 1e43a1 (Alpha Amylase, C-terminal ␤-sheet domain from Bacillus licheniformis) containing 90 aa). Clearly, HREM generates a much better sampling of conformations than ROSETTA.

Fig. 2 shows typical energy landscapes sampled from the Rosetta energy function discriminate near-native states from extended and the native state by HREM and ROSETTA. incorrect low-energy states, we examined two sets of 1,000 ROSETTA tends to sample only a local part of the energy low-rmsd states and one set of 1,000 low-energy (score4) states landscape, whereas HREM samples much more extensively. generated by all three methods for each of the 40 proteins in our Nevertheless, because of false regions of attraction, HREM dataset. The first set of low-rmsd states (low-rmsdnat, mean rmsd simulations from an extended state do not sample near-native 3.98 Å) was generated by sampling from the native state Ͻ conformations of interest (rmsd 3 Å). Together with the (structures closer than 1.0 Å were removed to prevent possible analysis presented in the previous section, these results show that artifacts in recognition by an energy function parameterized on the low-resolution energy function in Rosetta cannot reliably recognize near-native states.

Differences in Top Cluster Centers for ROSETTA and HREM. In prin- ciple, an accurate energy function should always recognize near-native conformations and discriminate them from nonna- tive conformations. In practice, there are scoring function inaccuracies and structural clustering must be used by de novo structure prediction methods to identify native-like structures (2, 4, 13). This makes two assumptions: (i) that the native confor- mation should have more structural neighbors than any other conformation because of the loss in configurational entropy on folding; and (ii) that this near-native energy basin is detected by the knowledge-based scoring functions used in Rosetta in that the basin results from the long-range hydrophobic interactions associated with native globular proteins (17). In our work, we

used the ‘‘LEADER’’clustering algorithm extensively tested with BIOPHYSICS the Rosetta protocol in the Critical Assessment of Structure Prediction (CASP) competition (2, 12, 13, 18). We clustered 5,000 lowest-energy structures of the 20,000 lowest-energy con- formations found in independent runs by each method. As seen in Fig. 3, HREM finds more diverse and larger clusters than TREM or ROSETTA. It is worth noting that agreement within HREM clusters is stronger than for TREM or ROSETTA (see Tables S4 and S5). A possible explanation for these observations, also supported by the energy landscape analysis in the previous section, is that HREM samples a more diverse set of highly populated low-energy basins of conformations. This indicates that the basins associated with false local minima are highly Fig. 3. Differences in top 5 clusters sampled by ROSETTA, HREM, and TREM. populated and thus represent conformations against which the (A) Shown is how the average rmsd of the top five clusters depends on protein scoring functions should be improved. length. (B) Shown is how the average cluster size (for the top five largest clusters) depends on protein length. In both cases we present data for the Deficiencies of the Rosetta Low-Resolution Scoring Function. To entire set of 40 proteins for the tree methods: ROSETTA, HREM, and TREM. further understand how the individual energy terms of the Superior performance of HREM is clear for both measures.

Shmygelska and Levitt PNAS ͉ February 3, 2009 ͉ vol. 106 ͉ no. 5 ͉ 1417 Downloaded by guest on October 2, 2021 Table 1. Native-like average Z score (top portion of the table), Pearson’s correlation coefficient between rmsd and energy score (bottom portion of the table), for low-rmsdnat vs. low-score4ext and low-rmsdext vs. low-score4ext discrimination

Low-rmsdnat vs. low-score4ext discrimination Low-rmsdext vs. low-score4ext discrimination

score4 hb࿝srbb hb࿝lrbb rama score4 hb࿝srbb hb࿝lrbb rama

Z␣ 9.21 Ϯ 6.45 Ϫ0.61 Ϯ 3.40 Ϫ0.90 Ϯ 1.76 Ϫ0.58 Ϯ 2.43 10.19 Ϯ 5.60 Ϫ1.08 Ϯ 1.97 0.97 Ϯ 0.95 Ϫ0.38 Ϯ 0.39 Z␤ 8.44 Ϯ 8.06 Ϫ0.48 Ϯ 1.86 Ϫ1.41 Ϯ 2.38 Ϫ0.78 Ϯ 1.53 10.18 Ϯ 4.84 Ϫ0.64 Ϯ 1.08 1.52 Ϯ 0.91 Ϫ0.21 Ϯ 0.74

Z␣/␤ 7.80 Ϯ 4.23 Ϫ0.19 Ϯ 1.37 Ϫ2.37 Ϯ 3.26 Ϫ1.05 Ϯ 2.63 8.53 Ϯ 4.16 Ϫ0.21 Ϯ 0.88 1.26 Ϯ 0.79 Ϫ0.16 Ϯ 0.48 Z␣ϩ␤ 6.10 Ϯ 6.74 Ϫ0.23 Ϯ 2.10 Ϫ1.12 Ϯ 2.26 Ϫ1.12 Ϯ 1.41 9.44 Ϯ 5.61 Ϫ0.69 Ϯ 1.45 0.88 Ϯ 0.98 Ϫ0.28 Ϯ 0.46

Zall 8.19 Ϯ 6.87 Ϫ0.38 Ϯ 2.17 Ϫ1.44 Ϯ 2.42 Ϫ0.87 Ϯ 1.97 9.92 Ϯ 5.45 Ϫ0.64 Ϯ 1.36 1.13 Ϯ 0.90 Ϫ0.23 Ϯ 0.52

r␣ Ϫ0.76 Ϯ 0.16 0.07 Ϯ 0.63 0.26 Ϯ 0.46 0.40 Ϯ 0.52 Ϫ0.72 Ϯ 0.15 0.26 Ϯ 0.32 Ϫ0.31 Ϯ 0.29 0.17 Ϯ 0.18 r␤ Ϫ0.51 Ϯ 0.53 0.15 Ϯ 0.57 0.18 Ϯ 0.29 0.26 Ϯ 0.42 Ϫ0.73 Ϯ 0.17 0.16 Ϯ 0.30 Ϫ0.46 Ϯ 0.21 0.03 Ϯ 0.28

r␣/␤ Ϫ0.56 Ϯ 0.26 0.05 Ϯ 0.36 0.33 Ϯ 0.34 0.21 Ϯ 0.62 Ϫ0.60 Ϯ 0.18 0.07 Ϯ 0.25 Ϫ0.38 Ϯ 0.22 0.03 Ϯ 0.20 r␣ϩ␤ Ϫ0.34 Ϯ 0.52 0.07 Ϯ 0.42 0.21 Ϯ 0.29 0.32 Ϯ 0.34 Ϫ0.65 Ϯ 0.24 0.17 Ϯ 0.32 Ϫ0.29 Ϯ 0.32 0.14 Ϯ 0.21

rall Ϫ0.52 Ϯ 0.42 0.08 Ϯ 0.47 0.24 Ϯ 0.33 0.28 Ϯ 0.47 Ϫ0.67 Ϯ 0.19 0.16 Ϯ 0.29 Ϫ0.35 Ϯ 0.27 0.08 Ϯ 0.23

Mean and standard deviation are provided.

high-resolution crystal structures), the second set of low-rmsd 1.36). Thus, the low-rmsdext decoys are less native-like and have states (low-rmsdext, mean rmsd 8.56 Å) and the set of low score4 fewer nonlocal interactions resulting in less favorable long-range energy states (low-score4ext, mean rmsd 13.12 Å) were generated and more favorable short-range hydrogen bonds; this suggests by sampling from an extended state. With its much greater that, as folding from an extended state proceeds to form more efficiency, HREM contributed most (85%) to the set of low- long-range interactions, the discriminatory power of hydrogen energy structures found by starting with an extended state. bonds shifts from short-range to long-range. Because these low-scoring decoys were produced by rigorously Fig. 4 shows how the long-range and short-range hydrogen sampling the energy function, they represent a challenging set of bonding backbone–backbone potential transforms the original local minima of Rosetta’s low-resolution energy function. low-resolution Rosetta energy (score4) landscape assigning We calculated two independent statistical measures that cap- lower energies to closer-to-native (low-rmsd) conformations. ture the ability of the scoring function to discriminate the This is shown separately for all of the proteins in each of the four native-like conformations from nonnative-like: (i) the Z score of fold classes: ␣, ␤, ␣ϩ␤, and ␣/␤. In Fig. 4A, we show low-rmsdnat the rmsd values of native-like conformations; and (ii) the set (start from the native state) and low-score4ext set (start from Pearson correlation coefficient between rmsd and score. In an extended state) discrimination by the original low-resolution Table 1, we give these values for the following discrimination energy (score4) and by long-range hydrogen bond score tasks: (i) discriminate low-rmsdnat from low-score4ext, and (ii) (hb࿝lrbb). In Fig. 4B, we show a more difficult discrimination test discriminate low-rmsdext from low-score4ext for the original for the low-rmsdext and low-score4ext decoys (both start from an low-resolution score (score4) as well as a selected set of Roset- extended state). Decoys in low-rmsdext set are less native-like ta’s low-resolution energy score terms that were identified as than those in the low-rmsdnat set and are thus harder to having most discriminatory power. We analyze each of the four distinguish from the low-score4ext decoys. In Fig. 4B, we see that structural classes separately, in addition to a combined analysis the score4 energy function is unable to distinguish structures for all proteins in our dataset. As seen (Table 1), good native-like with low rmsd values. In fact, the structures with the lowest average Z scores (Z ϽϪ1.0), and higher Pearson’s correlation score4 energies are generally Ͼ10 Å from the native structure; coefficients (r Ͼ 0.3), indicate the enhanced discrimination there is a distinct pattern of anticorrelation with the energy power of the hydrogen bond backbone–backbone scores for both becoming more favorable as the rmsd increases. A different tasks for all structural classes. energy term, the short-range backbone–backbone hydrogen For discrimination between perturbed native states (low- bond energy (hb࿝srbb) shown on Fig. 4B Right is generally able rmsdnat) and incorrectly scored low-energy states (low- to reverse this anticorrelation, but the energy of the low-rmsd score4ext), the ‘‘long-range’’ hydrogen bond term (hb࿝lrbb), decoys is now about the same as that of the decoys with higher where donor and acceptor of a backbone–backbone hydrogen rmsd. These results indicate that with efficient search methods bond separated by at least 5 amino acids along the sequence, is such as HREM the discrimination power of low-resolution more successful. For the more challenging discrimination be- energy functions can be improved. A promising methodology to tween near-native states sampled during ab initio folding (low- improve the discrimination power further is to use efficient rmsdext) and incorrectly scored low-energy states (low-score4ext), methods like HREM to locate decoys that are energy minima; the ‘‘short-range’’ backbone–backbone hydrogen bond term this is then followed by optimization of the energy function (hb࿝srbb), where donor and acceptor of a hydrogen bond is 4 or against these decoys. This paradigm, pioneered in 1996 (19–20), fewer amino acids apart along the sequence, is more successful. proved important in the formulation of Rosetta (1) and will Although this holds for all protein folds, it is less marked for ␣/␤ likely be as important for future improvement of methods for proteins; in our dataset these folds have longer lengths and larger structure prediction. rmsd values for both near-native and low-energy sets. Enhanced An orientation-dependent hydrogen-bonding energy term was discrimination is also shown by the Ramachandran score (rama). first added to Rosetta energy force field to enhance discrimi- We observed that low-rmsdnat conformations differed from nation between native-like and nonnative conformations just the low-rmsdext conformations in having more favorable long- before and during full-atom refinement (21). This energy is a range hydrogen bonds for ␤, ␣ϩ␤, and ␣/␤ folds (mean Z score linear combination of four terms that are parameterized by using is Ϫ1.53 Ϯ 1.33) and lower Ramachandran energies for ␤-folds a set of high-resolution protein crystal structures: (i) distance- (Z score is Ϫ1.58 Ϯ 0.79) as well as having less favorable dependent energy term derived from the distribution of dis- short-range hydrogen bonds for ␤-folds (Z score is 1.30 Ϯ 1.76) tances between the hydrogen and acceptor atoms (distances and higher contact order for all folds (mean Z score is 1.27 Ϯ range from 1.4 to 2.6 Å), (ii) angular energy measuring angle at

1418 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0812510106 Shmygelska and Levitt Downloaded by guest on October 2, 2021 Fig. 4. Discrimination between native-like and nonnative-like conformations. (A) Comparing the ability of the low-resolution Rosetta scoring function (score4) and the long-range hydrogen-bonding backbone–backbone potential (hb࿝lrbb) to discriminate between native-like low-rmsdnat (starting from the native state) and nonnative low-score4ext (starting from an extended state) conformations. Colors used for each fold class are as follows: all-␣, bright and dark red; all-␤, bright and dark green; ␣/␤, bright and dark orange; and ␣ϩ␤, bright and dark purple. The darker color is always used for the near-native decoys (low rmsd). Note how the hb࿝lrbb score is better at discrimination than score4 is. (B) Comparing the discrimination ability of the low-resolution Rosetta scoring function (score4) and the short-range hydrogen-bonding backbone–backbone potential (hb࿝srbb) to discriminate between native-like low-rmsdext and nonnative low-score4ext conformations (both starting from an extended state). The hb࿝srbb score generally raises the energy of the nonnative-like conformations (low-score4ext, high rmsd) relative to the near-native conformations (low-rmsdext); this makes the lower edge of the distribution slope toward rather than away from the native state.

the hydrogen atom, (iii) angular energy measuring angle at the search method, will we be able to develop better energy functions acceptor atom, and (iv) dihedral angle term corresponding to and representations. rotation around the acceptor–acceptor base bond in the case of We used an implementation of HREM that utilizes the four an sp2 hybridized acceptor (21). It has also been shown that this scoring functions from the existing Rosetta protocol; in future knowledge-based hydrogen-bonding potential in Rosetta is con- work we will investigate other implementations of HREM that sistent with the quantum mechanical calculations unlike molec- will scale individual energy contributions of different energy ular mechanics force fields, including CHARM27, OPLS-AA, terms. Our results confirm that the Hamiltonian Replica Ex- and MM3–2000 (22). Recently a modified version of Rosetta’s change Monte Carlo method and its variants are promising and hydrogen bonding potential was successfully used for protein deserve further study. structure refinement of homology models (23). That study showed that the modified Rosetta’s hydrogen bonding potential Materials and Methods in combination with two other statistical potentials can discrim- Protein Dataset Used. To evaluate and compare the algorithms, a set of 40 inate near-native models (obtained by using temperature replica nonhomologous folds was selected from the Structural Classification of Pro- exchange ) with an accuracy comparable to teins (SCOP) (24) structural domain database (ranging in length from 55 to 208 Rosetta’s full-atom score (23). In our work, we show that aa). Protein families in the test set span four SCOP class categories: all ␣, all ␤, backbone–backbone hydrogen-bonding energy terms signifi- ␣/␤, ␣ϩ␤ and are of different protein sequence lengths to ensure the gener- cantly enhance discrimination of near-native and misfolded ality of the reported results. We generated six independent sets of 20,000 decoys for each protein sequence for each search method starting from the

native-like states for ab initio protocol. BIOPHYSICS completely extended state and starting from the native state. Conclusion The Rosetta Energy Function. All of the search methods developed and imple- In this work we have shown that development of search methods mented were tested for Rosetta’s low-resolution protein structure represen- that more efficiently sample local minima is important for two tation and scoring functions. Rosetta is a protein structure prediction program reasons: (i) better protein structure prediction and (ii) better developed in Baker’s group and made freely available to academic community optimization of the energy function. We have found that Ham- (1). Rosetta incorporates (i) a low-resolution representation of a protein that iltonian Replica Exchange Monte Carlo method is the most uses the main chain atoms and a side-chain centroid and (ii) a high-resolution promising search method for de novo protein structure predic- representation that uses all atoms. The low-resolution Rosetta energy func- tion with low-resolution force fields; it outperforms Tempera- tion includes the van der Waals hard sphere repulsion (vdw), environment ture Replica Exchange Monte Carlo and the original Rosetta (env), pair (pair), C␤ packing density (cb), secondary structure packing [helix– Monte Carlo method. A better set of local minima provides a helix pairing (hh), helix-strand pairing (hs), strand-strand pairing (ss), strand pair distance/register (rsigma) and strand arrangement into sheets (sheet)], more challenging decoy set against which the energy function can radius of gyration (rg) energetic contributions, contact order (co), and Ram- be optimized. Thus, our results reveal some of the deficiencies achandran torsion angle filters (rama) (2, 14). Additional hydrogen bonding of the existing energy terms in Rosetta, including the presence (short- (hb࿝srbb) and long-range (hb࿝lrbb) backbone–backbone hydrogen of false local minima and a general flatness of the energy bond) energy terms are added right before (score6) and used during full-atom landscape near the native states. Only through better under- refinement (score12). All of the energy scoring components of Rosetta’s standing of these deficiencies, as revealed by our very powerful energy score are described in details elsewhere (15).

Shmygelska and Levitt PNAS ͉ February 3, 2009 ͉ vol. 106 ͉ no. 5 ͉ 1419 Downloaded by guest on October 2, 2021 TREM. Standard (8) implementation of Temperature Replica Exchange Monte ␭ ϭ ͑␭ ␭ ␭ ␭ ␭ ␭ ␭ ␭ ͒ i i,env, i,pair, i,sheet, i,hs, i,ss, i,cb, i,rsigma, i,rg , Carlo (TREM) is used here. Eight replicas run at related exponentially distrib- uted temperatures (␤i: 1.40, 1.95, 2.72, 3.79, 5.29, 7.38, 10.31, and 14.39 kT) to ʦ ensure efficiency of the exchanges, underwent four different stages of Monte for i (0, 1, 2, 3). In order for HREM to be effective, energy Hamiltonians Carlo interrupted by the attempted exchanges after each stage (see Addi- should only differ in a limited number of energy components. Four different tional Methods Description. TREM in the SI). These specific temperature low-resolution scores of Rosetta low-energy function satisfy this condition settings were optimized in a number of short preliminary runs. Following the with the following sets of Rosetta scaling parameters: general criterion for choosing the exchange frequency between replicas by ␭ ␭ ␭ ␭ ␭ ␭ ␭ ␭ integrating autocorrelation time of the higher temperature simulation (25), score i,env, i,pair, i,sheet, i,hs, i,ss, i,cb, i,rsigma, i,rg exchanges between replicas were attempted after every 2,000 steps. score 00,0,0,0,0,0,0,0 score 1 1, 1, 1, 1, 0.3, 0, 0, 0 HREM. Hamiltonian Replica Exchange Monte Carlo uses several related Ham- iltonians for different replicas, where only some of the terms of the potential score 2 1, 1, 1, 1, 1, 0.5, 0, 0 energy function, U(X), are modified across replicas through scaling parame- score 31,1,1,1,1,1,1,1 ters ␭i (10). Similarly to TREM, exchanges between pairs of replicas are at- tempted with a certain frequency, allowing it to overcome interactions re- sponsible for the ruggedness of the landscape to be weakened. Unlike regular To satisfy the condition of the detailed balance, the probability of at- TREM that scales with the square root of total degrees of freedom in the tempted pairwise exchanges between replicas follows the equation: number of replicas required to guarantee optimal overlap, HREM scales as a Ϫ⌬͑ Ј3 Ј ͒ ͑ Ј 3 Ј ͒ ϭ ͑ Xi, Xj Xi , Xj ͒ square root of only relevant subsystem degrees of freedom and is therefore W Xi, Xj Xi , Xj min 1, e , preferable for large systems. The key difference between the standard imple- where mentation of HREM (10) in our work, is that ␭i is a vector of weights and not a scalar: ⌬͑ Ј 3 Ј ͒ ϭ ␤͓ ͑ Ј͒ Ϫ ͑ Ј͒ ϩ ͑ ͒ Ϫ ͑ ͔͒ Xi, Xj Xi , Xj Ui X Uj X Uj X Ui X . ͑ ͒ ϭ ͑ ͒ ϩ ␭ ⅐ ͑ ͒ Ui X UA X i UB X , The exchange frequency between replicas was chosen by integrating the autocorrelation time of the highest ‘‘effective temperature’’ (score0) sim- where ulation (25); exchanges between replicas were tried after every 2,000 ␤ ͑ ͒ ϭ ͑ ͒ Monte Carlo steps. The inverse temperature, , was set to 2.0 kT as in UA X Uvdw X ROSETTA. ͑ ͒ ϭ ͑ ͒ ϩ ͑ ͒ ϩ ͑ ͒ ϩ ͑ ͒ ϩ ͑ ͒ UB X Uenv X Upair X Usheet X Uhs X Uss X ACKNOWLEDGMENTS. We thank members of the Levitt lab for helpful dis- ϩ ͑ ͒ ϩ ͑ ͒ ϩ ͑ ͒ cussions. This work was supported by Natural Sciences and Engineering Coun- Ucb X Ursigma X Urg X . cil of Canada Postdoctoral Fellowship PGS-D (to A.S.) and National Institutes of Health Grant GM063817 (to M.L.). National Science Foundation Award The weights are: CNS-0619926 provided computer resources.

1. Simons K, Kooperberg C, Huang E, Baker D (1997) Assembly of protein tertiary 14. Jagielska A, Wroblewska L, Skolnick J (2008) Protein model refinement using an structures from fragments with similar local sequences using simulated annealing and optimized physics-based all-atom force field. Proc Natl Acad Sci USA 105:8268– bayesian scoring function. J Mol Biol 268:209–225. 8273. 2. Bradley P, Misura K, Baker D (2005) Toward high-resolution de novo structure predic- 15. Rohl CA, Strauss CEM, Misura KMS, Baker D (2004) Protein structure prediction using tion for small proteins. Science 309:1868–1871. Rosetta. Methods Enzymol 383:66–93. 3. Schueler-Furman O, Wang C, Bradley P, Misura K, Baker D (2005) Progress in modeling 16. Zemla A (2003) LGA: A method for finding 3D similarities in protein structures. Nucleic of protein structures and interactions. Science 310:638–642. Acids Res 31:3370–3374. 4. Zhang Y, Arakaki AK, Skolnick J (2005) TASSER: An automated method for the 17. Shortle D, Simons K, Baker D (1998) Clustering of low-energy conformations near the prediction of protein tertiary structures in CASP6. Proteins Suppl 7:91–108. native structures of small proteins. Proc Natl Acad Sci USA 95:11158–11162. 5. Ortiz AR, Kolinski A, Rotkiewicz P, Ilkowski B, Skolnick J (1999) Ab initio folding of 18. Das R, et al. (2007) Structure prediction for CASP7 targets using extensive all-atom proteins using restrains derived from evolutionary information. Proteins Suppl 3:177– refinement with Rosetta@home. Proteins 69(Suppl 8):118–128. 185. 19. Huang ES, Subbiah S, Tsai J, Levitt M (1996) Using a hydrophobic contact potential to 6. Zhang Y, Kihara D, Skolnick J (2002) Local energy landscape flattening: Parallel evaluate native and near-native folds generated by molecular dynamics simulations. J hyperbolic monte carlo sampling of protein folding. Proteins 48:192–201. Mol Biol 257(33):716–725. 7. Swendsen RH, Wang JS (1986) Replica Monte Carlo simulation of spin-glasses. Phys Rev 20. Park B, Levitt M (1996) Energy functions that discriminate x-ray and near-native folds Lett 57:2607–2609. from well-constructed decoys. J Mol Biol 258:367–392. 8. Okamoto Y (2004) Generalized-ensemble algorithms: Enhanced sampling techniques 21. Kortemme, T., Morozov AV, Baker D (2003) An orientation-dependent hydrogen for Monte Carlo and molecular dynamic simulations. J Mol Graphics Model 22:425– bonding potential improves prediction of specificity and structure for proteins and 439. protein-protein complexes. J Mol Biol 326:1239–1259. 9. Hansmann UHE (1999) Protein folding simulations in a deformed energy landscape. Eur Phys J B 12:607–611. 22. Morozov AV, Kortemme T, Tsemekhman K, Baker D (2004) Close agreement between 10. Fukunishi H, Watanabe O, Takada S (2002) On the Hamiltonian replica exchange the orientation dependence of hydrogen bonds observed in protein structures and method for efficient sampling, of biomolecular systems: Application to protein struc- quantum mechanical calculations. Proc Natl Acad Sci USA 101:6946–6951. ture prediction. J Chem Phys 116:9058–9067. 23. Zhu J, Fan H, Periole X, Honig B, Mark AE (2008) Refining homology models by 11. Liu P, Kim B, Friesner RA, Berne BJ (2005) Replica exchange with solute tempering: A combining replica-exchange molecular dynamics and statistical potentials. Proteins method for sampling biological systems in explicit water. Proc Natl Acad Sci USA 72:1171–1188. 102:13749–13754. 24. Murzin A, Brenner SE, Hubbard TJP, Chothia C (1995) SCOP: A structural classification 12. Das R, Baker D (2008) Macromolecular modeling with Rosetta. Annu Rev Biochem of proteins database for the investigation of sequences and structures. J Mol Biol 77:363–382. 247:536–540. 13. Misura KMS, Baker D (2005) Progress and challenges in high-resolution refinement of 25. Newman MEJ, Barkma G.T (1999) Monte Carlo Methods in Statistical Physics (Claren- protein structure models. Proteins 59:15–29. don, Oxford).

1420 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0812510106 Shmygelska and Levitt Downloaded by guest on October 2, 2021