<<

PERSPECTIVE PERSPECTIVE The nature of folding pathways S. Walter Englander1 and Leland Mayne Johnson Research Foundation, Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104

Edited by Alan R. Fersht, Medical Research Council Laboratory of Molecular Biology, Cambridge, United Kingdom, and approved September 23, 2014 (received for review June 24, 2014)

How do fold, and why do they fold in that way? This Perspective integrates earlier and more recent advances over the 50-y history of the problem, emphasizing unambiguously clear structural information. Experimental results show that, contrary to prior belief, proteins are multistate rather than two-state objects. They are composed of separately cooperative foldon building blocks that can be seen to repeatedly unfold and refold as units even under native conditions. Similarly, foldons are lost as units when proteins are destabilized to produce partially unfolded equilibrium molten globules. In kinetic folding, the inherently cooperative nature of foldons predisposes the thermally driven -level search to form an initial foldon and subsequent foldons in later assisted searches. The small size of foldon units, ∼20 residues, resolves the Levinthal time-scale search problem. These microscopic-level search processes can be identified with the disordered multitrack search envisioned in the “new view” model for protein folding. Emergent macroscopic foldon–foldon interactions then collectively provide the structural guidance and free energy bias for the ordered addition of foldons in a stepwise pathway that sequentially builds the native protein. These conclusions reconcile the seemingly opposed new view and defined pathway models; the two models account for different stages of the protein folding process. Additionally, these observations answer the “how” and the “why” questions. The protein folding pathway depends on the same foldon units and foldon–foldon interactions that construct the native structure.

protein folding | hydrogen exchange |

Proteins must fold to their active unique native state through multiple un- Kinetic folding intermediates seemed to form when they emerge from the and predictable routes and intermediate con- asynchronously over a range of time scales. when they repeatedly unfold and refold during formations. Another prominent inference Equilibrium analogs of folding intermediates their lifetime (1, 2). The folding process is configured the Anfinsen thermodynamic called molten globules yielded mixed results, difficult (3, 4) and potentially dangerous (5). hypothesis (6) in terms of a funnel-shaped sometimes agreeing with kinetic folding in- Biological health depends on its success and diagram (Fig. 1B), which formation and sometimes not. Baldwin’sar- on its failure. However, more than 50 y pictures that proteins must fold energeti- ticle served to alert the experimental protein after the formative demonstration that protein cally downhill (the Z axis) and shrink in folding community to the new view of het- folding is a straightforward biophysical pro- conformational extent (the generalized XY erogeneous folding and helped to establish cess (6), there is not general agreement on plane) as they go (9, 12–14). To fill out the the current paradigm of a multipath fun- the overarching questions of how proteins fold landscape picture, classical rate-determining neled energy landscape. and why they fold in that way. Given this kinetic barriers are often replaced by qual- The distinction between the classical view uncertainty, one is not sure how to even think itative concepts such as ruggedness, frus- of a more or less single pathway through about many related biophysical and biological tration, and traps, and major species by defined intermediates and the disordered problems. deep wells, all forming a kind of metalan- many-pathway new view has broad signifi- Early in the history of the folding field, guage known as “energy landscape theory” cance for the understanding of protein bio- experimentalists simply assumed that proteins (10, 15–17). The graphic funnel picture is physics and biological function. The question fold through distinct intermediate states in a a generic representation, independent of could be resolved by determining experimen- distinct pathway (Fig. 1A), as seen for a classical structural and thermodynamic detail and tally the structure of the intermediate forms biochemical pathways. Following Anfinsen’s equally applicable to any protein, RNA, or that bridge between unfolded and native demonstration that proteins can fold all by other compact polymer. Although it pro- states in real proteins, but this effort has themselves without outside help (6), Levinthal vides no constraints that would exclude turned out to be exceptionally difficult. The perceived that no undirected folding process any realistic folding scenario, even a defined usual methods, crystallography and NMR, wouldbeabletofindthenativestructureby pathway model, it has been widely inter- cannot define partial structures that form and random searching through the vast number of preted to require that proteins fold through decay in less than 1 s. Experimentalists have structural options (3, 4). Proteins must solve many independent pathways. been forced to depend on spectroscopic the problem, he believed, by folding through R. L. Baldwin took up the challenge and methods (fluorescence, CD, IR) that can fol- predetermined pathways, although one had led the field in a multiyear effort to exper- low kinetic folding in real time but are blind no clue how or why that should occur. imentally define kinetic folding intermediates to the specifics of structure and so allow the A realization of the inability to equilibrate and pathways (18–20). In a thoughtful pro- to a common structure (3, 4) and the en- tein folding review 20 y ago, Baldwin consid- Author contributions: S.W.E. and L.M. wrote the paper.

semble nature of partially folded forms led ered the disparate insights available at the The authors declare no conflict of interest. the theoretical community to a very differ- time from both theory and experiment This article is a PNAS Direct Submission. “ ” – ent more statistical new view (7 11). It (21). He highlighted uncertainties in the ex- 1To whom correspondence should be addressed. Email: engl@mail. was inferred that proteins must fold to their perimental evidence for classical pathways. med.upenn.edu.

www.pnas.org/cgi/doi/10.1073/pnas.1411798111 PNAS | November 11, 2014 | vol. 111 | no. 45 | 15873–15880 Downloaded by guest on September 29, 2021 A B Collapse encounter a sizeable kinetic barrier and so reach significant population. TS Initial results obtained for cytochrome c (Cyt c; Fig. 2) showed that approximately half of the molecules form their sequentially remote but structurally contiguous N- and C-terminal helical segments early (12 ms), suggesting the formation of a specific on- U search states Q pathway native-like intermediate. However,

Energy Baldwin’s review (21) emphasized the asyn- Transition state chrony in these kinetic results; some of the Folding molecules protect their N- and C-terminal intermediates helices early, whereas others do so at later N 1.0 times, along with other regions. Other proteins Native similarly studied have often yielded analogous Fig. 1. (A) The classical view of a defined folding pathway, and (B) the new view of multiple routes through a funneled results. This heterogeneousbehaviorconflicts landscape. Reprinted with permission from ref. 13. Dashed line in A illustrates the insertion of an optional error-dependent with a well-defined sequential pathway model kinetic barrier, which can affect some population fraction and not others and thus mimic multipathway folding. but seems more consistent with the new view of different routes, rates, and traps. possibility of alternative folding mechanisms. energy space can be determined by HX la- One now knows that the heterogeneous Theorists have attempted to avoid these dif- beling, sulfhydryl labeling, and NMR meth- folding seen in kinetic HX pulse labeling ficulties by simulating the folding process in ods. Partially unfolded molten globules can experiments can be due to previously un- recognized experimental issues. One prob- computers. Theory-based computer simula- be labeled in a structure-sensitive way by lem concerns the tendency of refolding tions can be remarkably powerful. For exam- hydrogen–deuterium (H-D) exchange and proteins to transiently aggregate, especially ple, one can compute the path of a multiton analyzed later by site-resolved NMR in the at the high concentrations used to facilitate rocket through 150 million miles of free reformed native state or directly by mass the preparation of samples for NMR analy- space to a pinpoint landing on Mars. The spectrometry. sis (25, 26). Another unexpected effect was equations that govern space flight are known Theseadvancesnowmakeitpossibleto revealed in a sophisticated analysis of the precisely (22), computer power is ample, and determine the structure and properties of HX pulse labeling experiment, which showed the track to be controlled is clear. Computing intermediate protein folding states and their that intermediates populated during kinetic the structural journey of minuscule protein pathway connections and so place the study folding may repeatedly unfold and refold on molecules through submicrons of space has of folding pathways on the solid ground of a fast time scale. Sites that are already folded proved to be more difficult. The computer . Experiment can now ask and protected can nevertheless become power required to track the folding process whetherproteinsfoldthroughalimitednum- H-labeled during the intense high pH in- at the level of thermally driven residue-level ber of distinct obligatory intermediate struc- terrogation pulse even with only a single dynamics is immense. The forces that direct tures in an ordered kinetic sequence as reversible unfolding during the pulse (the A protein folding are delicately balanced, inter- suggested in Fig. 1 , or through a heteroge- so-called EX1 HX regime; 50-ms pulse, − locking, and not describable in exact terms. neous collection of independent multiply par- backunfoldingrate12s 1 for Fig. 2) (27, B The reaction path(s) to be mined from the allel forms and routes as in Fig. 1 ,orthrough 28). Other HX NMR pulse labeling studies mass of computer data are unknown. some other combination of conformations. have been compromised by similar aggrega- For both the classical and new view models, Intermediates During Kinetic Folding tion and HX EX1 behavior, and also by the Fig. 1 implies that the structure of folding inability to differentiate mixtures of states due HX Pulse Labeling and NMR Analysis. It intermediates and their pathway connec- to the ensemble averaging that occurs when first became possible to obtain detailed tions might be determined in three different NMR is used to obtain a single measurement i structural information on briefly present ways: ( ) as intermediates that reach signif- for each individual residue. icant occupancy during kinetic folding; (ii) protein folding intermediates with the de- as conformationally excited states that exist velopment of the HX pulse labeling meth- HX Pulse Labeling and MS Analysis. A at their equilibrium Boltzmann level in the od (23, 24). The initially unfolded and recently developed variant of the HX pulse high free energy space above the native pro- D-exchanged protein is mixed into folding labeling experiment can produce a more tein; and (iii)asmodifiedmoltenglobule conditions and then, at various times during explicit description of the kinetic folding forms made by destabilizing the native pro- folding, is subjected to a short, selective D to process. The new technology replaces NMR tein so that higher energy states become the H exchange labeling pulse. The protein folds analysis with a mass spectrometry technique lowest free energy equilibrium form. Exper- to the native state, and D vs. H placement is (HX MS) that allows folding experiments imental advances have accomplished all analyzed by NMR to identify amide sites that at 1,000-fold lower concentration and thus three of these approaches. At any time point were already protected (still D-labeled) or not excludes aggregation. As before, the un- during kinetic folding, a briefly present fold- yet protected (H-labeled) at the time of folded and D-exchanged protein is mixed ing intermediate can be marked in a struc- the labeling pulse. The results provide a series into folding conditions and is subjected to ture-sensitive way by hydrogen exchange of snapshots during the time course over a D to H exchange labeling pulse after var- (HX) pulse labeling and defined by later which folding converts identifiable main chain ious folding times. The labeling pulse can analysis. The structure of partially folded amides to a protected H-bonded condition. be adjusted to avoid or to study the back- states minimally populated in the high free The results will detect intermediates that unfolding behavior of transiently populated

15874 | www.pnas.org/cgi/doi/10.1073/pnas.1411798111 Englander and Mayne Downloaded by guest on September 29, 2021 1 temporal detail (30). The overlapping PERSPECTIVE V11 L64 C14 I75 MS results allow transiently formed inter- W59 L98 .8 K100 mediates to be defined at near amino acid resolution. In each case they are composed of .6 60s and 70s helix sets of residues that form well-defined Tertiary H-bond H-bonded elements in the native protein .4 (foldons). The results display a stepwise as- sembly of the native structure, first helix A + Proton occupancy C-Terminal helix .2 strand 4 (blue in Fig. 4), then the neighboring N-Terminal helix helix D + strand 5 (green), then the inter- acting B/C helix (yellow), and finally the 0 0.01 0.1 1 10 0.01 0.1 1 10 terminal segments (red). The yellow foldon Time (s) does not reach complete protection because of some back-unfolding (∼20%) during the Fig. 2. Initial HX NMR pulse labeling results for Cyt c (24). A brief D to H labeling pulse imposed after various folding times was used to track the increasing protection (decreasing H-labeling) of individual residues and the segments that 10-ms HX labeling pulse which, fortuitously, they represent. The results suggested early formation of a native-like N/C bihelical folding intermediate. Baldwin’s helps to distinguish the yellow and green review (21) noted the kinetic asynchrony, with the N- and C-terminal helical segments in different molecules folding at foldons along with the small difference in different rates. Later work shows that the asynchrony is caused by protein aggregation and by HX pulse breakthrough their formation rates seen in the renormal- due to back-unfolding of the transiently populated intermediate during the H-labeling pulse (50 ms). ized kinetic phases (Fig. 4, Inset). We used the HX MS method to reexamine the ambiguous kinetic folding results of Cyt intermediates. To terminate labeling and tendency to form helical structure. The more c measured before by HX NMR (Fig. 2). prepare for analysis, the selectively labeled protected segments (black curves in Fig. 3) Low folding concentration (2 μM) avoided protein is plunged into slow HX conditions are not the ones that form the emergent 7 s the previous transient aggregation problem, (low pH and ), then cleaved into native-like foldon. and a short labeling pulse (10 ms rather short fragments, and the fragments are The same technology was able to resolve than 50 ms) minimized spurious labeling separated and analyzed by fast HPLC and the entire folding trajectory of Ribonuclease due to back-unfolding during the pulse. mass spectrometry. The two examples so far H (155 residues; Fig. 4) in structural and The results confirm that all of the proteins published, illustrated in Figs. 3 and 4, pro- vide detailed pathway information. When the large (370 residues) two- domain and aggregation prone maltose A B CD 21-43 76-89 346-370 binding protein (MBP) is diluted into 1.0 Unfolded Unfolded Unfolded folding conditions at <1 μM concentration it does not aggregate, but it does rapidly 0.8 0.2 s 0.5 s 0.5 s

noitcarFnoitalu collapse into a dynamic polyglobular state 43 ms pulse 1 s 15 s 15 s with heterogeneous low level HX protection 0.6

(Fig. 3) (29). This condition might be 0.4 5 s 60 s 60 s expected to spawn multiple folding routes poPyvaeH as in the new view model, but it does not. 0.2 30 s 180 s 180 s The microsecond and millisecond time 0.0 Native Native Native scales pass with no indication of native-like 0.1 1 10 100 1000 Folding Time (s) structure formation, perhaps because con- 2570 2580 2590 1590 1595 1600 2660 2670 2680 formational searching in the collapsed state Mass (Da) Mass (Da) Mass (Da) is difficult. Ultimately, the entire protein E population assembles sequentially remote

segments into a specific native-like interme- 76-89 diate with a single exponential time con- 261-285 stant of 7 s (blue in Fig. 3). Other then report on later folding events that 21-43 move to the native state over a broader time Δ mass scale (60–120 s), suggesting several folding steps, but their kinetics are too compressed cesm1 ces7 ces001 to allow clear resolution. These experiments A largely avoided the back-unfolding HX la- Fig. 3. Pulse labeling HX MS results for maltose binding protein (29). ( ) The time-dependent folding (HX protection) of 116 highest-quality MBP peptide fragments representing different protein regions. Black kinetic curves show the beling artifact by using a short labeling slow time course for folding of peptide fragments that are most protected in the initial collapse. (B–D) Representative pulse (12 ms). Longer pulses (up to 42 ms) HX-labeled MS fragments from different protein regions (colored) define the separate folding steps, display their allowed the back-unfolding of the weakly concerted two-state nature, measure their formation rates, and show that the entire protein population (>95%) protected regions in the initially collapsed experiences the same steps. (E) The course of folding. On dilution from denaturant into folding conditions, MBP rapidly collapses into a heterogeneous polyglobular state (SAXS envelope reconstruction in gray) with widespread low form to be studied (29). Higher protection level HX protection, then slowly folds (kinetic curves in A) through an initial native-like intermediate (blue, τ = 7 s) and seems to correlate with the amphipathic later kinetically unresolved steps (green, gray, red; τ ∼60 s to 120 s; fastest green segments shown in C and E). nature of different segments and their Mutations known to greatly slow folding (stars) are all within the 7 s intermediate.

Englander and Mayne PNAS | November 11, 2014 | vol. 111 | no. 45 | 15875 Downloaded by guest on September 29, 2021 A CDEF method,althoughmuchlessused,ismore 1.0 54-67 +2 73-103 +6 108-117 +2 137-155 +2 definitive. It finds the same distinct partially 0.9 Unfolded formed native-like structure for the entire folding population (36). These results favor 1 9 ms 0.8 the distinct pathway hypothesis. 0.9 15 ms 0.7 Many proteins, especially small ones, tend

Population Fraction 0.8 38 ms to fold and unfold in a kinetically two-state 0.6 0.7 176 ms 10 100 1000 manner, each with a single exponential rate. 0.5 10 100 1000 720 ms Time(ms) The same kinetic barrier is rate-limiting in B 3 sec both folding and unfolding directions, and 0.6 8 sec their ratio gives the correct equilibrium sta- bility constant. In this case, intermediates will 15 sec 0.4 not be seen to populate either before or after 20 sec

Population Fraction the barrier, whether they exist or not, and the 0.2 30 sec usual kinetic folding experiment simply Native cannot distinguish whether separate pathway 0 0 5 10 15 20 25 30 769 771 773 775 635 637 541 543 545 1019 1023 1027 Time(sec) z/m z/m z/m z/m stepsdoordonotoccur.Forexample,the defined pathway model in Fig. 1A will pro- Fig. 4. Pulse labeling HX MS results for (30). (A and B) Kinetic curves for time-dependent HX duce two-state kinetic folding and unfolding protection of peptide fragments that define the blue, green, yellow, and red foldons. (C–F) HX MS pulse labeling results for representative peptide fragments show the time course and two-state concerted nature of foldon folding (and linear chevron plots) in the absence of steps, and that the entire protein population (>95%) experiences the same sequence of concerted steps in a single the inserted misfolding barrier noted. Un- dominant pathway. The yellow foldon does not reach complete protection because of partial labeling due to back- fortunately, the absence of explicit evidence unfolding during the 10-ms labeling pulse, which helps to distinguish the yellow foldon from the green foldon, along for multiple kinetic steps is often taken, in- A Inset with the small difference in their formation rates seen in the renormalized kinetic phases ( , ). correctly, as evidence for their absence. However, again here one can note that the observation of the same folding rate for the fold and dock their two terminal helices in equivalent subsequent steps; either step whole protein population tends to favor a single early step (∼12 ms), and the rest of can occur before the other (31). a single common pathway rather than mul- the native structure folds later. A method These results support a picture of protein tiple independent paths. for studying kinetic folding intermediates folding in which the entire protein pop- Thus, much of available kinetic infor- at equilibrium, known as native state HX, ulation folds through the same distinct mation is unable to distinguish alterna- described below, independently confirms intermediates and kinetic barriers in the same tive pathway behaviors, although some this result and elucidates the entire sub- definedpathway,asinFig.1A.Aseminal observations can be deemed supportive of sequent folding pathway. observation is that the intermediates form the distinct pathway model. Unlike all ensemble-level measurements by assembling pieces of the native protein, including HX NMR, these pulse labeling HX called foldons. Multiple Pathways and Misfolding. Some MS results provide snapshots that show the other optically measured kinetic results have structurally different populations that are Other Kinetic Studies. A large fraction of been thought to support multiple pathways, already formed and not yet formed at any the protein folding literature is directed at although only a small number. The conflict is finding the determinants of folding rates. time point during kinetic folding rather than often due to the chance occurrence of partial Prominent issues, highlighted by reviewers, a potentially misleading population average. misfolding, which inserts an optional kinetic concern the nucleation–condensation model, The HX MS data show that folding occurs in barrier into the folding pathway, differently the φ value analysis method, and two-state a stepwise manner and that each kinetic step affecting the folding of different population folding. Is the distinct pathway model con- is individually two-state representing the fractions (37). In this case kinetic folding will sistent with current kinetic information? appear heterogeneous and asynchronous, cooperative formation of an additional The nucleation–condensation model sug- even when all of the molecules fold through folding unit (foldon) (MS data in Figs. 3 and gests that folding is initiated by a nucleation thesamesequenceofintermediatestructures. 4). The time-dependent MS data show that, event that potentiates subsequent structural This barrier-based problem is common and once each foldon unit is formed, it remains consolidation (32, 33). The φ value analysis has greatly confused protein folding studies. in place as subsequent foldons are added, method attempts to define the parts of Known optional errors include aggregation demonstrating a stepwise buildup through a protein that gain structure in the initial (26), partial mis-isomerization (38), distinct, progressively more folded forms. rate-limiting transition state, the nucleating incorrect pairing (39), nonnative Essentially the entire refolding population event, by measuring the effect of specific hydrophobic clustering (40), and partial joins synchronously in the same stepwise mutations on folding rate (34). The usual heme mis-ligation (24). (Note: The term sequence of intermediate structures, in- result, that φ values are small and fractional “misfolding” has become associated with dicating a single dominant folding pathway. (∼0.3 ± 0.2) (35), can be explained either by formation; we use it in a more The data show explicitly that less than 5% multiple pathways or by the likelihood that general sense.) of the protein population folds through flexible partially folded structures can ac- In a prime example, folding experiments any other pathway(s). However, other Cyt c commodate disruptive mutations more easily onthelargeTIMbarrelproteinα-Trp syn- results do detect minimal branching in than the rigid native state. Thus, implications thase found several kinetically distinct the special case where the prior structure for the question of one pathway vs. many population fractions and intermediates, sug- can support two different but essentially are ambiguous. A related ψ value analysis gesting four parallel folding tracks (41).

15876 | www.pnas.org/cgi/doi/10.1073/pnas.1411798111 Englander and Mayne Downloaded by guest on September 29, 2021 Subsequent work found that each additional results have been interpreted in terms of Cyt c (Fig. 6) repeatedly unfold and refold, PERSPECTIVE track could be suppressed, one at a time, by multiple pathways, either during unfolding at accessing partially unfolded high energy mutational replacement of one or more pro- conditions far from native, or during folding states with ΔGo of 4–13 kcal/mol above the line residues or by addition of a prolyl isom- but potentially confounded by ensemble av- native state, corresponding to steady-state − − erase (42), as expected for a defined pathway eraging, or by complex spectroscopic phases populations between 10 3 and 10 9 of the interrupted in some fraction of the folding that allow different interpretations, as well as total protein. These results identify foldon population by optional mis-isomerized pro- by spurious barriers due to optional errors. In unfolding units in terms of their detailed line barriers. In similar work, multiple kinetic all of these cases, the structural information residue composition, specify the free energy folding phases observed by optical methods that is necessary to support a definitive con- of the partially unfolded states relative to the for hen egg lysozyme (43, 44) and Staphylo- clusion is absent. native state, and can measure unfolding and coccal nuclease (45) were also fit by the More definitive information comes from refolding rates. authors to multiple pathway models, but it the kinetic HX MS experiments illustrated However, these results do not fully identify was shown that the data can be fit at least as above, which do document a distinct path- the different partially unfolded forms (PUFs). well by a single pathway in which some way, and from a number of equilibrium- At each intermediate state, sites that have fraction of the molecules experience an error based methods described in the following, already exchanged to D are invisible (NMR); that slows its folding (37, 46). In the absence which have been able to reveal multiple one cannot tell whether they are structured of structural information, it is not possible to native-like partially folded on-pathway inter- or not in the given intermediate. Therefore, distinguish between a multiple pathway in- mediates,evenwhensimplefoldingseemsto one cannot tell whether the different foldons terpretation and a given pathway with op- be kinetically two-state. simply unfold independently or in a pathway tional barriers. This can be seen intuitively by sequence, as posed in Fig. 5E.Thisisunlike considering the insertion of an optional bar- Intermediates Observed at Equilibrium the kinetic HX MS experiments in Figs. 3 and rier at any step in a well-defined pathway as Intermediates as Conformationally Ex- 4, where the pulse labeling approach directly in Fig. 1A (dashed line). cited States. An experiment called equilib- provides a snapshot of the folded condition The common occurrence of on-pathway rium native state HX, explained in Fig. 5, first of all of the residues during the folding pro- optional errors has led to other incorrect sug- detected and described cooperative foldon cess. Ultimately, a series of “stability labeling” gestions: that well-populated kinetic inter- units (2, 47). The experiment uses low con- experiments showed that the high energy mediates are grossly misfolded artifacts rather centrations of denaturant (or other destabi- states seen for Cyt c do represent a quantized than constructive on-pathway structures with lant) to promote sizeable unfolding reactions stepwise series of progressively more un- some particular misfolding error; that visible tothepointwheretheycometodominate folded PUFs, as pictured in Fig. 5E (48). In intermediates hinder rather than promote the H-exchange of the amides that they ex- the unfolding direction, the transition to each folding because visible intermediates and pose. The results, reproduced in Fig. 5A, higher energy PUF unfolds one more foldon slowed folding occur together. Other literature showed that specific structural elements of in a sequential pathway manner. Because these experiments were done under equi- librium native conditions (pD 7, 30 °C), each uphill unfolding step must be matched 12.8 U 12 A C by an equivalent refolding step. The down- hill sequence defines a stepwise sequential F10 L68 E folding pathway. K13 C14 10.0 8 H18 M65 V11 Q12 E69 A15 Y67 In detailed confirmation, these equilibrium I9 F36 K7 N70 results identified the same N/C bihelical K8 E66 4 7.4 foldon (blue) as did the kinetic pulse label- 6.0 ing experiment. The pulse labeling experi-

(kcal/mol) 12 L98 B D

G ment places this state as first in the folding A96 Y97 4.3 Δ sequence; the native state HX experiment 8 R91 L94 W59 N1H places it as last in the unfolding sequence. E92 D93 K60 K100 Y74 G37 L64 The initially folded N/C bihelical PUF A101 I75 I85 N accumulates in Cyt c kinetic folding when it 4 0.0 independent sequential encounters a to heme mis-ligation 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 barrier; both peripheral of Cyt c [GdmCl] (M) are placed on and therefore block formation

Fig. 5. Initial equilibrium native state HX NMR results for Cyt c (47). (A–D) HX rates of many individual Cyt c residues, of the green foldon segment, which is pro- measured by NMR as a function of low levels of added denaturant far below the melting transition, are plotted in grammedtofoldnext.Anindependent terms of the free energy of the exposure reaction that determines each amide HX rate. HX governed by a small local kinetic mode native state HX experiment fluctuation is insensitive to denaturant and produces a horizontal curve. HX determined by a large unfolding reaction showed that the various foldons unfold in is sharply promoted by denaturant and can come to dominate the exchange of the residues that it exposes. The the kinetic order shown in the rising ladder residues that join each cooperative unfolding (large slope) specify the identity of that unfolding unit. The intercept of E each HX isotherm defines the free energy level of each PUF at zero denaturant; the slope relates to its surface ex- in Fig. 5 . The unfolding rate for a first posure. These data identified four large unfolding units (foldons), coded as blue, green, yellow, and red. The less unfolding step (by EX1 HX) accurately definitive red foldon and the infrared foldon not seen here (gray in Fig. 6) were better defined in later work. (E) The matches the independently measured Cyt c free energy levels of the PUFs produced by the individual cooperative unfolding reactions place them on a free energy global unfolding rate in two-state unfolding ladder. The data in A–D specify the identity of each foldon unfolding unit but do not specify the complete PUF produced by each unfolding. Therefore, one cannot tell whether the foldons unfold independently or sequentially or in conditions (49). some other manner. A series of stability labeling experiments defined the PUFs shown (far right). They constitute Distinct native-like pathway intermediates a stepwise unfolding and refolding pathway, as in Fig. 1A. have been found for other proteins by HX

Englander and Mayne PNAS | November 11, 2014 | vol. 111 | no. 45 | 15877 Downloaded by guest on September 29, 2021 C-helix These results provide a clear picture of the N-helix structure of an authentic folding intermediate. C term They also explain the molten globule ambi- 60’s-helix guities described in Baldwin’s new view article (21) and elsewhere. A molten globule may BC emulate, as a free-standing equilibrium spe- E A M8 0 cies, any one of the quantized intermediate PUFs seen by the kinetic and equilibrium methods just described, depending on how 1 D 2 lower energy states are destabilized. Evidently W59 the foldon concept has broad applicability for 3 5 H33 4 understanding the range of protein structures. H26 Protein Folding N term The ability to simulate protein folding has HesaelcunobiR cemorhcotyC been hampered by the immense computer Fig. 6. The foldon construction of Ribonuclease H and Cyt c. The order of folding is blue, green, yellow, and red, and power necessary, by incompletely adequate finally gray for the large bottom Cyt c loop. forcefields,andbythedifficultyofdiscerning a meaningful course of events (reaction co- ordinate) within the vast data files generated. pulse labeling, native state HX, and non-HX elements, consistent with the Ptitsyn hy- Until recently most efforts have attempted to methods. Silverman and Harbury (50) de- pothesis. However, Baldwin (21) compared evade the computational problems by using signed a proteomics method to measure the this proposal with expectations from classical simplified nonphysical force fields and reactivity of 25 SH side chains in and theoretical models, and again here am- models. They have not found coopera- triose phosphate isomerase, analogous to the biguity prevailed. For example, the pH 4 tive foldons and discrete foldon-dependent pathways. native state HX experiment. The experiment equilibrium molten globule of apomyoglobin In one exception Weinkam et al. (58) identified three partial unfolding reactions, contains the very same native-like A, G, and simulated the folding course of a Cyt c mimic and stability labeling experiments showed H helices found by HX pulse labeling during without side chains using a modified Go that they stack up in an unfolding/refolding kinetic folding, in line with the Ptitsyn hy- sequence, as for Cyt c. Sekhar and Kay (51) model. The computer was initially told what pothesis, but a Cyt c molten globule contains used NMR dispersion to identify the target native structure looks like, the all three of its native helical segments and not individual partially unfolded forms in several calculation was instructed to assign more just the N/C bihelical intermediate observed small, supposedly two-state proteins, and favorable energy as the mock residues draw supported their role as defined folding in kinetic HX pulse labeling. closer to their normal partners, a multiatom pathway intermediates. Later work connects the enigmatic struc- cooperativity term was added, and outsized All of these results are fully consistent with tural character of molten globules with the influence was given to the heme. The pre- a classical folding pathway with each in- foldon construction of native proteins. A sumed shape and properties of the folding termediate PUF separated from its neigh- number of partially structured proteins have landscape did not enter the calculation except bors by the folding or unfolding of one now been prepared by synthesis or mutation, for the energetically downhill tendency. more foldon. whether guided by foreknowledge of kinetic These instructions caused foldon units to folding intermediates or not (56). They mim- emerge and associate to produce a stepwise Molten Globules as Lowest Energy Fold- ic natively structured pieces of the native pro- folding pathway, resembling the Cyt c ing Intermediates. In his 1995 new view tein. Most incisively, Bai and coworkers (57) experiments. This success was considered to paper (21), Baldwin considered the then- used native-state HX to define partially un- show that the experimental Cyt c result depends especially on the influence of the current status of the molten globule hy- folded intermediates of apoCyt b562,andthen pothesis. Earlier thinking shaped by protein inserted mutations that selectively destabilize heme group, but other proteins with no denaturation studies had supposed that pro- the native state. In the present terms, one or prosthetic group are now known to fold teins are highly cooperative two-state struc- moreofthelessstablefoldons,onthelower through distinct intermediates and pathways. The significance of the Cyt c calculation is tures and can only occupy, at equilibrium, rungs of the energy state ladder (as in Fig. either their fully native or fully unfolded that it tends to identify the factors that de- 5E), were made to remain unfolded, which condition (although see ref. 52). Ptitsyn and termine the foldon-based behavior. As for caused some higher energy partly unfolded coworkers and others found that certain any mathematical derivation, the factors that state to become the dominant lowest energy destabilizing conditions, especially low pH, determine the output result must be coded could induce a new, more dynamic, and form. Feng et al. solved the structures of two into the initial premises. In the mock Cyt c somewhat expanded protein form, with loose re-engineered versions by NMR and found simulation, the important factor seems to be tertiary structure but often considerable sec- that both are close mimics of the folding the added cooperativity term, as emphasized ondary structure, which came to be called the intermediates indicated by native state HX. in the foldon hypothesis. molten globule (53–55). Ptitsyn suggested They are both partially folded and clearly A new generation of theoretical analysis that molten globules represent equilibrium native-like, although they energy minimize with real proteins in realistic force fields and analogs of kinetic folding intermediates. In to produce some nonnative distortions enhanced computer capabilities is over- some cases, HX NMR connected the sec- that shield otherwise exposed hydrophobic coming the calculational difficulties in other ondary structure with native-like helical side chains. ways (59–61). Gathering results from these

15878 | www.pnas.org/cgi/doi/10.1073/pnas.1411798111 Englander and Mayne Downloaded by guest on September 29, 2021 approaches tend to emulate the foldon-based through foldon-determined steps. The dis- should be appreciated that this degree of PERSPECTIVE distinct pathway picture. crete steps produce an ordered repeatable bias, more than 1 kcal/mol, is unreasonable macroscopic folding pathway because pre- at the individual residue level. A single Discussion viously formed foldons tend to guide and residue has very low probability for finding This article considers the fundamental ques- stabilize the formation of incoming foldons its correct native partners in a sea of non- tions of protein folding, previously answered that they are designed to interact with in the native alternatives. Certainly, microscopic so differently by the classical and new view native protein. thermal searching must underlie any struc- models. How do proteins fold, and why do ture formation process. However, given the they fold in that way? Extensive experience Time and Energy. A successful folding model required energy bias computed by Zwanzig with the folding problem over a 50-y period must resolve major questions concerning et al., it seems that microscopic-level has shown that clear structural information folding time and energy. Levinthal pointed searching alone cannot swiftly reach the on the intermediate states that bridge be- outthatthevastarrayofproteincon- native state. tween the unfolded and native states will be formations in unfolded space cannot By contrast, in a more macroscopic fol- required. Experimentation has developed three simply reequilibrate and reach the unique don-based scenario each correct native-like useful approaches. Folding intermediates can native state by an undirected random search choice is driven by the collective energy of be studied as significantly populated forms in any reasonable time (3, 4). Early the- many interaction sites held stereochemically during kinetic folding, or as conformationally oretical work therefore focused on the in a native-like geometry in partner foldons. excited forms present at equilibrium under downhill energetic drive and the many This mechanism has been described be- native conditions, or as equilibrium molten independent routes that heterogeneity and fore as sequential stabilization (48). It is globule forms. Structural results from these microscopic thermal searching alone seemed analogous to the well-known folding upon different approaches converge on the same to require. The new view answer to the binding process, except that here the conclusions. “why” question is that, from the micro- incoming disordered segment is advan- scopic point of view, there seems to be no tageously tethered to its already struc- The Foldon Hypothesis. In all of these other viable choice. tured partner. The macroscopic foldon-level observations, cooperative foldon units play Experimental work recounted here reveals factors provide both the qualitative structural a pivotal role. Foldon units were first dis- an emergent macroscopic behavior that basis and the quantitative energetic bias covered and characterized in the initial native provides a previously unrecognized mecha- required to rapidly and repeatably select state HX experiment (2, 47). The experiment nism. Random search does not have to carry discrete determinate pathway steps in showed that native Cyt c at equilibrium un- the protein all of the way to the native state. It competition with all of the other possible der native conditions repeatedly unfolds and only needs to accomplish the formation of alternatives. refolds. A series of experiments showed that a first native-like foldon. This process is the foldon unfolding reactions occur in a se- thermodynamically downhill and is guided Conclusions quential pathway-like manner (48) rather by the inherent cooperativity of native foldon The supposed conflict between the classical than independently (Fig. 5). That chain of units. Present information indicates that the and new views can be resolved by the re- research was rather complex; it developed first-formed foldon tends to be stable in the alization that they touch on different but over a period of years and has evidently been context of the rest of the protein (27, 62). The equally essential parts of the folding mecha- difficult for most investigators to follow. still-unfolded regions can shield and energy nism. Laboratory experiment is able to dis- However, reversible partial unfolding and minimize unfavorably exposed groups, as in cern macroscopic molecular behavior, but it refolding steps have now been seen in various the molten globule situation described before. is blind to the microscopic thermally driven ways for many proteins, and they have often The time scale for forming a first foldon unit amino acid-level searching behavior that been connected to the protein folding pro- by an unguided search, perhaps two seg- has been the domain of theoretical analy- cess. Most pointedly, a recently advanced HX ments ∼20 residues in length, is shorter by sis. The disordered microscopic multitrack MS capability made it possible to observe far than for a reference 100-residue protein search envisioned in the paradigmatic new matching behavior as it occurs during kinetic [3100/(2 × 320) ∼ 1040]. The formation of view model describes the initial stage amino folding for MBP (29), RNase H (30), and Cyt subsequent foldons must proceed by way of acid-level search to form cooperative native- c, as just described. In all cases one sees that similar microscopic searching but in a more like foldon structures, but not the final na- unfoldingandrefoldingproceedinstepsthat guided way analogous to the process of tive state. Experiment displays an emergent subtract or add one more native-like co- “folding upon binding.” The concept that foldon-based macroscopic behavior that pro- operative foldon unit. The detailed foldon proteins start folding by forming a native- vides the structural guidance and free energy construction of Cyt c and RNase H is illus- like structural nucleus has been widely ac- bias for the ordered stepwise formation of trated in Fig. 6. Both fold by first forming cepted (33). This minimal structure can be discrete native-like intermediates in a folding their blue foldon, then an immediately adja- sufficient to seed subsequent foldon–foldon pathway that leads to the native state. cent foldon to form the blue + green PUF, interaction steps in a sequence of more Folding in moderately small, separately and so on. guided searches that follow through, cooperative units may be necessary for The centrally important point is this: con- rapidly, to the native target. proteins to fold at all. A much larger step trary to previous belief, proteins are multistate Does this process have the energetic bias size would confront the Levinthal time scale objects built from separately cooperative necessary to select specific folding steps and problem; much smaller steps cannot as- foldon units. This fundamental insight leads drive folding to completion in a short time? semble the energy bias required by the to a foldon-based hypothesis that suggests Zwanzig et al. (63) calculated that a free Zwanzig criterion for fast folding. Thus, as the “how” and the “why” of protein folding. energy bias of 2 kT toward correct inter- before for the microscopic view, it may be The cooperative foldon construction of pro- actions is necessary for a folding sequence that there is no other viable choice. Efficient teins predisposes them to unfold and refold to complete on a time scale of seconds. It folding may well require foldon-based

Englander and Mayne PNAS | November 11, 2014 | vol. 111 | no. 45 | 15879 Downloaded by guest on September 29, 2021 protein folding pathways. However, here reciprocally shaped the foldon-based ACKNOWLEDGMENTS. We thank K. A. Dill, M. F. Gellert, S. Lund-Katz, V. S. Pande, M. C. Phillips, G. D. Rose, a related constraint enters. Because the nature of native protein structure. In T. R. Sosnick, A. Szabo, A. J. Wand, and members of essential folding intermediates closely respect to foldon-based folding and fol- our laboratory for helpful contributions. This work duplicate native structure, as perhaps they don-based native structure, it seems that was supported by National Institutes of Health Grant RO1 GM031847, National Science Founda- must in a reasonable pathway sequence, each necessitates the other, and that protein- tion Grant MCB1020649, and the Mathers Charita- it seems that the same requirement has based biology may require both. ble Foundation.

1 Bai Y, Englander JJ, Mayne L, Milne JS, Englander SW (1995) 24 Roder H, Elöve GA, Englander SW (1988) Structural the alpha subunit of Trp synthase, a TIM barrel protein. J Mol Biol Thermodynamic parameters from hydrogen exchange characterization of folding intermediates in cytochrome c by 330(5):1131–1144. measurements. Methods Enzymol 259:344–356. H-exchange labelling and proton NMR. Nature 335(6192):700–704. 43 Bieri O, Wildegger G, Bachmann A, Wagner C, Kiefhaber T 2 Bai Y, Englander SW (1996) Future directions in folding: The multi- 25 Nawrocki JP, Chu R-A, Pannell LK, Bai Y (1999) Intermolecular (1999) A salt-induced kinetic intermediate is on a new parallel state nature of protein structure. Proteins 24(2):145–151. aggregations are responsible for the slow kinetics observed in the pathway of lysozyme folding. Biochemistry 38(38):12460–12470. 3 Levinthal C (1968) Are there pathways for protein folding. J Chim folding of cytochrome c at neutral pH. J Mol Biol 293(5):991–995. 44 Radford SE, Dobson CM, Evans PA (1992) The folding of hen Phys 65:44–45. 26 Silow M, Oliveberg M (1997) Transient aggregates in protein lysozyme involves partially structured intermediates and multiple 4 Levinthal C (1969) How to fold graciously. Mossbauer folding are easily mistaken for folding intermediates. Proc Natl Acad pathways. Nature 358(6384):302–307. Spectroscopy in Biological Systems. Proceedings University of Illinois Sci USA 94(12):6084–6086. 45 Kamagata K, Sawano Y, Tanokura M, Kuwajima K (2003) Bulletin (University of Illinois Press, Urbana, IL), pp 22–24. 27 Krishna MM, Lin Y, Mayne L, Englander SW (2003) Intimate view Multiple parallel-pathway folding of proline-free Staphylococcal 5 Luheshi LM, Crowther DC, Dobson CM (2008) Protein misfolding of a kinetic protein folding intermediate: Residue-resolved structure, nuclease. J Mol Biol 332(5):1143–1153. and disease: From the test tube to the . Curr Opin Chem Biol interactions, stability, folding and unfolding rates, homogeneity. J Mol 46 Bédard S, Krishna MM, Mayne L, Englander SW (2008) Protein – 12(1):25 31. Biol 334(3):501–513. folding: Independent unrelated pathways or predetermined pathway 6 Anfinsen CB, Haber E, Sela M, White FH, Jr (1961) The kinetics of 28 Krishna MMG, Hoang L, Lin Y, Englander SW (2004) Hydrogen with optional errors. Proc Natl Acad Sci USA 105(20):7182–7187. formation of native ribonuclease during oxidation of the reduced exchange methods to study protein folding. Methods 34(1):51–64. 47 Bai Y, Sosnick TR, Mayne L, Englander SW (1995) Protein folding – polypeptide chain. Proc Natl Acad Sci USA 47:1309 1314. 29 Walters BT, Mayne L, Hinshaw JR, Sosnick TR, Englander SW intermediates: Native-state hydrogen exchange. Science 269(5221): 7 Sali A, Shakhnovich E, Karplus M (1994) How does a protein fold? (2013) Folding of a large protein at high structural resolution. Proc 192–197. – Nature 369(6477):248 251. Natl Acad Sci USA 110(47):18898–18903. 48 Englander SW, Mayne L, Krishna MM (2007) Protein folding and 8 Bryngelson JD, Onuchic JN, Socci ND, Wolynes PG (1995) Funnels, 30 Hu W, et al. (2013) Stepwise protein folding at near amino acid misfolding: Mechanism and principles. Q Rev Biophys 40(4):287–326. pathways, and the energy landscape of protein folding: A synthesis. resolution by hydrogen exchange and mass spectrometry. Proc Natl 49 Hoang L, Bedard S, Krishna MM, Lin Y, Englander SW (2002) – Proteins 21(3):167 195. Acad Sci USA 110(19):7684–7689. Cytochrome c folding pathway: Kinetic native-state hydrogen 9 Dill KA, Chan HS (1997) From Levinthal to pathways to funnels. 31 Krishna MM, Maity H, Rumbley JN, Englander SW (2007) exchange. Proc Natl Acad Sci USA 99(19):12173–12178. Nat Struct Biol 4(1):10–19. Branching in the sequential folding pathway of cytochrome c. Protein 50 Silverman JA, Harbury PB (2002) The 10 Plotkin SS, Onuchic JN (2002) Understanding protein folding Sci 16(9):1946–1956. pathway of a (beta/alpha)8 barrel. J Mol Biol 324(5):1031–1040. with energy landscape theory. Part I: Basic concepts. Q Rev Biophys 32 Sosnick TR, Mayne L, Englander SW (1996) Molecular collapse: 51 Sekhar A, Kay LE (2013) NMR paves the way for atomic level 35(2):111–167. The rate-limiting step in two-state cytochrome c folding. Proteins descriptions of sparsely populated, transiently formed biomolecular 11 Kussell E, Shimada J, Shakhnovich EI (2003) Side-chain dynamics 24(4):413–426. conformers. Proc Natl Acad Sci USA 110(32):12867–12874. and protein folding. Proteins 52(2):303–321. 33 Fersht AR (1997) Nucleation mechanisms in protein folding. Curr 52 Mayne L, Englander SW (2000) Two-state vs. multistate protein 12 Leopold PE, Montal M, Onuchic JN (1992) Protein folding Opin Struct Biol 7(1):3–9. unfolding studied by optical melting and hydrogen exchange. Protein funnels: A kinetic approach to the sequence-structure relationship. 34 Fersht AR, Sato S (2004) Phi-value analysis and the nature of Sci 9(10):1873–1877. Proc Natl Acad Sci USA 89(18):8721–8725. protein-folding transition states. Proc Natl Acad Sci USA 101(21): 53 Ptitsyn OB (1995) How the molten globule became. Trends 13 Wolynes PG, Onuchic JN, Thirumalai D (1995) Navigating the 7976–7981. Biochem Sci 20(9):376–379. folding routes. Science 267(5204):1619–1620. 35 Naganathan AN, Muñoz V (2010) Insights into protein folding 54 Ptitsyn OB (1995) Molten globule and protein folding. Adv 14 Oliveberg M, Wolynes PG (2005) The experimental survey of mechanisms from large scale analysis of mutational effects. Proc Natl Protein Chem 47:83–229. protein-folding energy landscapes. Q Rev Biophys 38(3):245–288. Acad Sci USA 107(19):8611–8616. 55 Arai M, Kuwajima K (2000) Role of the molten globule state in 15 Onuchic JN, Luthey-Schulten Z, Wolynes PG (1997) Theory of 36 Pandit AD, Krantz BA, Dothager RS, Sosnick TR (2007) protein folding. Adv Protein Chem 53:209–282. protein folding: The energy landscape perspective. Annu Rev Phys Chem 48:545–600. Characterizing protein folding transition States using Psi-analysis. 56 Peng ZY, Wu LC (2000) Autonomous protein folding units. Adv – – 16 Onuchic JN, Nymeyer H, García AE, Chahine J, Socci ND (2000) Methods Mol Biol 350:83 104. Protein Chem 53:1 47. The energy landscape theory of protein folding: Insights into folding 37 Krishna MM, Englander SW (2007) A unified mechanism for 57 Feng H, Zhou Z, Bai Y (2005) A protein folding pathway with mechanisms and scenarios. Adv Protein Chem 53:87–152. protein folding: predetermined pathways with optional errors. multiple folding intermediates at atomic resolution. Proc Natl Acad – – 17 Plotkin SS, Onuchic JN (2002) Understanding protein folding Protein Sci 16(3):449 464. Sci USA 102(14):5026 5031. with energy landscape theory. Part II: Quantitative aspects. Q Rev 38 Wedemeyer WJ, Welker E, Scheraga HA (2002) Proline cis-trans 58 Weinkam P, Zong C, Wolynes PG (2005) A funneled energy Biophys 35(3):205–286. isomerization and protein folding. Biochemistry 41(50): landscape for cytochrome c directly predicts the sequential folding – 18 Kim PS, Baldwin RL (1982) Specific intermediates in the folding 14637 14644. route inferred from hydrogen exchange experiments. Proc Natl Acad reactions of small proteins and the mechanism of protein folding. 39 Song MC, Scheraga HA (2000) Formation of native structure by Sci USA 102(35):12401–12406. Annu Rev Biochem 51:459–489. intermolecular thiol-disulfide exchange reactions without oxidants in 59 Lindorff-Larsen K, Piana S, Dror RO, Shaw DE (2011) How fast- 19 Kim PS, Baldwin RL (1990) Intermediates in the folding reactions the folding of bovine A. FEBS Lett 471(2-3): folding proteins fold. Science 334(6055):517–520. of small proteins. Annu Rev Biochem 59:631–660. 177–181. 60 Pande VS (2014) Understanding protein folding using Markov 20 Baldwin RL (2008) The search for folding intermediates and the 40 Klein-Seetharaman J, et al. (2002) Long-range interactions within state models. Adv Exp Med Biol 797:101–106. mechanism of protein folding. Annu Rev Biophys 37:1–21. a nonnative protein. Science 295(5560):1719–1722. 61 Adhikari AN, Freed KF, Sosnick TR (2013) Simplified protein 21 Baldwin RL (1995) The nature of protein folding pathways: the 41 Bilsel O, Zitzewitz JA, Bowers KE, Matthews CR (1999) Folding models: Predicting folding pathways and structure using amino acid classical versus the new view. J Biomol NMR 5(2):103–109. mechanism of the alpha-subunit of , an alpha/ sequences. Phys Rev Lett 111(2):028103. 22 Withers P (2013) Landing spacecraft on Mars and other planets: protein: Global analysis highlights the interconversion of 62 Hughson FM, Wright PE, Baldwin RL (1990) Structural An opportunity to apply introductory physics. Am J Phys 81:565–569. multiple native, intermediate, and unfolded forms through parallel characterization of a partly folded apomyoglobin intermediate. 23 Udgaonkar JB, Baldwin RL (1988) NMR evidence for an early channels. Biochemistry 38(3):1018–1029. Science 249(4976):1544–1548. framework intermediate on the folding pathway of ribonuclease A. 42 Wu Y, Matthews CR (2003) Proline replacements and the 63 Zwanzig R, Szabo A, Bagchi B (1992) Levinthal’s paradox. Proc Nature 335(6192):694–699. simplification of the complex, parallel channel folding mechanism for Natl Acad Sci USA 89(1):20–22.

15880 | www.pnas.org/cgi/doi/10.1073/pnas.1411798111 Englander and Mayne Downloaded by guest on September 29, 2021