<<

PHYSICS, BIOPHYSICS AND COMPUTATIONAL BIOLOGY BIOPHYSICS AND COMPUTATIONAL BIOLOGY Correction for ‘‘The role of fluctuations and stress on the Correction for ‘‘Exploring the folding energy landscape of a effective viscosity of cell aggregates,’’ by Philippe Marmottant, series of designed consensus tetratricopeptide repeat ,’’ Abbas Mgharbel, Jos Ka¨fer,Benjamin Audren, Jean-Paul Rieu, by Yalda Javadi and Ewan R. G. Main, which appeared in issue Jean-Claude Vial, Boudewijn van der Sanden, Athanasius F. M. 41, October 13, 2009, of Proc Natl Acad Sci USA (106:17383– Mare´e, Franc¸ois Graner, and He´le`ne Delanoe¨-Ayari, which 17388; first published October 1, 2009; 10.1073/pnas. appeared in issue 41, October 13, 2009, of Proc Natl Acad Sci 0907455106). USA (106:17271–17275; first published September 25, 2009; The authors note that, due to a printer’s error, some text 10.1073/pnas.0902085106). appeared incorrectly on page 17384. In the left column, the The authors note that, due to a printer’s error, the affiliation second sentence in the third full paragraph, ‘‘This yielded [D]50% for Abbas Mgharbel, Benjamin Audren, Jean-Paul Rieu, and (midpoint of unfolding) and mD-N (change in solvent-accessible ⌬ H2O ⌬ H2O He´le`ne Delanoe¨-Ayari at Universite´ de Lyon appeared incor- surface area upon unfolding) from which GD–N/ G03j rectly in part. The name of the authors’ laboratory, Laboratoire (free energy of unfolding in water) was calculated (Table S1),’’ Mate´riaux et Phe´nome`nes Quantiques, should instead have should instead have appeared as ‘‘This yielded [D]50% (midpoint appeared as Laboratoire de Physique de la Matie`re Condense´e of unfolding) and mD-N (change in solvent-accessible surface ⌬ H2O et Nanostructures. The corrected affiliation line appears below. area upon protein unfolding) from which GD–N (free energy of unfolding in water) was calculated (Table S1).’’ In the right a b ¨ a Philippe Marmottant , Abbas Mgharbel , Jos Kafer , column, the sentence beginning on line 17, ‘‘All of the values of b b a Benjamin Audren , Jean-Paul Rieu , Jean-Claude Vial , ⌬ H2O ⌬ H2O GD–N/ G03j calculated are within error for urea and GuHCl Boudewijn van der Sandenc, Athanasius F. M. Mare´ ed, denaturations,’’ should instead have appeared as ‘‘All of the a,e b ⌬ H2O Franc¸ois Graner , and He´ le` ne Delanoe¨ -Ayari values of G03j calculated are within error for urea and GuHCl denaturations.’’ These errors do not affect the conclusions of the aLaboratoire de Spectrome´trie Physique, Unite´Mixte de Recherche 5588, Universite´Grenoble I and Centre National de la Recherche Scientifique, 140 article. Avenue de la Physique, F-38402 Martin d’He`res Cedex, France; bLaboratoire de Physique de la Matie`re Condense´e et Nanostructures, Centre National www.pnas.org/cgi/doi/10.1073/pnas.0911552106 de la Recherche Scientifique, Unite´Mixte de Recherche 5586, Universite´de Lyon, Universite´Lyon 1, 43 Boulevard du 11 novembre 1918, F-69622 Villeurbanne Cedex, France; cInstitut des Neurosciences Grenoble, Institute National de la Sante´et de la Recherche Me´dicale U836, Chemin Fortune´ Ferrini, BP 170, F-38042 Grenoble Cedex 9, France; dTheoretical Biology/Bioinformatics, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands; and eBiologie du De´veloppement, Unite´Mixte de Recherche 3215 Institut Curie and Centre National de la Recherche Scientifique, Institut National de la Sante´et de la Recherche Me´dicale U934, 26 Rue d’Ulm, F-75248 Paris Cedex 05, France

www.pnas.org/cgi/doi/10.1073/pnas.0911892106

19204 ͉ www.pnas.org Downloaded by guest on September 29, 2021 Exploring the folding energy landscape of a series of designed consensus tetratricopeptide repeat proteins

Yalda Javadi and Ewan R. G. Main1

Centre for Structural and Chemical Biology, Department of Chemistry and Biochemistry, University of Sussex, Falmer near Brighton BN1 9QJ, United Kingdom

Edited by Peter G. Wolynes, University of California, San Diego, La Jolla, CA, and approved August 18, 2009 (received for review July 8, 2009) Repeat proteins contain short, tandem arrays of simple structural 16). Recently, these studies have been complemented by folding motifs (20؊40 aa). These stack together to form nonglobular simulations that show and predict that the cooperativity of structures that are stabilized by short-range interactions from repeat decouples on increasing repeat number or residues close in primary sequence. Unlike globular proteins, they where high local frustration occurs (17–19). have few, if any, long-range nonlocal stabilizing interactions. One Designed consensus repeat proteins provide an excellent ubiquitous repeat is the tetratricopeptide motif (TPR), a 34-aa system for investigating the fundamental properties of repeat helix-turn-helix motif. In this article we describe the folding kinet- proteins, because each repeat has identical intra- and inter- ics of a series of 7 designed TPR proteins that are assembled from repeat interactions. Thus, designed repeat proteins are more arraying identical designed consensus repeats (CTPRan). These structurally symmetric than natural repeat proteins and can range from the smallest 2-repeat protein to a large 10-repeat easily be extended or shortened by adding or removing whole protein (Ϸ350 aa). In particular, we describe how the energy repeats. This ability provides a unique and exquisitely tuneable landscape changes with the addition of repeat units. The data perturbation that differs radically from normal amino acid reveal that although the CTPRa proteins have low local frustration, mutations and enables a wider exploration of repeat protein their highly symmetric, modular native structure is reflected in folding energy landscapes. For example, dependence of stability, their multistate kinetics of unfolding and folding. Moreover, al- folding rate, cooperativity, and thus folding pathway on repeat though the initial folding of all CTPRan proteins involves a nucleus number can be explored by engineering a series of proteins that with similar solvent accessibility, their subsequent folding to the increase in size through addition of identical consensus motifs. native structure depends directly on repeat number. This corre- In particular, we and the Regan laboratory have used 2 series of sponds to an increasingly complex landscape that culminates in designed consensus TPRs (a 34-aa, helix-turn-helix motif), called CTPRa10 populating a misfolded, off-pathway intermediate. These CTPR and CTPRa proteins, to investigate the dependence of results extend our current understanding of the malleable folding thermodynamic characteristics and denatured state on increas- pathways of repeat proteins and highlight the consequences of ing repeat number (Fig. 1; the 2 series only differ by a 2-aa adding identical repeats to the energy landscape. substitution per repeat) (20–23). These studies have highlighted the 1-dimensionality and changing equilibrium properties of design protein ͉ kinetic traps ͉ misfolding ͉ protein folding increasing repeat number by showing that the thermodynamic unfolding transition can be described and, importantly, predicted by an Ising model that uses nearest-neighbor interactions within epeat proteins consist of tandem arrays of simple structural a 1-dimensional lattice (22). Such a description implies that motifs that consist of 20–40 residues. These modules stack R thermodynamically, individual repeats unfold independently of together to form elongated, nonglobular protein folds. They are each other and thus can populate partially folded configurations. highly abundant, widespread in nature and, second only to This observation was confirmed when Cortajarena et al. used immunoglobulins, are the most common protein class to spe- equilibrium, NMR-detected hydrogen/deuterium exchange to cialize in protein–protein interactions (1). Examples of repeat observe sequential unfolding transitions for 2 CTPR proteins proteins include tetratricopeptide (TPR), ankyrin, and leucine- containing 2 and 3 repeats, respectively (20). However, we have rich (LRR) repeats (2, 3). Unusually, unlike globular proteins, also previously shown that kinetic folding of these 2 proteins at these repeat proteins do not rely on complex long-range stabi- 20 °C was 2-state over the limited denaturant concentration lizing interactions. Instead their recurring modular architectures range measured (23). These studies have been further comple- are dominated by regularized short-range interactions (both mented by folding simulations showing that although the CTPRs inter- and intrarepeat). This distinctive feature, which results in possess very low inter- and intrarepeat frustration, their coop- a quasi–1-dimensional structure, has made them extremely erativity of folding as interpreted by correlation length is roughly attractive targets as models for protein folding and design 3 repeats (18). studies. In this study we investigated the kinetic folding cooperativity The keen interest in repeat proteins has led to successful and folding energy landscapes on increasing repeat protein protein design of consensus TPR (4), ankyrin (5, 6), and LRR length by comparing the folding kinetics of the largest series of proteins (7) and extensive stability and folding studies on natural highly symmetric designed repeat proteins to date. This consists ␤ proteins (a 33-residue repeat that forms a -turn of 7 designed CTPRa proteins that range from 2.5 to 10.5 repeats ␣ followed by 2 antiparallel -helices) (8–14). These studies have (86–358 aa). Our results show that as repeat number increases begun to dissect the equilibrium cooperativity and kinetic fold- the landscape becomes more complex, with intermediates in- BIOPHYSICS AND ing pathways of repeat proteins. They show that folding is COMPUTATIONAL BIOLOGY initiated in the most thermodynamically stable unit, with the subsequent route through the energy landscape to the native Author contributions: Y.J. and E.R.G.M. designed research; Y.J. performed research; Y.J. state being governed by a competition between the stability of and E.R.G.M. analyzed data; and Y.J. and E.R.G.M. wrote the paper. individual repeat units and the interactions between repeats. For The authors declare no conflict of interest. example, if repeats within a protein have similar stabilities and This article is a PNAS Direct Submission. weak inter-repeat interactions the protein is more likely to have 1To whom correspondence should be addressed. E-mail: [email protected]. equilibrium intermediates and to fold via a very malleable This article contains supporting information online at www.pnas.org/cgi/content/full/ pathway (more than 1 folding pathway can be accessed) (8, 9, 15, 0907455106/DCSupplemental. www.pnas.org͞cgi͞doi͞10.1073͞pnas.0907455106 PNAS ͉ October 13, 2009 ͉ vol. 106 ͉ no. 41 ͉ 17383–17388 Fig. 1. Ribbon representation of the crystal structures of (A) CTPR2 [ (PDB) entry: 1NA3], (B) CTPR3 (PDB entry: 1NA0), and (C) CTPR8 (PDB entry: 2AVP). The figure was prepared using PYMOL. creasingly populated, and interestingly, suggest that off-pathway intermediates become populated for the largest protein. Thus, although the CTPRa constructs have low intra- and inter-repeat frustration, on increasing repeat number their highly symmetric and modular native topology causes their cooperative kinetic folding to uncouple and eventually misfold to form kinetic traps. Finally, the folding pathway of this unique TPR system is compared with recently published work on the folding of other types of repeat protein (8, 17, 18).

Results Fig. 2. GuHCl-induced equilibrium unfolding experiments of the CTPRa proteins at 10 °C monitored by (A) ellipticity and (B) fluorescence [CTPRa2 (3 Structure of CTPRa Proteins. The consensus TPR proteins ␮M) (ⅷ), CTPRa3 (3 ␮M) (⅜), CTPRa4 (3 ␮M) (f), CTPRa5 (1 ␮M) (Ⅺ), CTPRa6 (1 (CTPRan) were built from arraying multiple copies (n) of a 34-aa ␮M) (᭜), CTPRa8 (1 ␮M) (‚), and CTPRa10 (1 ␮M) (Œ)]. Solid lines correspond idealized sequence with a C-terminal single ‘‘solvating’’ helix (22) to the global best fits to a description based on the 1-dimensional Ising model [Fig. 1 and supporting information (SI) Fig. S1A]. All proteins are (see SI Appendix for details on analysis) (22). stable (Fig. 2) and adopt the distinctive TPR fold with the unique feature of possessing identical modular structures. Recently a number of studies have analyzed multistate equi- Equilibrium Stability of CTPRa Proteins at 10 °C. Equilibrium chem- librium repeat unfolding using an Ising model (22, 26). This has ical denaturations using guanidine hydrochloride (GuHCl) and been used, as the repetitive and modular repeat structure of urea were performed at 10 °C and at pH 7.0. Ellipticity at 222 nm repeat proteins coupled with the systematic variation of transi- and fluorescence at 340 nm was monitored as a function of tion midpoint and mD-N mimic the nearest-neighbor coupling of denaturant concentration to follow each structural transition the Ising model. Therefore the data presented here were globally (Fig. 2 and Fig. S1D). All of the CTPRa proteins underwent a fit using the same Ising model as used when the data were fit at ␣ single reversible transition that corresponded to the concurrent 25 °C (22). This gave xc (midpoint of unfolding of a single -helix ␣ loss of native secondary [far-UV circular dichroism (CD)–␣- in the protein), m1 (denaturant dependence of a single -helix in helical] and tertiary (fluorescence) structure, to a denatured the protein), J (the coupling energy between ␣-helices), from state that lacked both. This was consistent with previously which H (half of the difference in free energy between the folded reported data at 20 °C and 25 °C (22, 23). and denatured states of a single helix in the absence of coupling ⌬ H2O Data from these denaturations were initially analyzed by to its neighbors) (Table S2). From these data G03j (the free individually fitting each protein to a 2-state model (24). This energy of folding in water for a protein with j ␣-helices) was yielded [D]50% (midpoint of unfolding) and mD-N (change in calculated for a helix, a repeat (2 helices), 1.5 repeats (3 helices) solvent-accessible surface area upon protein unfolding) from (Ϫ2.4 kcal molϪ1, Ϫ1.0 kcal molϪ1, and 0.4 kcal molϪ1, respec- ⌬ H2O ⌬ H2O tively), and each CTPRa protein (Table 1 and Table S2). All of which GD-N/ G03j (free energy of unfolding in water) was ⌬ H2O ⌬ H2O calculated (Table S1). However, previous NMR and equilibrium the values of GD-N/ G03j calculated are within error for urea denaturation studies on such designed TPRs have shown that, and GuHCl denaturations. As can be seen, the Ising analysis despite the apparent cooperative equilibrium unfolding, inter- correctly shows that individual helices or repeats are not stable/ mediate states are populated through the denaturation transition unfolded at 10 °C. To obtain a folded and stable protein at least (21–23). Our data support this because our mD-N values do not 1.5 consecutive repeats are required [as shown previously (23, increase in direct proportion to chain length of the CTPRa 27)]. It is interesting to note that for CTPRa2 and CTPRa3 the ⌬ H2O ⌬ H2O proteins. Moreover, differential scanning calorimetry was per- GD-N/ G03j from 2-state and Ising models are within error, formed that showed the larger proteins to be no longer thermally but as the proteins increase in size there is a disparity between unfolded in a 2-state manner (25). results, as one might expect when intermediates are present.

17384 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0907455106 Javadi and Main Table 1. Comparison of kinetic ⌬G with equilibrium ⌬G and ␤-Tanford values

⌬ H2O Ϫ1 ⌬ H2O Ϫ1 † ⌬ H2O Ϫ1 ‡ ⌬ H2O Ϫ1 § ␤ ␤ ␤ CTPRan G03j (kcal mol )* GI-U (kcal mol ) GI-N (kcal mol ) GU-N (kcal mol ) T TS1 T Intermediate T TS2 2 3.3 Ϯ 0.9 - - 2.8 Ϯ 0.4 0.56 - - 3 6.1 Ϯ 1.2 4.0 Ϯ 1.5 5.1 Ϯ 1.5 9.1 Ϯ 2.1 0.35 0.49 0.58 4 9.0 Ϯ 1.5 3.8 Ϯ 1.4 5.9 Ϯ 1.7 9.8 Ϯ 2.2 0.37 0.47 0.49 5 11.8 Ϯ 1.9 4.2 Ϯ 1.2 6.4 Ϯ 2.4 10.6 Ϯ 2.7 0.30 0.45 0.46 6 14.7 Ϯ 2.2 4.3 Ϯ 0.6 10.7 Ϯ 2.5 15.0 Ϯ 2.6 0.24 0.34 0.36 8 20.4 Ϯ 2.8 4.0 Ϯ 0.3 14.6 Ϯ 2.6 18.6 Ϯ 2.6 0.16 0.25 0.26 10 26.1 Ϯ 3.5 4.3 Ϯ 0.4 17.8 Ϯ 3.7 22.2 Ϯ 3.7 0.15 0.23 0.23

All errors from kinetic data are obtained from propagation of errors obtained from the fitting of the data (shown in Table S3). *Values averaged from the Ising fit of all equilibrium data and using Eq. 5 (SI Appendix); errors obtained from the propagation of a standard deviation of 3 averaged data sets (shown in Table S2). † ⌬ H2O ϭϪ H2O H2O Calculated from kinetic data using GU-1 RTln(kIU /kUI ). ‡ ⌬ H2O ϭϪ H2O H2O Calculated from kinetic data using GI-N RTln(kNI /kIN ). § ⌬ H2O ϭϪ H2O H2O H2O H2O ␤ ͚ ͚ ͚ Calculated from kinetic data using GU-N RTln(kIU kNI /kUI kIN ). T values were obtained using xmx/ imi, where xmx is the sum of kinetic m-values between the unfolded state and state X on the reaction coordinate, and ͚imi is the sum of all kinetic m-values along the reaction coordinate.

Unfolding/Refolding Kinetics of CTPRa Proteins. The unfolding and interesting to note that the point at which the refolding limb for refolding kinetics of CTPRa2 to CTPRa10 were measured using each protein kinks is essentially the same and coincides with the stopped-flow fluorescence spectroscopy as a function of GuHCl linear refolding arm of CTPRa2. In a similar manner to the concentration. This follows the changes in the fluorescence of refolding kinetics, the unfolding arms of CTPRa3 to CTPRa10 the tryptophan and tyrosine residues found in each repeat. The chevrons have a downward curvature that becomes more curved unfolding and refolding kinetics of the repeat protein constructs with increasing repeat number. However, the scale of the were observed to be very rapid, therefore experiments were rollover and the kinks is less severe. performed at 10 °C to expand the range of GuHCl concentration over which data could be collected. This permitted the refolding of CTPRa4 to CTPRa10 to be followed down to 0 M GuHCl, with CTPRa2 and CTPRa3 measured down to 0.7 M and 0.54 M GuHCl, respectively. Within these ranges all of the CTPRa protein kinetics were best described by a monophasic process that fit well to single exponential equation (Fig. S2 A and B show typical kinetic traces for unfolding and refolding, respectively, overlaid with the unfolded or folded baselines). The natural logarithms of the observed rate constants measured as a function of GuHCl concentration for each CTPRa protein are shown as chevrons plots in Fig. 3 and Fig. S3. A number of striking features and trends are immediately apparent from studying the chev- rons. These can be broken down into the effects that the addition of TPR motifs have on the rates of refolding and unfolding and the relationship of ln kobs as a function of GuHCl concentration. Rates of refolding and unfolding. It is obvious that the main effect of adding TPR motifs is a considerable decrease in the unfolding rate. Thus, the increase in stability of each protein on increasing repeat number can be attributed to a decrease in the rate of unfolding. This was consistent with previously reported data for 2 designed TPRs at 20 °C (23) and agrees well with studies on the effects of adding of ankyrin repeats to ankyrin-containing proteins (8, 28). In comparison, the refolding rates, when measured before any rollover, increase slightly before becoming equally fast. Nonlinear relationship of ln kobs as a function of [GuHCl]. The most striking feature and trend of the CTPRa chevrons is the changing nonlinearity of both ln kF and ln kU as a function of GuHCl concentration. Although the smallest protein in the series, CTPRa2, exhibits a linear dependence for both refolding and unfolding over the denatured concentration range measured, which is indicative of 2-state behavior, all of the other proteins BIOPHYSICS AND

in the series do not. COMPUTATIONAL BIOLOGY When rollover in the refolding arms of the chevron plots are compared, the nonlinearity observed for each progressively Fig. 3. (A) Chevron plots for all of the CTPRa proteins in the series. CTPRa2 bigger CTPRa construct becomes more pronounced until the is fitted to a linear 2-state model of folding (SI Appendix Eq. 6). CTPRa3 to rollover is no longer a smooth curve but is kinked (Fig. 3 and CTPRa10 are fitted to a sequential 3-state on-pathway model (Fig. S1B using SI Appendix Eqs. 8–10). (B) Chevron plots for CTPRa8 and CTPRa10. Here the Fig. S3). The slope of the refolding arm after the kink for the refolding kinetics are fitted to a minimal dead-end scheme, whereby a com- smaller proteins is negative (as one would expect), yet as the pact off-pathway intermediate species I equilibrates with the denatured state repeats become larger the slope becomes flat and for the largest (Fig. S1C and SI Appendix Eq. 12). In A and B: CTPRa2 (ⅷ), CTPRa3 (⅜), CTPRa4 protein (CTPRa10) positive (Fig. 3, Fig. S3, and Table S3). It is (f), CTPRa5 (Ⅺ), CTPRa6 (᭜), CTPRa8 (‚), and CTPRa10 (Œ).

Javadi and Main PNAS ͉ October 13, 2009 ͉ vol. 106 ͉ no. 41 ͉ 17385 Whereas previous studies have shown that such nonlinearities ing kUI ϩ kIU are much faster than kIN and for unfolding the kIN can be caused by transient aggregation, ionic effects, and in- and kNI are much faster than kIU). Fig. 3A shows that our data strumental dead-time (29–31), here they are not. The refolding fit well to this model, and Table S3 shows the rate constants rate constants for each CTPRa protein were found to be obtained for each CTPRa protein. independent of protein concentration over a 100-fold range (measured at 0 M and 0.54 M GuHCl by both pH and [GuHCl] Energetics of the Folding Landscape on Addition of Repeats. Impor- jump experiments, 0.1 ␮Mto10␮M, and shown in Fig. S3), tantly, the results shown in Table S3 enable a direct comparison independent of ionic effects (urea denaturations gave similar of the folding pathways for each protein through the calculation rollover to GuHCl denaturations; Fig. S1E) and not limited by of the relative stabilities of each populated state on it. These are instrumental dead-time (folding rates recorded for CTPRa2 are shown in Table 1 and consist of the stability of the intermediate ⌬ H2O ⌬ H2O faster than for the curved chevrons of larger CTPRa proteins). relative to the denatured or native state ( GU-I and GI-N , Thus, all observed changes in slope of the chevrons can be respectively) and the stability of the native state relative to the related to denaturant-dependent changes in either structure or ⌬ H2O denatured state ( GU-N). These highlight 3 important points: (i) population of differing states in the folding energy landscape. when repeat number is increased the overall stability of each protein (⌬GH2O) increases (consistent with equilibrium data and Confirmation of Significantly Populated Intermediate States During U-N ⌬GH2O Refolding. To confirm that the curvature observed in the refold- within error of the equilibrium stability, 03j , calculated from ing plots of the larger CTPRa proteins is due to the population fitting to the Ising model); (ii) the populated intermediates of of intermediate states, refolding experiments were performed each CTPRa protein have the same stability relative to the ⌬ H2O Ϸ Ϫ1 with the hydrophobic dye 1-anilinonaphthalene-8-sulfonate denatured state ( GU-N 4 kcal mol ) and are therefore (ANS). ANS is known to bind to exposed clusters of hydrophobic independent of repeat number; and (iii) consequently, as the residues frequently found in transient folding intermediates (32). repeat proteins become larger the change in stability between ⌬ H2O On binding to such a hydrophobic surface, ANS undergoes a the intermediate and native state increases ( GI-N ). large change in fluorescence and so is a sensitive probe of the Interestingly, the fact that the stability of each CTPRa’s ⌬ H2O formation of transient species during folding (32). Therefore, intermediate relative to the denatured state ( GU-I ) does not refolding experiments in the presence of ANS at a final con- change with increasing repeat number shows that all CTPRa centration of 0.54 M GuHCl were performed. Under these proteins form a nucleus of comparable stability. Moreover, if the conditions, if the curvature observed in Fig. 3 is due to the stabilities of helices/repeats obtained from the Ising model are population of intermediate states, the intermediates should bind ⌬ H2O Ϸ used, the stability of each CTPRa’s intermediate ( GU-I 4 ANS and show a change in fluorescence. This is observed for all kcal molϪ1) would correspond to the formation of a nucleus of of the larger CTPRa proteins with significant curvature (CT- Ϸ2.5 repeats (5 helices). PRa4 to CTPRa10; Fig. S2 C and D and Table S4). In contrast, CTPRa2 and CTPRa3 do not. This is expected for CTPRa2, Comparison of Compactness of Transition States and Intermediates of because the refolding arm of its chevron is linear over the the CTPRa Proteins. To characterize and compare the compactness denatured concentration range measured. The lack of signal of the transition states and intermediates for each CTPRa from CTPRa3 might be caused by its rapid folding kinetics at protein, ␤T values were calculated from the kinetic data (Table 0.54 M GuHCl (350 sϪ1 Ϯ 40). When a higher final GuHCl 1). A ␤T value is a measure of the average degree of exposure in concentrations was used on each CTPRa protein (corresponding state X on the reaction coordinate relative to that of the to the linear refolding region of each chevron) no signal was denatured state from the native state (34, 35). It is defined as the observed (Fig. S2 E and F), as expected. The lack of signal at ratio ͚xmx/͚imi, where ͚xmx is the sum of kinetic m-values higher GuHCl concentration reflects the fact that the interme- between the unfolded state and state X on the reaction coordi- diate state is destabilized relative to the denatured state with nate, and ͚imi is the sum of all kinetic m-values along the increasing concentrations of GuHCl. reaction coordinate. A value of 1 corresponds to a state X that is as solvent exposed as the native state, whereas a value of 0 Characterization of Folding Energy Landscapes. To analyze the suggests that a state X is as solvent exposed as the denatured energy landscapes of the CTPRa proteins the kinetic data were state. It is obvious from Table 1 that as repeats are added the fit in 2 ways: (i) CTPRa2: a 2-state folding scheme because no amount of surface area buried relative to their native state of ANS binding was observed; and (ii) CTPRa3 to CTPRa10: a both transition and intermediates states decreases with increas- 3-state folding scheme because curved chevron plots and/or ANS ing number of repeats. However, as each protein is increasing in binding was observed (thus populating refolding intermediates). size, this actually corresponds to a similar burial of surface area Two-state folding of CTPRa2. The data were fitted to a linear 2-state of transition and intermediates states that is constant throughout H2O model of folding (SI Appendix Eq. 6) (24). The extrapolated kF all of the CTPRa proteins. This is consistent with the invariant H O H O and k 2 from the fit were then used to calculate a kinetic ⌬G 2 ⌬ H2O Ϸ U U-N stability of intermediates across the CTPRa proteins ( GU-I ⌬ H2O ⌬ H2O Ϫ1 (Table 1 and Table S3). This was consistent with GD-N/ G035 4 kcal mol ) and implies that the addition of repeats does not calculated from equilibrium data, showing that in the range greatly change the initial folding process. If the ␤T values are measured CTPRa2 folds kinetically in a 2-state manner. related to the formation of complete repeats and helices (thus Three-state folding of CTPRa3 to CTPRa10. Because of the kinked native burial of side chains), the values obtained would equate nature of the chevron plots, the data could not be fitted to the formation of between 1 and 1.5 repeats (2 to 3 helices) in accurately to the sum of 2 quadratic equations (SI Appendix Eq. the first transition state, between 2 and 2.5 (4 to 5 helices) in the 7). Therefore, each dataset was fitted to a simple sequential, intermediate, and between 2 and 2.5 (4 to 5 helices) in the second on-pathway 3-state model whereby the same intermediate is transition state. populated in both refolding and unfolding experiments (SI Appendix Eqs. 8–10 and Fig. S1B) (33). Role of Each CTPRa Protein Intermediate in the Folding Landscape. This model is consistent with our ANS results, because it Further information on the nature of the intermediate states can assumes an intermediate is populated in the dead-time of the be obtained by analyzing the slope of the rollover of the refolding stopped-flow instrument, causing the rate to be limited (i.e., arm of each chevron (mI-N; Table S3). In this region of the where the chevron deviates from ‘‘2-state’’ linearity, the refold- chevron plot, the slope reports on the transition from the

17386 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0907455106 Javadi and Main intermediate to the native state, such that the slope depends on the changing solvent-accessible surface area between the inter- mediate (I) relative to the folding transition state (TS). A negative slope shows that the TS has less solvent-accessible surface area than I, no slope shows that the I and TS have the same solvent-accessible surface area, and a positive slope shows that the TS has more solvent-accessible surface area than I. This would mean a negative dependence on denaturant corresponds to a more collapsed TS relative to I, no slope corresponds to no change in compactness of the TS relative to I, and positive dependence shows that I is more compact relative to the TS. As described above, when repeats are added the rollover turns from negative (CTPRa3 to CTPRa4) to flat (CTPRa5 to CTPRa8) and finally positive (CTPRa10) (Fig. 3 and Table S3). In general, on-pathway folding proceeds through each progressive state being at least as compact if not more compact as the last. Thus, although an off-pathway intermediate cannot be ruled out, the compactness of the intermediates of CTPRa3 to CTPRa8 as Fig. 4. Schematic illustration of the proposed folding pathways of CTPRa judged by the slopes of the rollover observed are consistent with proteins as they increase in repeat number. Cylinders are coloured from the N being on pathway (Fig. S1B). In contrast, the positive slope of the terminus in red and correspond to 1 helix. (A) Folding of CTPRa2: 2-state folding rollover observed for CTPRa10 is consistent with an off-pathway over the conditions studied with a transition state (T.S.) that is Ϸ50% solvent intermediate that has to partially unfold through a less-compact exposed as the native state. The transition state is drawn, for illustrative purposes, TS to fold to the native state (Fig. S1C). as 3 formed helices arranged as a formed repeat and 1 partially formed helix. (B) To a first approximation the data are consistent with a Folding of CTPRa proteins greater than 3 repeats: multistate folding through a minimal dead-end scheme whereby a compact off-pathway stable intermediate. Although there is no evidence to support a structure for the intermediate state, the proposed structures (i, ii, and iii) are shown and corre- intermediate species equilibrates with the denatured state in the Ϫ1 spond to the ⌬GU-I Ϸ 4 kcal mol . If the stabilities of helices/repeats obtained dead-time of the experiment (Fig. 3B, Fig. S1C, and Table S5). from the Ising model are used, this would correspond to the formation of a unit Thus, although positive slopes have been observed for carbonic equal to Ϸ2.5 repeats. Folding from the intermediate requires a rearrangement anhydrase in the absence of salt (36) and S6 in the presence of that has no change in compaction when passing through the final transition state salt (37); here, we have explicitly shown that, without any on route to the native state. This could consist of the docking of preformed additives, repeat proteins can also exhibit this behavior. modules. (C) Folding of CTPRa proteins of at least 10 repeats: folding is hampered by the population of a misfolded intermediate (drawn as wrongly docked re- Discussion peats). The protein has to unfold from this state to continue to fold productively Multistate Kinetic Folding Pathways. In this study we have performed to the final native structure. a comprehensive characterization of the folding pathway of a series of designed CTPRa proteins by investigating their un/folding ki- docking of folded repeats/helices units that are not in the correct netics and equilibrium denaturations. In particular, our system native topologic sequence, or the misfolded state could arise from allowed for full denaturation and thus complete kinetic analysis of the fact that the CTPRa constructs have unusually compact dena- 7 proteins (CTPRa2 to CTPRa10) that ranged from 86 aa to 358 tured species (21). However, because the compact denatured aa. The results on these constructs, which differ greatly in size and structures are found in all of the CTPRa proteins’ denatured states stability, has enabled us to shed light on the systematic changes and and this does not seem to hamper or be prevalent in the kinetic similarities in folding pathway on increasing repeat number. In folding of the smaller proteins, we believe the former explanation particular we have been able to show that when the CTPRa of wrongly docked modules to be more probable. proteins’ identical modular structure and their modular equilibrium folding thermodynamics are taken into account, our kinetic data are consistent with the following scheme (Fig. 4): Comparison of Kinetics with Equilibrium and Computational Studies. (i) All CTPRa proteins begin folding with the formation of a Our analysis shows that the modular multistate equilibrium folding nucleus that has approximately similar burial of surface area (as of the CTPRa proteins (20, 23, 38) is also observed in their kinetic folding and unfolding. Excitingly, our results are also completely judged by comparing ␤T values). (ii) In the case of a CTPRa protein that contains at least 3 consistent with the 2 computational studies on the folding CTPR repeats and is more stable than 4 kcal molϪ1, stable substructures proteins. First, they mirror the simulations that predict that the form to produce metastable intermediates. For CTPRa3, folding kinetic folding pathway of CTPR proteins should be multistate (17, from this intermediate still requires further collapse of the 18). Second, they are consistent with the sequential process ob- protein chain through the final transition state to the native state. tained from computational simulations by Hagai and Levy (17). (iii) When the CTPRa protein has more than 4 repeats the These show that as CTPRa proteins increase in repeat number, the intermediate folds to the native state through a transition state occurrence of independently folding intermediates (formed from that has no change in compaction. This could consist of the consecutive folded repeats) increases. Third, our results show that docking of preformed modules. multistate kinetics are only observed when a CTPRa construct

(iv) If the protein has at least 10 repeats it causes the contains at least 3 CTPR motifs, and Ferreiro et al. (18) showed that BIOPHYSICS AND population of a misfolded intermediate. The protein has to the folding cooperativity of the larger CTPRa proteins, defined by COMPUTATIONAL BIOLOGY unfold from this state to continue to fold productively to the final correlation length, is roughly 3 repeats. native structure. This is an explicit demonstration of misfolding for a repeat Comparison with Ankyrin Repeat Folding Studies. Although there protein. In particular, these results provide insight into the kinetic have been a number published studies on the folding of ankyrin traps that are a direct result of constructing a protein from identical proteins (11, 13), there are few that have characterized such large structural and energetic repeated motifs. It is interesting to spec- repeat proteins (9, 10, 15, 16) or compared such a range of repeat ulate on the structure of the misfolded intermediate. Two schemes protein sizes (8, 28). However, these studies coupled with studies are probable: the misfolded intermediate could be caused by the on smaller ankyrin proteins have elegantly shown that folding is

Javadi and Main PNAS ͉ October 13, 2009 ͉ vol. 106 ͉ no. 41 ͉ 17387 controlled through the thermodynamic interplay between the causes off-pathway kinetic traps to predominate. Thus, although stability of individual repeats and the interactions between proteins constructed of identical repeats have low local frustra- repeats. In general, each natural ankyrin protein seems to have tion, their modular and symmetric structures produce energy a stable core of a repeat or a few repeats that initiates folding and landscapes that are prone to kinetic traps. These results com- around which the rest of the structure condenses. Whether plement the existing notion of repeat proteins having very folding proceeds with no intermediates being populated, a malleable folding pathways and highlight how the structure of populated kinetic intermediate, or an intermediate populated consensus repeats affects their folding pathway. even under equilibrium conditions seems to be determined by the stability and size of the core compared with the rest of the Materials and Methods protein. The results for CTPRa presented here agree well with Cloning, Protein Production, and Purification. The designed CTPRa proteins this scheme and highlight how the difference in structure were cloned, expressed, and purified as previously described (22). between natural repeats and our designed constructs affects their folding (i.e., the CTPRa proteins are built from identical Equilibrium Experiments. Fluorescence and far-UV CD equilibrium unfolding measurements were performed and analyzed as described in detail in SI repeats and therefore have no repeat/unit that is more stable or Appendix. less stable than the rest of the protein). This difference manifests itself in the ready accumulation of intermediates along the Kinetic Experiments. All experiments were performed as described in detail in folding and unfolding pathway and, when the protein is large SI Appendix. In brief, both unfolding and folding phases fitted well to a enough, an off-pathway intermediate. It is interesting to specu- single-exponential process. No slow, proline isomerization phases were ob- late that Nature may have evolved away from complete consen- served in the refolding experiments over a 200-s time scale. It is quite possible sus proteins, not only for binding, but also to avoid such kinetic that slower phases exist but are difficult to detect owing to instrumental drift. traps. Data Analysis. The dependence of ln kobs on [denaturant] for each CTPRa Conclusions protein was fitted to either a 2-state model or a sequential 3-state model whereby an intermediate is either on pathway (Fig. S1B) or off pathway (Fig. Designed TPR motifs have unique properties: identical struc- S1C). For full details of equations used see SI Appendix. tural modularity, sequence simplicity, and modular linear struc- ture. We have exploited these properties to explore their folding ACKNOWLEDGMENTS. We thank Dr. Sophie Jackson, Dr. Alan Lowe, and Prof. landscapes by elegantly using the perturbation of adding and Lynne Regan for insightful discussion, helpful comments, and suggestions; subtracting whole repeat motifs. This facilitates a wider explo- and members of the Department of Chemistry and Biochemistry (University of ration of repeat protein folding energy landscapes. Initially on Sussex), Dr. Lowe, Dr. Jackson, and Dr. Phillips, for critical reading of the manuscript. Y.J. was supported by an Engineering and Physical Sciences increasing the repeat number, on-pathway intermediates are Research Council studentship. This work was partly funded by Biotechnology kinetically populated. Subsequent addition of modular repeats and Biological Sciences Research Council Grant E005187/1.

1. Grove TZ, Cortajarena AL, Regan L (2008) Ligand binding by repeat proteins: Natural 20. Cortajarena AL, Mochrie SGJ, Regan L (2008) Mapping the energy landscape of repeat and designed. Curr Opin Struct Biol 18:507–515. proteins using NMR-detected hydrogen exchange. J Mol Biol 379:617–626. 2. Andrade MA, Perez-Iratxeta C, Ponting CP (2001) Protein repeats: Structures, func- 21. Cortajarena AL, et al. (2008) Non-random-coil behaviour as a consequence of extensive tions, and evolution. J Struct Biol 134:117–131. PPII structure in the denatured state. J Mol Biol 382:203–212. 3. Groves MR, Barford D (1999) Topological characteristics of helical repeat proteins. Curr 22. Kajander T, Cortajarena AL, Main ERG, Mochrie SGJ, Regan L (2005) A new folding Opin Struct Biol 9:383–389. paradigm for repeat proteins. J Am Chem Soc 127:10188–10190. 4. Main ERG, Jackson SE, Regan L (2003) The folding and design of repeat proteins: 23. Main ERG, Stott K, Jackson SE, Regan L (2005) Local and long-range stability in Reaching a consensus. Curr Opin Struct Biol 13:482–489. tandemly arrayed tetratricopeptide repeats. Proc Natl Acad Sci USA 102:5721–5726. 5. Kohl A, et al. (2003) Designed to be stable: Crystal structure of a consensus ankyrin 24. Fersht AR (1999) Structure and Mechanism in Protein Science: Guide to Enzyme repeat protein. Proc Natl Acad Sci USA 100:1700–1705. Catalysis and Protein Folding (W. H. Freeman, New York) 3rd Ed, p 650. 6. Mosavi LK, Minor DL, Peng ZY (2002) Consensus-derived structural determinants of the 25. Javadi Y (2009) Structural stability and the folding pathway of a series of consensus ankyrin repeat motif. Proc Natl Acad Sci USA 99:16029–16034. tetratricopeptide repeat proteins. Ph.D. thesis (University of Sussex, Falmer near 7. Stumpp MT, Forrer P, Binz HK, Pluckthun A (2003) Designing repeat proteins: Modular Brighton, UK). leucine-rich repeat protein libraries based on the mammalian ribonuclease inhibitor 26. Aksel T, Barrick D (2009) Analysis of repeat-protein folding using nearest-neighbor family. J Mol Biol 332:471–487. statistical mechanical models. Methods Enzymol 455:95–125. 8. Wetzel SK, Settanni G, Kenig M, Binz HK, Pluckthun A (2008) Folding and unfolding 27. Main ERG, Xiong Y, Cocco MJ, D’Andrea L, Regan L (2003) Design of stable alpha-helical mechanism of highly stable full-consensus ankyrin repeat proteins. J Mol Biol 376:241–257. arrays from an idealized TPR motif. Structure 11:497–508. 9. Werbeck ND, Rowling PJE, Chellamuthu VR, Itzhaki LS (2008) Shifting transition states in 28. Tripp KW, Barrick D (2007) Enhancing the stability and folding rate of a repeat protein the unfolding of a large ankyrin repeat protein. Proc Natl Acad Sci USA 105:9982–9987. through the addition of consensus repeats. J Mol Biol 365:1187–1200. 10. Tripp KW, Barrick D (2008) Rerouting the folding pathway of the notch ankyrin domain 29. Silow M, Oliveberg M (1997) Transient aggregates in protein folding are easily mis- by reshaping the energy landscape. J Am Chem Soc 130:5681–5688. taken for folding intermediates. Proc Natl Acad Sci USA 94:6084–6086. 11. Kloss E, Courtemanche N, Barrick D (2008) Repeat-protein folding: New insights into 30. Krantz BA, Sosnick TR (2000) Distinguishing between two-state and three-state models origins of cooperativity, stability, and topology. Arch Biochem Biophys 469:83–99. for ubiquitin folding. Biochemistry 39:11696–11701. 12. Courtemanche N, Barrick D (2008) Folding thermodynamics and kinetics of the leucine- 31. Went HM, Benitez-Cardoza CG, Jackson SE (2004) Is an intermediate state populated rich repeat domain of the virulence factor Internalin B. Prot Sci 17:43–53. on the folding pathway of ubiquitin? FEBS Lett 567:333–338. 13. Barrick D, Ferreiro DU, Komives EA (2008) Folding landscapes of ankyrin repeat proteins: Experiments meet theory. Curr Opin Struct Biol 18:27–34. 32. Jones BE, Jennings PA, Pierre RA, Matthews CR (1994) Development of nonpolar 14. Lowe AR, Itzhaki LS (2007) Rational redesign of the folding pathway of a modular surfaces in the folding of Escherichia coli dihydrofolate reductase detected by 1-ani- protein. Proc Natl Acad Sci USA 104:2679–2684. linonaphthalene-8-sulfonate binding. Biochemistry 33:15250–15258. 15. Low C, et al. (2008) Structural insights into an equilibrium folding intermediate of an 33. Sanchez IE, Kiefhaber T (2003) Evidence for sequential barriers and obligatory inter- archaeal ankyrin repeat protein. Proc Natl Acad Sci USA 105:3779–3784. mediates in apparent two-state protein folding. J Mol Biol 325:367–376. 16. Werbeck ND, Itzhaki LS (2007) Probing a moving target with a plastic unfolding 34. Tanford C (1968) Protein denaturation. Adv Protein Chem 23:121–282. intermediate of an ankyrin-repeat protein. Proc Natl Acad Sci USA 104:7863–7868. 35. Tanford C (1970) Protein denaturation. C. Theoretical models for the mechanism of 17. Hagai T, Levy Y (2008) Folding of elongated proteins: Conventional or anomalous? denaturation. Adv Protein Chem 24:1–95. J Am Chem Soc 130:14253–14262. 36. McCoy LF, Jr, Rowe ES, Wong KP (1980) Multiparameter kinetic study on the unfolding 18. Ferreiro DU, Walczak AM, Komives EA, Wolynes PG (2008) The energy landscapes of and refolding of bovine carbonic anhydrase B. Biochemistry 19:4738–4743. repeat-containing proteins: Topology, cooperativity, and the folding funnels of one- 37. Otzen DE, Oliveberg M (1999) Salt-induced detour through compact regions of the dimensional architectures. PLoS Comput Biol 4:e1000070. protein folding landscape. Proc Natl Acad Sci USA 96:11746–11751. 19. Ferreiro DU, Cho SS, Komives EA, Wolynes PG (2005) The energy landscape of modular 38. Main ERG, Lowe AR, Mochrie SGJ, Jackson SE, Regan L (2005) A recurring theme in repeat proteins: Topology determines folding mechanism in the ankyrin family. JMol protein engineering: The design, stability and folding of repeat proteins. Curr Opin Biol 354:679–692. Struct Biol 15:464–471.

17388 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0907455106 Javadi and Main