Quantitative description of residual helical structure for λ-repressor N-terminal domain in the unfolded state

by

Kan Li

Department of Biochemistry Duke University

Date: Approved:

Terrence G. Oas, Supervisor

David Richardson

Pei Zhou

Patrick Charbonneau

Scott Schmidler

Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Biochemistry in the Graduate School of Duke University 2017 Abstract Quantitative description of residual helical structure for λ-repressor N-terminal domain in the unfolded state

by

Kan Li

Department of Biochemistry Duke University

Date: Approved:

Terrence G. Oas, Supervisor

David Richardson

Pei Zhou

Patrick Charbonneau

Scott Schmidler

An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Biochemistry in the Graduate School of Duke University 2017 Copyright c 2017 by Kan Li All rights reserved except the rights granted by the Creative Commons Attribution-Noncommercial Licence Abstract

Proteins can form residual compactness in the unfolded state. Among different types of residual compactness, residual helical structure is an important type of local compactness that can propagate through the formation of helical hydrogen bonds. Residual helicity has been observed for different unfolded state proteins. In order to accurately determine the contributions of individual residues to the overall helic- ity, accurate determination of residue-specific information and quantitative analysis methods are needed. The projects in this dissertation aim at quantitatively describing the residual helical conformation in the unfolded state of λ-repressor N-terminal domain. The residue-specific helicity values and backbone amide proton hydrogen bonding popu- lations are analyzed using improved methods based on Bayesian inference. Generally, these values are higher for the helix 1 region in the context of the N-terminal domain than as an isolated peptide. Experimentally determined residue-specific helicity val- ues of unfolded state λ-repressor N-terminal domain show similarity to the theoretical prediction using the helix-coil model. These results show that, in the unfolded state of λ-repressor N-terminal domain, the propagation of residual helicity does not significantly depend on tertiary interac- tions. The results support the hypothesis that λ-repressor N-terminal domain folds by “diffusion-collision” of nascently formed helices.

iv This thesis is dedicated to my parents.

v Contents

Abstract iv

List of Tables ix

List of Figuresx

List of Abbreviations and Symbols xii

Acknowledgements xiii

1 Introduction1

1.1 The importance of studying unfolded state...... 1

1.1.1 Two-state folding and conformational heterogeneity...... 1

1.1.2 Theories of ...... 4

1.1.3 The unfolded state is not necessarily a random coil...... 7

1.1.4 IUP: the unfolded state can be functional...... 8

1.1.5 Residue-specific helicity of protein unfolded state...... 9

1.2 Evidence of residual structure in the unfolded state...... 10

1.3 λ repressor protein...... 14

1.4 Advancement in theories and experimental techniques to describe nascent helicity...... 18

1.4.1 Helix-coil transition theory...... 18

1.4.2 Circular Dichroism...... 22

1.4.3 CSI (chemical shift index)...... 24

1.4.4 Amide proton exchange...... 28

vi 1.4.5 Other NMR methods...... 33

2 Amide proton exchange and fraction protection 36

2.1 LRH1x peptide introduction...... 36

2.2 CD spectra of LRH1x...... 37

2.2.1 Materials and methods...... 37

2.2.2 Residue-averaged helicity of LRH1x...... 41

2.3 Backbone amide proton protection factors in LRH1x...... 45

2.3.1 Apparent amide proton exchange rate constant measurement. 45

2.3.2 kobs values of LRH1x residues...... 53 2.3.3 Fitting for fraction protection values...... 54

2.3.4 Bayesian inference to determine reference parameters..... 59

2.3.5 Posterior parameter distributions...... 60

2.4 Discussion...... 63

3 Backbone chemical shift and residue specific helicity 65

3.1 Backbone chemical shifts of LRH1x...... 65

3.1.1 Materials and Methods...... 65

3.1.2 Backbone chemical shift data...... 67

3.2 Prediction of residue helicity using δ2D...... 71

3.3 Bayesian-CSI model...... 72

3.3.1 The calculation of residue helicity by helix-coil model..... 72

3.3.2 Bayesian-CSI model...... 74

3.3.3 Posterior distributions...... 76

3.3.4 Discussion...... 82

4 Residual helicity in λ repressor N-terminal domain 84

4.1 Backbone chemical shifts...... 84

vii 4.1.1 Materials and Methods...... 84

4.1.2 Results...... 87

4.2 Amide exchange rate constants...... 96

4.2.1 Materials and Methods...... 96

4.2.2 Results...... 96

4.3 Discussion...... 99

4.3.1 Implication of missing crosspeaks...... 99

˚ 4.3.2 The difference in helicity between λ1´85 and LRH1x...... 100

˚ 4.3.3 High helicity areas in λ1´85 ...... 102 4.3.4 Future directions...... 103

5 Conclusion 104

A Supplementary figures 106

Bibliography 110

Biography 127

viii List of Tables

3.1 Backbone assignments for LRH1x residues at pH 6.0, 15 ˝C & 0M urea 70

˚ ˝ 4.1 Backbone assignments for λ1´85 residues at pH 5.3, 20 C and 0M urea 88

ix List of Figures

1.1 Structure of λ-repressor N-terminal domain...... 16

1.2 An example of a helix-coil calculation...... 21

1.3 pH dependence of exchange rate constant...... 29

1.4 Example SOLEXSY intensity profile...... 32

2.1 The mechanism of CNBr cleavage at methionine residue...... 39

2.2 Normalized CD spectra of LRH1x peptide at pH 6.0...... 41

2.3 Normalized CD signals of urea containing LRH1x samples...... 42

2.4 The change of helicity with increasing temperature...... 43

2.5 Helicity estimates for CD signals of urea containing samples..... 44

2.6 Peak intensities observed in a SOLEXSY experiment...... 49

2.7 Examples of simulated SOLEXSY data...... 51

˝ 2.8 kobs values of LRH1x residues at pH 7.23, 20 C...... 54 2.9 The relationship between box plots and probability distributions... 60

2.10 Comparison of fraction protection values...... 62

2.11 Posterior distribution of γB paramters...... 63 3.1 Example 15N-HSQC spectrum of LRH1x...... 67

3.2 15N-HSQC spectra overlay for 5-50˝C with 5˝C increments...... 68

3.3 CA chemical shifts of residues Arg11 and Leu12...... 69

3.4 CS differences for each residue and nucleus...... 71

3.5 Helicity values estimated using δ2D algorithm...... 72

x 3.6 Posterior distribution of global parameter m...... 77

3.7 Reference chemical shift deviations from δ2D values...... 78

3.8 Helicity comparison between δ2D scores and posterior distributions. 81

3.9 w value posterior distributions at 20˝C and 0 M urea...... 82

4.1 The products of methionine oxidation...... 86

15 ˚ ˝ 4.2 N-HSQC of λ1´85 at pH 5.3, 20 C and 0M urea...... 91 4.3 Experimental estimated and theoretical calculated helicity...... 93

˚ 4.4 Comparison of helicity between LRH1x and λ1´85 ...... 95 4.5 Comparison of observed rate constants...... 97

4.6 Ratios of observed exchange rate constants...... 98

˚ 4.7 Comparison of fraction protection between LRH1x and λ1´85 ..... 99 4.8 Capping box in folded state λ-repressor...... 102

A.1 All the fprot parameter posterior distributions...... 107 A.2 (Figure A.1 continued)...... 108

A.3 (Figure A.1 continued)...... 109

A.4 Posterior distributions of ∆S0 parameters...... 109

xi List of Abbreviations and Symbols

Symbols

N Normal distribution

kobs Observed exchange rate constant

kint Intrinsic exchange rate consant

fprot Fraction protection

Abbreviations

CD Circular dichroism

NMR Nuclear Nuclear magnetic resonance

CLEANEX CLEAN Chemical EXchange Spectroscopy

SOLEXSY SOLvent EXchange SpectroscopY

LRH1x Extended helix 1 peptide of λ-repressor N-terminal domain

CS Chemical shift

CSI Chemical shift index

BE Bayesian-Englander

IUP Intrinsically unfolded protein

˚ λ1´85 A methionine-oxidized, monomeric λ-repressor N-terminal do- main variant that predominately samples the unfolded state un- der physiological conditions

xii Acknowledgements

I would like to thank Dr. Terrence Oas for providing the research opportunity. I would like to thank all the current and past members of the Oas lab that I’ve worked with, including Andrew Hagarman, Billy Franch, Roy Hughes, Kyle Daniels, Pei- Fen Liu, Yang Qi, Pamela Mosley, Rachel Kimbrough, Jo Anna Capp and Lindsay Deis. I would also like to thank other past members of the Oas lab, including Preeti Chugha, Jeffrey Myers and Randall Burton. I would like to thank my graduate committee for providing me with guidance and help: Dr. David Richardson, Dr. Pei Zhou, Dr. Patrick Charbonneau and Dr. Scott Schmidler. I would also like to thank the NMR center staff for their help during experiments and troubleshooting: Dr. Leonard Spicer, Dr. Ronald Venters, Dr. Anthony Ribeiro, Donald Mika and Dr. Benjamin Bobay. I would like to thank Qinglin Wu for helping me set up NMR sparse sampling scheme, and Joseph Marion for helping me with Bayesian data analysis.

xiii 1

Introduction

1.1 The importance of studying unfolded state

1.1.1 Two-state folding and conformational heterogeneity

In a protein, the primary amino acid sequence dictates the formation of its native, functional conformation. For a large number of small globular proteins, the native state can be represented by a well-folded three-dimensional structure. For many small globular proteins, the folding reaction is a fast process and happens on the time-scale of milliseconds to seconds. When removed from denaturing conditions, proteins with stable native structures spontaneously refold into their native states. How the primary sequence determines the native conformation and promotes rapid folding is the protein folding problem. Simply speaking, assuming a two-state reaction, the protein folding reaction can be written as UéN, with U and N being the unfolded state and the native state. The relative Gibbs free energy difference (∆G) between the reactant and the product determines whether the reaction is spontaneous or non-spontaneous. It also deter- mines the population of each state under equilibrium. This free energy difference is relatively small, considering the free energy contribution of a single hydrogen bond

1 can be on the similar level. ∆G also varies among different proteins. Indeed, the native states of some proteins are not the folded state. ∆G of folding can be strate- gically changed by mutations or modifications of important residues or changing the environment protein molecules are in, including temperature, pH, ion strength, os- molyte concentrations and binding partners. With the appropriate modifications or conditions, the ∆G between the two states might change and the unfolded state can become the dominating species. Structurally, a protein molecule can sample a large number of sterically allowed conformations. Both the unfolded state and folded state contains ensembles of struc- tures, with members in an ensemble having similar thermostabilities. Interconversion of conformations within an ensemble is on a much faster time scale than converting conformations between ensembles. However, different ensembles can have significant differences in their thermostabilities. Therefore, not all conformations will be sam- pled with similar probabilities. Conformations with favorable interactions, therefore, lower free energies, will be sampled more frequently. In solution, the conformations are in a thermodynamic equilibrium. When one or a few of the conformations dom- inate, there is still a small population of the other conformations, albeit the pop- ulation might be negligible. With modification of solution conditions, the relative populations might also change. Conformational heterogeneity of proteins can even be observed in seemingly static crystal structures (DePristo et al., 2004; Smith et al., 1986). Sidechains, in a collec- tion of crystallized protein molecules, can sample multiple conformations which will be reflected in the electron density map. A residue sidechain may occupy alternate locations, as can be observed in some crystal structures (Arpino et al., 2012; Ca- ballero et al., 2016). For flexible sidechains, the conformation(s) is often difficult to be clearly identified from electron density using conventional crystallography mod- eling (Miao and Cao, 2016). Currently, because of the limited resolution for many

2 electron density maps, isotropic conformation modeling is still employed when deter- mining the majority of crystal structures. As a consequence, anisotropic information might have been artificially eliminated during modeling even when it exists on the electron density level (DePristo et al., 2004) . Careful modeling based on electron density map is important to experimentally determine conformational heterogeneity. The conformational heterogeneity of proteins has functional importance. For ex- ample, proteins like need to be flexible and sample different conformations in order to recognize different binding partners. Complexes of ubiquitin and various binding partners can cause ubiquitin to adopt different conformations. It has been shown that conformational heterogeneity in 46 ubiquitin crystal structures can be explained by a single RDC-based ubiquitin conformational ensemble (Husnjak and Dikic, 2012; Lange et al., 2008). Conformational heterogeneity was also shown to aid DNA recognition (Adhikary et al., 2017) and regulate apoptosis (Kao et al., 2017). Conformational heterogeneity does not necessarily mean the proteins will not be dominated by a small number of confirmations under a certain physiological condi- tion. In fact, for a large number of proteins, the folding process is typically highly cooperative (Jackson, 1998; Zwanzig, 1997). Conformations in the transition state for these proteins are usually highly unstable. Therefore, in a given condition, a single protein molecule will mainly sample either the folded state or the unfolded state. The ensembles will also contain mainly members of the ground states and no significant amount of the transition state. Of course, two-state cooperative folding is the typical depiction of folding mech- anism when experimental evidence does not indicate more than two states. The cooperativity of folding is reduced when the protein deviates from two-state fold- ing. It was shown that the cooperativity of folding can be tuned by mutations and experimental conditions (Malhotra and Udgaonkar, 2016; Hao and Scheraga, 1998).

3 1.1.2 Theories of protein folding

The path a protein molecule uses to reach from the unfolded state to the folded state is referred to as the folding mechanism. Because conformations with favorable inter- actions are sampled more frequently, proteins are able to choose the most efficient routes for folding. If the energy landscape is to be described as a rugged funnel, with the depths of positions on the funnel correlated with free energy, then conformational ensembles can “reside” on the funnel according to their free energy levels. Any path from conformational ensembles at the top of the funnel to folded state ensembles at the bottom of the funnel can be a potential folding pathway. A protein uses the most efficient pathway as its dominant folding mechanism. A large body of work has been done to determine what type of interactions is the most important to form in the early stage of a folding pathway (Dill et al., 2008b; Englander and Mayne, 2014; Matthews, 1993; Dill et al., 2008a; Ahluwalia et al., 2013; Karplus and Weaver, 1979, 1994). Several folding mechanisms have been proposed. Based on the argument that hydrophobic residues prefer to be excluded from polar water, the hydrophobic collapse mechanism was proposed, stating that a protein will initiate structural stabilization by a collapse to form a hydrophobic interior. In other words, water is a “poor” solvent for proteins. A second mechanism still emphasizes the importance of hydrophobic residues. However, only a few key hydrophobic residues, in this case, are responsible for the initial nucleation, which leads to formation of all remaining interactions. The mechanism hence has the name nucleation condensation. A third mechanism addresses the important role of secondary structure units. With the name diffusion-collision, it assumes that local structural units form first, followed by random collision of these units until the protein finds more stable tertiary topology. While a number of small protein molecules fold by a two-state mechanism, protein

4 folding is not necessarily a two state process. There could be stable intermediates bridging the folding reaction. In fact, folding through diffusion-collision and folding through hydrophobic collapse can both involve folding intermediates. Whether or not the folding process for a protein molecule is two state can be assessed using several approaches. One indicator is the m-value of folding, which is correlated with the speed of solvent exposure change when the protein is going through thermal or chemical denaturation. A low m-value extracted from fitting the experimental data to a two-state folding model could indicate that the nature of the folding process might deviate from two-sate (Soulages, 1998; Spudich and Marqusee, 2000). Another indicator is whether the experimental results of the same unfolding process using different probes would reconcile. The lack of agreement between probes could also mean the deviation from two-state folding (Tsytlonok and Itzhaki, 2013). Non-two-state folding proteins experience intermediate states when folding. One type of conformations observed during intermediate states, molten globule, has at- tracted the attention of researchers (Christensen and Pain, 1991; Fink, 2001). This type of conformations seems to retain most of the restrictions on secondary struc- tures but lack close packing for tertiary structures. The so-called “molten-globule states”, which are usually detected experimentally under mild denaturing conditions, are thermodynamically stable with loosely packed globular conformations. Depend- ing on whether water has entered the interior of the structure, the molten globule can further be separated into dry molten globule and wet molten globule. During unfolding the native state can first transform into a dry molten globule and then a wet molten globule (Mishra and Jha, 2017; Sarkar et al., 2013). With increasing amount of study on small two-state folders, another folding mech- anism called “downhill folding” was proposed. In essence, selective mutations can be engineered within some two-state folding proteins to speed up the folding kinet- ics to a point that the folding energy barrier will disappear. This type of folding

5 process without the need to climb over a kinetic barrier is called “downhill folding” (Kubelka et al., 2004). Reaching downhill folding has been described as pushing the folding “speed limit”. Some natural proteins are thought to fold downhill (Bryngel- son et al., 1995). However, in most cases, this type of arbitrarily achieved folding kinetics might not be relevant to the folding mechanisms of naturally occurring amino acid sequences. Folding cooperativity is a crucial factor correlated with the nature of folding process and ensembles under equilibrium. Two-state folding proteins usually have high folding cooperativity. In order to fold, the protein molecule has to cross a fairly high free energy barrier and land in a relatively sharp energy minimum. The structural ensembles of the folded state are also not easily affected by small changes in solution conditions. In contrast, structural elements formed before the protein crosses the main energy barrier only require crossing low free energy barrier and experience a broad landscape. One such process is the helix-coil transition. A helix- coil transition is the process by which residues in a peptide chain to initiate helical formation by forming the first backbone helical H-bond and then elongate the helical stretch. The cooperativity of helix-coil transition is limited and ensembles under equilibrium can be easily affected by changing solution conditions. Originally, the unfolded state of proteins was been assumed to have no significant structural preference. Efforts to prove or disprove the proposed folding mechanisms relied heavily on assessing the folding intermediates and transition states. For a num- ber of two-state folders, Φ-analysis has been carried out to determine which residue(s) affect the folding transition states the most, therefore describing conformations of the transition states (Fersht and Sato, 2004; Ternstrom et al., 1999; Baldwin and Rose, 1999) . For instance, the nucleation condensation mechanism has been supported by Φ-analysis on -inhibitor 2 (CI2) (Nolting, 1999). However, diffusion collision mechanism seems to be also supported by other Φ-analysis results (Ozkan

6 et al., 2001; Mayor et al., 2003). It was also pointed out that Φ-analysis alone, being a method targeted on the middle step(s) of folding, is not sufficient to describe the folding pathway when used alone.

1.1.3 The unfolded state is not necessarily a random coil

For a protein molecule to fold correctly, an intricate network of interactions among residues has to be formed. Because the amino acid sequence should dictate the folded structure, it is conceivable that the primary sequence would prefer the protein to have metastable interactions even in the unfolded state. The unfolded state is disordered, but it is not necessarily random coil like. Advancement in experimental techniques have enabled a detailed structural de- scription on the residue level. More and more evidence supports the existence of residual structure in the unfolded state. However, based on the measurements of several proteins in the unfolded state (Lietzow et al., 2002) , the radius of gyrations of these proteins match the predictions based on polymer physics random coil model (Kohn et al., 2004) . This contradiction of results has been called the reconcilia- tion problem. Fitzkee & Rose offered an explanation of the seemingly contradictory results (Fitzkee et al., 2004) . By running Monte Carlo simulations on proteins as- suming only 8% of the each molecule is flexible, the end-to-end distances and radius of gyration values generated from the simulated results surprisingly match random coil values. In another study, the results indicated that the ∆Cp of a compact unfolded state protein might not be distinguishable from the value expected of a random coil like unfolded state (Shimizu and Chan, 2002). Therefore, observations of unfolded states that match the random coil expectation does not exclude the formation of residual structures. Structural preferences in the unfolded state are a direct manifestation of con- formational heterogeneity, and provide a useful insight on how proteins initiate the

7 folding process. Understanding the types of secondary and tertiary residual struc- tures will help determine the role of residual structure in the early stages of protein folding.

1.1.4 IUP: the unfolded state can be functional

The existence of intrinsically unfolded proteins (IUPs) is the most direct evidence of conformational significance of unfolded state (Tompa, 2011). Proteins such as P53 and α-synuclein, as opposed to the classical examples of proteins, mainly sample the unfolded state under physiological conditions without a binding partner (Bussell and Eliezer, 2001; Alderson and Markley, 2013; Ullman et al., 2011). Transient long-range structural preference has been observed in IUPs, as in the example of β-synuclein (Allison et al., 2014). This conformational heterogeneity is the reason some IUPs can adapt to different binding partners with different populated conformations, as in the example of P53 (Kannan et al., 2016). The traditional view is that a protein forms “a structure”. This structure repre- sents the combination of interactions and topology that renders the protein to be the most stable under physiological conditions. But some proteins cannot be depicted by a single structure. Instead, these proteins sample predominately the unfolded state under physiological conditions. The “native state” for these proteins is the unfolded state. IUPs can associate and disassociate with a target rapidly, and have exposed motifs ready to interact with targets (Babu et al., 2011). An online depository Dis- Prot is dedicated to curate information for IUPs and intrinsically disordered regions. To date, over 800 IUPs have been deposited. (Sickmeier et al., 2007; Piovesan et al., 2017). When discussing folded proteins, the functionality of the protein is usually at- tributed to unique structural elements, such as an active site or a structural motif. This is usually referred to as the “structure-function paradigm”. This paradigm was

8 revisited in recent years following the discovery of functional IUPs (Dunker et al., 2008; Tompa, 2012). A number of IUPs have been found to participate in signaling and regulatory pathways (Wright and Dyson, 2015). One example for functional IUPs is the CREB-binding protein and its paralogue P300, which contains intrinsi- cally disordered regions and can interact with more than 400 partners (Dyson and Wright, 2016). There are numerous reasons why IUPs have not attracted attention until recently. Conventional protein expression and purification procedures are designed to produce proteins with folded structures. Proteins that are intrinsically unfolded are more susceptible to degradation than proteins that are predominately folded in physiolog- ical conditions (Dyson and Wright, 2005). Intrinsically disordered regions are also often “invisible” in X-ray crystallography structures. Additionally, IUPs tend to be “low complexity” in terms of amino acid composition, and are avoided in homology searches (Nishikawa, 2009).

1.1.5 Residue-specific helicity of protein unfolded state

In the unfolded state, proteins sample a large number of conformations. The ex- perimental observations usually are population-averaged signals. For residue-specific observations of protein backbone, each observation is contributed by several different backbone conformations. Residual helical structure is a local structural preference populated through local hydrogen bonds. It is one of the main contributors in ob- served signals. In order to understand the extend of residual helicity in the unfolded state, it is important to accurately estimate the contributions of helical state in the observed signals on a residue-to-residue basis. Many studies have demonstrated the existence of residual helicity, with some of these studies to be discussed in the next section. However, the existing studies either showed residual helicity on the chain level, or fell short of an accurate estimate of

9 residue-specific helical content. Most residue-specific studies showed experimental evidence related to residual helicity without giving an estimate of the helical popu- lation (Bruun et al., 2010; Eliezer et al., 1998). For the rare studies that did try to estimate residue-specific helicity values, they did so using popular methods without scrutinizing the results for their own specific sequences (Rosner and Poulsen, 2010; Pashley et al., 2012). Therefore, one of the contributions of the projects in this dis- sertation is to provide methods to accurately estimate residue-specific helical content for individual residues. The methods described in the following chapters estimate residue-specific helical content by fitting values from experimental observations and choosing or developing algorithms specifically for unfolded state proteins. These methods have provided residue-specific helicitiy estimations whose accuracy surpasses those in the existing literature.

1.2 Evidence of residual structure in the unfolded state

With the ability to synthesize short polymers of amino acids, spectroscopic evidence was collected to describe the structural preference of individual amino acids, artificial sequences and protein fragments. It is evident from experimental data that amino acids already show significant structural preferences in peptides as short as two residues. Studies on dipeptides and tripeptides revealed that there is not a significant population of α-helical con- formation for almost all these peptides (Hagarman et al., 2010) . Instead, the major populations of backbone conformation have been distributed among β-strand and polyproline II, a form of left-handed helix with no internal helical hydrogen-bonds (Adzhubei et al., 2013) . The distribution of conformations are also influenced by the bulkiness of the sidechains (Bywater and Veryazov, 2015). Therefore, the Ra- machandran plots of amino acids of these extremely short peptides are significantly

10 different from those generated by protein library statistics. When the peptide chain becomes longer than four residues, the peptides start to be able to form helical hydrogen-bonds and other local interactions. As the most abundant type of secondary structure in documented protein structures, the helical conformation is assumed to form in the early stage of protein folding in diffusion- collision theory, and was indeed observed in small peptides, protein fragments and unfolded proteins (Wang and Shortle, 1997; Padmanabhan et al., 1996; Dyson et al., 1992; Cao et al., 2004; Zagrovic et al., 2002; Sommese et al., 2010). Among the three types of helical secondary structure, namely 3/10 helix, α-helix and π-helix, α-helix is the most prevalent due to the tightness of the helix packing and the arrangement of hydrogen-bonding pattern. An example of highly helical protein fragments is C-peptide and S-peptide. C- peptide and S-peptide correspond to residues 1-13 and 1-20 of RNase A. In the protein structure, an α-helical structure unit is formed by residue 3-13. At low temperature,

C-peptide shows „30% residual helicity in neutral pH, higher than predicted value (Shoemaker et al., 1987). This amount of significant helicity is partially explained by stabilizing electrostatic interactions (Kim and Baldwin, 1984). Substituting with other amino acids also decreases the amount of helicity (Strehlow and Baldwin, 1989). Studies of C-peptide and S-peptide in the 1980s (Strehlow and Baldwin, 1989; Shoemaker et al., 1987; Kim and Baldwin, 1984) and later on the study of Myohe- merythrin peptide fragments (Dyson et al., 1992) showed that, without the rest of the protein chain, some protein fragments can still show a relatively high degree of helicity under physiological conditions. In the case of Myohemerythrin, all peptide fragments corresponding to the four helices in the folded structure show significant amount of nascent helicity. α-helical type backbone conformations are also sampled by loop regions. Similar results have been obtained for the peptide corresponding to

11 the first helix of λ-repressor (Marqusee and Sauer, 1994). Alongside studies of protein fragments, synthesis of artificial host-guest sequences enables studies to illustrate the contributions of each amino acid type in stabiliz- ing helical structure (Padmanabhan et al., 1994; Maison et al., 2001; Yang et al., 1998; Scholtz and Baldwin, 1992). A large body of work has been done by com- bining experimental observations and helix-coil transition theory to parametrize the propensity of amino acid types to form helical conformation with or without the help of sidechain interactions(Lifson and Roig, 1961; Chakrabartty et al., 1994; Munoz and Serrano, 1994; McCammon et al., 1980; Scholtz and Baldwin, 1992). The most helix-promoting amino acid is found to be , hence a number of studies used Alanine as the host residue (Scholtz et al., 1991b) or one of the co-polymer residues to increase helicity. In folded state protein structures, helical residues do not form a long helix along the entire sequence. One reason is the need to form a tertiary structure. The tertiary fold allows long-range interactions and the exclusion of hydrophobic groups from bulk solvent. A second reason is that not all residues along the chain are capable of extending a helical stretch. Especially amino acids such as Glycine and Proline can terminate a helical stretch. The third reason is the existence of so called “helix stop signals”. These are specific interactions that define the boundaries of helices in the folded state (Doig and Baldwin, 1995; Aurora and Rose, 1998). One classical example is the capping box, (Harper and Rose, 1993; Jimenez et al., 1994) which has the sequence Ser/Thr-X-X-Glu/Gln. Another form of local residual structure is local hydrophobic collapse. This type of interaction refers to protein burying hydrophobic residues to avoid unfavorable contacts with solvent molecules. The result of the collapse is a loosely globular structure without distinctive favorable interactions or local residual structures. Cur- rently, it is still challenging to experimentally detect hydrophobic collapse. Several

12 studies have detected hydrophobic collapse early on in the folding process (Mok et al., 2007; Lapidus et al., 2007). Longer peptides allow the formation of long-range tertiary interactions. Long- range van der waals interactions of hydrophobic residues and long-range electrostatic interactions are essential for a protein molecule to fold in the correct topology. In the unfolded state, long-range interactions might not completely disappear in a pro- tein molecule (Meng et al., 2013), albeit the interactions might not resemble those observed in the native state (Thukral et al., 2015). Long range hydrogen bonding interactions include β-sheet formation. However, the β-sheet structure has been identified as a core element of the cross-β structure in amyloid formation (Uversky, 2008; Rambaran and Serpell, 2008). Judging from this observation, β-sheet formation may contribute more toward protein misfolding. Despite the fact that β-sheet and α-helix are the two dominant types of secondary structure in the folded state, α-helix is easier to form in the unfolded state because the interactions do not require tertiary proximity. With the advancement of NMR techniques, especially nuclear Overhauser en- hancement (NOE) and paramagnetic relaxation enhancement (PRE), there is accu- mulating residue-specific level evidence of long-range contacts in the unfolded state (Yi et al., 2000; Gillespie and Shortle, 1997a; Salmon et al., 2010). For example, with the help of PRE experiments, long range contacts between the N-terminus and the C-terminus were shown to exist in acid-unfolded apomyoglobin (Lietzow et al., 2002). For full-length protein molecules, hydrophobic residues are likely to be excluded from the solvent. There is a large number of studies that focus on characterizing residual hydrophobic interactions in the unfolded state (Crowhurst and Forman- Kay, 2003; Stumpe and Grubmuller, 2009). Most notably, studies by David Shortle and coworkers on ∆131∆, an unfolded variant of staphylococcal nuclease, show the

13 non-random topology and local structural preferences detected by NOE, RDC and

R2 measurements (Francis et al., 2006; Gillespie and Shortle, 1997a,b; Zhang et al., 1997; Ohnishi and Shortle, 2003). The Shortle group claimed an overall topological similarity between the native state and the unfolded state of ∆131∆ as well as Eglin C(Shortle and Ackerman, 2001; Ohnishi et al., 2004), although this claim is worth further investigation.

1.3 λ repressor protein

λ-repressor is a phage-encoded DNA binding protein. The encoding gene for λ- repressor belongs to bacteriophage λ genes (Ptashne, 2011), which are integrated into the chromosome of the infected bacteria after gene injection. λ-repressor binds to phage DNA operators, suppresses the expression of lytic phage genes and also enhances the transcription of its own gene. Upon external stimulations such as UV light, λ-repressor can be inactivated, which initiates the lytic cycle and the escape of mature λ phage (Sauer et al., 1990) . There are two operators for λ-repressor to

bind, OL and OR, λ-repressor is only functional as a dimer. The intact λ-repressor protein consists of an N-terminal domain, a C-terminal domain and a linker region, shaped like a dumbbell (Stayrook et al., 2008). The C-terminal domain is consisted of β-sheet structure, and is responsible for dimer interface formation and inactivation upon stimulations. Inactivation is initiated by the self-cleavage of λ-repressor mediated by the C-terminal domain, resulting in the separation of itself and the N-terminal domain. Structurally, in contrast to the C- terminal domain, the N-terminal domain is a five-helix bundle (Beamer and Pabo, 1992). The second and third helices form a helix-turn-helix motif that is responsible for majority of the protein-DNA interactions. Several N-terminus residues also form contacts with the DNA substrate. The fifth helix forms the dimer interface of the N-terminal domain (Sauer et al., 1990). Without the help of the C-terminal domain,

14 N-terminal domain, which corresponds to residues 1-92 of the intact protein, can still form dimer-DNA complex. The dimer-interface can be abolished by excluding 7 residues from the N-terminus, modifying the length of the fifth helix. Several important residues were identified during early studies of the protein. The work of Lim and Sauer (Lim and Sauer, 1989, 1991) pointed out the importance of 7 hydrophobic residues (Leu18, Val36, Met40. Val47, Phe51, Leu57 and Leu65) by mutation studies. Its important that these core residues stay hydrophobic and the sidechain geometries be compatible to stabilize the core. Later on, two important pairs of sidechain-sidechain hydrogen bonds (Asp14 - Arg17 and Asp 14 - Ser77) that stabilize the protein were discovered (Marqusee and Sauer, 1994). The monomeric version of λ-repressor N-terminal domain (residues 1-85) has long been used as a model system to understand protein folding mechanisms. By truncating the five N-terminus residues that are disordered in the original structure,

λ6-85 was constructed and is structurally equivalent to full length N-terminal domain.

Using CD and NMR data, it was shown that λ6-85 is a two state folder with no populated intermediates (Huang and Oas, 1995b) . With stabilizing mutations, it has been stated that the stabilizing λ6-85 mutants can follow downhill folding by design (Liu and Gruebele, 2007). The folding rate of λ6-85 is in the submillisecond range (3600˘400/s) (Huang and Oas, 1995a).

The folding mechanism of λ6-85 was explored experimentally by several studies. When two Glycine residues on the third helix (Gly46 and Gly48) were mutated to

Alanines, the folding rate increased about 12.5 fold compared to wild type λ6-85, and the solvent accessibility of the double mutant was also significantly decreased (Burton et al., 1996). The study found helix 3 to be important in the folding process of λ6-85. The proposed folding mechanisms of both the wildtype and the G46A/G48A mutant were depicted as diffusion-collision models (Burton et al., 1998). The two mutated residues stabilize helix 3, therefore significantly increase rate constants for effective

15 ⓹ ⓵

⓸ ⓶ ⓷

Figure 1.1: Structure of λ-repressor N-terminal domain

The ribbon representation of structure is based on the crystal structure with PDB ID 1LMB. The helices are numbered. collisions involving helices 3 in the early folding steps, which in turn increases the overall folding rate constant. It was also shown in a following study that the sidechain hydrogen bonding pair Asp14 - Ser77 should form after the formation of helix 3 in the wildtype λ6-85 (Myers and Oas, 1999).

The Gruebele lab introduced additional mutations into wildtype λ6-85 and the G46A/G48A mutant, in an attempt to increase the folding rate constants and even change the folding mechanism (Kim et al., 2009; Liu and Gruebele, 2007; Liu et al., 2010). The mutations Q33Y or D14A in addition to G46A/G48A made the protein fold even faster. These studies imply suggest that these mutants fold in a downhill fashion instead of being two-state folders (Yang and Gruebele, 2004). By contrast, the Oas group introduced mutations and modifications to destabi- lize λ6-85 in attempt to populate protein unfolded state under physiological condi- tions (Chugha et al., 2006). The S77A substitution abolishes the sidechain hydrogen bond and destabilizes the protein. The substitution pair I54K/A56K was also intro-

16 duced to solubilize and further destabilize the protein. Re-introducing the original 5 residues at the N-terminus of the intact λ repressor protein also helped solubilize the protein. Most importantly, after hydrogen peroxide treatment, two methionine residues in the protein Met40 and Met42, with one of the two being an important hydrophobic core residue, were oxidized to methionine sulfoxide. This oxidation destabilizes the protein by approximately 6 kcal/mol. This collection of modifica-

tions created a version of the protein called MetO-λLS. And MetO-λLS populates less than 1% of the folded state at 25 ˝C.

Helix 1 λ6-85 is very likely to form at the beginning of the folding process. Mar- qusee and Sauer observed that the first helix remains highly helical as a protein frag- ment (Marqusee and Sauer, 1994). According to the prediction of the AGADIR in- trinsic helicity prediction program, helix 1 has the highest intrinsic helicity among the

5 helices. In a 5 µs folding MD simulation of the D14A/Y22W/Q33Y/G46A/G48A mutant, helix 1 and 4 have a much higher possibility to resemble native folded struc- ture than other helices (Larios et al., 2006). A second simulation study of the same mutant also suggests helix 1 and helix 4 formation to occur early in the folding path- way (Prigozhin et al., 2011). In an NMR backbone dynamics study of MetO-λLS (Chugha and Oas, 2007), helix 1 was found to show significant residual helicity. He- lix 4 also showed residual compactness, which was interpreted to be a hydrophobic cluster. The projects of this dissertation aim to quantitatively describe the residual helical content of λ-repressor N-terminal domain in the unfolded state. MetO-λLS will be

˚ called λ1´85 instead in Chapter 4. The theories and techniques used in the analysis and their current developments are discussed in the section below.

17 1.4 Advancement in theories and experimental techniques to describe nascent helicity

1.4.1 Helix-coil transition theory

Helix-coil transition theory uses a one-dimensional statistical mechanics model equiv- alent to the Ising model. The Ising model was originally developed to describe co- operative magnetic alignment. It was invented originally by Wilhelm Lenz in 1920. In the original model, each magnet has two possible states: spin up or spin down. Here the model is instead adapted to describe the nearest-neighbor effect on residue conformational transition from the coil to the helix state (Vitalis and Caflisch, 2012). Therefore each residue also has two simplified states to sample from. In terms of dihedral angles of the backbone, a residue is defined to be in the coil state as long as it is not in a helical conformation. Note that because it is an abstract model, the dihedral angle boundaries for the two states are not precisely-defined. There are both entropic and enthalpic contributions to helix formation. When one of the residues initiates the transition from the coil to the helix state, it is ener- getically unfavorable because it decreases the entropy of the peptide chain, because each residue needs to restrict two backbone dihedral angles. This step is called the nucleation of a helical stretch. Altogether six backbone dihedral angles need to be restricted for an α-helical hydrogen bond to form. As the helical stretch becomes longer, namely in the elongation steps, the formation of helical hydrogen-bonds makes the coil to helical transition more favorable because of the enthalpic contributions. Each elongation step also only requires the restriction of two backbone dihedral an- gles. Therefore, it is more likely for a peptide chain to have one long helical region than to have several short ones. Protein molecules may have more types of favorable interactions interrupting helical hydrogen bonding patterns. One might, therefore, expect one or several regions of a protein molecule in the unfolded state to be sig-

18 nificantly helical. “Helix-breaking” residues in the sequence, such as Glycine and Proline, will also interrupt helix propagation and segregate the helical stretches. In terms of mathematical formulation, a relatively simple model is the Zimm- Bragg model (Zimm and Bragg, 1959) . For a homopolymer, only two parameters are required to calculate the partition function: nucleation parameter σ and propagation parameter s. For coil state residue, a reference weight, 1, is assigned; for helix state residue preceded by a coil state residue, a weight of σs is assigned; for helix state residue preceded by a helix state residue, a weight of s is assigned. These weights are directly related with helix transition probabilities. All the possible outcomes and

associated weights for each residue can be expressed as a 2ˆ2 matrix:

CH

C 1 s (1.1) ¨ ˛ H 1 σs ˝ ‚ Using recursive matrix multiplication, one can then calculate the partition func- tion and estimate the helicity for each residue. For a homopolymer model, the middle residue(s) would have the biggest helicity value. Helicities gradually decrease toward either end of the peptide. The equation for calculating residue helicity is:

1 s 0 s 1 s 1 1 0 ¨ j´1 ¨ ¨ N ¨ i“1 1 σs 0 σs i“j`1 1 σs 1 f j “ „  „  „  „  (1.2) hel “ ‰ ś 1 sś 1 1 0 ¨ N ¨ i“1 1 σs 1 „  „  “ ‰ ś where N is the number of residues and j is the residue for which the helicity is being calculated. Theoretically, when the first and last residue in a sequence are not capped, mean- ing the amino group and the carboxyl group are freely exposed, they are not allowed

19 to be in helical state in this model. If either end is “capped” with additional chemical groups, usually n-acetyl group at the N-terminus or an amide group at the C-terminus (Doig et al., 1994), the terminus residue of that end can sample the helical state. For heteropolymers, each type of amino acid has its own σ and s values. In practice, σ for all sequences is usually assumed to be approximately 0.05. The Lifson-Roig model (Lifson and Roig, 1961) is similar to the Zimm-Bragg model, but with a different weight assignment. The model considers not only the influence of the preceding residue, but also that of the nearest following residue.

Therefore, instead of a 2ˆ2 matrix, the model uses a 4ˆ4 weight matrix to describe all the possible outcomes of a three-residue window and associated weights of the middle residue:

hh hc ch cc

hh w v 0 0 ¨ ˛ hc 0 0 1 1 ˚ ‹ (1.3) ˚ ‹ ch ˚ v v 0 0 ‹ ˚ ‹ ˚ ‹ cc ˚ 0 0 1 1 ‹ ˚ ‹ ˝ ‚ In this formalism, the row labels refer to the four combinations of helix and coil states for the 1st and 2nd residues for 3 consecutive residues and the column labels refer to the 2nd and 3rd residues; w is the statistical weight for helix propagation; v is the weight for helix nucleation; and any combination with the 2nd residue in the coil state has a weight of 1. The method to calculate helicity for each residue is similar to equation 1.2 in Zimm-Bragg formalism. This formalism can more accurately estimate residue specific helicity, especially for heteropolymers. The Lifson-Roig formalism can be used to predict both residue-specific helicity

20 1.0 1.0 0.8 0.8 0.6 0.6

Helicity ● ● ● ● ● ● ● ●

0.4 ● ● 0.4 ● ● ● ● ● ● Protection Fraction ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● 0.2 0.2 ● ● ● ●

● ● ● ● 0.0 0.0

5 10 15 20 5 10 15 20

Residue Number Residue Number

Figure 1.2: An example of a helix-coil calculation

The first and last residues in this hypothetical peptide are assumed to be able to sample the helical state. and the population of each backbone amide proton being hydrogen bonded (fraction protection). As a demonstration, for a hypothetical 20-residue homopolymer with w “ 1.2 and v “ 0.05, the residue-specific helicity and fraction protection values calculated using the helix-coil model is shown in Figure 1.2. Several types of modification can be made to the weight matrix. The first and last helical state residue in a helical stretch are the “capping” residues and may be assigned unique weights. Here, the “capping” residue refers to the helix initiation residue (Doig et al., 1994; Schmidler et al., 2007). Therefore, the initiation weight can have more than one value. To include the effect of stabilizing sidechain interactions on helicity the matrix can be expanded to describe more situations. This modification concerns the propagation weight w. Efforts to optimize helix-coil parameters are still on-going. Zimm-Bragg model was proposed in 1958 and Lifson-Roig model in 1961. The relationship between σ{s and v{w has been detailed later (Qian and Schellman, 1992). Then a set of elongation parameters (s and w) has been assigned to all amino acid types at 0 ˝C without

21 considering sidechain interactions (Chakrabartty et al., 1994). One of the conclusions from this set of values is that except Ala, Arg and Leu, all other amino acid types are helix breakers. Because of the simplified assumption when calculating w values, these elongation parameters cannot successfully describe all observed helicities. In the meantime, a third model, AGADIR, by Mu˜nozand Serrano, was devel- oped to predict chain helicity and residue-specific helicities of any given sequence (Munoz and Serrano, 1994, 1995a,b). The algorithm separated the coil to helix tran- sition energy into four components: the residues’ intrinsic tendency to adopt helical φ/ψ angles, the contribution from helical H-bonds, the contribution from sidechain- sidechain interactions and the contribution from non-helical residues. The calcu- lation of residue-specific helicity does not involve matrix calculation. The authors concluded that given equivalent parameters, the AGADIR model and the Lifson-Roig model produce identical results. Later on, to further optimize helix-coil model implementation, a study used Bayesian statistics to incorporate a large number of experimentally observed data in order to simultaneously analyze all available data in a data set and better optimize the parameter values of an improved version of Lifson-Roig model (Schmidler et al.,

2007). In this improved version, the weight matrix was extended to a 16ˆ16 matrix to include additional parameters beyond the basic Lifson-Roig formalism to model interactions including ionic interactions between sidechains, helix capping and the influence of the first residue’s charge state on helix formation (Schmidler et al., 2007).

1.4.2 Circular Dichroism

Circular dichroism (CD) signals come from the absorption difference between left- handed circularly polarized light and right-handed circularly polarized light. Because natural amino acids except Glycine are in their L-form and that each type of sec- ondary structure has its unique pattern of backbone dihedral angles, CD spectra can

22 be used to probe the secondary structures in proteins and peptides. Far-UV CD (180 nm - 300 nm) spectra are usually used to provide secondary structure informa- tion. Each far-UV CD spectrum shows the population-averaged signals from protein backbone dihedral angles. Each secondary structure type has its own basis spectrum. Because of the preva- lence of α-helix, β-sheet and random coil structure units, the basis spectra for these three types of secondary structures are best characterized (Greenfield, 2006). The spectrum for α-helical structure has two minima at 222 nm and 208 nm and a max- imum near 190 nm. The spectrum for β-sheet structure has one minimum at „216 nm and a maximum near 190 nm. By contrast, the spectrum for random coil has a maximum near 215 nm and a minimum near 190 nm. For less common secondary structure types, such as β-turn, polyproline II and 3/10 helix, the basis spectra are less well characterized (Andersen et al., 1996; Adzhubei et al., 2013; Rucker and Creamer, 2002; Bush et al., 1978). Several methods for analyzing CD spectra for helicity content are commonly used in recent literature (Greenfield, 1996). The first one is singular value decomposition (SVD). The method was proposed by Hennessey and Johnson in the 1980s (Hen- nessey and Johnson, 1981). SVD extracts basis spectra from a series of measured spectra. The more modes (i.e., more minimum and maximum peaks) contained in the spectra, the more accurate SVD is able to calculate the correct basis spectra. SVD converts one spectra matrix into three matrices, where one of the matrices con- tains eigenvectors representing basis spectra, and a second diagonal matrix contains eigenvalues representing the weighted populations of the basis spectra. Although the basis spectra generated from a single SVD do not always equal basis spectra of individual secondary structure types. Another method to estimate helicity, which is less affected by buffer conditions, is to analyze single wavelength CD signal at 222 nm. When β-sheet structure is not

23 present, CD signal at this wavelength is almost exclusively contributed by helical content of the proteins. One needs to first convert raw CD signal to mean residue ellipticity for comparable analysis. One also needs to take into account that temper- ature does affect the values for 0% and 100% helicity (Luo and Baldwin, 1997). The relationship between helicity and CD signal is assumed to be linear. This method does not require more than one signal to be analyzed at a time. However, because other secondary structure types also contribute to the 222 nm signal, the calculated helicity is only accurate when only helical and coil structures exist. There are also methods that utilize CD spectra database of proteins with known structures, including K2D, CDSSTR, Contin, VARSLC and Selcon (Sreerama and Woody, 1993; Sreerama et al., 1999; Provencher and Glockner, 1981; van Stokkum et al., 1990; Compton and Johnson, 1986; Manavalan and Johnson, 1987; Andrade et al., 1993; Sreerama and Woody, 2000; Micsonai et al., 2015). Selcon is a modified version of VARSLC, which incorporated SVD. Each of these methods have their own strength. These methods can be used to analyze populations of all the secondary structures considered. For methods such as CDSSTR, Contin and Selcon, the sec- ondary structure types considered and the prediction outcomes can be influenced by the selection of reference/training spectra set from proteins with known struc- tures. For example, some protein structures contains more 3/10 helix residues and can therefore be chosen to be incorporated into the reference set and analyze CD spectra with significant 3/10 helix content.

1.4.3 CSI (chemical shift index)

It is well known that detail chemical environment of individual residues influences the chemical shifts of backbone atoms. Among all the factors, backbone dihedral angle is the most important determinant of the backbone chemical shift values. Therefore, backbone chemical shifts can be used to predict whether the corresponding residue

24 samples significant populations of α-helix or β-sheet conformation. When a residue in the coil state, the chemical shifts of its backbone atoms are called its random coil value. When the residue deviates from the random coil state, the change in chemical shifts relative to their random coil values are called the secondary chemical shifts (SCS). SCSs are frequently used to predict protein structure information. Before two-dimensional and multidimensional NMR techniques matured, in the early years, 1H chemical shifts have already been used for secondary structure predic- tion (Asakura et al., 1977, 1995). It has been observed that the 1Hα chemical shifts move upfield when forming helical conformation and downfield when forming β-sheet conformation. Similarly, it was later found that other backbone atoms also have their chemical shifts moving opposite directions with the formation of helical vs. β-sheet conformation (Avbelj et al., 2004b; Wishart et al., 1992). With increasing amount of NMR structures deposited in the BMRB database, chemical shift statistics were generated from the deposited values for 100% α-helix, 100% β-sheet and 100% ran- dom coil. Representative values of these statistics, e.g. the means of chemical shift subsets, were then used to calculate the secondary structure propensities of a residue giving one or more of its backbone chemical shifts (Labudde et al., 2003; Zhang et al., 2003; Cheung et al., 2010). These means are called the reference chemical shifts. Several methods have been developed for calculating secondary structure propen- sities. Their names include δ2D, SSP, PSSI, PsiCSI and Talos+ (Wang and Jardet- zky, 2002; Shen et al., 2009; Hung and Samudrala, 2003; Marsh et al., 2006; Camilloni et al., 2012) . Each has its uniqueness in terms of mathematical model. For exam- ple, Talos+ uses artificial neural network, SSP correct for inaccuracy in Cα and Cβ chemical shift referencing, PSSI and δ2D use Gaussian distribution to calculate indi- vidual propensity. Method SSP and δ2D are developed in the recent years targeted at calculating accurate propensities for unfolded state proteins and peptides. Albeit different mathematical formulations, the original assumption is a linear

25 relationship between secondary chemical shift and secondary structure propensity:

CSobs “ fhel ¨ CShel ` fsht ¨ CSsht ` p1 ´ fhel ´ fshtq ¨ CScol (1.4)

where CSobs is the observed chemical shift; CShel, CSsht and CScol are the reference chemical shifts for 100% α-helical state, 100% β-sheet state and 100% random coil state, respectively. Parameters fhel and fsht are fractions of population for a residue to sample respectively the α-helical state or the β-sheet state. In terms of mathematical modeling, weighting functions are usually incorporated in order to minimize the influence of less indicative nucleus types. For α-helix, the order of usefulness among six backbone nucleus types in propensity prediction has

13 13 1 13 15 1 been found to be Cα> C’> Hα> Cβ> N> HN . And that of β-sheet has found to

1 13 1 13 13 15 be Hα> Cβ> HN „ Cα „ C’„ N(Wang and Jardetzky, 2002). For predicting α-helix propensity: several factors contribute to the usefulness of different nuclei: 1) The separations between reference state shifts are of different magnitude, with

13 Cα usually having the biggest separation. Bigger separation will signify detailed changes in secondary shifts. 2) Some nucleus types are more affected by factors

15 1 other than dihedral angle change. For example, N and HN chemical shifts are strongly affected by hydrogen bonding (Hong et al., 2013; Cierpicki and Otlewski, 2001; Kjaergaard et al., 2011; Merutka et al., 1995), rendering them potentially less useful when being used alone. 3) Related to the previous factor, temperature also affects chemical shifts to different degrees. Therefore, different nucleus types show different apparent temperature coefficients. The correct values for reference chemical shifts are critical for calculating sec- ondary structure propensities. However, reference shifts determined by different studies do not provide a unified set of values. The methods in studies, including the ones mentioned above, include experimental observations, theoretical calcula-

26 tions (Weinstock et al., 2008) and statistical analysis of curated versions of BMRB database. The parameters being determined are the reference state values for differ- ent amino acid types and the change of these values when different types of neighbor- ing amino acids are present. At first glance, these values are comparable. However, because the small differences between random coil reference shifts and reference shifts of the ordered states, small changes to these values can cause significant differences in

13 the calculated propensity. The difference can be as large as „30%, taking Cα for ex- ample. In addition to differences in the calculation methods, possible reasons for the value discrepancy of neighboring independent references include: 1) zero ppm chemi- cal shift is inaccurately referenced; 2) differences in conditions used lead to difference in reference values; 3) in the database methods, neighboring effect and conditions included when the observations were made are skewing the reference shifts. The neighboring effect is even less well characterized than neighbor independent shifts. If a database is used, different neighboring situations are counted with dif- ferent number of instances. Neighboring effects with a small amount of instances in the database become poorly determined. If using small peptide measurement, the neighboring effects determined do not necessarily reflect the effects on residues accurately. Prediction of secondary structure propensities with the currently existing meth- ods is still less effective than desired. The majority of these methods use curated chemical shift database to calculate reference state shifts. However, these methods are based on different sets of reference chemical shifts depending on the available database size at the time of the method development and the specific criteria of fil- tering the database. This factor alone limits the prediction power of some methods, without considering the inherent deficiency in the methods themselves. For a protein in the folded state, the inaccuracy problem is less pronounced. Only a small frac- tion of residues in folded proteins are partially ordered. A qualitative interpretation

27 of the result is able distinguish between three different reference states. The result therefore is usually satisfactory for analyzing folded proteins. In comparison, the re- sults for disordered proteins and peptides need to be more quantitative to reflect the percentages of partial orderedness formed. A better algorithm to accurately assign the reference shifts thus needs to be developed.

1.4.4 Amide proton exchange

Amide proton exchange is the exchange between amide protons and the protons in bulk solution. The chemical reaction was initially introduced as a protein chem- istry tool by Linderstr∅m-Lang (Hvidt and Linderstr∅m-Lang, 1954). Traditionally, amide proton exchange has been used to distinguish residues that are protected from solvent versus solvent-exposed residues(Englander et al., 1997). Thus it is used as a probe of protein conformations. Experiments also use deuterium and sometimes tritium (Thevenon-Emeric et al., 1992) to create differences in the signal so that the exchange process can be detected. Ionized proton, hydroxide and non-ionized water all participate in the exchange process. Ionized proton and hydroxide mediates acid-catalyzed and base-catalyzed amide proton exchange, respectively. The exchange rate constants of these two mechanisms are, expectedly, strongly dependent on the pH of the aqueous solution. The intrinsic exchange rate constant can be expressed as the sum of rate constants from all three mechanisms. Consequently, acid-catalyzed reaction dominates the rate constant in acidic conditions and base-catalyzed reaction in basic conditions. The relationship between pH and exchange rate constant can be described as a (Bai et al., 1993) (Figure 1.3), where the junction of the two arms represents the minimal rate constant possible for the particular amide proton being described. The intrinsic rate constant due to the non-ionized water mediated reaction is negligible except when the overall constant approaches a minimum. The non-ionized water

28 Log(kex)(/min)

1.5

1.0

0.5

pH 2 3 4 5 6 -0.5

Figure 1.3: pH dependence of exchange rate constant

The exchange rate constant profile of a realistic, hypothetical residue mediated reaction usually contributes less than 25% of the overall rate constant at the pH of chevron plot minimum. The apparent exchange rate constant, mathematically, is the product of intrin- sic exchange rate constant and fraction protection of the amide proton. Intrinsic exchange rate constant is the rate constant when the amide proton is completely unprotected. Several factors affect the value of intrinsic exchange rate constant, in- cluding pH, amino acid types, amino acid types of the neighboring residues. Note that the unprotected amide protons still experience conformational preferences of the peptide chain. Fraction protection is essentially a description of the percentage of hydrogen-bonded species of the amide proton being measured. If the intrinsic ex- change rate constant is accurately known, one can calculate the fraction protection easily by measuring apparent exchange rate constant. Traditional H-D exchange detected by NMR monitors the disappearance of pro- ton peaks when protonated proteins are dissolved in deuterated water. The solution pH values are usually slightly acidic, rendering the intrinsic rate constants of most residues approach their minimum values. Due to the high concentration of deuterons provided of bulk solution, deuterons will gradually replace protons in the amide bonds by exchange. A solvent-exposed amide proton will have an exchange rate constant

29 orders of magnitude faster than that of a protected amide proton. Therefore, protons in solvent-exposed amide bonds will usually finish exchanging within minutes, some- times even within the dead time of the experiment. For a protected amide proton, however, the exchange process can take hours, days or even years. Therefore, one can easily isolate the labile amide protons by measuring their real-time peak intensity change. The distinction between protected and unprotected amides is usually more qualitative than quantitative. For proteins in the unfolded state and flexible peptides, one needs to apply other methods instead. The methods described above are not suitable because 1) amides in unfolded state proteins are rarely well protected; 2) the exchange process of most amide protons is shorter than the deadtime of the experiment. Two notable methods for measuring relatively fast exchange rate constants are CLEAN-EX and SOLEXSY (Hwang et al., 1997; Chevelkov et al., 2010). Both methods are 15N-HSQC based. Samples used in these methods are already in equilibrated conditions before the experiments start. And they are able to measure exchange rate constants near 1 per second. The mathematical equation to determine exchange rate constant from 2D-type ex- change NMR spectra, which is applicable to CLEAN-EX and SOLEXSY, was worked out initially by Jeener at el (Jeener et al., 1979). As protons in two chemical environ- ments are exchanged, four proton species will experience changing populations: the protons remaining in their starting chemical environments before exchange will have decreasing populations before reaching equilibration; the protons being exchanged to the new chemical environments will have increasing populations before reaching equilibration. Therefore, the change of longitudinal magnetization with time can be plotted as a decay curve or a build-up curve depending on the species. Parameters are shared in the equations describing the intensity changes of the two curves. If the magnetization changes can be monitored, they can then be analyzed to obtain

30 exchange rate constant. Usually the change of proton concentration in bulk solution is negligible, therefore leaving the only detectable magnetization change to be that of the amide attached protons. Both CLEAN-EX and SOLEXSY use one time point within the pulse sequence to be time zero of exchange. For CLEAN-EX, a pulsed field gradient is used to eliminate magnetization of all amide protons at time zero, so the initial “build-up“ of peak intensities are exclusively due to exchange. The observables of CLEAN- EX are peak intensities that can be turned into build-up curves of amide attached protons. SOLEXSY takes a slightly different approach, taking advantage of the lack of signal from deuterated amide bonds. H2O and D2O are mixed to form the aqueous environment of the sample solution, with the recommended final proton:deuteron ratio to be 1:1. In the chemically equilibrated sample, there will be roughly half of the amide bonds with a proton and the other half with a deuteron. At time zero, the chemical shifts of NH and ND are recorded. Due to the different chemical environments of two amide species, their chemical shifts are different. Then a certain mixing period allows the proton/deuteron exchange to happen. The species initially marked as NH will have decreasing population, whereas the species initially marked as ND will be exchanged to increasing population of NH. Therefore, with a series of spectra of different mixing periods, one can simultaneously obtain a decay curve and a build-up curve. The ability to obtain both curves in one experiment makes SOLEXSY able to determine exchange rate constant more accurately than CLEAN- EX. Fast amide proton exchange rate constants can be turned into fraction protection values of for poorly protected amide protons. Fraction protection is a measurement of the hydrogen-bonded population of an amide proton. For a nascently helical sequence, therefore, it is an indirect indication of the residual helicity. In recent years, it has become one of the few most used approaches to probe residual secondary

31 Normalized peak intensity 1.0

0.8

0.6

0.4

0.2

Time(s) 0.2 0.4 0.6 0.8 1.0 1.2 1.4

Figure 1.4: Example SOLEXSY intensity profile

The apparent exchange rate constant for this hypothetical residue is 5/s. structures (Buck et al., 1994; Nkari and Prestegard, 2009; Smith et al., 2013). One important factor in determining fraction protection values is the intrinsic exchange rate constants (kint) of residues in the chosen sequence. The kint values are especially important to know accurately for residues in the unfolded state proteins.

The most common approach to determine kint values currently is to predict the values using SPHERE, a web server used to calculate kint values when an amino acid sequence is provided. The server was built based on the work of the Englander group

(Bai et al., 1993). The related studies measured the kint value of the middle Alanine residues in an Alanine stretch and used the value as a reference value. Then the studies determined, using combinations of dipeptides, how the non-alanine residue is able to affect the minimum pH and the minimum value of the exchange rate constant chevron plot. Residue type is able to influence the chevron plot of the target residue as well as the residue on the left or on the right. And the neighbor effect is shown to be additive. Finally, considering only the influences of amino acid type coming from the residue being calculated and the nearest neighbors, the SPHERE model predicts the kint values of any sequence given the target pH and temperature. Currently, the accuracy of prediction provided by SPHERE has been shown to be less than desired. SOLEXSY measurements on drkN SH3 domain (Chevelkov

32 et al., 2010), showed several isolated amide protons with extremely unrealistically low fraction protection values. Therefore, an effort to develop an improved version of prediction program for kint values is currently needed.

1.4.5 Other NMR methods

A number of other NMR techniques are also frequently used to probe residual com- pactness in unfolded state proteins, although they are less relevant to providing residual helicity values of residues. Some of these techniques are briefly discussed in this section.

Backbone conformation

Backbone dynamics is a method to probe the flexibility of residue backbones. This method usually concerns the measurements of backbone relaxation rate constants and intra-residue NOE (nuclear Overhauser effect) intensities. Using these values, one can calculate the order parameter (S2) values for the backbone of each residue. Scaled from 0 to 1, these S2 values describe, on average, how rigid (value close to 1) or flexible (value close to 0) a backbone residue is. For protein molecules in solution, it is relatively straightforward to use this approach to identify relatively disordered regions in a folded state protein or relatively rigid regions in an unfolded state protein (Chugha and Oas, 2007; Bai et al., 2001; Bhattacharya et al., 1999). However, this method is not able to reveal, for the relatively rigid regions, what types of structure elements have formed. Backbone J-coupling constants can be used to estimate the population-averaged backbone torsion angles. Existing methods can measure J-coupling constants of an atom pair that are three, two or one covalent bonds apart. For example, Wirmer & Schwalbe used a version of HSQC that is “J-modulated“ to measure J-coupling

H constants between N and Cα (Wirmer and Schwalbe, 2002). More studies have

33 been focused on the measurements of 3J coupling constants (Li et al., 2015; Avbelj et al., 2006; Markwick et al., 2009). Theoretically, these constants follow various forms of Karplus equations. Karplus equations are generally linear sums of cosine functions of φ and ψ. Thus, using measured J-coupling constants, one can estimate the population-averaged backbone conformation of each residue on the peptide chain. Currently, the accuracy of torsion angle prediction is still less than desired. This method seems to be better suited for qualitative analysis or when combined with other methods.

Inter-residue distances

Inter-residue NOE signal detection is a method to detect inter-residue contacts that are usually within a few angstroms. Proton-proton NOESY of backbone amide pro- tons is a common approach in this category. Traditionally, proton-proton NOESY can detect short-distance proton pairs only when the separation is within 6 angstroms. Because the intensity of NOE signal is proportional to 1/R6, R being the distance between a pair of atoms, NOE is known for its detection limit in distance. Recent method development has led to a version of NOESY capable of detecting “very long- distance NOEs“, that is, NOEs up to 8 angstroms (Koharudin et al., 2003). The method utilizes the deuteration of protein to decrease dipolar relaxation of proton pairs, resulting in a slower longitudinal relaxation and a longer detection range. Paramagnetic relaxation enhancement (PRE) is another method to detect inter- residue distances (Salmon et al., 2010; Lietzow et al., 2002). The enhancement of transverse relaxation, in this case, is caused by an unpaired electron, usually artificially attached by means such as nitroxide spin labelling. The relationship between relaxation rate constant increase of the nuclei of interest and its distance to the spin center also follows a 1/R6 dependence. In contrast to NOE, the PRE effect can be observed up to 35 angstroms (Clore and Iwahara, 2009). If a residue is

34 within this distance from the spin center, one will observe a stretch of surrounding

1 15 resonances (usually H) with increased R2. In practice, the N-HSQC peak volume will decrease due to broadening. PRE is a very effective tool to probe long range contact in flexible folded protein systems (Felitsky et al., 2008).

Sidechain movements

Sidechain dynamics is a method to probe fast sidechain motions for hydrophobic sidechains in the ps-ns range. This is in contrast with backbone dynamics, where the ps-ns range motion is often not large (Best et al., 2005) . The measurements of sidechain dynamics was restricted to 2H-labeled methyl sidechains (Choy et al., 2003; Mittermaier et al., 1999). The measurements are done through spin relaxation experiments. Therefore the investigation is limited to several hydrophobic amino

2 acid types. Similar to backbone dynamics, one can calculate the S axis values for each methyl group, with 1 indicating completely rigid and 0 indicating completely flexible. Later development has also allowed for the order parameter analysis of aromatic sidechains(Boyer and Lee, 2008).

35 2

Amide proton exchange and fraction protection

This chapter first focuses on estimating the chain-level helicity of LRH1x in a wide range of temperatures and urea concentrations. The chain-level information is im- portant to guide the experimental designs for collecting residue-specific information. Then the chapter focuses on estimating the population of helical H-bonded protons for the backbone amide proton of each residue. The ultimate goal is to depict the helical hydrogen bonding network of LRH1x under physiological conditions. The sample preparation and experimental measurements described in this chapter were all made by myself. The development of the statistical method to determine intrinsic rate constants and its implementation were carried out by Dr. Roy Hughes.

2.1 LRH1x peptide introduction

Evidence has shown that when helix 1 region (residues 9-23) of λ-repressor is ex- pressed as a standalone peptide (Marqusee and Sauer, 1994), the peptide shows significant amount of helicity as indicated by CD measurement. Combined with

NMR data from unfolded λ6-85 this evidence suggests that the helix 1 region is also significantly helical in the unfolded state. Residues 25-30 form a kink between helix 1

36 1 2 3 4 5 6 7 8 9 10 11 12 GTQEQLEDARRL 13 14 15 16 17 18 19 20 21 22 23 24 KAIYEKKKNELG and 2 in the folded structure. The kink is relatively rigid but contains non-α-helical mainchain hydrogen bonds. Helix 1, therefore, is also sometimes defined as residues 9-30 (Burton et al., 1998). Residues 8-30 correspond to the extended helix 1 region of λ repressor N-terminal domain. In the lab, in order to further study the behavior of helix 1 region outside the context of N-terminal domain, a peptide corresponding to residues 8-30 is ex- pressed. This peptide is the biological material used in this chapter. The purpose of including residues 24-30 is to extend the helical hydrogen bonding network. The peptide is named LRH1x. The amino acid sequence for LRH1x is GTQEQLEDAR- RLKAIYEKKKNELG.

2.2 CD spectra of LRH1x

2.2.1 Materials and methods Peptide expression

A modified TrpLE fusion peptide (Yansura, 1990) TrpLE-LRH1x was inserted after the T7 promoter (Studier et al., 1990). The recombinant plasmid containing pAED- 4 based vector (Doering, 1992) was expressed into BL21(DE3) competent cells in 1L of LB media at 37˝C to an OD600 of 1.0. Overexpression of LRH1x protein was induced by adding IPTG to a concentration of 0.8 mM. After incubation at 37˝C for an additional 5 hours, the cells were harvested by centrifugation and then resuspended in 1 volume of lysis buffer (15 ml per liter of culture, composed of 25% sucrose, 1mM EDTA, 20 mM DTT, 50 mM Tris pH 7.8). Cell lysates were generated by pretreating the resuspended cell pellets with lysozyme

37 (1mg/ml) and DNAse I, and then passed through a French pressure cell at 12,000 lb/in2. Cell lysates were mixed with 2 volumes of detergent buffer (0.2 M NaCI, 1% deoxycholate, 1% Nonidet P-40, 20 mM DTT, 20 mM Tris pH 7.8, 2 mM EDTA) before centrifugation. Then the pellets were resuspended and re-pelleted through centrifugation in a series of solutions, sequentially including 1 volume of Triton wash solution (0.5% Triton X-100, 20 mM DTT, 1 mM EDTA), 1 volume of deoxycholate solution (0.1% deoxycholate, 20 mM DTT, 1 mM EDTA) and 1 volume of deionized water. Deoxycholate washes and water washes were repeated twice respectively. The washed pellets were resuspended in Buffer A (6M Guanidine hydrochloride, 10 mM Tris, 50 mM sodium phosphate pH 8.0, 10mM TCEP) before centrifugation. The resulting supernatant was loaded onto Ni-NTA column and the column was washed with Buffer A at pH 8.0. Then the Ni-NTA column was washed with Buffer A at pH 5.5 to elute target proteins. Eluted fractions containing target protein was dialyzed against 2% acetic acid 3 times and then lyophilized. A methionine residue was engineered between the the TrpLE leader sequence and the LRH1x sequence. The lyophilized protein was then cleaved at the methionine position using cyanogen bromide (CNBr) to separate the target peptide from the fusion protein. The chemical reactions involved in CNBr cleavage is shown in Figure 2.1. The cleavage will generate a normal N-terminus and a C-terminus that contains a homoserine lactone unit. If the methionine residue is followed by a or thre- onine, The sidechain of serine or can prevent the formation of homoserine lactone and therefore prevent cleavage. In the case of LRH1x, the Met residue is followed by Gly. The lyophilized proteins were dissolved in 10ml of 70% formic acid containing 100 mg/ml CNBr. Then the acidic solution was evaporated under vacuum to re- move CNBr and partial formic acid. The resulting slurry was dissolved in buffer B (6M Guanidine hydrochloride, 50 mM Tris pH 8.0) and adjusted to pH 8.0 using

38 CH3 Br

S C H3C S C N Br H3C S C N Br

CH2 N CH2 CH2

CH2 R CH2 R CH2 O R H H H H H H N CH C N CH C N CH C N CH C N CH C N CH C

O O O O O

H3C S C N Br H3C S C N Br H3C S C N Br

H2 C H2 H C 2 R H C O R C 2 H2O R H2C O H2C O CH CH C CH CH C CH H N N N C CH C 3 C H H2 N N O C N H H H O O O H O

Figure 2.1: The mechanism of CNBr cleavage at methionine residue.

sodium hydroxide solution before centrifugation. The supernatant was loaded onto Ni-NTA column and washed with buffer B to remove cleaved TrpLE peptide. The flow-through from Ni-NTA column was dialyzed again 2% acetic acid for 3 times, lyophilized, resuspended in 1.5 ml of 1% acetic acid before centrifugation. The result- ing supernatant was loaded onto Sephadex G-25 size exclusion column and washed with 1% acetic acid. Fractions containing LRH1x were lyophilized. The molecular weight and purity of the lyophilized protein is validated by gel electrophoresis, HPLC and mass spectrometry.

CD measurements

For samples without urea, 40 µM of LRH1x peptide was dissolved in buffer of pH 6.0, consisting of 20 mM Sodium Phosphate and 100 mM Potassium Fluoride. Wave- length scans in the range of 190 nm to 260 nm were carried out at temperatures ranged from 2 ˝C to 87 ˝C, with a step size of every 5 ˝C. Data points with dynode

39 voltage bigger than 500 volts were ignored in the following analysis. For samples with urea, 40 µM of LRH1x peptide was dissolved in buffers of pH 6.0. For each temperature, a peptide solution with high concentration of urea was titrated into a solution with lower urea concentration. Both the titrating solution and the titrated solution were consisted of 20 mM Sodium Phosphate and 100 mM Potassium Fluoride at pH 6.0. For each titration point, the signal at 222 nm was collected. The concentration of urea in the titrating solution and the inital titrated solution before titration was determined using refractive index.

Results

Figure 2.2 shows that under low temperature (e.g. 2 ˝C) and neutral pH, the CD spectrum of LRH1x peptide shows features of highly helical protein/peptide, with two minima at 222 nm and around 208 nm, and a maximum around 190 nm. As temperature increases, the highly helical features gradually disappear, the 222:208 signal ratio decreases, and the absolute ellipticity at 222 nm gradually de- creases, indicating a decrease of residue-averaged helicity with increasing tempera- ture. An isodichroic point can be observed by overlaying spectra of different tem- peratures. In the presence of urea, CD measurements have significant noise at wavelengths lower than „210 nm. Instead of collecting wavelength scans for urea containing samples, the signals at 222 nm were collected instead. The results are shown in Figure 2.3. The titration experiments at 5˝C, 25˝C and 45˝C are used to inspect how the combination of high urea concentration and high temperature influence helicity. These titrations started at 3M urea. The analysis of the data from urea- containing samples incorporated the correction for the effect of temperature and urea concentration on 222nm signals (Luo and Baldwin, 1997; Scholtz et al., 1995).

40 4 2°C 12°C 27°C 3 42°C 57°C 87°C 2 1 0 Mean Residue Ellipticity (deg×m2/dmol) −1 −2

190 200 210 220 230 240 250 260

Wavelength(nm)

Figure 2.2: Normalized CD spectra of LRH1x peptide at pH 6.0.

Far-UV CD spectra of LRH1x are shown at selective temperatures. The background signals have been subtracted.

2.2.2 Residue-averaged helicity of LRH1x

For CD data collected in the absence of urea, the fractional helicity values of the pep- tide were predicted from the CD spectra using CDPro, a software that determines the secondary structure fractions using algorithms including SELCON3, CDSSTR,and CONTIN. Figure 2.4 shows the prediction results using CDPro for LRH1x peptide CD spectra under various temperatures. Reference set SP37A was chosen because of its ability to estimate population of PPII secondary structure. The prediction using the CDSSTR algorithm is presented here because its result is the closest to a continuous decrease of helicity with increasing temperature, which explains the data best.

41 ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.5 ● ●

●●

−1.0 ● ● 02°C ● 05°C ● ● 25°C ● ● 45°C

● −1.5 ●

● Mean Residue Ellipticity (deg×m2/dmol)

● −2.0 ●

0 2 4 6 8 10

Urea concentration(M)

Figure 2.3: Normalized CD signals of urea containing LRH1x samples

CD signals at 222 nm of LRH1x peptide in various urea concentration and temperature combinations at pH 6.0.

As shown in Figure 2.4, on the chain level, the peptide displays a significant amount of helicity, as high as „60%, at low temperatures with no urea. In the absence of urea, even at high temperature (87˝C), the peptide displays „18 % helicity. High temperature alone is thus insufficient to eliminate the chain-averaged helicity of the peptide. Helicity values in the presence of urea were analyzed using 222 nm signals. The equation is:

MRE ´ p2250 ´ 4250 ˜ 80 ˚ T ` 280 ˚ Uq f “ obs (2.1) hel p´44000 ` 250 ˚ T ´ 620 ˚ Uq ´ p2250 ´ 4250 ˜ 80 ˚ T ` 280 ˚ Uq

where fhel is fraction helicity, MREobs is the observed mean residue ellipticity 42 1.0 0.8

● 0.6

● ●

Helicity ● ● ● 0.4 ● ●

● ● ● ● ● ● ● 0.2 ● ● ● 0.0

0 20 40 60 80

Temperature(°C)

Figure 2.4: The change of helicity with increasing temperature

The chain-level helicity values are estimated by CDpro analysis of experimental data.

(degˆcm2/dmol), T is temperature in Celsius and U is urea concentration in molar. As shown in Figure 2.5, increasing urea concentration decreases helicity. Above 8M urea, the helicity values are below 15% for all 4 temperatures used in the mea- surements. The data points for higher urea concentrations are associated with a bigger noise because of an increase in the dynode voltage. On the chain level, the peptide displays a significant amount of helicity, as high as „60%, in low temperatures with no urea; and it continues displaying significant amount of helicity in high temperatures. This is a remarkable helicity for a naturally ocurring 24-residue peptide sequence. Temperature increase alone was insufficient to decrease the helicity to negligible amount. High concentrations of urea, in compar- ison, decrease the helicity below 15%. Based on these observations, residue specific measurements at lowest possible chain-level helicity must include samples with high concentrations of urea. In fact, according to the Lifson-Roig formalism, the lowest chain-averaged helicity

43 0.8

0.6 ● ●

● ● ● 02°C ● ● 05°C ● ● 25°C 0.4 ● ●

Helicity 45°C ●

● ●● ● ● ● ● ●

0.2 ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0

0 2 4 6 8 10

Urea concentration(M)

Figure 2.5: Helicity estimates for CD signals of urea containing samples a 24 residue peptide can have is „5%. With all propagation weights w approaching zero, the initiation weight v becomes the dominate contribution to helicity calcula- tion. The absolute value chosen for v determines the exact number for the final result.

For v “ 0.05, the lowest chain-averaged helicity is 4.75%. However, decreasing all w parameters to close to 0, requires biophysically unrealistic temperature and urea concentration. Therefore, it is unsurprising that the lowest chain-averaged helicity for LRH1x is higher than 5% under the condions used for these studies. The lowest helicity predicted is heavily dependent on the value chosen for v.

44 2.3 Backbone amide proton protection factors in LRH1x

2.3.1 Apparent amide proton exchange rate constant measurement

One way to detect and describe residual helicity is to quantify the extent to which each backbone amide proton is protected from exchange with solvent by helical hy- drogen bonding. Assuming that the predominant intramolecular hydrogen bond for a particular amide proton is due to helix formation, then the fractional protection of a backbone amide proton reflects the helicity in its vicinity. The fractional protection

(fprot) at a given residue is linearly related to the observed exchange rate constant

(kobs) and the intrinsic rate constant when the residue is fully unprotected (kint):

kobs “ kint ¨ p1 ´ f protq (2.2)

In this section, I will describe how I have used the NMR pulse sequence SOLEXSY to detect amide proton exchange of backbone amide protons, therefore measuring values of kobs. All rate constants were measured in solutions whose pH was between 6 and 8. Under these conditions the proton exchange is dominated by base catalysis.

Solution condition and kint

Intrinsic exchange rate constant is influenced by various factors, including solution pH, temperature and urea concentration. First, the intrinsic exchange rate constant of a given backbone amide proton depends on the concentration of hydronium ion and hydroxide anion in the bulk solution. The pH dependence of kint can be expressed as:

´pH pH´pKw kint “ kA ¨ 10 ` kB ¨ 10 ` kwater (2.3)

where kA is the rate constant for acid catalyzed exchange, kB is the rate con-

stant for base catalyzed exchange and kwater is the rate constant for water catalyzed

45 exchange. When the solution pH is close to neutral, the base catalyzed reaction dominates the exchange process. Therefore, kint can be approximated as:

pH´pKw kint “ kB ¨ 10 (2.4)

The intrinsic exchange constant also increases with temperature. Their relation- ship can be expressed as:

Ea 1 1 ´ R p T ´ T q kintpT q “ kintpT0qe 0 (2.5)

where Ea is the activation energy and R is the gas constant. Additionally, the intrinsic exchange rate constant decreases with increasing urea concentration. Urea can form a hydrogen bond to amide nitrogen and block hydroxide access to amide proton. The relationship between urea concentration and the exchange rate constant (Lim et al., 2009) can be expressed as:

urea 0 1 kint “ kint ¨ 3 (2.6) 1 ` K ¨ φurea{φwater

urea 0 where kint and kint are respectively the intrinsic exchange rate constant with and without the presence of urea; φwater and φurea are the volume fractions of water and urea for the given urea concentration; and K is the exchange constant.

Protein expression and purification

The expression and purification of 15N and 13C labeled LRH1x peptide was similar to the expression and purification of unlabeled LRH1x peptide. The difference is in the overexpression step. For 15N and 13C labeled LRH1x peptide, the LRH1x protein

15 was overexpressed in M9 media with N labelled ammonium chloride (NH4Cl) and

13 C labeled glucose. The other components of the M9 media include 6 g/L Na2HPO4,

3 g/L KH2PO4, 0.5 g/L NaCl, 1 mM MgSO4 and 0.1 mM CaCl2.

46 Theory of the SOLEXSY experiment

SOLEXSY experiments rely on the exchange of amide protons and deuterons in aqueous solution containing D2O. Unprotected and partially protected backbone amide protons can exchange with either protons or deuterons in the aqueous so- lution. Therefore, at equilibrium, the overall ratio of NH bonds and ND bonds for a particular residue becomes close to the ratio of ionizable protons to ionizable deuterons in the bulk solution. Additionally, at equilibrium, there are four types of ongoing amide proton/deuteron exchange reactions: 1) NH+H*+ÑNH*+H+; 2) NH+D*+ÑND*+H+; 3) ND+H*+ÑNH*+D+; 4) ND+D*+ÑND*+D+. In these equations, the symbol “*” indicates that the annotated reactant is part of solvent. The SOLEXSY pulse sequence can detect reaction 2 and 3 by measuring the exchange of magnetization between solvent and the amide group. The pulse sequence simultaneously detects the “decay” of proton signals due to reaction 2 and “buildup” of signals due to reaction 3. Because the ongoing reactions are at equilibrium, the beginning of each measurement, which magnetically labels both amide and solvent protons, can be viewed as the time 0 of the kinetic experi- ment. The frequencies of the 15N resonances are different for nitrogens participating in an NH bond versus an ND bond due to the isotope effect. These nitrogen reso- nances are “recorded” in the measurements and correlated with proton resonances. Therefore, two groups of crosspeaks with different nitrogen frequencies can be used to monitor the two separate reactions. Additionally, two spectra are collected simultaneously, with the one phase for “de- cay” peaks and the opposite phase for “buildup” peaks. The addition and subtraction of crosspeak intensities in the two spectra can separate crosspeaks corresponding to the two reactions into two separate spectra. An important parameter, τ, in the pulse sequence equates to the allowed reaction

47 time before the resulting crosspeak intensities are measured. A set of crosspeak intensity measurements for the same solution condition at different τ values can be fit to obtain exchange rate constants. The magnetization transfer pathway of the SOLEXSY experiment is shown below. The magnetization first transfers from Hα to Cα to C1 to N H{D. The frequencies of

N H and N D are labelled. Then, using a x{x and x{ ´ x filter, the magnetization of N H is separated into N H and ´N H (steps 1 and 2 below), while the magnetization of

D D N remains N (step 3 below). During the mixing period τmix (see above), exchange occurs. Then the magnetization is transferred from N to H and the proton signal is detected.

˝ ˝ α 90˝ pHq ˝ α α 90xpHq α τa´180xpH`C q´τa α α y α α 90xpC q α α H z ÝÝÝÝÑ´Hy ÝÝÝÝÝÝÝÝÝÝÝÝÑ´2Hx Cz ÝÝÝÝÑ 2Hz Cz ÝÝÝÝÝÑ´2Hz Cy

˝ α 1 ˝ α ˝ 1 τc´180xpC `C q´τc α 1 90xpC q α 1 τd´180xpC `Nq´τd 1 H{D ÝÝÝÝÝÝÝÝÝÝÝÝÑ ´2C C ÝÝÝÝÝÑ 2C C ÝÝÝÝÝÝÝÝÝÝÝÝÑ 2C Nz y z ˝ 1 z y y 2τb´Hdecouple 90xpC q

˝ ˝ 1 90xpNq 1 H{D τd´180xpC `Nq´τd H{D ÝÝÝÝÑ 2C Nx ÝÝÝÝÝÝÝÝÝÝÝÝÑ Nx Ñ ˝ 1 z 90xpC q t1evolutionpNq

˝ ˝ 2τe´90 pHq180 pNq´2τe ˝ ˝ H x{x x H τmix H τf ´180xpH`Nq´τf H 90xpNq 1qN ÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÑ´N ÝÝÑ´N ÝÝÝÝÝÝÝÝÝÝÝÝÑ´2HzN ÝÝÝÝÑ x x x y ˝ 90xpHq

˝ H τf ´180xpH`Nq´τf ´ 2HxNz ÝÝÝÝÝÝÝÝÝÝÝÝÑ ´Hx watergate

˝ ˝ 2τe´90 pHq180 pNq´2τe ˝ ˝ H x{´x x H τmix H τf ´180xpH`Nq´τf H 90xpNq 2qN ÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÝÑ N ÝÝÑ N ÝÝÝÝÝÝÝÝÝÝÝÝÑ 2HzN ÝÝÝÝÑ x x x y ˝ 90xpHq

˝ H τf ´180xpH`Nq´τf 2HxNz ÝÝÝÝÝÝÝÝÝÝÝÝÑ Hx watergate

˝ ˝ D D τmix D τf ´180xpH`Nq´τf D 90xpNq 3qN Ñ N ÝÝÑ N ÝÝÝÝÝÝÝÝÝÝÝÝÑ 2HzN ÝÝÝÝÑ x x x y ˝ 90xpHq

˝ D τf ´180xpH`Nq´τf 2HxNz ÝÝÝÝÝÝÝÝÝÝÝÝÑ Hx watergate

48 Figure 2.6: Peak intensities observed in a SOLEXSY experiment

An example of peak intensity changes due to changes in τ for a pair of ND and NH crosspeaks. The intensities were taken from experimental data. Values of τ from left to right are: 0.7, 40.7, 70.7, 110.7, 160.7, 220.7 and 500.7 ms.

Obtaining kobs values from SOLEXSY data

For a given solution condition, several factors influence the intensities for “buildup” peaks and “decay” peaks when measured with different τ values, including R1 relax- ation rate constants for 15N-H and 15N-D, and deuteron/proton fractionation factors in amide bonds. In theory, the evolution of peak intensities with respect to τ follows the equation below:

H NH H d Nz pτq ´R1 ´ kHD kDH Nz pτq D “ ND D (2.7) dτ N pτq kHD ´R ´ kDH N pτq ˆ z ˙ ˆ 1 ˙ ˆ z ˙

NH ND 15 where R1 and R1 are the longitudinal relaxation rate constants for N-H

15 and N-D, respectively; and kHD and kDH are exchange rate constants for amide

proton and amide deuteron, respectively. The relationship between kHD and kDH is determined by the fractionation factor, due to an isotope effect on the strength of the nitrogen-hydrogen covalent bond:

kHD “ f ¨ kDH (2.8)

Therefore, the fitting equation for “buildup” peaks and “decay” peaks can be expressed as the following:

49 kDH ´στ δ αNH pτq “ A ¨ e rarccos pDτq ´ ¨ arcsin pDτqs (2.9) kHD ` kDH D

? k ¨ k α pτq “ B ¨ DH HD e´στ arcsin pDτq (2.10) ND D

with

NH ND σ “ R1 ` kHD ` R1 ` kDH (2.11)

NH ND δ “ R1 ` kHD ´ R1 ´ kDH (2.12)

2 D “ δ ` kHD ¨ kDH (2.13) a where αNH and αND are the peak intensities for the ”decay” peak and a ”buildup” peak, respectively; and A and B are scaling factors used to fit the intensities. Simul- taneous fitting for “decay” peaks and “buildup” peaks with different τ settings for

a single residue gives the kobs value for that residue. A simulation of the ”decay” and ”buildup” peak intensity vs. increasing mixing time is shown in Figure 2.7 for different exchange rate constants. Here, it is assumed

NH ND that A “ B “ 1 and that R1 “ R1 “ 1.5

Choice of solution conditions

The solution conditions of the NMR samples, including pH, urea concentration and temperature, are chosen such that for the majority of residues, values of kobs for amide protons are within 0.3/s - 5/s. In this range, SOLEXSY data can be used to most accurately determine the values of kobs. The quality of fitting significantly decreases when the rate constant for longitudinal relaxation dominates the rate constant for

50 ● ● 1.0 ● 1.0 ● ● ● ●

0.8 ● 0.8 ● ● ●

● 0.6 0.6

● 0.4 0.4 ● Normalized Intensity Normalized Intensity ● 0.2 0.2 ● ●● ● ● ● ● ● ● ● ● ● ● ●●●●●● ● ● ● ● 0.0 0.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 Time(s) Time(s)

● 1.0 0.8

0.6 ●

● ●● 0.4 ● ●

Normalized Intensity ● ● 0.2 ● ● ● ● 0.0 0.0 0.5 1.0 1.5 2.0 Time(s)

Figure 2.7: Examples of simulated SOLEXSY data

Assumed values are kobs = kDH : 0.1/s for top left, 1/s for top right and 10/s for the bottom NH ND figure; A “ B “ 1; and R1 “ R1 “ 1.5. The x axis is the mixing time, τ.

amide exchange or vice versa. Therefore, the best range for kobs measurement is when kobs values are comparable with R1 values.

Because pH, temperature and urea concentration all affect both Kint and fprot, thereby affecting kobs, not all combinations of pH / temperature / urea concentration are suitable for SOLEXSY measurements of LRH1x. Solution conditions used in the final analysis are documented below.

51 NMR measurements

Eight NMR samples were prepared for measurements. The pH, urea concentration and the temperatures under which rate constant data were collected are as the fol- lows: sample 1 - 7.23 & 0M (10˝C, 15˝C, 20˝C, 25˝C, 30˝C); sample 2 - 6.19 & 0M (25˝C, 30˝C, 35˝C, 40˝C, 45˝C, 50˝C); sample 3 - 7.49 & 1.84M (20˝C, 25˝C, 30˝C, 35˝C); sample 4 - 7.49 & 4.22M (25˝C, 30˝C, 35˝C, 40˝C); sample 5 - 7.54 & 5.94M (30˝C, 35˝C); sample 6 - 7.53 & 6.92M (45˝C); sample 7 - 7.58 & 7.83M (30˝C, 40˝C, 45˝C); sample 8 - 7.63 & 7.88 (35˝C). Concentrations of all samples ranged from 1 mM to 2 mM. SOLEXSY experiments were carried out according to the protocol suggested by the Forman-Kay group (Chevelkov et al., 2010). The values of τ used were 40.7, 70.7, 110.7, 160.7, 220.7, 500.7, 1000.7, 1500.7 and 2000.7 ms for a given solution condition. For each solution condition, an HA(CA)CO 2D spectrum was collected for the determination of fractionation factor f (LiWang and Bax, 1996). All experiments for exchange rate constant determination were performed on a Bruker 700MHz in- strument with a cryogenic probe.

NMR data post-processing

The SOLEXSY data processing procedure was adapted from protocol suggested by the Forman-Kay group (Chevelkov et al., 2010). The conversion from raw data to time domain signals was done using NMRPipe. NMRPipe was also used to add and subtract the two spectra collected in a single measurement to separate “buildup” peaks from “decay” peaks. Then the NMRViewJ was used for peak picking and peak intensity calculation.

52 2.3.2 kobs values of LRH1x residues

At equilibrium, residues along the LRH1x chain are undergoing helix-coil transitions.

The timescale of helix-coil transition is „100 ns (Brooks, 1996; Huang et al., 2001). For NMR measurements, this is a fast conformational exchange. Therefore, only a single chemical shift for each backbone nucleus is observed for a given residue. The measured kobs is also a population averaged property. The amide HX rate constants for all measurable residues of LRH1x, except residues Gln3 and Asn21, are comparable for a specific solution condition. Rate constants for Gln3 and Asn21 are significantly higher, which is an expected outcome because their neighboring residues have elevated their values of kint (Bai et al., 1993). Increasing pH, increasing temperature and decreasing urea concentration all increase kobs values. All experiments were done in pH 6.5 to 7.5 range. This range should not en- compass the pKa of any titratable group in LRH1x. Therefore, no perturbation to conformational ensemble is expected by changing pH in this range. The titration experiment monitored by CD (not shown) also confirms that the conformational en- semble in this pH range is unperturbed. As described above, temperature and urea concentration affect both the helicity level and the kint. THe SOLEXSY experiment’s mixing time dependence is influenced by the lon- gitudinal relaxation rate constant of ND and NH , and the exchange rate constant can be fitted most accurately if the value is comparable to the longitudinal relax- ation rate constants. According to Chevelkov et al.(2010), the optimal range of rate constant is 0.3-5/s. Therefore, we used combinations of pH, temperature and urea concentrations to constrain the rate constants of most residues to this range. Based on the helicity estimations from CD measurements, varying temperature

alone is unable to decrease the helicity below a certain level, estimated to be „18%

53 15

● 10 ● (/s) obs k 5

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0

0 5 10 15 20

Residue Number

˝ Figure 2.8: kobs values of LRH1x residues at pH 7.23, 20 C by CD data. As shown in the analysis of CD urea titration data, high concentrations

(ą8M) of urea can significantly decrease the chain-average helicity to below 15%. Therefore, the combination of high temperature and urea concentration enables the measurement of kobs values that are close to kint values under such conditions.

2.3.3 Fitting for fraction protection values

Theoretically, if both kobs and kint are accurately known, the value of fprot can be precisely calculated. In practice, no algorithm or dataset known to date can accu- rately predict kint values given a random peptide sequence. Therefore, in this project, we have attempted to simultaneously fit for fprot and kint values from multiple kobs measurements. In order to simultaneously fit for fraction protection values and intrinsic exchange

54 rate constants, which are two sets of highly correlated parameters, we use Bayesian inference to improve the fitting efficiency. Apparent exchange rate constants for measurable residues were obtained in solution conditions of multiple temperatures and multiple denaturing osmolyte (urea) concentrations. As indicated by CD mea- surements, residue-averaged helicity gradually decreases with increasing temperature or increasing urea concentration. The instrument limits the accessible temperature range to 0-50˝C in NMR experiments. The fitting model used here is called the “Bayesian-Englander model”. The Bayesian-Englander model uses Bayesian inference to globally analyze amide ex- change data for LRH1x across solution conditions, in order to simultaneously fit for kint and fprot values. Distributions of fprot for neighboring residues within the same condition are correlated by first order autoregressive model (AR(1)). Distributions of kint for each residue across solution conditions are correlated by pH, tempera- ture and urea concentration dependence. The mathematical relationships among kint parameters use a modified version of the Englander calculation.

In this model, as shown in Equation 2.3, kint is calculated as a sum of contri- butions from all three catalytic mechanisms. In this model, kint parameters are dependent on position-specific parameters γA and γB. Position-specific γA and γB are corrections to Englander calculations.

ik ˆ kint “ kintpTi, ui,%i, pHi, ξi, γA,k, γB,kq (2.14)

where

ik ik ik ˆ X ´pHi Y `pHi´pKw,i Z kintpTi, ui,%i, pHi, ξi, γA,k, γB,kq “ 10 ` 10 ` 10 ¨ ξi (2.15) ” ı and

55 E 1 1 Xik “ log k ` log Ak ` log Ak ` a ´ ` γ a L R 4.57 293 T A,k ˆ i ˙ E 1 1 Y ik “ log ku ` log Bk ` log Bk ` b ´ ` γ b L R 4.57 293 T B,k ˆ i ˙ E 1 1 Zik “ log k ` log Bk ` log Bk ` w ´ ` γ w L R 4.57 293 T B,k ˆ i ˙

u 3 ´1 kb “ kb 1 ` Kpφurea{φwaterq (2.16) ` ˘ where φurea “ 0.046ui, φwater “ 1 ´ φurea are the volume fractions of urea and

water respectively and are related to ui according to volumetric information ob- tained from (Schellman, 2003; Kawahara and Tanford, 1966), and K “ 2.8. The ˆ kintpTi, ui,%i, pHi, ξi, γA,k, γB,kq function is nearly identical to the relation used by

the Englander model (Bai et al., 1993) to calculate kint values from a sequence and experimental condition. In the above equations, i is the index for solution conditions and k is the index for residue number. Experimental condition i consists of values

for temperature (Ti), urea concentration (ui), ionic strength (%i), percentage of D2O present in a mixed D2O{H2O solvent (ξi), and pH (pHi). ka, kb, and kw are the reference rate constants for acid, base, and water catalyzed amide exchange. Ea,

Eb, and Ew are the activation energies for acid, base, and water catalyzed exchange.

k k AR and BR describe perturbations to the exchange rate constant for the amide of residue k due to the side chain of residue k ´ 1 for acid and base catalyzed exchange

k k respectively. AL and BL describe perturbations to the exchange rate constant for the amide of residue k due to the side chain of residue k for acid and base catalyzed exchange respectively. pKw,i is the logarithm of the self-dissociation constant of the solvent for condition i and 293 K is taken as the reference temperature. All of these variables are described in the original Englander model (Bai et al., 1993). The de-

ik pendence of kint on ionic strength is modeled by having different ka, kb, and kw values 56 for high and low ionic strength conditions (Bai et al., 1993). The modifications to the original Englander model that appear in the expressions above are as follows:

u ik u • kb appears in Y in place of kb in the original Englander model. kb con-

tains a multiplicative correction that captures the dependence of kb on urea concentration as described by Lim et al. (Lim et al., 2009).

• γA,k represents a position-specific correction to the Englander model estimate

k k for log AL ` log AR

• γB,k represents a position-specific correction to the Englander model estimate

k k for log BL ` log BR

Parameter ξi is the D2O mole fraction of the solvent for condition i. The multipli- cation by ξi in Equation (2.15) is intended to account for the effect of mixed solvents, and assumes the exchange reaction being monitored is proton to deuteron (H Ñ D).

If the reaction were D Ñ H the ξi factor would be replaced by p1 ´ ξiq. This modifi- cation has been used to model the effects of mixed solvents on kint (Chevelkov et al., 2010). ˆ The kintpTi, ui,%i, pHi, ξi, γA,k, γB,kq function constitutes a modified Englander

Model that lies embedded within the BE Model. The γA,k and γB,k represent pa-

ik rameters of the BE Model itself and are estimated from the kobs observations simul-

ik ik taneously with the fpro parameters. In this way instead of estimating the kint values from scratch, the BE model leverages the knowledge contained in the Englander model while simultaneously allowing the Englander predictions to be updated by the new data collected from the sequence of interest through the new γA,k and γB,k parameters. Aside from the benefits obtained by leveraging the empirical knowledge contained in the Englander model, this parameterization reduces the number of po- tential parameters from 2NcNr to pNc ` 2qNr, where Nc is the number of conditions 57 and Nr is the number of residues. The incorporation of urea dependence extends the applicability of the BE model to experiments employing urea to reduce nascent structure.

Eqn. (2.15) requires the specification of pKw,i values to calculate the kint values.

Using a single condition-independent value for the pKw,i (as commonly done when using the Englander Model) has proven inadequate when fitting the BE model to experimental data, as have pKw,i values calculated using the simple temperature- dependent empirical model. In both cases the fpro posterior estimates are highly unreasonable. For this reason Eqn. (2.15) is modified as follows when fitting the model against experimental data:

ik ik ik ˆ X ´pHi Y `pHi´pKw,i`∆pKw,i Z kintpTi, ui,%i, pHi, ξi, γA,k, γB,kq “ 10 ` 10 ` 10 ¨ ξi ” ı

Here the pKw,i values are calculated ionization constants. The ∆pKw,i values are model parameters that capture deviation of the calculated pKw,i values from their

ptrueq true values (pKw,i ):

ptrueq pKw,i “ pKw,i ` ∆pKw,i (2.17)

Temperature-dependent pKw,i values for pure D2O (pKw,d2o,i) and H2O (pKw,h2o,i) were calculated using previously published theory (Covington et al., 1966). The value of pKw,i was then calculated using a linear interpolation between the values for the pure solvents:

pKw,i “ ξipKw,d2o,i ` p1 ´ ξiqpKw,h2o,i

This linear approximation has been used by previous workers to apply the Englan- der calculation to mixed solvents (Chevelkov et al., 2010). Physically reasonable estimates can be obtained for real experimental data sets when these errors in pKw,i are accounted for through the new ∆pKw,i parameters.

The experimentally determined kobs values are then fitted with: 58 ik ik ik i kobs “ p1´ qkintp1 ` obsq (2.18)

i i 2 obs „ N p0, pσobsq q (2.19)

i The magnitude of the observation error obs is proportional to the magnitude of

i the kobs values and is taken from the normal distribution shown above. The σobs pa- rameter describes the observation error variability for measurements in experimental condition i, and are estimated from the data.

2.3.4 Bayesian inference to determine reference parameters

Bayesian inference optimizes joint probability distributions of model parameters based on experimental data. Bayesian inference describes each parameter with a probabilistic distribution of values, and attempts to maximize the likelihood that the combination of these distributions reflects the existing knowledge of parameters as well as the influence of new data on parameters. In comparison, the frequentist method minimizes the sum of square error. Prior parameter distributions, which are distributions reflecting existing knowledge of parameters, and experimental data are needed to generate posterior parameter distributions. The relationship can be simplified as:

P pθ~ | d~q9P pθ~qP pd~ | θ~q (2.20)

where P pθ~q is the prior distribution of the parameter vector (θ~), P pd~ | θ~q is the likelihood distribution, which reflects the likelihood that the data is predicted by the model and P pθ~ | d~q is the posterior distribution, which reflects the probability of the model parameters, given the data. θ~ is the parameter vector and d~ is the data vector. Therefore, the outcome of posterior distributions depends both on the

59 Figure 2.9: The relationship between box plots and probability distributions

Source: www.kanat.net/verdicts experimental data and the assumptions for the prior distributions. In practice, prior distributions can be a normal distribution centered at 0, usually indicating a lack of prior knowledge.

In the Bayesian-Englander model, the likelihood P pd~ | θ~q describing the proba- bility of observing d~ given a candidate value for θ~ is given by

Nc Nr kˆik ´ kik P pd~ | θ~q “ N obs obs ; 0, pσi q2 (2.21) ˆik obs i“1 k“1 ˜ kobs ¸ ź ź

ˆik ik ik ik where kobs “ p1 ´ fproqkint is the calculated value of kobs calculated under the model, and Np¨; µ, σ2q denotes the probability density function of a normal distribu- tion with mean µ and variance σ2.

2.3.5 Posterior parameter distributions

Some figures beginning from this section feature box plots. The quartiles used for the box and whiskers of each box plot in this dissertation are explained in Figure 2.9. The use of whiskers in the figures follow the “1.5 IQR rule”.

The model used here attempts to simultaneously fit kint values and fraction pro- tection values fprot. It can be observed from equation 2.2 that that kint and fprot

60 are highly correlated and are impossible to decorrelate by analyzing a single kobs data point. The conventional approach is to use the kint values provided by the Englander model, therefore avoiding fitting both parameters. In this work, instead, we globally analyze kobs data across conditions, with kint parameters correlated by pH, temperature and urea concentration.

Because fprot values in high urea concentration conditions are presumably small,

kobs values from these conditions are used to provide more information for the infer- ence process and enhance the inference of kint related parameters. The fitting model incorporates a modified version of the Englander model but does not incorporate helix-coil model. Instead, the fitting model uses auto-regression

to influence the fprot parameter of neighboring residues within the same condition.

This means there is a strict mathematical relationship for all the kint parameters of the residues. For each residue, the kint parameters are pH, temperature and urea

concentration dependent. However, no such relationship exist in the model for fprot parameters. Figure 2.10 compares the fraction protection values at 20˝C generated using Eng- lander kint values and the fraction protection parameter posterior distributions from

Bayesian inference. The fprot values calculated using Englander kint values show sudden changes from one residue to the next, which is biophysically unrealistic. Above 6M urea, no significant fraction protection can be observed in the posterior distributions (see supplementary figures). Because three consecutive residues need to sample the helical state for one helical hydrogen bond to form, the change in fraction protection values are perceivably more sensitive to urea concentration change compared to residue-specific helicity. For all conditions, the pH values are close to neutral. At such pH, the ex- change of all residues are predominately contributed by base-catalyzed exchange.

Therefore, the posterior distributions of γB parameters, which are the corrections to 61 1.0

● ●

● 0.8

● ● ● ● ● ● ●

0.6 ● ●

0.4 ● Fraction protection Fraction ● ● 0.2 ●

● 0.0

5 10 15 20

Residue Number

Figure 2.10: Comparison of fraction protection values

Fraction protection values of LRH1x backbone amide protons at pH 7.23, 20˝C and 0 M urea. The blue dots are values estimated using Englander values; the box plots are posterior distributions of the Bayesian fitting. The fprot of residue 1 is not fitted. The data for posterior distribution boxplots were contributed by Roy Hughes. base-catalyzed exchange rate constants in the logarithmic scale, can be accurately determined using the collected experimental data.

Figure 2.11 shows the posterior distributions of γB parameters. The addition of

γ parameters to the Englander model is a log scale adjustment to the Englander kint

γB values. Therefore, each kint value will be adjusted by the amount of 10 . There are no inference of the γB parameters for residues 2 and 22 because of the lack of data. For the other 21 γB parameters in the model, the means of 16 γB posterior distributions are significantly deviating from 0. These deviations are sequence specific kint corrections that are based on experimental data specifically measured for the

62 1.5 1.0 0.5 gammaB 0.0 −0.5 −1.0

1 3 5 7 9 11 13 15 17 19 21 23

Residue Number

Figure 2.11: Posterior distribution of γB paramters

γB parameter for residue 1 is not fitted. The data for posterior distribution boxplots were contributed by Roy Hughes.

peptide. The “corrected” kint values therefore more accurately describe the backbone amide exchange of LRH1x.

2.4 Discussion

Studying peptides and unfolded state proteins in the physiological conditions is es- pecially important in order to gain evidence of naturally occurring unfolded state conformations. The effect of solution condition on residual structure is well sup- ported by the change in chain-level and residue-specific helicity of LRH1x when exposed to changing temperature and urea concentrations. Far-UV CD spectrum is known for its advantages in providing secondary structure

63 information. The analysis of far-UV CD spectra gives an estimate of the percentages of secondary structure types. The estimate from CD spectra works both for folded state and unfolded state proteins. It is difficult to attribute each basis spectrum to particular parts of the sequence without further evidence. For LRH1x, the CD data describes a peptide that shows high amount of helicity in physiological conditions and persists significant amount of helicity in extreme conditions. The helicity estimates from CD data are quantitative and can guide the further exploration of residue-specific information. In order to know the contribution of helicity from individual residues along the chain, residue-specific information is needed. According to the helix-coil transition theory, a peptide propagates helicity along the chain, the residues in the middle of the chain can be highly helical, while the residues towards both terminus are less helical in comparison. The population of helical hydrogen bond for each backbone amide proton follows a similar profile. The posterior distributions of fprot parameters agree with the basic premise of helix-coil transition theory. The simultaneous fitting for fprot parameters and kint parameters ensures that the determination kint values are influenced also by experimental data of LRH1x. The improvement can be observed from a more biophysically reasonable set of fprot parameter distributions.

64 3

Backbone chemical shift and residue specific helicity

3.1 Backbone chemical shifts of LRH1x

This chapter focuses on using backbone chemical shifts to estimate the residue specific helicity values for LRH1x residues. The residues of LRH1x are each undergoing fast equilibrium of a large number of conformations. The helix-coil transition happens on a ns-µs time scale, which is fast exchange on the NMR time scale. Therefore, only a single peak for each residue can be observed for each nucleus.

3.1.1 Materials and Methods Protein expression

15N and 13C labeled LRH1x peptide was expressed in E.coli cells using minimal media. The expression and purification protocol is the same as the protocol used for the fraction protection project.

Backbone resonance assignments

Backbone assignments were determined initially for a sample of pH 6.0 at 15 ˝C and pH. The NMR sample contains 10% D2O, 20mM Sodium Phosphate salt, 100 mM

65 NaCl and 1 mM TMSP(trimethylsilylpropanoic acid). The spectra collected include: 15N-HSQC, HNCA, HN(CO)CA, HNCO, HN(CA)CO, H(N)CACB, H(NCO)CACB, HACACO and HA(CACO)N. Rawdata were converted to frequency domain data using NMRPipe; 15N-HSQC and 3D projections were analyzed using NMRViewJ; 3D NMR data were analyzed using CARA. After the initial assignments at 15 ˝C, assignments were also determined for 49 other solution conditions. The differences among the 50 conditions are in temper- atures and urea concentrations. Five NMR samples were prepared in total, each having a different urea concentration.All samples contain 10% D2O, 20mM Sodium Phosphate salt, 100 mM NaCl and 1 mM TMSP. The pH value and urea concentra- tion of each sample are respectively 6.02 & 0 M, 5.95 & 2.21 M, 6.02 & 3.97 M, 5.95 & 6.16 M, 6.03 & 8.18 M. The concentrations of urea were determined by refractive index measurements. For each sample, NMR experiments were carried out in 10 dif- ferent temperatures (5 ˝C, 10 ˝C, 15 ˝C, 20 ˝C, 25 ˝C, 30 ˝C, 35 ˝C, 40 ˝C, 45 ˝C, 50 ˝C). For each sample and each temperature, a set of 2D NMR spectra were collected, in- cluding 15N-HSQC, H(N)CA, H(NCO)CA, H(NCO)CACB, HA(CA)CO, H(N)CO, H(NCA)CO and HA(CON)H. Rawdata were converted to frequency domain data using NMRPipe; data analysis was carried out using NMRViewJ. Assignments at other temperatures and urea concentrations were determined by overlaying 2D NMR spectra of the same type. For example, for assigning amide proton and amide nitrogen resonances, 10 15N-HSQC spectra for 2.21 M urea NMR sample can be overlaid. For each residue, because of change in secondary chemical shift, the crosspeak will gradually shift in both dimensions and form a recognizable trajectory (Figure 3.2). Similar trajectories can also be obtained by overlaying other sets of 15N-HSQC spectra. With the assignments at 0 M Urea and 15 ˝C known, the assignments can be extended to other conditions by following the trajectories of crosspeaks.

66 Figure 3.1: Example 15N-HSQC spectrum of LRH1x

15N-HSQC spectrum of 2.21 M Urea LRH1x sample at 5 ˝C, only backbone resonances are shown.

The experiments for the initial backbone chemical shift assignments were per- formed on a Varian 800MHz instrument with a cryogenic probe. All other experi- ments for backbone chemical shift assignments were performed on a Bruker 700MHz instrument with a cryogenic probe.

3.1.2 Backbone chemical shift data

The complete set of backbone chemical shift assignment at 15 ˝C, pH 6.0 and 0 M urea is shown in Table 3.1.

Unlike kint values, without conformational change, chemical shift is not dependent on pH. All samples are of the same pH. For all residues, except residues 1 and 24, all six backbone nuclei display continuous changes in chemical shift when either temperature or urea is increasing. Examples of the continuously changing chemical

13 shifts of Cα are shown in Figure 3.3. The observed temperature dependence of

67 Figure 3.2: 15N-HSQC spectra overlay for 5-50˝C with 5˝C increments.

The top figure shows the overlay at 2.21 M urea; the bottom figure shows the overlay at 8.18 M urea. Only backbone resonances are shown. Crosspeaks measured at different temperatures are shown with different colors, with blue/green end of the trace starting from lower temperatures for both overlays.

68 Chemical Shift(PPM) Chemical Shift(PPM) 59.0

58.5 57.0

58.0 56.5

57.5 56.0 57.0 55.5 56.5

Temperature(°C) Temperature(°C) 0 10 20 30 40 50 0 10 20 30 40 50

Figure 3.3: CA chemical shifts of residues Arg11 and Leu12.

The chemical shift values systematically change with increasing temperature and urea concentration. The color blue, orange, green, red and purple represents the data collected at 0M, 2.21 M, 3.97 M, 6.16 M and 8.18 M urea concentration, respectively chemical shift at 8 M urea can be modelled as linear for all observed nuclei. The slope of the temperature dependence is not necessarily zero. The non-zero slopes are due to non-zero temperature coefficient of random coil state chemical shifts (Kjaergaard et al., J Biomol NMR, 2011). This observation suggests that residue helicities at 8 M urea are approaching their lowest values, validating the calculation of chain-averaged helicities from CD measurements. Also, this observation supports the results from

Bayesian-Englander model fitting, where no significant fprot values are observed with urea higher than 6M. For all residues and nuclei, the slope for temperature dependence at 6M urea shows little to no change when compared to the slope at 8M urea. If the slope at 8M can be interpreted as temperature dependence of chemical shifts that is independent of helicity change, this observation indicates that when urea concentration is higher than 6M, the influence of temperature on helicity change is insignificant. Note that the lack of temperature dependence does not neccessarily indicate lack of helicity. But rather that helicity change is negligible. Depending on the specific nuclei and amino acid type, the helical state chemical shift value could be either bigger or smaller than coil state chemical shift value (Av-

69 Table 3.1: Backbone assignments for LRH1x residues at pH 6.0, 15 ˝C & 0M urea

Res Type Res Number HN N CA CB CO HA Thr 2 8.766 114.621 62.987 69.968 175.623 4.319 Gln 3 8.909 122.027 57.835 28.581 177.401 4.193 Glu 4 8.572 120.983 58.95 29.454 177.929 4.142 Gln 5 8.211 120.217 57.545 28.82 178.148 4.253 Leu 6 8.254 122.762 56.965 41.994 179.11 4.236 Glu 7 8.336 120.966 58.237 29.542 178.226 4.18 Asp 8 8.385 121.198 56.564 40.942 177.844 4.485 Ala 9 8.125 122.556 54.735 18.366 180.173 4.176 Arg 10 8.112 118.927 58.705 18.366 178.604 4.109 Arg 11 8.112 120.99 58.46 30.42 178.21 4.146 Leu 12 8.096 120.389 56.899 42.007 178.725 4.17 Lys 13 8.006 120.566 58.081 32.755 177.469 4.192 Ala 14 7.941 122.215 53.961 18.632 179.425 4.254 Ile 15 7.848 119.156 63.207 38.672 177.365 3.909 Tyr 16 8.091 122.041 59.218 38.501 176.969 4.462 Glu 17 8.296 121.08 57.389 30.218 176.997 4.112 Lys 18 8.061 121.661 57.411 32.794 177.432 4.205 Lys 19 8.228 121.677 57.01 32.81 177.462 4.234 Lys 20 8.276 121.812 57.3 32.776 177.011 4.169 Asn 21 8.363 118.752 53.687 38.868 175.538 4.685 Gln 22 8.318 121.054 57.032 30.195 176.639 4.291 Leu 23 8.257 122.217 55.315 42.423 177.136 4.383 Gly 24 7.97 115.367 61.626 NA 179.154 NA belj et al., 2004a). All nuclei are sensitive to changes in residue helicity. However, the scale of secondary shift, i.e. deviation from coil state chemical shifts are significantly different depending on the type of nucleus. Hence the same change in chemical shift has different mathematical significance depending on the nucleus type. The differences between the maximum value of chemical shift and the minimum value of chemical shift for each residue and nucleus are showing in Figure (3.4). The maximum difference for each nucleus across residues is 2.73, 0.236, 1.27, 2.64, 0.724

13 1 13 13 1 1 15 and 4.40 for Cα, Hα, Cβ, C , HN and NH , respectively.

70 hrfr,temto a sotig ffc ntesoe n h othelical most the and scores the residues. on neighboring effect several “smoothing” on a Note has depends method residue the Leu23. one Therefore, to he- for Arg10 The score from estimated 0% Arg10. the reaches by that eventually contributed and is decreases helicity gradually maximum licity The (3.5), Figure in shown regions. disordered and regions loop the in residues values from experimentally reference derived coil of especially random set The were large 2012 ). al., a et from (Camilloni shifts derived chemical determined are that values shift chemical reference the propensities, structure δ secondary predict to designed algorithms existing Among using helicity residue of Prediction 3.2 o ahrsdeadnces h gr hw h ieec ewe h aiu n the and maximum the between difference the shows figure the nucleus, and residue each For Dmto eeoe yVnrsooe l spriual elaatdt provide to adapted well particularly is al. et Vandruscolo by developed method 2D codn oteVnrsooetmtosuigLHxceia hf data, shift chemical LRH1x using estimations Vandruscolo the to According iiu hmclsitaogtedt olce.Teui o h ieec sppm. is difference the for unit The collected. data the among shift chemical minimum Chemical shift difference (ppm)

0.5 1.0 1.5 2.0 2.5 0.5 1.0 1.5 2.0 2.5 ● ● ● ● ● ● 5 5 ● ● ● ● ● ● iue3.4 Figure ● ● ● ● 10 10 ● ● ● ● CO CA ● ● ● ● ● ● 15 15 ● ● ● ● ● ● ● ● ● ● 20 20 ● ● Sdffrne o ahrsdeadnucleus. and residue each for differences CS : ● ● ● ● ● ●

0.3 0.4 0.5 0.6 0.7 0.05 0.10 0.15 0.20 ● ● ● ● ● ● 5 5 ● ● ● ● ● ● ● ● Residue Number ● ● 10 10 71 ● ● ● ● HA HN ● ● ● ● ● ● 15 15 ● ● ● ● ● ● δ ● ● ● ● 2D 20 20 ● ● ● ● ● ● ● ●

1 2 3 4 0.2 0.4 0.6 0.8 1.0 1.2 ● ● ● ● ● ● 5 5 ● ● ● ● ● ● ● ● ● ● 10 10 ● ● ● ● NH CB ● ● ● ● ● ● 15 15 ● ● ● ● ● ● ● ● ● ● 20 20 ● ● ● ● ● ● ● ● fLHxbcbn hmclsit,tebscfr fLfo-ogfraimhas formalism Lifson-Roig of form basic analysis the the shifts, For chemical peptides. helical backbone in LRH1x helicity used of level be residue can the theory analyze and transition predict helix-coil to chapter, introduction the in model helix-coil introduced by As helicity residue of calculation The 3.3.1 model Bayesian-CSI 3.3 LRH1x. the in used shifts reference the ihicesn eprtr bevdb D oee,tecanlvlhlct cal- helicity level chain the the using However, culated CD. helicity by of observed decrease temperature level increasing chain with the with agrees temperature increasing with decrease terminal of shifts chemical abnormal from estima- resulted helicity residues. be high might The 3-5 residues helicity. for low tions experiencing to be close residues also the should theory, N-terminus helix-coil to the According arguable. is Arg10 being residue ntrso eprtr eedneo eiiyfraseicrsde h overall the residue, specific a for helicity of dependence temperature of terms In h hi-ee eiiyvle acltduigtesoe r hw nparentheses in shown are scores the using calculated values helicity chain-level The iue3.5 Figure Helicity 100% 100% 20% 30% 40% 50% 60% 70% 80% 90% 10% 10% 0% δ Dsoe o’ ge ihC esrmns hsi because is This measurements. CD with agree don’t scores 2D 0 eiiyvle siae using estimated values Helicity : 4 δ 8 Dd o cuaeyrflc h eeec hfsfor shifts reference the reflect accurately not do 2D Residue Number 12 72 16 16 20 24 24 δ Dalgorithm 2D 35C (16.5%)35C (21.4%)30C (28.1%)25C (36.6%) 20C 15C (45.6%) 10C (52.7%) 05C (58.1%)05C been used. The weight for helix elongation w is dependent on the free energy for transition from the coil to the helical state. This transition free energy is assumed to be dependent on temperature and urea concentration, that is:

∆Gi,j wi,j “ expp´ q (3.1) R ¨ T i

∆Gi,j “ ∆Hi ´ T i∆Si,j ` M ¨ U i (3.2)

where j is the residue number; i is the condition index; T and U are respec- tively temperature and urea concentration; ∆Hi is the enthalpy at the reference temperature T i; ∆Si,j is the entropy at temperature T i for residue j; M is the linear dependence of free energy on urea concentration; R is the gas constant. Terms ∆Hi and ∆Si,j can be further expanded as:

i i ∆H “ ∆H0 ` ∆CppT ´ T0q (3.3)

i i,j j T ∆S “ ∆S0 ` ∆Cp log (3.4) T0

where T0 is the reference temperature; ∆H0 and ∆S0 are the energy parameters at the reference temperature. The transition matrix for each residue then becomes:

hh hc ch cc

hh wi,j v 0 0 ¨ ˛ i,j hc 0 0 1 1 M “ ˚ ‹ (3.5) ˚ ‹ ch ˚ v v 0 0 ‹ ˚ ‹ ˚ ‹ cc ˚ 0 0 1 1 ‹ ˚ ‹ ˝ ‚ Because residues 2 and 23 also show continuous change in observed chemical shifts, it is assumed that all residues can sample the helical state. Therefore the

73 partition function for the Lifson-Roig model, which is also the equation for calculating chain-averaged helicity, is:

N Zi “ p0, 0, 1, 1qp M i,jqp0, 1, 0, 1qT (3.6) j“1 ź

The calculation for residue specific helicity is done by marginalizing the weights for helical state for the residue under consideration. That is:

k´1 N i,k i,j i,k i,j T fhel “ p0, 0, 1, 1qp M qMhelp M qp0, 1, 0, 1q {Z (3.7) j“1 j“k`1 ź ź where

wi,j v 0 0 0 0 0 0 M i,j “ (3.8) hel » v v 0 0fi — 0 0 0 0ffi — ffi – fl 3.3.2 Bayesian-CSI model

The purpose of the model is to enable Bayesian inference of free energy related parameters and reference chemical shift parameters based on the measured chemical shifts for LRH1x in various conditions. The model has incorporated Lifson-Roig formalism so that residue specific helicity in all solution conditions are influenced by several global parameters, including ∆H, m for urea dependence, ∆Cp and residue specific ∆S.

It is difficult to determine the value for ∆Cp using the existing data. Tempera- tures used when collecting the chemical shift data are between 0˝C and 50˝C. In this range, there is insignificant amount of curvature for ∆G versus temperature. Being that fitting for ∆Cp relies on this curvature, the value of ∆Cp cannot be accurately determined using the existing chemical shift data. When fitting LRH1x data, ∆Cp is fixed at 0.

74 During testings, the posterior mean of ∆H0 was found to be between -0.8 kcal{mol and -1 kcal{mol. This range is biophysically plausible (Scholtz et al., 1991b,a).

However, because ∆H0 has relatively high correlations with ∆S0 parameters, the prior distribution of ∆H0 can influence the posterior distribution to a certain degree.

∆H0, therefore, cannot be precisely determined using existing data. When fitting chemical shift data , the value for ∆H0 is fixed to -0.955 kcal/mol (Scholtz et al., 1991b). Each chemical shift is assumed to be defined by two reference chemical shifts, helicity for the residue under consideration and a temperature dependence of chemical shift for coil state. This relationship can be expressed as:

i,j,m j,m i,j j,m j,m i i,j m CSobs “ CShel ¨ fhel ` pCScol ` ∆CScol ¨ T q ¨ p1 ´ fhelq `  (3.9)

where m is the error associated with chemical shift measurements, assuming a simple distribution is efficient to describe the errors associated with all the measure-

i,j,m j,m ments for a single type of nucleus. CSobs are the observed chemical shifts; CShel j,m j,m and CScol are helix state and coil state reference chemical shifts. ∆CScol describes the temperature coefficients for the change in coil state reference chemical shifts.

j,m j,m The terms CShel and CScol can be expanded as:

CSj,m CSj,m CSj,m (3.10) hel “ helo ` helγ

CSj,m CSj,m CSj,m (3.11) col “ colo ` colγ

where CSj,m and CSj,m are the fixed experimental chemical shifts for a specific helo colo nucleus at 5˝C, 0M urea and at 5˝C, 8.18M urea, respectively; CSj,m and CSj,m helγ colγ are the correcting parameters to CSj,m and CSj,m . helo colo The error parameters are assumed to follow normal distributions, that is:

75 m „ N p0, pσmq2q (3.12)

In the Bayesian-CSI model, the likelihood P pd~ | θ~q describing the probability of observing d~ given a candidate value for θ~ is given by

Nc Nr Nn ~ ~ i,j,m i,j,m m 2 P pd | θq “ N CScalc ´ CSobs ; 0, pσ q (3.13) i“1 j“1 m“1 ź ź ź ` ˘ where

i,j,m j,m i,j j,m j,m i i,j CScalc “ CShel ¨ fhel ` pCScol ` ∆CScol ¨ T q ¨ p1 ´ fhelq (3.14)

Nc, Nr and Nn are respectively the number of conditions, number of residues and the number of nucleus types. Np¨; µ, σ2q denotes the probability density function of a normal distribution with mean µ and variance σ2. The inference was done using MCMC package STAN (http://mc-stan.org) in the R environment.

3.3.3 Posterior distributions

The model contains global parameters, residue-specific parameters and residue and nucleus specific parameters. Global parameters include ∆H, ∆Cp and m (urea de- pendence of free energy), where ∆H and ∆Cp are fixed. ∆S parameters are residue specific. Residue and nuclei specific parameters include reference chemical shifts and parameters for temperature dependence of coil state chemical shifts. Similar to the fitting for exchange rate constants, the model simultaneously fits for reference chemical shifts and helicity values. The fitting model incorporated helix- coil model. So the fitting model constraints the helicity parameters of all residues within the same condition. It also constrains the helicity parameters of the same residue across conditions.

76 40 30 20 frequency 10 0

29.5 30.0 30.5 m (cal/mol/M)

Figure 3.6: Posterior distribution of global parameter m

The histogram is drawn from 1000 posterior samples

The posterior distributions of global thermodynamics parameters are biophysi-

cally plausible in describing helix-coil transition. The mean for m is 30.0 cal ¨ mol´1 ¨ M ´1, with the width of the distribution being 0.3 in the same unit (Figure 3.6). The helix-coil model used here is the most simplified form of the Lifson-Roig formalism. It doesn’t distinguish the type of helix initialization. There is also no specific sidechain interaction terms incorporated. The purpose of the embedded helix-coil model here is to constrain the helicity parameters in order to fit for reference chemical shifts rather than to understand the thermodynamics contributions to helix- coil transition. Compared with the Bayesian-Englander model, the Bayesian-CSI model has less complexity in reference parameters. For each residue, the helix state reference chemi- cal shift parameters are only nucleus type dependent; the coil state reference chemical shift parameters are nucleus type and temperature dependent. Therefore, most of the constraints are provided by the helix-coil model, including temperature dependence, urea concentration dependence and the the propagation of helicity throughout the

77 Distributions of helical state reference chemical shift deviations (ppm) CA HA CB 0.5 2 10 0.0 0 −2 −0.5 5 −4 −1.0 0 −6 −1.5 1 3 5 7 9 12 15 18 21 24 1 3 5 7 9 12 15 18 21 24 1 3 5 7 9 12 15 18 21 24

CO HN NH 5 10 2 0 0 Chemical shift deviation (ppm) 5 −2 −5 0 −4 −10

1 3 5 7 9 12 15 18 21 24 1 3 5 7 9 12 15 18 21 24 1 3 5 7 9 12 15 18 21 24 Residue Number

Distributions of coil state reference chemical shift deviations (ppm) CA HA CB 0.6 0.10 0.0 0.4 0.05 0.2 0.00 0.0 −0.5 −0.10 −0.4 −1.0

1 3 5 7 9 12 15 18 21 24 1 3 5 7 9 12 15 18 21 24 1 3 5 7 9 12 15 18 21 24

CO HN NH 5 0.8 4 0.5 0.7 3 0.6 2 0.5 0.0 1 0.4 0 0.3 −0.5 −1 0.2 −2 1 3 5 7 9 12 15 18 21 24 0.1 1 3 5 7 9 12 15 18 21 24 1 3 5 7 9 12 15 18 21 24 Residue Number

Figure 3.7: Reference chemical shift deviations from δ2D values

Values for residues 1 and 24 are not fitted. 78 chain. Figure 3.7 shows the deviation of chemical shift posterior means from the reference values calculated using the Vandruscolo method. The reference chemical shifts for the first and last residues of LRH1x were not fitted. The following discussion focuses on the means of the distributions. Generally speaking, there are significant deviations for all six nuclei for both helix state references and coil state references. Distinctive patterns can be observed for the deviations of helix state reference values. The deviations are relatively small for residues in the middle of the chain, especially for residues from Ala9 to Ala 14.

13 13 Deviations for helix references are relatively big for terminal residues. For Cα, Cβ,

13 1 1 1 C and Hα, the most significant deviations are observed for residues 2-4. For HN

15 and NH , the most significant deviations are observed for residues 18-23. The patterns are less clear for deviations of coil state chemical shifts. The most significant deviation for each nucleus is about an order of magnitude smaller than

15 15 its counterpart for helix references, except for NH . The absolute values for NH

1 deviations are also significantly bigger than the rest of the nuclei. Compared to Hα

1 deviations, the deviations for HN are also significantly bigger. This trend is unsur- prising. The chemical shifts of amide proton and nitrogen are influenced not only by secondary structure propensity, but also other factors such as hydrogen bonding strength with the bonding partner. Therefore, the database used for training δ2D references involves chemical shifts from various sources and might not be appropriate for LRH1x residues. The terminal residues experience relatively low helicity for all conditions. There- fore, the observed chemical shifts for terminal residues do not contain significant amount of information for helical state reference chemical shifts. The accuracy of helix reference deviations for terminal residues, therefore, might not be as accurate as the values for residues toward the middle of the chain. For example, the posterior

79 1 means for HN helical state chemical shifts of residue 22 and 23 are 3.76 ppm and 4.30 ppm, respectively. These values are not typically expected for amide protons. Nonetheless, the deviations of helix reference values of residues 2-5 can explain the abnormally high helicity predictions for residues 3-6 when using δ2D algorithm. The fitted helicity posterior means are more biophysically realistic. This fitting has incorporated all data from all six nuclei. Therefore, the model is assumed to have the ability to distinguish helicity relevant information contained in the observations from noise. The estimates of noise level are manifested in the σ parameters. The posterior means for the σ parameters are 0.069, 0.0071, 0.039,

13 1 13 13 1 1 15 0.051, 0.029 and 0.15 for Cα, Hα, Cβ, C , HN and NH , respectively. Here, posterior means of σ for three carbon types are comparable. Posterior mean of σ for

1 1 HN is significantly bigger (about four times) than that of Hα. And the posterior

15 mean of σ for NH is the biggest among all nucleus types. However, this does not necessarily mean the the levels of noise are different among nuclei. In fact, in the same order, the ratios of σ posterior mean to the maximum difference of chemical shift (see Figure 3.4) are 0.025, 0.030, 0.031, 0.019, 0.040 and 0.034. This means that comparatively, observations for all six nuclei contributed similar amount of information to the inference process. Figure 3.8 shows the comparison between posterior distributions of helicity from the Bayesian fitting and the scores calculated using δ2D algorithm. Because all helic- ity parameter posterior distributions are relatively narrow, the following discussion will focus on comparing the means of the distributions with the δ2D scores. The first difference is the residue with the highest helicity, being residue 10 in δ2D estimation and residue 11 for posterior means. However, this difference might not be statistically significant because the distribution for residue 10 and 11 overlap to a large extend. The second difference is the helicity values for residues 3-6. As dis- cussed previously, the helical state chemical shift references used in the δ2D algorithm

80 1.0

0.8 ●

● 0.6 ● ● ● ● ● Helicity ● ● 0.4

● 0.2

● ● ● ● ● ● 0.0

1 2 3 4 5 6 7 8 9 11 13 15 17 19 21 23

Residue Number

Figure 3.8: Helicity comparison between δ2D scores and posterior distributions

Helicity values estimated using δ2D algorithm and posterior distributions from Bayesian fitting for LRH1x residues at 20˝C and 0 M urea

are significantly different for residues 2-5 compared with the fitted means, causing the helicity values to differ. The third difference is related with the propagation of helicity. Because the incorporation of helix-coil model in fitting, the propagation of helicity indicated by the posterior means can be easily described by a biophysical model; while it will be difficult to give such a description to the δ2D scores. Figure 3.9 shows the helix propagation weight w for individual residues at 20˝C and 0M urea. The amino acid types with the highest w distribution means are: Ala (9/14), Arg(10/11), Leu(6/12/23) and Lys(13/18/19/20). This pattern matches the w values published by Chakrabartty et al. (Chakrabartty et al., 1994). The absolute values of w value posterior means are significantly bigger than the published values,

81 3.5 3.0 2.5 2.0 1.5 Helix propagation weight 1.0 0.5 0.0 1 3 5 7 9 11 13 15 17 19 21 23

Residue Number

Figure 3.9: w value posterior distributions at 20˝C and 0 M urea

Values for residues 1 and 24 are not fitted. however. The w posterior means are especially high for residues 9-12 and residue 14. They are significantly higher than 1.61, which is the published w value for Alanine. This does not mean that the published values are unreasonable, but rather adjustments need to be made for each residue in order to better explain the helicities associated with the observed data.

3.3.4 Discussion

Similar to the kint values, here the reference chemical shift values are estimated using experimental data. And the collection of data over a wide range of conditions enables global fitting for highly correlated parameters. Therefore, the reference values are directly influenced by the experimental data LRH1x, rather than influenced by a

82 database of proteins that are not directly relevant to the current sequence. Helix 1 was proposed to be one of the early folding elements in the folding process of λ-repressor N-terminal domain. Combining experimentally determined chain-level helicity, residue-specific helicity and the extend of amide proton protection, it is apparent that the helix 1 region can remain significantly helical as an isolated peptide over a wide range of temperatures and urea concentrations. And the residue-specific information can be explained by classical helix-coil model. Alanine, Arginine and Leucine are most “helical-forming” sidechain types. The posterior distribution of the w parameters confirms this pattern. With residues 9- 14 in LRH1x being ARRLKA, this sequence was naturally selected to populate the helical state.

83 4

Residual helicity in λ repressor N-terminal domain

Experimental results from the previous two chapters (Chapter 2-3) have provided evidence that amide exchange rate constants (kobs) and backbone chemical shifts are useful probes for residual helicity. Values for kint and reference chemical shifts have also been estimated for LRH1x residues through global fitting. This chapter concerns the residual helicity of λ repressor N-terminal domain. Backbone chemical shifts

* and kobs values are experimentally extracted for λ 1-85 residues in the unfolded state. Experimental results for residues 8-30, corresponding to residues 2-24 in LRH1x, are compared with the results for peptide LRH1x residues.

4.1 Backbone chemical shifts

4.1.1 Materials and Methods Protein expression, purification and oxidation

* The gene sequence of λ 1-85 was inserted in pET9A plasmid vector containing T7 promoter. The recombinant plasmid were transformed into BL21(DE3) competent cells and the cells were incubated in 1L M9 minimal media (6 g/L Na2HPO4, 3 g/L

15 13 KH2PO4, 0.5 g/L NaCl, 1 mM MgSO4, 0.1 mM CaCl4, 1 g/L NH4Cl and 2 g/L C-

84 ˝ 13 15 * Glucose) at 37 C to an OD600 of 1.0. Overexpression of C- and N-labeled λ 1-85 protein was induced by adding IPTG to a concentration of 0.8 mM. After incubation at 37 ˝C for an additional 6 hours, the cells were harvested by centrifugation and then resuspended in 1 volume of lysis buffer (30 ml per liter of culture, composed of 10 mM Tris, 5 mM DTT, 10 mM EDTA at pH 8.0). Suspended cells were passed through French pressure cell at 12 000 lb/in2 to create cell lysates. After the removal of insoluble components through centrifugation, cell lysates were purified through a DEAE Sephacel column to further remove other components, including cell debris, nucleic acid and other proteins. The flow through from the DEAE Sephacel column was then purified through an Affigel Blue column. The target protein would bind to the column when equilibrated in low salt buffer (20 mM Potassium Phosphate buffer, 0.1 mM EDTA and 10 mM KCl) and would be eluted from the column with 300 mM KCl. The fractions containing the target protein were dialyzed against distilled H20 3 times and lyophilized. The lyophilized protein was resolubilized in 1 ml 1% acetic acid and was purified by passing through a Sephadex G-50 column equilibrated in 1% acetic acid. The fractions containing the target protein were lyophilized. The methionine residue of the lyophilized protein was then oxidated using hy- drogen peroxide. The products of methionine oxidation are shown in Figure 4.1.

˚ With oxidation condition used here to produce λ1´85, only the first oxidation step

˚ will occur. Therefore, the methionine was oxidized to methionine sulfoxide in λ1´85. Protein oxidation follows the protocol suggested by Chugha et al. (Chugha et al., 2006). Before oxidation, the lyophilized protein was solubilized at 1 mg/ml con- centration in distilled H2O. 0.1 M perchloric acid was added to the solution to a final concentration of 20 µL/ml. Then 30% hydrogen peroxide was added to the solution to a final concentration of 0.05%. The final solution was thorouly mixed and incubated at room temperature for 45 mintutes. After incubation, the solution

85 CH3 CH3 CH3

S S O O S O oxidant CH2 oxidant CH2 CH2

CH2 CH2 CH2

NH CH C NH CH C NH CH C

O O O Methionine Methionine Sulfoxide Sulfone

Figure 4.1: The products of methionine oxidation.

was lyophilized. Purity and the oxidation state of the protein was checked by gel electrophoresis, HPLC and mass spectrometry.

Backbone chemical shift measurements

* Backbone chemical shifts of unfolded state λ 1-85 were assigned at 3 conditions with 3 samples. All 3 samples are at pH 5.3 and contain 20 mM Sodium Acetate buffer and 5 mM Imidazole. The urea concentration is 0M, 1.0M and 2.0M respec-

N H tively. H ,N ,Cα,Cβ, C’ and Hα chemical shifts were assigned for the sample

N H at 0M urea; H ,N ,Cα,Cβ and C’ chemical shifts were assigned for the sam- ples at 1.0M and 2.0M urea. NMR experiments for assignments include HNCA, HN(CO)CA, HN(CA)CB, HN(COCA)CB, HNCACB, CBCA(CO)NH,HN(CA)CO, HNCO, HA(CA)NH, HA(CACO)NH. The protein concentration is 0.25 mM for all samples.

15 * Then N-HSQC and HNCO titrations were carried out for unfolded state λ 1-85 at pH 5.3 (containing 20 mM Sodium Acetate buffer and 5 mM Imidazole). Both spectra were collected at 0M, 0.5M, 1.0M, 1.6M and 2.0M urea. Missing assignments at 0M urea were estimated using linearly extrapolated chemical shift (including HN,

H N ,Cα,Cβ and C’) based on assignments for 1.0M and 2.0M urea samples.

* All experiments for λ 1-85 backbone chemical shift assignments were performed

86 on a Varian 800MHz instrument with a cryogenic probe.

4.1.2 Results

˚ Stability of λ1´85 in solution

˚ Protein λ1´85 has poor stability in solution. Undesirable solution compositions can cause protein aggregation. The stability can be significantly affected by protein concentration, salt concentration, incubated temperature, temperature change, in- cubated pH, pH changed and buffer composition. Trials of stability tests were carried out in order to maximize the stability of NMR samples. The main findings of such tests are the following:

1. Minimizing temperature change can prolong the lifetime of a initially soluble sample. Increasing temperature decreases the time needed to start observing visible precipitation.

2. Protein solutions are stable without buffer and added salt, both in acidic con- dition and pH close to neutral. Such samples in the acidic conditions can also withstand extreme temperatures and rapid temperature changes, with the highest concentration tested being 1.2 mM.

˚ 3. Added salt negatively affect the stability of λ1´85 in solution. Salt concentration tested were 100 mM for NaCl or KCl.

4. After incubated at 37˝C for 24 hours, commonly used buffer compositions (without added salt), including sodium phosphate buffer and sodium cacody- late buffer, can cause the protein to aggregate. Protein is relatively stable in sodium acetate buffer. A mixture of sodium acetate buffer and imidazole can be used to adjust solution pH over a wide range.

87 5. According to the ProtParam tool, the theoretical isoelectric point is 9.74. Therefore, theoretically, the protein becomes more stable when the solution pH is lowering further away from the isoelectric point. In practice, the pro- tein solution was found to be relatively stable in pH lower than 6.0 (can be incubated at 37˝C for 24 hours without observing precipitations).

6. Addition of urea can significantly increase the lifetime of the protein solutions. The urea concentrations tested were 1.0M and 2.0M.

7. Increase in protein concentration decreases the time needed to observe precipi- tation. For example, after incubating at pH 6 with 1.0M urea in 37˝C overnight, a 500 µM sample will form visible precipitation while a 250 µM sample will not.

Backbone chemical shifts

˚ The following table summarizes the backbone chemical shift assignments for λ1´85 at pH 5.3 and 20 ˝C with no urea present in the solution.

˚ Table 4.1: Backbone assignments for λ1´85 residues at pH 5.3, 20 ˝C and 0M urea

Res Type Res Number HN N CA CB CO HA Ser 1 NA NA 57.359 63.568 171.298 4.286 Thr 2 8.679 116.032 62.057 70.208 174.334 4.437 Lys 3 8.474 124.942 56.377 33.557 176.495 4.360 Lys 4 8.427 123.942 56.201 33.620 176.422 4.334 Lys 5 8.461 125.108 54.501 NA 174.610 NA Pro 6 NA NA 62.856 32.517 177.070 4.492 Leu 7 8.347 122.801 55.315 42.959 178.188 4.509 Thr 8 8.519 113.212 61.106 71.275 175.282 4.408 Gln 9 8.853 120.682 58.825 28.639 178.210 4.113 Glu 10 8.628 119.428 59.530 29.561 178.695 4.118 Gln 11 7.904 119.483 58.335 29.679 179.493 4.215 Leu 12 8.467 122.460 57.721 42.150 179.617 4.127 Glu 13 8.280 120.624 58.667 29.397 178.919 4.185 Asp 14 8.294 120.886 57.030 40.927 178.152 4.518

88 Ala 15 8.059 121.815 55.049 18.439 180.333 4.157 Arg 16 8.057 118.988 59.306 30.684 179.202 4.088 Arg 17 8.140 121.024 59.094 30.812 178.749 4.150 Leu 18 8.103 119.958 57.407 42.089 178.908 4.138 Lys 19 8.030 119.973 58.868 33.053 177.853 4.119 Ala 20 7.824 121.072 54.467 18.651 180.300 4.276 Ile 21 7.818 119.265 64.129 38.710 177.823 3.867 Tyr 22 8.124 121.331 60.205 38.612 177.402 4.367 Glu 23 8.374 119.501 58.121 30.237 177.829 4.028 Lys 24 7.909 120.495 58.119 33.068 178.104 4.211 Lys 25 8.100 120.183 57.438 32.903 177.939 4.242 Lys 26 8.162 120.625 58.149 32.878 177.669 4.054 Asn 27 8.214 118.180 54.288 39.146 176.447 4.671 Glu 28 8.251 120.818 57.451 30.211 177.330 4.263 Leu 29 8.093 120.844 55.755 42.631 178.217 4.342 Gly 30 8.196 108.383 45.825 NA 174.701 3.995 Leu 31 8.007 120.866 55.241 42.741 177.825 4.424 Ser 32 8.345 116.257 58.547 64.125 174.945 4.479 Gln 33 8.482 122.043 56.233 29.742 176.329 4.369 Glu 34 8.440 121.369 56.927 30.299 176.724 4.325 Ser 35 8.406 117.031 58.519 64.221 175.046 4.517 Val 36 8.168 121.538 62.854 32.971 176.461 4.121 Ala 37 8.276 126.252 53.136 19.566 178.066 4.294 Asp 38 8.216 119.125 54.584 41.340 176.848 4.589 Lys 39 8.231 121.231 56.890 33.075 177.262 4.301 Met 40 8.498 118.999 55.861 26.971 176.192 4.497 Gly 41 8.432 109.524 45.668 NA 174.472 4.028 Met 42 8.449 118.848 55.533 27.415 176.184 4.587 Gly 43 8.626 110.061 45.650 NA 174.466 4.008 Gln 44 8.379 120.044 56.401 29.754 176.706 4.367 Ser 45 8.443 116.940 58.970 64.018 175.071 4.437 Ala 46 8.401 126.136 53.503 19.555 178.703 4.315 Val 47 7.936 117.835 63.796 33.059 176.705 NA Ala 48 8.124 125.357 53.839 NA 178.517 NA Ala 49 8.026 121.136 53.242 19.563 178.521 NA Leu 50 7.873 119.266 55.885 42.551 177.731 NA Phe 51 8.024 118.870 58.194 39.736 175.870 NA Asn 52 8.227 119.845 53.218 39.397 175.980 4.740 Gly 53 8.136 108.637 45.908 NA 174.710 NA Lys 54 8.223 120.536 56.935 33.104 176.778 4.311 Asn 55 8.296 118.181 53.411 39.256 175.246 4.736 Lys 56 8.177 121.109 56.745 33.427 176.840 4.299 Leu 57 8.262 122.573 55.417 42.591 177.323 NA

89 Asn 58 8.364 119.465 53.240 39.093 175.403 NA Ala 59 8.269 122.667 54.161 19.603 178.853 NA Tyr 60 8.159 119.330 59.782 38.996 176.736 NA Asn 61 8.295 119.484 54.262 39.497 176.610 NA Ala 62 8.497 123.861 55.315 19.005 179.355 NA Ala 63 7.961 120.423 54.414 18.794 179.976 NA Leu 64 7.708 119.600 57.120 42.526 178.912 NA Leu 65 7.949 119.795 57.022 42.845 178.179 NA Ala 66 7.935 120.680 54.272 18.926 179.058 NA Lys 67 7.692 117.726 58.026 32.942 178.119 NA Ile 68 7.934 120.096 62.918 38.566 177.254 NA Leu 69 NA NA NA NA NA NA Lys 70 NA NA NA NA NA NA Val 71 NA NA 61.911 33.862 176.156 NA Ser 72 8.352 118.450 58.405 64.294 175.072 NA Val 73 8.282 121.872 63.112 31.797 176.600 NA Glu 74 NA NA NA NA NA NA Glu 75 NA NA 56.894 30.723 176.293 NA Phe 76 8.053 119.680 57.487 40.432 175.099 NA Ala 77 8.137 126.188 50.785 18.865 NA NA Pro 78 NA NA 63.662 32.446 177.459 4.425 Ser 79 8.386 115.214 58.847 63.998 175.027 4.428 Ile 80 7.989 122.275 61.369 39.229 176.212 4.184 Ala 81 8.261 127.179 52.827 19.555 177.786 4.292 Arg 82 8.164 119.789 56.221 31.370 176.324 4.329 Glu 83 8.350 121.690 56.436 30.561 176.222 4.339 Ile 84 8.215 122.833 61.261 38.774 175.418 4.192 Arg 85 7.985 130.668 57.371 32.061 180.919 4.233

* In the unfolded state, the backbone chemical shifts of λ 1-85 is relatively degen- erated. For example, the degeneracy can be observed in 15N-HSQC spectrum. Except for the first residue and two Proline residues, 57 residues can be observed on 15N-HSQC spectrum for 0M urea sample; whereas 78 residues can be observed on 15N-HSQC spectra for either 1.0M or 2.0M urea sample. This difference in number is due to gradually decreasing peak intensities with decreasing urea concentration for several residues. Some of these residues have their peaks completely disappearing in 0M urea sample. With relation to the folded structure, most of the missing peaks in

90 G30 G53 G41 G43

T8

S79

S32

S45 S35 V47 K67 N55N27 M42 M40 D38 R16 L50 E10 N58/61 Y60 Q11 N52 L18 K19 E23 R82 I21 L64 K54 Q9 Q44 E13 K26 K25 L65 E28 R17 L29 K24 E34 D14 L31 A20 K56 A49 E83 K39 Y22 A15 V36 Q33 L57 I80 L12 L7 I84 A62 K4

K3 A48 K5 A46 A37 A77

A81

R85

15 ˚ ˝ Figure 4.2: N-HSQC of λ1´85 at pH 5.3, 20 C and 0M urea

Crosspeaks are labeled with corresponding assignments

91 0M urea sample correspond to linker residues between helix 4 and helix 5. Compared with the assignments use in backbone dynamics study (Chugha and Oas, 2007), these assignments have been corrected and validated among multiple conditions. Therefore, in the following discussions for residual helicity, conclusions do not completely agree with conclusions in the backbone dynamics study.

Heliciy estimation by δ2D algorithm

The δ2D algorithm was first used to estimate residue specific helicity values for residues throughout the chain. Then this estimation was compared to the helix- coil model prediction for the same sequence. The comparison between experimental estimation and theoretical prediction is shown in Figure 4.3. As shown in Figure 4.3, there is a pattern similarity for highly helical areas between estimation and prediction. Three regions with high helicity have been iden- tified. In terms of folded state structure, two of the three areas approximately corre- spond to helix 1 region and helix 4 region. The third region with significant amount of helicity correspond approximately to the helix 3 region, although the the biggest helicity value in the region is only 21% in experimental estimation. Similar to the CSI fitting model used in Chapter 3, the calculated helicity values use global ∆H and

∆Cp and sidechain-specific ∆S values. The absolute values of helicity in prediction depend heavily on the values chosen for thermodynamic parameters. Especially, the value for ∆H can significantly influence the overall magnitude of helcity. Therefore, the absolute values for experimental estimation is more relevant to the discussion. There are 12 Alanines in the entire sequence. Being the top “helix former” among all amino acid types, Alanines have contributed to the helicity in all 3 regions, with 2 Alanines in the helix 1 region, 3 Alanines in the helix 3 region and 4 Alanines in the helix 4 region. According to Chakrabartty et al., is among the amino acid types least likely to sample helical state. Therefore, Valine being the neighboring

92 1.0 ● ● ● ● ● ●● ● ● ● ● ●●

● ●

0.8 ●

● ● ● ● ● ● ● ● Experimental ● ● ● ● ● ●● Theoretical ● ● ● ● ● ● ●

0.6 ● ● ● ● ● ●● ● ● ● ● ● Helicity

● ● ● ● ● 0.4 ●

● ● ● ● ● ● ● ● ●

●● ● ● ●● ●● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ●●●● ● ●● ● ● ● ●● ● ● ●●●● ● ● ● ●●●●● ●● ● ●●●● 0.0

0 20 40 60 80

Residue Number

Figure 4.3: Experimental estimated and theoretical calculated helicity

This figure shows the comparison of experimental estimated and theoretical calculated helicity ˚ ˝ values for λ1´85 at 20 C. For the theoretical calculation, the ∆S value for each sidechain type is extracted from the w values provided by the Balwin group (Chakrabartty et al., 1994). residue to Alanine greatly reduce the residue helicity values in the vacinity. This is evident for residues 36-37(VA) and 46-49(AVAA), with the latter contributing to the observed helicity in the helix 3 region. Four Glycine residues contribute to the segmentation of helical regions along the chain: they are G30, G41, G43 and G53. Although the segmentation in experimental estimation and theoretical prediction is not identical, for each segmentation, the boundary is no more than 4 residues away from each other.

˚ The boundaries of the first four helices of λ1´85, as defined by Burton et al.(Burton et al., 1998), are respectively residue 9-30, 33-39, 44-52 and 59-69. It’s not surprising

93 that G30, G41, G43 and G53 played the role of terminating helix 1, 2 and 3 with Glycine’s unique accessibility of Ramachandran space. Therefore, the separations of helical regions both in the folded state and in the unfolded state are primarily caused by Glycine residues. One feature that is contradictory in experimental estimation and in theoretical prediction is the helicity comparison between helix 1 region and helix 4 region. In experimental estimation, the residue with the highest helicity in the helix 1 region (residues 11 and 16, both 97%) is more helical than the residue with the highest helicity in the helix 4 region (residues 63 and 64, both 87%). And the order of such helicity comparison is reversed in theoretical estimation. Note that the theoreti- cal calculation is carried out in the form of classic Lifson-Roig formalism, without sidechain interactions built into the recursive multiplied matrices. Therefore, one possible reason for the mismatch between estimation and calculation is due to the lack of consideration for sidechain interactions in the model.

˚ Comparison of helicity between LRH1x and λ1´85

˚ Residues 2-24 in LRH1x correspond to residues 8-30 in λ1´85. Fig 4.4 shows the comparison between the helicity of LRH1x residues at 20 ˝C, pH 6 and the helic-

˚ ˝ ity of λ1´85 residues at 20 C and pH 5.3. There is no residue in either sequence. Therefore, there is no titratable group between pH 5 and 7 for both se- quences. Theoretically, pH change in this range should not influence conformational equilibrium. The first comparison is comparison of estimated helicity by δ2D algorithm. All residues in the figure have a higher helicity in the context of the protein than in the context of the peptide. In the second comparison, the helicity values are estimated using posterior distributions of reference chemical shifts from Chapter 3. Again, all

˚ residues have higher helicity means in λ1´85, with the means of Arg 11 to be close.

94 1.0 ● ● 1.0 ● ● ● ● ● ● ● Protein Protein ● ● Peptide ● Peptide ● ● ● ●

● ●

0.8 ● 0.8

● ● ● ●

0.6 ● 0.6 ● ● ● ● ● ● ● ● Helicity Helicity ● ● 0.4 0.4 ● ● ●

● ● 0.2 0.2 ●

● ●

● ● ● ● ● ● 0.0 0.0

5 10 15 20 1 3 5 7 9 11 13 15 17 19 21 23

Residue Number Residue number

˚ Figure 4.4: Comparison of helicity between LRH1x and λ1´85

˚ ˝ Comparison of helicity between residues 2-23 in LRH1x and residues 8-30 in λ1´85 at 20 C. The values in left figure are estimated using δ2D algorithm; the distributions in the right figure are generated using the posterior distributions of reference chemical shifts in Chapter 3.

The only exception is Thr (residue 2 in LRH1x). Therefore, except one residue, all residues being more helical is an agreed feature in both comparisons. The discrepancies between two sets of comparisons are the values of helicity and the most helical residues. Based on the results in Figure 4.4, the δ2D algorithm overestimates the helicity for the most helical residues and, in the case of LRH1x, underestimates the helicity of the least helical residues. The helicity for Thr2 in LRH1x was not estimated by δ2D due to the algorithm’s dependence of chemical shifts of neighboring residues in both directions. The only comparison for the Threonine residue is provided by fitted reference shifts. The estimated means for this residue is smaller in the context of the protein. This com- parison could be biophysically realistic, or could be caused by the fact that the reference shifts determined for Thr residue in the peptide is not appropriate for this residue in the protein.

95 4.2 Amide exchange rate constants

4.2.1 Materials and Methods

˚ The measurements of amide exchange rate constants of λ1´85 backbone amide protons were done by carrying out CLEANEX experiments. The solution contains 7mM Acetic acid and 20 mM Imidazole. D2O concentration is 10%. pH is adjusted by

˚ adding NaOH and read by pH paper as pH 7.5. Concentration of λ1´85 is 250 µM. Mixing time of CLEANEX was chosen to include 0.5 ms, 40 ms, 70 ms, 110 ms, 160 ms, 220 ms, 500 ms, 1 s, 1.5 s and 2 s. For each mixing time point, the spectrum resolution in nitrogen dimension is 0.025 ppm. In addition, an 15N-HSQC spectrum with the same carrier positions, spectrum width and indirect dimension resolution was collected. The frequency domain data was converted to time domain data using NMRPipe. the intensities of the crosspeaks were further analyzed using NMRViewJ. The intensi- ties of CLEANEX crosspeaks were normalized to the intensities of the corresponding crosspeaks in the 15N-HSQC spectrum. The fitting procedure was as suggested by Huang et al.(Hwang et al., 1998).

* All experiments for λ 1-85 backbone amide proton exchange rate constant mea- surements were performed on a Bruker 600MHz instrument with a cryogenic probe.

4.2.2 Results

CLEANEX experiment works similar to SOLEXSY. The fitted rate constants are directly comparable. Figure 4.5 shows the exchange rate constants of LRH1x residues

˚ versus exchange rate constants of residues 8-30 of λ1´85 at the same temperature (20˝C). The exchange rate constant comparison shown is after the pH discrepancy and D2O concentration difference has been corrected. Figure 4.6 shows the ratio of rate constants. Each value is the ratio of rate constants for the same residue in the

96 10 8

● Peptide ● Protein 6

(/s) ● obs k

● 4

● ● ● ● ●

● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0

0 5 10 15 20

Residue Number

Figure 4.5: Comparison of observed rate constants.

The kobs of Gln3 in LRH1x is not shown because the magnitude of its value. Both sets of values were measured at 20˝C. The pH discrepancy was corrected by normalizing against the pH used for LRH1x. peptide versus in the protein. As shown in Figure 4.6, for all 20 residues being compared, the exchange rate constants for 18 residues are equal or higher in the peptide than in the protein. Only residue 11 and 17 has slightly lower rate constant in the peptide. Similar to chemical shift comparisons, the ratio of kobs values for the Thr residue (Thr2 in LRH1x) is not calculated due to the lack of data for Thr2 in LRH1x. Figure 4.7 shows the comparison of fraction protection distributions calculated using the kobs values in Figure 4.6. The error bars of kobs ratios are relatively big and the ratios lack recognizable pattern throughout the sequence. The uncertainties and lack of pattern

97 12 10

● 8 Lambda*1−85) obs 6 LRH1x/k ● obs ● ● (/s)(k ● 4

obs ● ● ● ● ● ● ● ● 2

Ratio of k ● ● ● ● ● ●

● 0

0 5 10 15 20

Residue Number

Figure 4.6: Ratios of observed exchange rate constants

Ratios of observed exchange rate constants between residues 3-24 in LRH1x and residues 9-30 ˚ ˝ in λ1´85. Both sets of values were measured at 20 C.

could come from the inaccuracy in kobs estimations using CLEANEX data, which is known to be a less accurate method than SOLEXSY. They could also indicate that

˚ kint values are different for helix 1 residues in λ1´85, which in turn indicates that the

˚ estimation of fprot values for λ1´85 residues are not completely reliable. Combining with the comparison of helicity values, the two comparisons both

˚ indicate that residues of helix 1 region in λ1´85 experiences higher compactness than as an isolated peptide.

98 ● Protein 1.0 ● Peptide 0.8 0.6 0.4 Fraction protection Fraction 0.2 0.0

1 3 5 7 9 11 13 15 17 19 21 23

Residue Number

˚ Figure 4.7: Comparison of fraction protection between LRH1x and λ1´85

˚ The fraction protection distributions of λ1´85 residues are calculated from the posterior distributions of fprot parameters for LRH1x residues, assuming the same kint for each residue in both contexts. The temperature is 20˝C.

4.3 Discussion

4.3.1 Implication of missing crosspeaks

˚ λ1´85 is prone to aggregate. And in pH 5.3 and no urea, certain residues do not have corresponding peaks in the 15N-HSQC spectrum. The Redfield group observed that crosspeaks for unfolded state proteins disappeared with decreasing concentra- tion of urea (Wijesinha-Bettoni et al., 2001; Redfield, 2004). The disappearance of the peaks has been attributed to intermediate exchange and is said to indicate the existence of molten globule structures. Currently, there is no evidence that during the folding of λ-repressor N-terminal domain, there is any state other than the folded

99 and the unfolded state. Although the lack of evidence doesn’t necessarily exclude the possible existence of a third state, existing experiment results point to a two-state folding mechanism (Huang and Oas, 1995b). Another possible reason for disappear- ing crosspeaks is related to transverse relaxation rate constants (R2). It is likely that the residues without corresponding crosspeaks experience two environments in the solution. The molecules equilibrate between the isolated and the aggregated from. And the aggregation interface might be primarily composed of residues that have no corresponding peaks without the presence of urea. The two environments cause each residue to undergo comformational exchange between two species with significantly different R2 values. The majority of these residues, in the folded state, form the loop between helix 4 and helix 5, with several others in helix 4. Therefore, it is likely that this area has become the polymerization interface that causes the protein to be aggregation prone.

˚ 4.3.2 The difference in helicity between λ1´85 and LRH1x

Both helicity estimations from chemical shifts and fraction protection estimations from amide exchange rate constants agree that the residual helicity of the helix 1 region is bigger when the rest of the protein is present than as an isolated peptide. For residue-specific comparisons, the fraction protection values are especially elevated in the context of the protein for residue Gln 9 to residue Glu 13. These differences can be explained by four possible hypotheses. The first reason is that the residues preceding Threonine are different in the protein and in the peptide. In LRH1x, the Threonine residue is at position 2 and preceded by Glycine at position 1. Being at the first position, Gly1 can only initiate the helix stretch and have limited population for helix state. This limits the further propagation of helical state along the chain. In comparison, the Threonine residue

100 ˚ in λ1´85 is at position 8 and preceded by more residues. The previous seven residues include three Lysines and one Leucine, which has higher propagation weight w than majority of other sidechain types. Even with one Proline residue that is “helix- breaking”, these residues helps propagating helical state and increased the helicity

˚ of residues 9-30 in λ1´85. The second possible reason for the elevated residual helicity for these residues is a pair of sidechain-mainchain interactions that potentially populate in the unfolded state. In the folded state structure of λ-repressor N-terminal domain (PDB ID: 1LMB), the Thr 8 sidechain is hydrogen bonding to Gln 11 and vice versa (Figure 4.8). This pair of interactions serves as the N-terminal capping interactions for helix 1. In terms of categorization, this type of capping was termed the “Capping Box” (Seale et al., 1994). This pair of interactions might also be present in the unfolded state. It will, however, not serve as a capping box, but as a pair of interactions that increases the population of residues sampling the helical state. In the unfolded state, this sidechain-mainchain interaction pair can cause an alternative nucleation site for propagating helicity. The residues preceding Threonine are different in the protein than in the peptide. If this difference causes the nucleation to populate differently, this interaction pair can contribute to the difference in helicity. A third possible reason is a potential pair of sidechain-sidechain interaction be- tween Lysine 4 and Threonine 8. The electrostatic interaction between the two sidechains in a KxxxT sequence was shown to cause a negative ∆∆G (Schmidler et al., 2007). This negative ∆∆G will contribute to the coil-to-helix transition of the third residue in this 5 residue window. In λ-repressor, this third residue is Pro- line. Proline has the lowest coil-to-helix transition probability because its inability to form a helical hydrogen bond. This sidechain-sidechain interaction can increase the transition probability for Proline and increase the helicity values for following residues.

101 Figure 4.8: Capping box in folded state λ-repressor

A fourth possible reason is the exclusion of bulk solvent by possibly forming tertiary interactions. The probability of forming helical hydrogen bonds will increase if long-range compactness causes the exclusion of bulk solvent. Also, if tertiary interactions can affect the solvent accessibility of helix 1 residues, kint values of helix 1 residues could be different than the corresponding residues in LRH1x. This potential difference can explain why the comparison of kobs values lack a recognizable pattern. The large ratio for residue 9-13 may also be caused by a significant difference in kint values.

˚ 4.3.3 High helicity areas in λ1´85

In the unfolded state, the areas corresponding to helix 1 and helix 4 in the native state dominate the contribution to the overall helicity. This observation supports the speculation that helix 1 and helix 4 form early in the folding pathway. The residual compactness in the unfolded state has significantly limited the conformational space the N-terminal domain needs to explore in order to find the correct topology. The experimentally determined high helicity areas match those predicted by the classical Lifson-Roig formalism. This comparison thus supports the hypothesis that

102 λ-repressor N-terminal domain folds via “diffusion-collision”. Secondary structure elements are very likely to be formed before extensive tertiary contacts.

4.3.4 Future directions

The hypotheses listed above need experimental evidence to be validated. hypothe- ses 2-3 can be validated by creating single-residue mutation in the sequence. One possibility is to mutate the Threonine residue to a hydrophobic residue and abolish any hydrogen bond to the sidechain of this residue. The potential effect of tertiary contacts on residual helical formation can be tested by studying the molecule in a “good solvent”, as opposed to in the aqueous solution, which is a “poor solvent”.

103 5

Conclusion

˚ This dissertation focuses on quantitatively describing the residual helicity of λ1´85 helix 1 region as an isolated peptide and also in the context of the intact protein. The residual helicity of the LRH1x is resistant in harsh conditions. And in the context of the N-terminal domain, the residual helicity for each residue is elevated in the unfolded state compared to in LRH1x. The overall pattern of helix propagation

˚ along the chain of λ1´85 can be explained by helix-coil transition theory, using a basic Lifson-Roig formalism. These observations are effective demonstrations of native- state like compactness in the unfolded state. They show that residual compactness can be local without involvement of extensive tertiary contacts.

˚ The analysis results for λ1´85 provide direct experimentally supporting evidence for the “diffusion-collision” folding model. They also show the extend of residual helicity that can exist in the unfolded state of protein and the utility of the helix-coil model in explaining residual helicity observations.

In terms of method development, kint values and reference chemical shifts can be accurately estimated using Bayesian inference, levering existing biophysical knowl- edge and residue-specific observations in a variety of solution conditions. These

104 reference values, therefore, are specifically estimated for the sequence of interest and significantly more accurate than literature values. Bayesian inference also enabled each parameter to be represented with a probabilistic distribution.

˚ The residue-specific helicity values of λ1´85 residues were first estimated using δ2D, an algorithm specifically developed for unfolded state proteins, and then op- timized for the helix 1 region using fitted reference shifts. Compared with existing literature, the results documented here is the first case of residue-specific helical con- tent estimation using extensive experimental observations and optimized estimations of reference values. It is also a rare case where residue-specific helicity values are attempted to be estimated for unfolded state residues.

105 Appendix A

Supplementary figures

106 283.2 , 7.23 , 0 288.2 , 7.23 , 0 293.2 , 7.23 , 0 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 fraction protection fraction protection fraction protection fraction 0.2 0.2 0.2 0.0 0.0 0.0

1 4 7 10 14 18 22 1 4 7 10 14 18 22 1 4 7 10 14 18 22

Residue Number Residue Number Residue Number

298.2 , 7.23 , 0 303.2 , 7.23 , 0 298.2 , 6.19 , 0 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 fraction protection fraction protection fraction protection fraction 0.2 0.2 0.2 0.0 0.0 0.0

1 4 7 10 14 18 22 1 4 7 10 14 18 22 1 4 7 10 14 18 22

Residue Number Residue Number Residue Number

303.2 , 6.19 , 0 308.2 , 6.19 , 0 313.2 , 6.19 , 0 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 fraction protection fraction protection fraction protection fraction 0.2 0.2 0.2 0.0 0.0 0.0

1 4 7 10 14 18 22 1 4 7 10 14 18 22 1 4 7 10 14 18 22

Residue Number Residue Number Residue Number

318.2 , 6.19 , 0 323.2 , 6.19 , 0 293.2 , 7.49 , 1.84 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 fraction protection fraction protection fraction protection fraction 0.2 0.2 0.2 0.0 0.0 0.0

1 4 7 10 14 18 22 1 4 7 10 14 18 22 1 4 7 10 14 18 22

Residue Number Residue Number Residue Number

Figure A.1: All the fprot parameter posterior distributions.

The 3 numbers on the top of each sub-figure are the temperature in Kelvin, pH and urea concentration in Molar for the corresponding parameter set. fprot for residue 1 is not fitted. 107 298.2 , 7.49 , 1.84 303.2 , 7.49 , 1.84 308.2 , 7.49 , 1.84 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 fraction protection fraction protection fraction protection fraction 0.2 0.2 0.2 0.0 0.0 0.0

1 4 7 10 14 18 22 1 4 7 10 14 18 22 1 4 7 10 14 18 22

Residue Number Residue Number Residue Number

298.2 , 7.49 , 4.22 303.2 , 7.49 , 4.22 308.2 , 7.49 , 4.22 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 fraction protection fraction protection fraction protection fraction 0.2 0.2 0.2 0.0 0.0 0.0

1 4 7 10 14 18 22 1 4 7 10 14 18 22 1 4 7 10 14 18 22

Residue Number Residue Number Residue Number

313.2 , 7.49 , 4.22 303.2 , 7.54 , 5.94 308.1 , 7.54 , 5.94 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 fraction protection fraction protection fraction protection fraction 0.2 0.2 0.2 0.0 0.0 0.0

1 4 7 10 14 18 22 1 4 7 10 14 18 22 1 4 7 10 14 18 22

Residue Number Residue Number Residue Number

318.2 , 7.53 , 6.92 303.2 , 7.58 , 7.83 308.2 , 7.63 , 7.88 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 fraction protection fraction protection fraction protection fraction 0.2 0.2 0.2 0.0 0.0 0.0

1 4 7 10 14 18 22 1 4 7 10 14 18 22 1 4 7 10 14 18 22

Residue Number Residue Number Residue Number

Figure A.2: (Figure A.1 continued)

108 313.2 , 7.58 , 7.83 318.2 , 7.58 , 7.83 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 fraction protection fraction protection fraction 0.2 0.2 0.0 0.0

1 4 7 10 14 18 22 1 4 7 10 14 18 22

Residue Number Residue Number

Figure A.3: (Figure A.1 continued) −1 −2 −3 −4 deltaS0 (cal/M/K) −5 −6 −7

1 3 5 7 9 11 13 15 17 19 21 23

Residue Number

Figure A.4: Posterior distributions of ∆S0 parameters

Posterior distributions of ∆S0 parameters from Bayesian-CSI fitting. ∆S0 for residues 1 and 24 are not fitted.

109 Bibliography

Adhikary, R., Tan, Y. X., Liu, J., Zimmermann, J., Holcomb, M., Yvellez, C., Daw- son, P. E., and Romesberg, F. E. (2017), “Conformational Heterogeneity and DNA Recognition by the Morphogen Bicoid,” Biochemistry, 56, 2787–2793, PMID: 28547993. Adzhubei, A. A., Sternberg, M. J. E., and Makarov, A. A. (2013), “Polyproline- II helix in proteins: structure and function,” Journal of molecular biology, 425, 2100–2132. Ahluwalia, U., Katyal, N., and Deep, S. (2013), “MODELS OF PROTEIN FOLD- ING,” Journal of Proteins Proteomics, 3. Alderson, T. R. and Markley, J. L. (2013), “Biophysical characterization of alpha- synuclein and its controversial structure,” Intrinsically Disord Proteins, 1, 18–39. Allison, J. R., Rivers, R. C., Christodoulou, J. C., Vendruscolo, M., and Dobson, C. M. (2014), “A Relationship between the Transient Structure in the Monomeric State and the Aggregation Propensities of -Synuclein and -Synuclein,” Biochem- istry, 53, 7170–7183, PMID: 25389903. Andersen, N. H., Liu, Z., and Prickett, K. S. (1996), “Efforts toward deriving the CD spectrum of a 3/10 helix in aqueous medium,” FEBS Letters, 399, 47–52. Andrade, M. A., Chacon, P., Merelo, J. J., and Moran, F. (1993), “Evaluation of secondary structure of proteins from UV circular dichroism spectra using an unsupervised learning neural network,” Protein Eng., 6, 383–390. Arpino, J. A., Rizkallah, P. J., and Jones, D. D. (2012), “Crystal structure of en- hanced green fluorescent protein to 1.35 resolution reveals alternative conforma- tions for Glu222,” PLoS ONE, 7, e47132. Asakura, T., Ando, I., and Nishioka, A. (1977), “The proton chemical shifts of - helical poly-L-alanine,” Die Makromolekulare Chemie, 178, 1111–1132. Asakura, T., Taoka, K., Demura, M., and Williamson, M. P. (1995), “The relation- ship between amide proton chemical shifts and secondary structure in proteins,” J. Biomol. NMR, 6, 227–236.

110 Aurora, R. and Rose, G. D. (1998), “Helix capping,” Protein Sci., 7, 21–38.

Avbelj, F., Kocjan, D., and Baldwin, R. L. (2004a), “Protein chemical shifts arising from alpha-helices and beta-sheets depend on solvent exposure,” Proc. Natl. Acad. Sci. U.S.A., 101, 17394–17397.

Avbelj, F., Kocjan, D., and Baldwin, R. L. (2004b), “Protein Chemical Shifts Aris- ing from -Helices and -Sheets Depend on Solvent Exposure,” Proceedings of the National Academy of Sciences of the United States of America, 101, 17394–17397.

Avbelj, F., Grdadolnik, S. G., Grdadolnik, J., and Baldwin, R. L. (2006), “Intrinsic backbone preferences are fully present in blocked amino acids,” 103, 1272–1277.

Babu, M. M., van der Lee, R., de Groot, N. S., and Gsponer, J. (2011), “Intrinsically disordered proteins: regulation and disease,” Curr. Opin. Struct. Biol., 21, 432– 440.

Bai, Milne, Mayne, and Englander (1993), “Primary structure effects on peptide group hydrogen exchange,” Proteins: Structure, Function, and Genetics, 17, 75– 86.

Bai, Y., Chung, J., Dyson, H. J., and Wright, P. E. (2001), “Structural and dynamic characterization of an unfolded state of poplar apo-plastocyanin formed under nondenaturing conditions,” Protein Sci., 10, 1056–1066.

Baldwin, R. L. and Rose, G. D. (1999), “Is protein folding hierarchic? II. Folding intermediates and transition states,” Trends Biochem. Sci., 24, 77–83.

Beamer and Pabo (1992), “Refined 1.8 crystal structure of the repressor-operator complex,” Journal of Molecular Biology, 227, 177–196.

Best, R. B., Clarke, J., and Karplus, M. (2005), “What contributions to protein side-chain dynamics are probed by NMR experiments? A molecular dynamics simulation analysis,” J. Mol. Biol., 349, 185–203.

Bhattacharya, S., Falzone, C. J., and Lecomte, J. T. (1999), “Backbone dynamics of apocytochrome b5 in its native, partially folded state,” Biochemistry, 38, 2577– 2589.

Boyer, J. A. and Lee, A. L. (2008), “Monitoring aromatic picosecond to nanosecond dynamics in proteins via 13C relaxation: expanding perturbation mapping of the rigidifying core mutation, V54A, in eglin c,” Biochemistry, 47, 4876–4886.

Brooks, C. L. (1996), “HelixCoil Kinetics: Folding Time Scales for Helical Peptides from a Sequential Kinetic Model,” The Journal of Physical Chemistry, 100, 2546– 2549.

111 Bruun, S. W., Iemantaviius, V., Danielsson, J., and Poulsen, F. M. (2010), “Cooper- ative formation of native-like tertiary contacts in the ensemble of unfolded states of a four-helix protein,” Proceedings of the National Academy of Sciences, 107, 13306–13311.

Bryngelson, J. D., Onuchic, J. N., Socci, N. D., and Wolynes, P. G. (1995), “Funnels, pathways, and the energy landscape of protein folding: a synthesis,” Proteins, 21, 167–195.

Buck, M., Radford, S. E., and Dobson, C. M. (1994), “Amide Hydrogen Exchange in a Highly Denatured State: Hen Egg-white Lysozyme in Urea,” Journal of Molecular Biology, 237, 247–254.

Burton, R. E., Huang, G. S., Daugherty, M. A., Fullbright, P. W., and Oas, T. G. (1996), “Microsecond protein folding through a compact transition state,” J. Mol. Biol., 263, 311–322.

Burton, R. E., Myers, J. K., and Oas, T. G. (1998), “Protein folding dynamics: quantitative comparison between theory and experiment,” Biochemistry, 37, 5337– 5343.

Bush, C. A., Sarkar, S. K., and Kopple, K. D. (1978), “Circular dichroism of beta turns in peptides and proteins,” Biochemistry, 17, 4951–4954.

Bussell, R. and Eliezer, D. (2001), “Residual structure and dynamics in Parkinson’s disease-associated mutants of alpha-synuclein,” J. Biol. Chem., 276, 45996–46003.

Bywater, R. P. and Veryazov, V. (2015), “The dipeptide conformations of all twenty amino acid types in the context of biosynthesis,” Springerplus, 4, 668.

Caballero, D., Virrueta, A., O’Hern, C., and Regan, L. (2016), “Steric interactions determine side-chain conformations in protein cores,” Protein Engineering, Design and Selection, 29, 367–376.

Camilloni, C., Simone, A. D., Vranken, W. F., and Vendruscolo, M. (2012), “Deter- mination of secondary structure populations in disordered states of proteins using nuclear magnetic resonance chemical shifts,” Biochemistry, 51, 2224.

Cao, W., Bracken, C., Kallenbach, N. R., and Lu, M. (2004), “Helix formation and the unfolded state of a 52-residue helical protein,” Protein Sci., 13, 177–189.

Chakrabartty, A., Kortemme, T., and Baldwin, R. L. (1994), “Helix propensities of the amino acids measured in alanine-based peptides without helix-stabilizing side-chain interactions,” Protein Science, 3, 843852.

112 Cheung, M. S., Maguire, M. L., Stevens, T. J., and Broadhurst, R. W. (2010), “DANGLE: A Bayesian inferential method for predicting protein backbone dihe- dral angles and secondary structure,” J. Magn. Reson., 202, 223–233.

Chevelkov, V., Xue, Y., Rao, D. K., Forman-Kay, J., and Skrynnikov, N. (2010), “15NH/D-SOLEXSY experiment for accurate measurement of amide solvent ex- change rates: application to denatured drkN SH3,” Journal of Biomolecular NMR, 46, 227–244.

Choy, Shortle, and Kay (2003), “Side Chain Dynamics in Unfolded Protein States: an NMR Based 2H Spin Relaxation Study of 131,” Journal of the American Chemical Society, 125, 1748–1758.

Christensen, H. and Pain, R. H. (1991), “Molten globule intermediates and protein folding,” Eur. Biophys. J., 19, 221–229.

Chugha and Oas (2007), “Backbone Dynamics of the Monomeric Repressor De- natured State Ensemble under Nondenaturing Conditions ,” Biochemistry, 46, 1141–1151.

Chugha, P., Sage, H. J., and Oas, T. G. (2006), “Methionine oxidation of monomeric lambda repressor: the denatured state ensemble under nondenaturing conditions,” Protein Sci., 15, 533–542.

Cierpicki, T. and Otlewski, J. (2001), “Amide proton temperature coefficients as hydrogen bond indicators in proteins,” Journal of Biomolecular NMR, 21, 249– 261.

Clore, G. M. and Iwahara, J. (2009), “Theory, Practice, and Applications of Para- magnetic Relaxation Enhancement for the Characterization of Transient Low- Population States of Biological Macromolecules and Their Complexes,” Chemical Reviews, 109, 4108–4139, PMID: 19522502.

Compton, L. A. and Johnson, W. C. (1986), “Analysis of protein circular dichro- ism spectra for secondary structure using a simple matrix multiplication,” Anal. Biochem., 155, 155–167.

Covington, A. K., Robinson, R. A., and Bates, R. G. (1966), “The Ionization Con- stant of Deuterium Oxide from 5 to 50,” The Journal of Physical Chemistry, 70, 3820–3824.

Crowhurst, K. A. and Forman-Kay, J. D. (2003), “Aromatic and methyl NOEs high- light hydrophobic clustering in the unfolded state of an SH3 domain,” Biochem- istry, 42, 8687–8695.

113 DePristo, M. A., de Bakker, P. I. W., and Blundell, T. L. (2004), “Heterogeneity and Inaccuracy in Protein Structures Solved by X-Ray Crystallography,” Structure, 12, 831–838.

Dill, K. A., Ozkan, S. B., Shell, M. S., and Weikl, T. R. (2008a), “The Protein Folding Problem,” Annual Review of Biophysics, 37, 289–316, PMID: 18573083.

Dill, K. A., Ozkan, S. B., Shell, M. S., and Weikl, T. R. (2008b), “The protein folding problem,” Annu Rev Biophys, 37, 289–316.

Doig, A. J. and Baldwin, R. L. (1995), “N- and C-capping preferences for all 20 amino acids in alpha-helical peptides,” Protein Sci., 4, 1325–1336.

Doig, A. J., Chakrabartty, A., Klingler, T. M., and Baldwin, R. L. (1994), “Deter- mination of Free Energies of N-Capping in .alpha.-Helixes by Modification of the Lifson-Roig Helix-Coil Theory To Include N- and C-Capping,” Biochemistry, 33, 3396–3403, PMID: 8136377.

Dunker, A. K., Silman, I., Uversky, V. N., and Sussman, J. L. (2008), “Function and structure of inherently disordered proteins,” Curr. Opin. Struct. Biol., 18, 756–764.

Dyson and Wright (2005), “Intrinsically unstructured proteins and their functions,” Nature Reviews Molecular Cell Biology, 6, 197–208.

Dyson, Merutka, Waltho, Lerner, and Wright (1992), “Folding of peptide fragments comprising the complete sequence of proteins I. Myohemerythrin,” Journal of Molecular Biology, 226, 795–817.

Dyson, H. J. and Wright, P. E. (2016), “Role of Intrinsic Protein Disorder in the Function and Interactions of the Transcriptional Coactivators CREB-binding Pro- tein (CBP) and p300,” J. Biol. Chem., 291, 6714–6722.

Eliezer, D., Yao, J., Dyson, H. J., and Wright, P. E. (1998), “Structural and dynamic characterization of partially folded states of apomyoglobin and implications for protein folding,” Nat. Struct. Biol., 5, 148–155.

Englander, S. W. and Mayne, L. (2014), “The nature of protein folding pathways,” 111, 15873–15880.

Englander, S. W., Mayne, L., Bai, Y., and Sosnick, T. R. (1997), “Hydrogen ex- change: The modern legacy of Linderstrm-Lang,” Protein Science, 6, 1101–1109.

Felitsky, D. J., Lietzow, M. A., Dyson, H. J., and Wright, P. E. (2008), “Modeling transient collapsed states of an unfolded protein to provide insights into early folding events,” Proc. Natl. Acad. Sci. U.S.A., 105, 6278–6283.

114 Fersht, A. R. and Sato, S. (2004), “Phi-Value analysis and the nature of protein- folding transition states,” .

Fink, A. L. (2001), “Molten Globule,” .

Fitzkee, N. C., Rose, G. D., and Englander, S. W. (2004), “Reassessing Random-Coil Statistics in Unfolded Proteins,” Proceedings of the National Academy of Sciences of the United States of America, 101, 12497–12502.

Francis, C. J., Lindorff-Larsen, K., Best, R. B., and Vendruscolo, M. (2006), “Char- acterization of the residual structure in the unfolded state of the Delta131Delta fragment of staphylococcal nuclease,” Proteins, 65, 145–152.

Gillespie, J. R. and Shortle, D. (1997a), “Characterization of long-range structure in the denatured state of staphylococcal nuclease. I. Paramagnetic relaxation en- hancement by nitroxide spin labels,” J. Mol. Biol., 268, 158–169.

Gillespie, J. R. and Shortle, D. (1997b), “Characterization of long-range structure in the denatured state of staphylococcal nuclease. II. Distance restraints from paramagnetic relaxation and calculation of an ensemble of structures,” J. Mol. Biol., 268, 170–184.

Greenfield, N. J. (1996), “Methods to Estimate the Conformation of Proteins and Polypeptides from Circular Dichroism Data,” Analytical Biochemistry, 235, 1–10.

Greenfield, N. J. (2006), “Using circular dichroism spectra to estimate protein sec- ondary structure,” Nat Protoc, 1, 2876–2890.

Hagarman, A., Measey, T. J., Mathieu, D., Schwalbe, H., and Schweitzer-Stenner, R. (2010), “Intrinsic propensities of amino acid residues in GxG peptides inferred from amide I’ band profiles and NMR scalar coupling constants,” Journal of the American Chemical Society, 132, 540.

Hao, M.-H. and Scheraga, H. A. (1998), “Theory of Two-State Cooperative Folding of Proteins,” Accounts of Chemical Research, 31, 433–440.

Harper, E. T. and Rose, G. D. (1993), “Helix stop signals in proteins and peptides: the capping box,” Biochemistry, 32, 7605–7609.

Hennessey and Johnson (1981), “Information content in the circular dichroism of proteins,” Biochemistry, 20, 1085–1094.

Hong, J., Jing, Q., and Yao, L. (2013), “The protein amide H(N) chemical shift tem- perature coefficient reflects thermal expansion of the N-HO=C hydrogen bond,” J. Biomol. NMR, 55, 71–78.

115 Huang and Oas (1995a), “Submillisecond folding of monomeric lambda repressor.” Proceedings of the National Academy of Sciences, 92, 6878–6882.

Huang, C.-Y., Getahun, Z., Wang, T., DeGrado, W. F., and Gai, F. (2001), “Time- Resolved Infrared Study of the HelixCoil Transition Using 13C-Labeled Helical Peptides,” Journal of the American Chemical Society, 123, 12111–12112, PMID: 11724630.

Huang, G. S. and Oas, T. G. (1995b), “Structure and stability of monomeric lambda repressor: NMR evidence for two-state folding,” Biochemistry, 34, 3884–3892.

Hung, L. and Samudrala, R. (2003), “Accurate and automated classification of pro- tein secondary structure with PsiCSI,” Protein Science, 12, 288–295.

Husnjak, K. and Dikic, I. (2012), “Ubiquitin-binding proteins: decoders of ubiquitin- mediated cellular functions,” Annu. Rev. Biochem., 81, 291–322.

Hvidt, A. and Linderstr∅m-Lang, K. (1954), “Exchange of hydrogen atoms in insulin with deuterium atoms in aqueous solutions,” Biochim. Biophys. Acta, 14, 574–575.

Hwang, Mori, Shaka, and van Zijl (1997), “Application of Phase-Modulated CLEAN Chemical EXchange Spectroscopy (CLEANEX-PM) to Detect WaterProtein Pro- ton Exchange and Intermolecular NOEs,” Journal of the American Chemical So- ciety, 119, 6203–6204.

Hwang, T.-L., van Zijl, P., and Mori, S. (1998), “Accurate Quantitation of Water- amide Proton Exchange Rates Using the Phase-Modulated CLEAN Chemical EXchange (CLEANEX-PM) Approach with a Fast-HSQC (FHSQC) Detection Scheme,” Journal of Biomolecular NMR, 11, 221–226.

Jackson, S. E. (1998), “How do small single-domain proteins fold?” Folding and Design, 3, R81 – R91.

Jeener, Meier, Bachmann, and Ernst (1979), “Investigation of exchange processes by two-dimensional NMR spectroscopy,” The Journal of Chemical Physics, 71.

Jimenez, M. A., Munoz, V., Rico, M., and Serrano, L. (1994), “Helix stop and start signals in peptides and proteins. The capping box does not necessarily prevent helix elongation,” J. Mol. Biol., 242, 487–496.

Kannan, S., Lane, D. P., and Verma, C. S. (2016), “Long range recognition and selection in IDPs: the interactions of the C-terminus of p53,” Sci Rep, 6, 23750.

Kao, T. Y., Tsai, C. J., Lan, Y. J., and Chiang, Y. W. (2017), “The role of confor- mational heterogeneity in regulating the apoptotic activity of BAX protein,” Phys Chem Chem Phys, 19, 9584–9591.

116 Karplus, M. and Weaver, D. L. (1979), “Diffusioncollision model for protein folding,” Biopolymers, 18, 1421–1437.

Karplus, M. and Weaver, D. L. (1994), “Protein folding dynamics: the diffusion- collision model and experimental data,” Protein Sci., 3, 650–668.

Kawahara, K. and Tanford, C. (1966), “Viscosity and density of aqueous solutions of urea and guanidine hydrochloride,” J. Biol. Chem., 241, 3228–3232.

Kim and Baldwin (1984), “A helix stop signal in the isolated S-peptide of ribonuclease A,” Nature, 307, 329–334.

Kim, S. J., Matsumura, Y., Dumont, C., Kihara, H., and Gruebele, M. (2009), “Slowing down downhill folding: a three-probe study,” Biophys. J., 97, 295–302.

Kjaergaard, M., Brander, S., and Poulsen, F. (2011), “Random coil chemical shift for intrinsically disordered proteins: effects of temperature and pH,” Journal of Biomolecular NMR, 49, 139–149.

Koharudin, L. M. I., Bonvin, A. M. J. J., Kaptein, R., and Boelens, R. (2003), “Use of very long-distance NOEs in a fully deuterated protein: an approach for rapid protein fold determination,” Journal of Magnetic Resonance, 163, 228–235.

Kohn, J. E., Millett, I. S., Jacob, J., Zagrovic, B., Dillon, T. M., Cingel, N., Dothager, R. S., Seifert, S., Thiyagarajan, P., Sosnick, T. R., Hasan, M. Z., Pande, V. S., Ruczinski, I., Doniach, S., Plaxco, K. W., and Baldwin, R. L. (2004), “Random- Coil Behavior and the Dimensions of Chemically Unfolded Proteins,” Proceedings of the National Academy of Sciences of the United States of America, 101, 12491– 12496.

Kubelka, J., Hofrichter, J., and Eaton, W. A. (2004), “The protein folding ’speed limit’,” Curr. Opin. Struct. Biol., 14, 76–88.

Labudde, D., Leitner, D., Kr¨uger, M., and Oschkinat, H. (2003), “Prediction algo- rithm for amino acid types with their secondary structure in proteins (PLATON) using chemical shifts,” Journal of Biomolecular NMR, 25, 41–53.

Lange, O. F., Lakomek, N. A., Fares, C., Schroder, G. F., Walter, K. F., Becker, S., Meiler, J., Grubmuller, H., Griesinger, C., and de Groot, B. L. (2008), “Recog- nition dynamics up to microseconds revealed from an RDC-derived ubiquitin en- semble in solution,” Science, 320, 1471–1475.

Lapidus, L. J., Yao, S., McGarrity, K. S., Hertzog, D. E., Tubman, E., and Bakajin, O. (2007), “Protein hydrophobic collapse and early folding steps observed in a microfluidic mixer,” Biophys. J., 93, 218–224.

117 Larios, E., Pitera, J. W., Swope, W. C., and Gruebele, M. (2006), “Correlation of early orientational ordering of engineered 685 structure with kinetics and ther- modynamics,” Chemical Physics, 323, 45 – 53, Nonequilibrium Dynamics in Biomolecules.

Li, F., Lee, J. H., Grishaev, A., Ying, J., and Bax, A. (2015), “High accuracy of Karplus equations for relating three-bond J couplings to protein backbone torsion angles,” Chemphyschem, 16, 572–578.

Lietzow, M. A., Jamin, M., Dyson, H. J., and Wright, P. E. (2002), “Mapping Long- range Contacts in a Highly Unfolded Protein,” Journal of Molecular Biology, 322, 655–662.

Lifson, S. and Roig, A. (1961), “On the Theory of HelixCoil Transition in Polypep- tides,” The Journal of Chemical Physics, 34, 1963–1974.

Lim and Sauer (1989), “Alternative packing arrangements in the hydrophobic core of lambda represser,” Nature, 339, 31–36.

Lim and Sauer (1991), “The role of internal packing interactions in determining the structure and stability of a protein,” Journal of Molecular Biology, 219, 359–376.

Lim, W. K., Rsgen, J., and Englander, S. W. (2009), “Urea, but Not Guanidinium, Destabilizes Proteins by Forming Hydrogen Bonds to the Peptide Group,” Pro- ceedings of the National Academy of Sciences of the United States of America, 106, 2595–2600.

Liu, F. and Gruebele, M. (2007), “Tuning 685 Towards Downhill Folding at its Melting Temperature,” Journal of Molecular Biology, 370, 574 – 584.

Liu, F., Gao, Y. G., and Gruebele, M. (2010), “A survey of lambda repressor frag- ments from two-state to downhill folding,” J. Mol. Biol., 397, 789–798.

LiWang, A. C. and Bax, A. (1996), “Equilibrium Protium/Deuterium Fractionation of Backbone Amides in U13C/15N Labeled Human Ubiquitin by Triple Resonance NMR,” Journal of the American Chemical Society, 118, 12864–12865.

Luo, P. and Baldwin, R. L. (1997), “Mechanism of helix induction by trifluoroethanol: a framework for extrapolating the helix-forming properties of peptides from triflu- oroethanol/water mixtures back to water,” Biochemistry, 36, 8413–8421.

Maison, W., Kennedy, R. J., Miller, J. S., and Kemp, D. S. (2001), “C-terminal helix capping propensities in a polyalanine context for amino acids bearing nonpolar aliphatic side chains,” Tetrahedron Letters, 42, 4975 – 4977.

Malhotra, P. and Udgaonkar, J. B. (2016), “How cooperative are protein folding and unfolding transitions?” Protein Sci., 25, 1924–1941.

118 Manavalan, P. and Johnson, W. C. (1987), “Variable selection method improves the prediction of protein secondary structure from circular dichroism spectra,” Anal. Biochem., 167, 76–85. Markwick, P. R. L., Showalter, S. A., Bouvignies, G., Br¨uschweiler, R., and Black- ledge, M. (2009), “Structural dynamics of protein backbone φ angles: extended molecular dynamics simulations versus experimental 3J scalar couplings,” Journal of Biomolecular NMR, 45, 17–21. Marqusee and Sauer (1994), “Contributions of a hydrogen bond/salt bridge network to the stability of secondary and tertiary structure in repressor,” Protein Science, 3, 2217–2225. Marsh, Singh, Jia, and Forman-Kay (2006), “Sensitivity of secondary structure propensities to sequence differences between - and -synuclein: Implications for fibrillation,” Protein Science, 15, 2795–2804. Matthews, C. R. (1993), “Pathways of Protein Folding,” Annual Review of Biochem- istry, 62, 653–683, PMID: 8352599. Mayor, U., Guydosh, N. R., Johnson, C. M., Grossmann, J. G., Sato, S., Jas, G. S., Freund, S. M., Alonso, D. O., Daggett, V., and Fersht, A. R. (2003), “The complete folding pathway of a protein from nanoseconds to microseconds,” Nature, 421, 863– 867. McCammon, J. A., Northrup, S. H., Karplus, M., and Levy, R. M. (1980), “Helixcoil transitions in a simple polypeptide model,” Biopolymers, 19, 2033–2045. Meng, W., Luan, B., Lyle, N., Pappu, R. V., and Raleigh, D. P. (2013), “The denatured state ensemble contains significant local and long-range structure under native conditions: analysis of the N-terminal domain of ribosomal protein L9,” Biochemistry, 52, 2662–2671. Merutka, G., Jane Dyson, H., and Wright, P. E. (1995), “‘Random coil’ 1H chemical shifts obtained as a function of temperature and trifluoroethanol concentration for the peptide series GGXGG,” Journal of Biomolecular NMR, 5, 14–24. Miao, Z. and Cao, Y. (2016), “Quantifying side-chain conformational variations in protein structure,” Sci Rep, 6, 37024. Micsonai, A., Wien, F., Kernya, L., Lee, Y. H., Goto, Y., Refregiers, M., and Kar- dos, J. (2015), “Accurate secondary structure prediction and fold recognition for circular dichroism spectroscopy,” Proc. Natl. Acad. Sci. U.S.A., 112, E3095–3103. Mishra, P. and Jha, S. K. (2017), “An Alternatively Packed Dry Molten Globule-like Intermediate in the Native State Ensemble of a Multidomain Protein,” J Phys Chem B, 121, 9336–9347.

119 Mittermaier, A., Kay, L., and Forman-Kay, J. (1999), “Analysis of deuterium relaxation-derived methyl axis order parameters and correlation with local struc- ture,” Journal of Biomolecular NMR, 13, 181–185.

Mok, K. H., Kuhn, L. T., Goez, M., Day, I. J., Lin, J. C., Andersen, N. H., and Hore, P. J. (2007), “A pre-existing hydrophobic collapse in the unfolded state of an ultrafast folding protein,” Nature, 447, 106–109.

Munoz, V. and Serrano, L. (1994), “Elucidating the folding problem of helical pep- tides using empirical parameters,” Nat. Struct. Biol., 1, 399–409.

Munoz, V. and Serrano, L. (1995a), “Elucidating the folding problem of helical peptides using empirical parameters. II. Helix macrodipole effects and rational modification of the helical content of natural peptides,” J. Mol. Biol., 245, 275– 296.

Munoz, V. and Serrano, L. (1995b), “Elucidating the folding problem of helical peptides using empirical parameters. III. Temperature and pH dependence,” J. Mol. Biol., 245, 297–308.

Myers, J. K. and Oas, T. G. (1999), “Contribution of a Buried Hydrogen Bond to Repressor Folding Kinetics,” Biochemistry, 38, 6761–6768.

Nishikawa, K. (2009), “Natively unfolded proteins: An overview,” Biophysics (Nagoya-shi), 5, 53–58.

Nkari, W. K. and Prestegard, J. H. (2009), “NMR Resonance Assignments of Sparsely Labeled Proteins: Amide Proton Exchange Correlations in Native and Denatured States,” Journal of the American Chemical Society, 131, 5344–5349.

Nolting, B. (1999), “Analysis of the folding pathway of chymotrypsin inhibitor by correlation of phi-values with inter-residue contacts,” J. Theor. Biol., 197, 113– 121.

Ohnishi, S. and Shortle, D. (2003), “Effects of denaturants and substitutions of hy- drophobic residues on backbone dynamics of denatured staphylococcal nuclease,” Protein Sci., 12, 1530–1537.

Ohnishi, S., Lee, A. L., Edgell, M. H., and Shortle, D. (2004), “Direct demonstration of structural similarity between native and denatured eglin C,” Biochemistry, 43, 4064–4070.

Ozkan, S. B., Bahar, I., and Dill, K. A. (2001), “Transition states and the meaning of Phi-values in protein folding kinetics,” Nat. Struct. Biol., 8, 765–769.

120 Padmanabhan, S., York, E. J., Gera, L., Stewart, J. M., and Baldwin, R. L. (1994), “Helix-forming tendencies of amino acids in short (hydroxybutyl)-L-glutamine pep- tides: an evaluation of the contradictory results from host-guest studies and short alanine-based peptides,” Biochemistry, 33, 8604–8609. Padmanabhan, S., York, E. J., Stewart, J. M., and Baldwin, R. L. (1996), “Helix propensities of basic amino acids increase with the length of the side-chain,” J. Mol. Biol., 257, 726–734. Pashley, C. L., Morgan, G. J., Kalverda, A. P., Thompson, G. S., Kleanthous, C., and Radford, S. E. (2012), “Conformational properties of the unfolded state of Im7 in nondenaturing conditions,” J. Mol. Biol., 416, 300–318. Piovesan, D., Tabaro, F., Mieti, I., Necci, M., Quaglia, F., Oldfield, C. J., As- promonte, M. C., Davey, N. E., Davidovi, R., Dosztnyi, Z., Elofsson, A., Gas- parini, A., Hatos, A., Kajava, A. V., Kalmar, L., Leonardi, E., Lazar, T., Macedo- Ribeiro, S., Macossay-Castillo, M., Meszaros, A., Minervini, G., Murvai, N., Pu- jols, J., Roche, D. B., Salladini, E., Schad, E., Schramm, A., Szabo, B., Tantos, A., Tonello, F., Tsirigos, K. D., Veljkovi, N., Ventura, S., Vranken, W., Warholm, P., Uversky, V. N., Dunker, A. K., Longhi, S., Tompa, P., and Tosatto, S. C. (2017), “DisProt 7.0: a major update of the database of disordered proteins,” Nucleic Acids Research, 45, D1123–D1124. Prigozhin, M. B., Sarkar, K., Law, D., Swope, W. C., Gruebele, M., and Pitera, J. (2011), “Reducing Lambda Repressor to the Core,” The Journal of Physical Chemistry B, 115, 2090–2096, PMID: 21319829. Provencher, S. W. and Glockner, J. (1981), “Estimation of globular protein secondary structure from circular dichroism,” Biochemistry, 20, 33–37. Ptashne, M. (2011), “Principles of a switch,” Nature chemical biology, 7, 484. Qian and Schellman (1992), “Helix-coil theories: a comparative study for finite length polypeptides,” The Journal of Physical Chemistry, 96, 3987–3994. Rambaran, R. N. and Serpell, L. C. (2008), “Amyloid fibrils: abnormal protein assembly,” Prion, 2, 112–117. Redfield, C. (2004), “Using nuclear magnetic resonance spectroscopy to study molten globule states of proteins,” Methods, 34, 121–132. Rosner, H. I. and Poulsen, F. M. (2010), “Residue-specific description of non-native transient structures in the ensemble of acid-denatured structures of the all-beta protein c-src SH3,” Biochemistry, 49, 3246–3253. Rucker, A. L. and Creamer, T. P. (2002), “Polyproline II helical structure in protein unfolded states: lysine peptides revisited,” Protein Sci., 11, 980–985.

121 Salmon, L., Nodet, G., Ozenne, V., Yin, G., Jensen, M. R., Zweckstetter, M., and Blackledge, M. (2010), “NMR characterization of long-range order in intrinsically disordered proteins,” Journal of the American Chemical Society, 132, 8407.

Sarkar, S. S., Udgaonkar, J. B., and Krishnamoorthy, G. (2013), “Unfolding of a small protein proceeds via dry and wet globules and a solvated transition state,” Biophys. J., 105, 2392–2402.

Sauer, R. T., Jordan, S. R., Pabo, C. O., and C.B. Anfinsen, J. T. E. (1990), Lambda Repressor: A Model System for Understanding ProteinDNA Interactions and Pro- tein Stability, pp. 1–61, Academic Press.

Schellman, J. A. (2003), “Protein stability in mixed solvents: a balance of contact interaction and excluded volume,” Biophys. J., 85, 108–125.

Schmidler, S. C., Lucas, J. E., and Oas, T. G. (2007), “Statistical estimation of statistical mechanical models: helix-coil theory and peptide helicity prediction,” J. Comput. Biol., 14, 1287–1310.

Scholtz, J. M. and Baldwin, R. L. (1992), “The Mechanism of alpha-Helix Formation by Peptides,” Annual Review of Biophysics and , 21, 95– 118.

Scholtz, J. M., Marqusee, S., Baldwin, R. L., York, E. J., Stewart, J. M., Santoro, M., and Bolen, D. W. (1991a), “Calorimetric determination of the enthalpy change for the alpha-helix to coil transition of an alanine peptide in water,” Proc. Natl. Acad. Sci. U.S.A., 88, 2854–2858.

Scholtz, J. M., Qian, H., York, E. J., Stewart, J. M., and Baldwin, R. L. (1991b), “Parameters of helixcoil transition theory for alanine-based peptides of varying chain lengths in water,” Biopolymers, 31, 1463–1470.

Scholtz, J. M., Barrick, D., York, E. J., Stewart, J. M., and Baldwin, R. L. (1995), “Urea unfolding of peptide helices as a model for interpreting protein unfolding.” Proceedings of the National Academy of Sciences of the United States of America, 92, 185–189.

Seale, J. W., Srinivasan, R., and Rose, G. D. (1994), “Sequence determinants of the capping box, a stabilizing motif at the N-termini of -helices,” Protein Science, 3, 1741–1745.

Shen, Y., Delaglio, F., Cornilescu, G., and Bax, A. (2009), “TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts,” Journal of Biomolecular NMR, 44, 213–223.

122 Shimizu, S. and Chan, H. S. (2002), “Origins of protein denatured state compactness and hydrophobic clustering in aqueous urea: inferences from nonpolar potentials of mean force,” Proteins, 49, 560–566.

Shoemaker, Kim, York, Stewart, and Baldwin (1987), “Tests of the helix dipole model for stabilization of alpha-helices,” Nature, 326, 563–567.

Shortle, D. and Ackerman, M. S. (2001), “Persistence of native-like topology in a denatured protein in 8 M urea,” Science, 293, 487–489.

Sickmeier, M., Hamilton, J. A., LeGall, T., Vacic, V., Cortese, M. S., Tantos, A., Szabo, B., Tompa, P., Chen, J., Uversky, V. N., Obradovic, Z., and Dunker, A. K. (2007), “DisProt: the Database of Disordered Proteins,” Nucleic Acids Res., 35, D786–793.

Smith, A. E., Sarkar, M., Young, G. B., and Pielak, G. J. (2013), “Amide proton exchange of a dynamic loop in cell extracts,” Protein Sci., 22, 1313–1319.

Smith, J. L., Hendrickson, W. A., Honzatko, R. B., and Sheriff, S. (1986), “Structural heterogeneity in protein crystals,” Biochemistry, 25, 5018–5027.

Sommese, R. F., Sivaramakrishnan, S., Baldwin, R. L., and Spudich, J. A. (2010), “Helicity of short E-R/K peptides,” Protein Sci., 19, 2001–2005.

Soulages, J. L. (1998), “Chemical Denaturation: Potential Impact of Undetected Intermediates in the Free Energy of Unfolding and m-Values Obtained from a Two-State Assumption,” Biophysical Journal, 75, 484 – 492.

Spudich, G. and Marqusee, S. (2000), “A Change in the Apparent m Value Re- veals a Populated Intermediate under Equilibrium Conditions in Escherichia coli Ribonuclease HI,” Biochemistry, 39, 11677–11683, PMID: 10995235.

Sreerama, N. and Woody, R. W. (1993), “A self-consistent method for the analysis of protein secondary structure from circular dichroism,” Anal. Biochem., 209, 32–44.

Sreerama, N. and Woody, R. W. (2000), “Estimation of protein secondary structure from circular dichroism spectra: comparison of CONTIN, SELCON, and CDSSTR methods with an expanded reference set,” Anal. Biochem., 287, 252–260.

Sreerama, N., Venyaminov, S. Y., and Woody, R. W. (1999), “Estimation of the num- ber of alpha-helical and beta-strand segments in proteins using circular dichroism spectroscopy,” Protein Sci., 8, 370–380.

Stayrook, Jaru-Ampornpan, Ni, Hochschild, and Lewis (2008), “Crystal structure of the repressor and a model for pairwise cooperative operator binding,” Nature, 452, 1022–1025.

123 Strehlow and Baldwin (1989), “Effect of the substitution Ala – Gly at each of five residue positions in the C-peptide helix,” Biochemistry, 28, 2130–2133.

Studier, F. W., Rosenberg, A. H., Dunn, J. J., and Dubendorff, J. W. (1990), “[6] Use of T7 RNA polymerase to direct expression of cloned genes,” 185, 60 – 89.

Stumpe, M. C. and Grubmuller, H. (2009), “Urea impedes the hydrophobic collapse of partially unfolded proteins,” Biophys. J., 96, 3744–3752.

Ternstrom, T., Mayor, U., Akke, M., and Oliveberg, M. (1999), “From snapshot to movie: phi analysis of protein folding transition states taken one step further,” Proc. Natl. Acad. Sci. U.S.A., 96, 14854–14859.

Thevenon-Emeric, G., Kozlowski, J., Zhang, Z., and Smith, D. L. (1992), “Deter- mination of amide hydrogen exchange rates in peptides by mass spectrometry,” Analytical Chemistry, 64, 2456–2458, PMID: 1466454.

Thukral, L., Schwarze, S., Daidone, I., and Neuweiler, H. (2015), “Beta-Structure within the Denatured State of the Helical BBL,” J. Mol. Biol., 427, 3166–3176.

Tompa (2011), “Unstructural biology coming of age,” Current Opinion in Structural Biology, 21, 419–425.

Tompa, P. (2012), “Intrinsically disordered proteins: a 10-year recap,” Trends Biochem. Sci., 37, 509–516.

Tsytlonok, M. and Itzhaki, L. S. (2013), “The how’s and why’s of protein folding intermediates,” Arch. Biochem. Biophys., 531, 14–23.

Ullman, O., Fisher, C. K., and Stultz, C. M. (2011), “Explaining the structural plasticity of alpha-synuclein,” J. Am. Chem. Soc., 133, 19536–19546.

Uversky, V. N. (2008), “Amyloidogenesis of natively unfolded proteins,” Curr Alzheimer Res, 5, 260–287. van Stokkum, I. H., Spoelder, H. J., Bloemendal, M., van Grondelle, R., and Groen, F. C. (1990), “Estimation of protein secondary structure and error analysis from circular dichroism spectra,” Anal. Biochem., 191, 110–118.

Vitalis, A. and Caflisch, A. (2012), “50 Years of LifsonRoig Models: Application to Molecular Simulation Data,” .

Wang, Y. and Jardetzky, O. (2002), “Probabilitybased protein secondary structure identification using combined NMR chemicalshift data,” Protein Science, 11, 852– 861.

124 Wang, Y. and Shortle, D. (1997), “Residual helical and turn structure in the dena- tured state of staphylococcal nuclease: analysis of peptide fragments,” Fold Des, 2, 93–100.

Weinstock, D. S., Narayanan, C., Baum, J., and Levy, R. M. (2008), “Correlation between 13Calpha chemical shifts and helix content of peptide ensembles,” Protein Sci., 17, 950–954.

Wijesinha-Bettoni, R., Dobson, C. M., and Redfield, C. (2001), “Comparison of the denaturant-induced unfolding of the bovine and human alpha-lactalbumin molten globules,” J. Mol. Biol., 312, 261–273.

Wirmer, J. and Schwalbe, H. (2002), “Angular dependence of 1J(Ni,C i) and 2J(Ni,C (i 1)) coupling constants measured in J-modulated HSQCs,” Journal of Biomolec- ular NMR, 23, 47–55.

Wishart, D. S., Sykes, B. D., and Richards, F. M. (1992), “The chemical shift index: a fast and simple method for the assignment of protein secondary structure through NMR spectroscopy,” Biochemistry, 31, 1647–1651, PMID: 1737021.

Wright, P. E. and Dyson, H. J. (2015), “Intrinsically disordered proteins in cellular signalling and regulation,” Nat. Rev. Mol. Cell Biol., 16, 18–29.

Yang, J., Zhao, K., Gong, Y., Vologodskii, A., and Kallenbach, N. R. (1998), “- Helix Nucleation Constant in Copolypeptides of Alanine and Ornithine or Lysine,” Journal of the American Chemical Society, 120, 10646–10652.

Yang, W. Y. and Gruebele, M. (2004), “Folding lambda-repressor at its speed limit,” Biophys. J., 87, 596–608.

Yansura, D. G. (1990), “[14] Expression as trpE fusion,” 185, 161 – 166.

Yi, Q., Scalley-Kim, M. L., Alm, E. J., and Baker, D. (2000), “NMR characterization of residual structure in the denatured state of protein L,” J. Mol. Biol., 299, 1341– 1351.

Zagrovic, B., Snow, C. D., Khaliq, S., Shirts, M. R., and Pande, V. S. (2002), “Native-like mean structure in the unfolded ensemble of small proteins,” J. Mol. Biol., 323, 153–164.

Zhang, H., Neal, S., and Wishart, D. S. (2003), “RefDB: A database of uniformly referenced protein chemical shifts,” Journal of Biomolecular NMR, 25, 173–195.

Zhang, O., Kay, L. E., Shortle, D., and Forman-Kay, J. D. (1997), “Comprehensive NOE characterization of a partially folded large fragment of staphylococcal nucle- ase 131, using NMR methods with improved resolution11Edited by P. E. Wright,” Journal of Molecular Biology, 272, 9 – 20.

125 Zimm and Bragg (1959), “Theory of the Phase Transition between Helix and Random Coil in Polypeptide Chains,” The Journal of Chemical Physics, 31.

Zwanzig, R. (1997), “Two-state models of protein folding kinetics,” Proceedings of the National Academy of Sciences, 94, 148–150.

126 Biography

Kan “Jonathan” Li was born in 1988 in Tianjin, China. He attended Yaohua High school in China. Then he attended Nankai University for undergraduate educa- tion in China, majored in Biotechnology. He obtained his B.S. degree from Nankai University. During his undergraduate education, he gained experiences in protein expression and purification by conducting research in Zihe Rao’s lab. He joined the Biochemistry department at Duke University for graduate level edu- cation. His Ph.D. dissertation projects have focused on the biophysical characteriza- tion of λ-repressor protein in the unfolded state. He also collaborated with the Pratt School of Engineering at Duke University on a project related with the structural characterization of ELP-fusion protein (ELP stands for Elastin-like polypeptides).

127