<<

CONFORMATIONAL TRANSITION MECHANISMS OF FLEXIBLE PROTEINS

A dissertation submitted to Kent State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy

by

Swarnendu Tripathi

December 2010 Dissertation written by

Swarnendu Tripathi

B.Sc., ST. Xavier’s College, University of Calcutta, 2000

M.Sc., University of Pune, 2002

M.Tech., Indian Institute of Technology Bombay, 2004

M.A., Kent State University, 2006

Ph.D., Kent State University, 2010

Approved by

Dr. John J. Portman , Chair, Doctoral Dissertation Committee

Dr. Robin L. B. Selinger , Members, Doctoral Dissertation Committee

Dr. Almut Schroeder

Dr. Hamza Balci

Dr. Arne Gericke

Accepted by

Dr. James T. Gleeson , Chair, Department of Physics

Dr. Timothy Moerland , Dean, College of Arts and Sciences

ii TABLE OF CONTENTS

LISTOFFIGURES ...... viii

ACKNOWLEDGMENTS ...... xxviii

Dedication...... xxxi

1 INTRODUCTION ...... 1

1.1 Overview...... 1

1.2 ProteinStructure ...... 2

1.3 Theory of Protein Folding from Energy Landscape Perspective.... 5

1.3.1 Thefunnellandscape...... 6

1.4 Remodeled Energy Landscapes of Functional Transitions in Proteins . 7

1.5 ProteinMotions...... 8

1.5.1 Fast timescale (small amplitude) motions ...... 10

1.5.2 Slow timescale (large amplitude) motions ...... 11

1.6 Proteins Studied in this Dissertation ...... 13

1.6.1 ...... 13

1.6.2 NitrogenregulatoryproteinC(NtrC) ...... 18

1.7 OrganizationofDissertation ...... 21

iii 2 BACKGROUND INFORMATION ON COARSE-GRAINED MODELING

OF PROTEIN CONFORMATIONAL TRANSITIONS ...... 24

2.1 Introduction...... 24

2.2 Elastic Network Models for a Single Minimum ...... 26

2.3 Elastic Network Models for Two Minima ...... 30

2.4 Coarse-grained MD Models for Conformational Transitions...... 34

2.4.1 Takada-Onuchic-Wolynes model ...... 34

2.4.2 Best-Hummermodel ...... 36

3 MODELANDMETHODS...... 39

3.1 Variational Model of Conformational Transitions ...... 40

3.1.1 Hamiltonianoftheproteinsystem...... 40

3.1.2 Reference Hamiltonian for two known structures ...... 42

3.1.3 Approximating the free energy surface ...... 46

3.1.4 Modeling the energy of conformational transition ...... 49

3.1.5 Analysis of conformational transition route ...... 50

3.1.6 Orderparameters...... 51

3.2 VariationalModelofFolding...... 54

3.2.1 Analysisoffoldingroute ...... 55

3.2.2 Orderparameters...... 57

3.3 ModelParameters...... 58

iv 4 THE OPEN/CLOSED CONFORMATIONAL TRANSITION OF THE N-

TERMINALDOMAINOFCALMODULIN ...... 60

4.1 Introduction...... 60

4.2 Methods...... 64

4.3 Conformational Flexibility and Binding ...... 65

4.3.1 Bindingloops ...... 66

4.3.2 Helices B and C and the B/C linker ...... 69

4.4 Conformational Transition Mechanism ...... 70

4.4.1 BindingloopsIandII ...... 72

4.4.2 Methionineresidues...... 75

4.5 Conformational Transition Rate and Order Parameters ...... 77

4.6 Conclusion...... 79

5 CONFORMATIONAL TRANSITION MECHANISMS OF THE EF-HANDS

OFCALMODULIN...... 81

5.1 Introduction...... 81

5.2 Methods...... 87

5.3 N-Terminal and C-Terminal Domains of CaM ...... 88

5.3.1 Conformational flexibility of the CaM domains ...... 88

5.3.2 Cracking in the conformational transition of cCaM ...... 90

5.3.3 Open/closed transition mechanism of the CaM domains . ... 92

5.4 CaM2/3Fragment ...... 93

v 5.4.1 Conformational flexibility and cracking of CaM2/3 ...... 93

5.4.2 Conformational transition mechanism of CaM2/3 ...... 97

5.5 Discussion...... 97

6 INTERPLAY AMONG TOPOLOGY, PLASTICITY AND ENERGETICS

IN THE FUNCTIONAL TRANSITIONS OF THE CALMODULIN DO-

MAINS ...... 102

6.1 Introduction...... 102

6.2 StrainEnergyAnalysis ...... 104

6.3 Analysis of Conformational Flexibility and Cracking ...... 110

6.4 Relationship between Free Energy Barrier and Inherent Flexibility . . 115

6.5 Conclusions ...... 117

7 CONFORMATIONAL TRANSITION AND FOLDING MECHANISMS OF

THE N-TERMINAL RECEIVER DOMAIN OF PROTEIN NTRC . . . . 118

7.1 Introduction...... 118

7.2 Methods...... 121

7.3 Folding Mechanism of NtrCr ...... 122

7.3.1 Folding of the inactive-NtrCr ...... 122

7.3.2 Folding of the active-NtrCr ...... 125

7.4 Conformational Transition of NtrCr ...... 127

7.4.1 Inactive to active state transition routes of NtrCr ...... 127

7.4.2 Conformational flexibility of NtrCr ...... 130

vi 7.4.3 Inactive to active transition mechanism of NtrCr ...... 132

7.4.4 Strain energy analysis of inactive NtrCr ...... 135

7.4.5 Comparison with experimentally predicted transition mechanism139

7.5 Discussion ...... 142

7.6 Conclusions ...... 143

8 CONCLUSIONS ...... 145

9 APPENDIX...... 148

9.1 The Stiff Chain Model for Polypeptide Backbone ...... 148

9.2 Variational Free Energy Approximation ...... 150

BIBLIOGRAPHY ...... 151

vii LIST OF FIGURES

1.1 Schematic illustration of different levels of protein structures. Primary

structure is a chain of sequence (shown in three letter code)

contains all the information needed to specify (a). Secondary structures

such as α-helix and β-sheet are formed from the regular repeating pat-

terns of backbone hydrogen bonds (b). The way secondary structural

elements pack together to form the overall three-dimensional fold of the

protein is tertiary structure (P13 protein) (c). The relative arrange-

ment of two or more individual polypeptide chains is called quaternary

structure (hemoglobin) (d). Adapted from wikimedia [1]...... 4

1.2 The funnel-shape free energy landscape for protein folding. Folding

occurs through the progressive organization of ensembles of structures

[example is src-SH3 domain (left)] on a free energy landscape. Confor-

mational entropy loss during folding is compensated by the free energy

gained as more native interactions are formed. Adapted from Brooks

et al. [2]...... 7

viii 1.3 Remodeled energy landscape to accommodate protein functional dy-

namics. (A) Schematic illustration of the free energy landscape (red:

higher energy, blue: lower energy). The box region encloses conforma-

tional states that are energetically available for interconversion under

physiological conditions. (B) A signal such as, ligand can remodel the

energy landscape by narrowing down the number of ensemble states

in a single energy well through structural rigidification of the average

conformation. (C) Alternatively, a protein may already exist in equi-

librium between conformationally distinct states and a ligand can alter

the relative energies or population of the states, resulting in redistri-

bution of their occupancies. (D) A slight variation on (C) may occur

if the population of the higher-energy state shifts toward a ligand-

induced conformation in the absence of ligand. Adapted from Smock

andGierasch [3]...... 9

1.4 Types of protein motions in different timescale and the experimental

methods to characterize fluctuations on each timescale. Adapted from

Henzler-WildermanandKern [4]...... 11

ix 1.5 Function of calcium-binding EF-hand protein calmodulin. (a) Calcium

binding occurs to the EF-hand subdomains of calmodulin, which favors

binding to a target molecule such as myosin light chain kinase. Binding

of the target to one site on calmodulin enhances binding affinity to a

second binding site [5]. Adapted from Smock and Gierasch [3]. (b)

The EF-hand helix-loop-helix motif. The index finger represents helix-

E (shown in yellow), thumb is helix-F (shown in blue), and, rest of

the fingers are binding loop (shown in red and calcium in green).

AdaptedfromRef.[6]...... 14

1.6 Calcium induced conformational change in calmodulin. The EF-hands

of the two domains of unbound-calmodulin [PDB: 1cfd, (left)] undergo

large structural rearrangement upon calcium binding and exposes hy-

drophobic surface on each domain to bind target proteins. The cal-

cium are shown in white spheres in bound-calmodulin [PDB: 1cll,

(right)]. These three-dimensional protein figures and others in this

dissertation are made using Visual Molecular Dynamics (VMD) pro-

gram[7]...... 17

x 1.7 The transcriptional activation mechanism by NtrC. (a) The glnA gene

is transcribed by the σ54-containing polymerase which alone cannot

initiate transcription. The unphosphorylated NtrC dimers can bind

only one site at the enhancer, still insufficient to stimulate transcrip-

tion. (b) The phosphorylated NtrC dimers can bind both sites of the

enhancer. (c) Their binding induces DNA looping. Contact between

the activator and the polymerase stabilizes the interaction between the

polymerase and DNA, thereby initiating transcription. Adapted from

web-book[8]...... 19

1.8 Phosphorylation induces large conformational change in the N-terminal

receiver domain of NtrC at the active site Asp54 (near the C-terminal

end of β3). The unphosphorylated (inactive) structure is shown in blue

and the phosphorylated (active) structure in red...... 20

2.1 The elastic network model. The LAO binding protein shown in ribbon

representation (top). The elastic network model for the LAO bind-

ing protein (bottom). Pairs of atoms < 8 A˚ apart are connected by

harmonic springs. Adapted from Tama and Sanejouand [9]...... 28

xi 2.2 Schematic view of energy profiles of protein conformational dynam-

ics from elastic network approach. (a) Conventional elastic network

model only captures the conformational dynamics within the single

basin of the open-state or closed-state structures (solid curves) due to

the harmonic approximation of the energy. The actual energy profile of

closed/open conformational transition is depicted in dotted curve. (b)

Large strain energy barrier results from the linear elastic network model

during the conformational change due to high stress (solid curves).

This barrier height is reduced through cracking (local unfolding and

refolding) to relive high elastic stress (dotted curves), as shown by

Miyashita et al. [10] using a nonlinear elastic network approach. . . . 31

2.3 An illustration of multiple-basin energy landscape of proteins. Two

funnel-shape single basins that are used for model construction are

displayed by dashed lines. Below schematic of two protein configura-

tions (unbound and ligand-bound) are shown. Conformational change

occurs with the rearrangement of some contacts. Contacts specific to

the unbound conformation are broken, and new contacts are formed in

bound conformation. Thick solid bonds correspond to covalent link-

ages. Adapted from Okazakai et al. [11]...... 35

xii 3.1 A schematic of the variational method for the Hamiltonian of a pro-

tein system for a single conformation. The residues are represented

as beads using the backbone alpha-carbon positions of a protein. The

total Hamiltonian of the protein model consists of the Hamiltonian

for the backbone chain and contact interactions (a). We replace the

interaction Hamiltonian with a solvable Hamiltonian expressed as a

quadratic function of the positions of the monomers (b). The coeffi-

cients of the quadratic function determine how localized each residue

is around its mean positions. This figure is inspired from Shen et al.

[12]...... 42

3.2 A schematic of the free energy profile of conformational transitions

between two native (folded) sub-states 1 and 2. Conformations 1 and

2 are separated by a barrier correspond to the transition state (TS)

barrier for some value of the reaction coordinate. The residues of the

protein systems are represented by the positions of the alpha-carbon

asbeads...... 44

3.3 An illustration of the free energy surface of a two-state folder at folding

temperature. The native (N) (folded) state and globule (G) (unfolded)

state are separated by a barrier correspond to the transition state bar-

rier for some value of the reaction coordinate. The main goal is to

characterize the protein structure at the TSE...... 56

xiii 3.4 The illustration of potential well used in the variational model. See

thetextformoredetail...... 59

4.1 The N-terminal domain of calmodulin (nCaM). (a) The Ca2+-free (apo,

closed) structure, PDB code 1cfd. (b) The Ca2+-bound (holo, open)

structure, PDB code 1cll. (c) The secondary structure of nCaM is

shown with one letter amino acid sequence code for residues 4-75.

The secondary structure of nCaM is as follows: helix A (5–19), Ca2+-

binding loop I (20–31), helix B (29–37), B/C helix-linker (38–44), helix

C (45–55), Ca2+-binding loop II (56–67), helix D (65–75). Note that,

the last three residues of the binding loops I and II are also part of the

exiting helices B and D. There are short β-sheet structures in binding

loop I (residues 26–28) and loop II (residues 62–64)...... 62

2 2 4.2 Fluctuations Bi(α0) = hδri i0 = Giia vs sequence index of nCaM for

selected values of the interpolation parameter α0 in the conformational

transition route between open and closed. Here a = 3.8A˚ is the dis-

tance between successive monomers. Different α0 are denoted by, red

(α0 = 0) open; green (α0 =0.2); blue (α0 =0.4); magenta (α0 =0.6);

(α0 =0.8) and black (α0 = 1) closed. The secondary structure

isindicatedbelowtheplot...... 66

xiv 4.3 Change in fluctuations Bi(α0) in nCaM domain during the closed to

open conformational transition. The 3D structure in (a) corresponds

to the interpolation parameter, α0 = 1 (closed state); (b) corresponds

to α0 = 0.4 (intermediate state) and (c) corresponds to α0 = 0 (open

state). Red corresponds to low fluctuations and blue corresponds to

high...... 67

4.4 Difference between the normalized native density ∆ρi (a measure of

structural similarity) of each residue for different α0. The change in

color from red to blue is showing the closed → open conformational

transition of nCaM. This is normalized to be −1 at the open state

minimum (α0 = 0; blue) and 1 at the closed state minimum (α0 = 1;

red). Below the secondary structure of nCaM is shown. Here, αN in

Eq.3.32andEq.3.33is0.5...... 71

4.5 Closed to open conformational transition of nCaM with different in-

terpolation parameter α0. The 3D structure in (a) corresponds to the

interpolation parameter, α0 = 0.8; (b) corresponds to α0 = 0.6; (c)

corresponds to α0 =0.4 and (d) corresponds to α0 =0.2. The change

in color from red to blue corresponds to different values of normal-

ized native density ∆ρi(α0) (a measure of structural similarity) of each

residue for different α0. Red corresponds to ∆ρi(α0) = 1 (closed con-

formation, α0 = 1) and blue (open conformation, α0 = 1) corresponds

to ∆ρi(α0)= −1...... 73

xv 4.6 Comparison of structural change in binding loops I (in bottom) and

II (in top) in terms of the order parameter ∆ρi(α0). The 3D struc-

tures in (a)-(i) corresponds to the interpolation parameter, α0 = 0.9

-0.1 during the closed to open transition. The change in color from

red to blue corresponds to different values of ∆ρi(α0) (a measure of

structural similarity) of each residue. Red corresponds to ∆ρi(α0)=1

(closed conformation, α0 = 1) and blue (open conformation, α0 = 0)

corresponds to ∆ρi(α0)= −1...... 74

4.7 Dynamical behavior of residues during conformational transition of

nCaM. The normalized native density difference ∆ρi(α0) vs α0 are

shown for four different group of residues. Structural transition of

(a) residues in position 9 (Thr28 and Asn64) and position 12 (Glu31

and Glu67) of the two binding loops; (b) four hydrophobic Methionine

residuesinpositions36,51,71and72...... 76

4.8 Free energy along the transition route. In the lower curve the abscissa

is the interpolation parameter α0. In the upper curve the abscissa

is the global structural order parameter ∆Q(α0). The entropy across

the transition is relatively constant, so that the free energy barrier is

largelyenergetic...... 78

xvi 5.1 Three dimensional structures of calmodulin (CaM) domains and EF-

hands 2 and 3 fragment. The apo-CaM and holo-CaM structures shown

here are correspond to human CaM with PDB ID code 1cfd and 1cll,

respectively. (a) The closed, apo and open, holo conformations of N-

terminal domains of CaM (nCaM) consist of helices A/B and C/D with

binding loops I and II respectively. (b) The closed, apo and open, holo

conformations of C-terminal domains of CaM (cCaM) consist of helices

E/F and G/H with binding loops III and IV respectively. (c) The apo

and holo conformations of EF-hands 2 and 3 of CaM (CaM2/3) consist

of helices C/D and E/F with binding loops II and III, respectively. . 84

xvii 5.2 The fluctuations Bi(α0) vs residue index for selected values of the inter-

polation parameter α0 in the conformational transition route between

apo (α0 = 1) and holo (α0 = 1) structures. Here, a = 3.8 A˚ is the

distance between successive monomers in our model. Parts (a) and (c)

correspond to nCaM and cCaM, respectively. The secondary struc-

tures for nCaM and cCaM are indicated above each plot. Helices are

represented by the rectangular boxes, binding loops and helix-linker

are by lines and small β-sheets are by arrows. Parts (b) and (d) show

the difference between the normalized native density ∆ρi(α0) (a mea-

sure of structural similarity) vs residue index for different α0 of nCaM

and cCaM, respectively. In each plot the change in color from red to

blue is showing the apo to holo transition for each conformation. This

is normalized to be −1 at the holo state minimum (α0 = 0; blue) and

1 at the apo state minimum (α0 =1;red)...... 89

5.3 Change in fluctuations Bi(α0) of the residues from binding loops and

helix-linker of CaM domains along the open/closed conformational

transition route for different α0. Here, a = 3.8 A˚ is the distance

between successive monomers in our model. Residues Gly23, Gly59

and Glu45 are from binding loop I, loop II and the helix-linker of the

nCaM, respectively. While, residues Gly96, Gly132 and Asp118 are

from binding loop III, loop IV and the helix-linker of the cCaM, re-

spectively...... 91

xviii 5.4 (a) Average pair potentialsu ¯i (in kcal/mol) of the residues from the

helix-linker of CaM domains along the open/closed conformational

transition route for different α0. Residue Glu45 and Asp118 are from

the nCaM and cCaM, respectively. (b)u ¯i of the residue Asp80 from the

central linker of CaM along the apo to holo conformational transition

route for different α0...... 92

5.5 Conformational transition of CaM2/3 fragment. The fluctuations Bi(α0)

vs residue index for selected values of the interpolation parameter α0

in the conformational transition route between apo- and holo-CaM2/3

(a). Here, a = 3.8 A˚ is the distance between successive monomers in

our model. The secondary structure of CaM2/3 is indicated above the

plot. Helices are represented by the rectangular boxes, binding loops

and helix-linker are by lines and small β-sheets are by arrows. The

difference between the normalized native density ∆ρi(α0) (a measure

of structural similarity) vs residue index for different α0 of CaM2/3

(b). The change in color from red to blue is showing the apo to holo

transition. This is normalized to be −1 at the holo state minimum

(α0 = 0; blue) and 1 at the apo state minimum (α0 =1;red). .... 95

xix 6.1 Two topologically similar domains of CaM. (a) The N-terminal domain

of CaM (nCaM) and (b) the C-terminal domain of CaM (cCaM). The

closed (apo)-state structure of the two domains are represented in blue

and the open (holo)-state structures are shown in red. The measured

Cα RMSD between the two apo-CaM domains is 4.5309 A˚ and for the

two holo-CaM domains is 3.9384 A.˚ Whereas, the Cα RMSD between

apo/holo-nCaM is 6.5039 A˚ and for apo/holo-cCaM is 8.4157 A.˚ The

Cα atomsareshownbybeadsinthestructures...... 103

6.2 Total strain energy of the CaM domains for the linearly interpolated

apo/holo structural change. The solid curves show the change in total

strain energy of nCaM domain and the dotted curves represent the

changeintotalstrainenergyofcCaMdomain...... 106

6.3 Distributions of residue strain energy for the linearly interpolated CaM

domain structures. (a) and (b) are the change of the strain energy

for individual residues along the apo → holo-state structural change

of nCaM and cCaM, respectively. Residues in blue have no strain

energy while red residues have high strain energy. (c) and (d) are the

residue strain energy distributions for the deformed structure at an

intermediate state, α0 = 0.4 for nCaM and cCaM, respectively. The

secondary structures of nCaM and cCaM are indicated above the plots. 107

xx 6.4 Contact map of the two CaM domains, (a) for nCaM and (b) for cCaM.

In each plot the contacts are separated in three sets, common contacts

(), contacts only from the apo-state ( ) and contacts only from

the holo-state (4) of each CaM domain. The secondary structures of

nCaM and cCaM are indicated in each plot. Helices are represented

by the rectangular boxes, binding loops and helix-linker are indicated

by lines and small β-sheets are shown by arrows. In nCaM, the helices

are denoted as (A-D) and the binding loops are by I and II. While, in

cCaM the helices are represented as (E-H) and the binding loops are

by III and IV. Number of different contacts of the CaM domains are

indicatedinsideeachplot...... 109

2 6.5 Mean square fluctuations, Bi = hδri i0 of the apo-CaM domains. The

solid and dotted curves indicate the Bi of the apo-state nCaM and

cCaM, respectively. Secondary structures of the nCaM and cCaM are

shown at the top and bottom part of the plot, respectively...... 111

xxi r 6.6 Ratio of fluctuations Bi ({C}, α0) of the CaM domains for apo → holo

r conformational change. (a) and (b) are showing Bi (α0) of each residue

of nCaM and cCaM along the transition route, respectively. In (c)

and (d) the linearly interpolated 3D structures of nCaM and cCaM are

r colored according to Bi ({C}, α0) of each residue at an intermediate

α state α0 = 0.4 during the apo → holo transition. C atoms of the

CaM domains are represented as beads. In all figures residues in red

have higher flexibility while the residues in blue have lower flexibility

with respect to the apo state of the corresponding domains during the

conformationalchange...... 113

xxii 6.7 Free energy barriers of the CaM domains during the apo/holo confor-

mational change. In (a) and (b) the free energy barriers F ({C}, α0),

along the apo → holo-state conformational change is shown by the dot-

ted black curves, when the variational parameters {Ci}, for the residues

are allowed to vary from the apo to holo-state for both domains. The

Napo free energy profiles F ({C , α0), denoted by the solid black curves

in (a) and (b) indicate that {Ci} of the residues of the CaM domains

Napo are fixed to their corresponding {Ci } of the apo-state, during their

apo → holo-state conformational change. The solid gray curves in (a)

Nholo and (b) indicate the free energy profiles F ({C }, α0), such that,

the {Ci} for the residues of the CaM domains are fixed to their cor-

Nholo responding {Ci } of the holo-state, during their holo → apo-state

conformational change. The horizontal and vertical dotted lines are

drawn for estimating the free energy barrier height from the plots. . 116

xxiii 7.1 The inactive (I) and active (A) conformations of NtrCr. (a) The I

− (PDB ID code 1ntr) and the A (BeF3 -activated, PDB ID code 1krw)

conformations of NtrCr are shown by representing the secondary struc-

ture with different colors. (b) Cartoon of the secondary structure of

NtrCr, where the α-helices and β-sheets are represented by rectangles

and arrows respectively, along with the residue number. (c) Devi-

ations between the two native state structures of NtrCr (the I and

the A states) measured directly from the corresponding PDBs, where

N NI NA ˚ |∆ri | = |ri − ri | and a = 3.8 A is the distance between adjacent

monomersinthevariationalmodel...... 119

7.2 Folding mechanism of the inactive NtrCr. (a) Mean square fluctua-

2 tions vs residue index, Bi = Giia where a is the distance between

successive monomers, plotted for different values of the order param-

eter Q(I). Fluctuations of the transition states (TS, dotted), minima

(Min, solid) and native (N, solid) state are shown. Globule (G) state

is not shown. (b) Folding route is characterized locally by the nor-

G N G malized native density ρi(I) = ρi(I) − ρi (I) / ρi (I) − ρi (I) and the     global order parameter Q(I) = 1/N i ρi(I), where I denotes the in-

active state. The degree of structuralP localization of each residue is

reflected in the colors, linearly scaled between blue [ρi(I) = 0 for G]

and red [ρi(I) = 1 for N]. (c) Free energy profile along the folding route

vs global order parameter Q(I)...... 124

xxiv 7.3 Folding mechanism of the active NtrCr. (a) Mean square fluctuations

2 vs residue index, Bi = Giia where a is the distance between successive

monomers, plotted for different values of the order parameter Q(A).

(b) Folding route is characterized locally by the normalized native den-

G N G sity ρi(A) = ρi(A) − ρi (A) / ρi (A) − ρi (A) and the global order     parameter Q(A) = 1/N i ρi(A), where A denotes the active state.

The degree of structuralP localization of each residue is reflected in the

colors, linearly scaled between blue [ρi(A) = 0forG]andred[ρi(A) = 1

for N]. (c) Free energy profile along the folding route vs global order

parameter Q(A)...... 126

7.4 Energy and free energy paths for conformational transition of NtrCr.

The single basin energies EI and EA are for the inactive (black curve)

and active (gray curve) state, respectively in (a). The dotted curve E

is the energy path of the inactive (α0=1) → active (α0=0) state tran-

sition. The free energy path of the inactive (α0=1) → active (α0=0)

state transition of NtrCr,(b)...... 129

xxv 7.5 Change in conformational flexibility of NtrCr along the inactive → ac-

tive conformational transition route. Mean square fluctuations Bi(α0)

vs residue index for selected values of the interpolation parameter

r α0 are shown in (a). Ratio of mean square fluctuations, Bi (α0) =

I I Bi(α0)/Bi, (b) where Bi(α0) is calculated for different α0 and Bi is

fluctuations of the inactive state at α0 = 1. The dotted curves in (a)

and (b) enclose the regions from β4-α4 loop, helix-α4, α4-β5 loop and

r β3-α3 loop of NtrC ...... 131

7.6 Change in mean square fluctuations of selected residues of NtrCr along

the conformational transition route. (a) Plot of fluctuations Bi(α0) vs

interpolation parameter α0 of residues from β3α3 loop for inactive (I)

→ active (A) transition. (b) Similar plot of Bi(α0) vs α0 of residues

from α4...... 134

7.7 Contact maps (sequence networks) of the inactive and active conforma-

tion of NtrCr. The contacts represented in squares are for the inactive

state and the contacts in triangles are for the active state. The set of

contacts inside the circles are for β3α3 loop and helix-α4. Note that for

α4 and β3α3 loop there are more contacts in the active state than the

inactivestate...... 136

xxvi 7.8 Residue strain energy distribution for linearly interpolated NtrCr struc-

tures. Strain energy of the inactive NtrCr vs residue index during the

deformation from the inactive → active state conformation for different

values of the interpolation parameter α0, (a). Strain energy of helix-

α4 is enclosed inside the dotted curves in (a). Residue strain energy

distribution at an intermediate state, α0 =0.35,(b)...... 138

7.9 Comparison of the experimentally measured backbone conformational

dynamics of NtrCr with parameters calculated from the variational

model. (a) Exchange parameter Rex vs residue index from NMR relax-

ation study [13]. Adapted from [14]. The blue dots indicate Rex data

for residues with larger than a threshold in the NMR experiments [13].

(b) Difference in the mean square fluctuations ∆Bi between the inac-

tive and active conformations of each residue. (c) Difference in the

A energy per residue ∆Ei of the active conformation for two end values

of the interpolation parameter α0. (d) Native density difference of the

A active conformation ∆ρi for two end values of α0...... 140

xxvii ACKNOWLEDGMENTS

I owe my gratitude to all those great people because of whom my graduate experi-

ence at Kent State University has been one that I will cherish forever. The following

dissertation would not have been possible to produce without their significant contri-

butions.

It gives me a great pleasure to express the deepest gratitude to my esteemed advisor, Dr. John J. Portman. I have been amazingly fortunate for being introduced to this wonderful field of biophysics by him. He always gave me the freedom to explore independently, and at the same time guided me to recover from difficulties.

Dr. Portman taught me how to write academic papers, made me a better programmer, had confidence in me when I doubted myself, and brought out the good ideas in me.

He showed me different ways to approach a research problem and the need to be persistent to accomplish any goal. Not only he was readily available for me, but also read and responded to the drafts of each chapter of my work more quickly than I could have hoped. His oral and written comments are always extremely perceptive, helpful, and appropriate.

Besides my advisor, I would like to acknowledge the rest of my dissertation com- mittee: Dr. Robin L. B. Selinger, Dr. Almut Schroeder, Dr. Hamza Balci and Dr.

Arne Gericke.

I am grateful to Dr. James T. Gleeson for giving me an opportunity to work in

xxviii the Summer of 2005 with him. A special thanks to Dr. Hamza Balci for including me

in their group meetings, which I really enjoyed and learned new things in biophysics.

I again like to thank Dr. Robin Selinger (Liquid Crystal Institute) for offering the

Computational Materials Science course that really helped me to get the confidence

I needed to learn programming. She also gave me freedom to choose the topic of the

project in that course according to my research interest.

I am also indebted to the other faculty members of the Physics Department with whom I have interacted during the course of my graduate studies. Particularly, I would like to acknowledge Dr. Satyendra Kumar who encouraged me to apply in the

Physics Department and provided enormous support as a graduate coordinator. I am grateful to Dr. Almut Schroeder for being my academic advisor and for all her suggestions and advices during my graduate studies.

My special thanks go to Cindy Miller, Loretta Hauser and all other office staff in the Physics Department. I would also like to acknowledge the help of Greg Putman

(Academic Laboratory Manager) for a nice experience of the Teaching Assistantship during my graduate study.

I like to appreciate the help of my colleague Dr. Xianghong Qi to overcome many problems that I faced particularly during the initial stage of my research. She always provided a nice environment in the office during her stays .

I am thankful to my friends and colleagues Drs. Mandar Bhagwat, Vishal Pandya,

Basab B. Dhar and Somik Chatterjee. Their support and care helped me overcome setbacks and stay focused on my graduate study. I greatly value their friendship

xxix and I deeply appreciate their belief in me. I am really grateful to my friend Dr.

Panchapakesan Ganesh for all his help and support.

Last, but not least, my heartiest thanks to my family: my parents, Amalendu

Tripathi, and Chhaya Tripathi, for giving me life in the first place, educating me

and their unconditional support and encouragement to pursue my interests. At the

same time, it is my pleasure to thank my brothers and sister for their support. Most

importantly, none of this would have been possible without my wife, Archita Tripathi.

She has been a constant source of love, concern, support and strength in all these

years.

I highly acknowledge the financial support from Ohio Board of Regents, National

Science Foundation and Institute for Complex Adaptive Matter.

Fianlly, despite all the assistance provided by Dr. Portman and others, I alone remain responsible for the content of the following, including any errors or omissions which may unwittingly remain.

xxx This dissertation is dedicated

To

My parents

&

My wife

xxxi CHAPTER 1

INTRODUCTION

1.1 Overview

Protein molecules, the most versatile macromolecules in living cells, serve crucial

functions in essentially all biological events. Their primary biochemical function is

enzymatic catalysis. In addition, regulatory proteins control gene expression, and

receptor proteins accept intercellular signals (“hormones”) that are often transmitted

by other proteins. Antibody proteins protect organisms from harmful antigens. Struc-

tural proteins form microfilament, microtubules, fibrils and other protective coverings;

they reinforce membranes and maintain the structure of cells and tissues. Transport

proteins move molecules from one place to another around the living organisms, and

storage proteins store molecules. Proteins are also responsible for mechano-chemical

activities, such as and movement.

The basis for the enormous variety of protein functions is their ability to reversibly

bind other molecules (“ligands”) with high specificity and sensitivity to environmen-

tal conditions. These specific molecular interactions with ligands are crucial in main-

taining the high degree of order required for a living system to adaptively respond to

changing environmental and metabolic circumstances. The static three-dimensional

structure of a folded protein molecule, provided by X-ray crystallography for exam-

ple, often gives molecular level insight to its specific biological functions: structure

1 2

determines function. At the same time protein molecules are flexible and dynamic.

In fact, the key to understand a protein’s function often lies in its conformational

dynamics [15, 16]. An important example is allostery, the coupling between ligand-

binding and protein conformational change, which is a central component of protein

.

The primary focus of my research in this dissertation is to elucidate the detailed

mechanism of large scale structural changes of specific proteins where conformational

flexibility is essential for function.

In this chapter, I first give a brief description of the different levels to character- ize protein structure, and discuss the modern view of protein folding (the “energy landscape” concept). Next, I discuss the conformational transitions of proteins from the energy landscape perspective and emphasize the functional motions of proteins in different timescales. Finally, I summarize the structural changes of two specific

flexible proteins which are the focus of the work in this dissertation.

1.2 Protein Structure

Proteins are polymers of amino acids covalently linked through peptide bonds into a chain; this was first showed by E. Fischer in the beginning of the 20th century. In

the early 1950s Sanger discovered that the sequence of amino acid residues1 is unique

for each protein. The polypeptide chain of protein consists of a chemically regular

backbone (“main chain”) from which various side chains of residues project.

1A “residue” is the portion of an amino acid that remains after polymerization. 3

Remarkably, protein folds into a unique three dimensional “native” structure de-

termined by its sequence of amino acids. There are four different levels of protein

structure, organized hierarchically called the “primary structure”, “secondary struc-

ture”, “tertiary structure”, and “quaternary structure”. The higher-level structures

are called “motifs” and “domains”.

The primary structure refers to the sequence of amino acid residues in the polypep-

tide chain from the N-terminus to the C-terminus [see Fig. 1.1(a)]. Because a protein

is a hetero-polymer, the number and identity of the amino acids composing protein

is not enough to specify its composition. Rather, the order or sequence in which the

specific amino acids are combined in the polypeptide chain is required to specify the

protein. The primary structure may be thought of as a complete description of all of

the covalent bonding in a polypeptide chain or protein.

The amino acid backbone forms one of a few secondary structures determined

by the interactions of the peptide bond with nearby neighbor residues. Three main

elements of well-defined secondary structures may be distinguished as the α-helix, the β-sheet, and turns, as shown in Fig. 1.1(b). These structural elements may be connected with each other by relatively unstructured loops. Secondary structure formation provides an efficient mechanism of pairing polar groups of the polypeptide backbone by hydrogen bonds and may associate through sidechain interactions to form super-secondary structures, called motifs.

The overall three-dimensional arrangement of all atoms in a protein is referred to as the tertiary structure of a protein, which results from a large number of longer-range 4

(a)

(b)

(c)

(d)

Figure 1.1: Schematic illustration of different levels of protein structures. Primary structure is a chain of amino acid sequence (shown in three letter code) contains all the information needed to specify (a). Secondary structures such as α-helix and β-sheet are formed from the regular repeating patterns of backbone hydrogen bonds (b). The way secondary structural elements pack together to form the overall three-dimensional fold of the protein is tertiary structure (P13 protein) (c). The relative arrangement of two or more individual polypeptide chains is called quaternary structure (hemoglobin) (d). Adapted from wikimedia [1]. 5

non-covalent interactions between amino acids. In many cases, the final structure

consists of distinct, independently folded regions called domains [see Fig. 1.1(c)]. Not

all proteins have a quaternary level of structure. Quaternary structure only applies

to complex proteins that are composed of more than one polypeptide chain. Each of

the polypeptides is called a subunit or domain. In Fig. 1.1(d), hemoglobin has four

such subunits. The subunits might be identical or very different.

1.3 Theory of Protein Folding from Energy Landscape Perspective

A seminal experiment carried out by Christian Anfinsen and colleagues in the early

1960s revealed the first evidence that the amino acid sequence of a polypeptide chain

contains all the information needed to fold spontaneously into its native, three-

dimensional structure [17, 18]. This is often summarized by the phrase “sequence

determines structure”. An immediate question, then, is how do proteins find the

right conformation out of the huge number of possible three-dimensional structures

available to a polypeptide chain? A random search through the huge phase space

would take an astronomical amount of time. This problem was first pointed out by

Cyrus Levinthal [19] in 1968 and is also known as “Levinthal’s paradox”. Levinthal

suggested that nature must have devised more efficient search methods to find the

folded conformation and postulated the existence of well-defined sets of pathways by

which protein folding can take place rapidly.

Nevertheless, it seems very unlikely that a small set of specific pathways is suffi- cient to describe the complex and heterogeneous dynamics involved in protein folding. 6

Instead, a statistical description is much more appropriate. A different theoretical perspectives emerged in the late 1980s that provide a global overview of the protein’s free energy surface, also known as the “free energy landscape perspective” [2,20–24].

From this perspective folding is described as a progressive organization of an ensem- ble of partially folded configurations rather than following one specific pathway of discrete folding intermediates. This approach gives a quantitative way of describ- ing the multidimensional free energy surface of a protein in a statistical mechanical framework. The complex scenario of folding created by the large number of inter- acting degrees of freedom and the complicated network of molecular interactions, is simplified in the energy-landscape description. Through the organization of numer- ous protein configurations characterized in terms of a few collective variables (order parameters).

1.3.1 The funnel landscape

A natural protein shapes its folding energy landscape from the strong energetic bias toward the native global minimum. This energetic bias enables a protein to fold reliably within the biologically relevant timescale by reducing the conformational search for the native state. The “funnel landscape” (see Fig.1.2) is a consequence of the correlation between the energy of a conformation and its structural similarity to the native state, known as the “principle of minimal frustration” [20, 21]. This principle suggests that the main determinant of the folding mechanism of a protein is its native state topology. 7

Figure 1.2: The funnel-shape free energy landscape for protein folding. Folding occurs through the progressive organization of ensembles of structures [example is src-SH3 domain (left)] on a free energy landscape. Conformational entropy loss during folding is compensated by the free energy gained as more native interactions are formed. Adapted from Brooks et al. [2].

The implications of “funnel-like” landscape of natural proteins has been exten- sively explored through simplified native-state biased, energetically unfrustrated Hamil- tonians called G¯o-models [25]. Over the last decade, many different analytical models

[26–31] and simulations (See e.g., Ref [32] and references therein.) based on G¯o-model have revealed folding routes that agree qualitatively (and even quantitatively) with site-directed mutagenesis experiment that probe the transition state ensemble.

1.4 Remodeled Energy Landscapes of Functional Transitions in Proteins

Understanding how a given protein functions demands a full elaboration of its en- ergy landscape and how this landscape is modulated by interactions with other pro- teins, peptides, smaller ligands, as well as covalent modifications such as phosphory- lation [3,33–37]. From the energy landscape perspective functional states of proteins 8 correspond to a minimum in the high dimensional free energy landscape. Conforma- tional exchange between structures within this native basin occur with rates controlled by the height of the free energy barrier between them. As a protein fluctuates between these accessible states, a ligand can modulate the energy landscape, thereby shifting the population of the ensemble towards a particular conformation (see Fig.1.3). This picture is called the “population-shift” mechanism for allostery. The distribution of these conformational substates is highly complex and the dynamics of transitions between well-defined free energy minima are generally controlled by relatively low probability conformational ensembles. The main challenge is to describe the tran- sition state ensembles at the residue level giving a site-specific description of the transition mechanism.

The aim of my research in this thesis is to develop general analytical and com- putational approaches to characterize mechanism controlling large scale (main-chain) conformational transitions of flexible proteins.

1.5 Protein Motions

The ability for protein to interact with high specificity required for regulatory control depends on precise spatial ordering of its residues. Protein conformational dynamics can be classified in terms of the timescale of functional motions and their relationship to the average structure: from fast, small-amplitude motions to slow, large-amplitude changes (see Fig. 1.4). 9

Figure 1.3: Remodeled energy landscape to accommodate protein functional dynam- ics. (A) Schematic illustration of the free energy landscape (red: higher energy, blue: lower energy). The box region encloses conformational states that are energetically available for interconversion under physiological conditions. (B) A signal such as, ligand can remodel the energy landscape by narrowing down the number of ensemble states in a single energy well through structural rigidification of the average con- formation. (C) Alternatively, a protein may already exist in equilibrium between conformationally distinct states and a ligand can alter the relative energies or pop- ulation of the states, resulting in redistribution of their occupancies. (D) A slight variation on (C) may occur if the population of the higher-energy state shifts toward a ligand-induced conformation in the absence of ligand. Adapted from Smock and Gierasch [3]. 10

1.5.1 Fast timescale (small amplitude) motions

The fastest protein motions are molecular fluctuations such as interatomic vibrations and the rotations about chemical bonds. The next fastest are the collective motions of bonded and non-bonded neighboring groups of atoms, such as the wig-wag motions of long side chains or flip-flop motions of the short peptide loops. These motions are local with small amplitude dynamics on the picosecond to nanosecond timescale at physiological temperature. Protein dynamics occurring on this timescale involve motion within a single energy basin, where a large ensemble of structurally similar states are separated by energy barriers of less than a kBT . The interests in this timescale arises mostly from the change in conformational entropy (∆Sconf ) due to the sampling of the large number of substates. This loss in ∆Sconf of a particular side-chain of a protein can be critical for function [38].

X-ray diffraction is a very powerful technique to characterize the spatial distribu- tion around the substates on the timescale. Small amplitude motion can be inferred from X-ray diffraction data through the mean square atomic displacement called

B-factors (also known as the temperature factors and the Debye-Waller factors). In- terpreting these B-factors can be complicated because lattice disorder also contributes to the molecular fluctuations measured by X-ray diffraction data. X-ray diffraction can also be used to measure the correlated side-chain motions on the picosecond to nanosecond timescale [39]. Nuclear magnetic resonance (NMR) relaxation methods can characterize fast timescale dynamics in the picosecond to nanosecond through the order parameter, S2, and the internal correlation time scale. NMR spectroscopy has 11 been used extensively to investigate the entropic contribution of the protein-ligand binding [38, 40, 41]. Computationally, fast dynamics are on a perfect timescale for modern molecular dynamics (MD) simulations. One of the advantages of MD simu- lations is that it can easily clarify atomistic correlated motions, a phenomenon that is obscured in experiments on ensembles.

Figure 1.4: Types of protein motions in different timescale and the experimental meth- ods to characterize fluctuations on each timescale. Adapted from Henzler-Wilderman and Kern [4].

1.5.2 Slow timescale (large amplitude) motions

Proteins often posses slower large-scale conformational dynamics such as the relative movements of relatively rigid structural elements: a loop, a helix, or a whole domain.

Such motions are important for a variety of protein functions, including catalysis, sig- nal transduction and protein-protein interactions. In contrast to the fast timescale, 12 dynamics on a slow timescale define fluctuations between kinetically distinct confor- mational sub-states that are separated by an energy barrier of at least several kBT .

Such transitions correspond to timescales of microseconds or slower.

It is possible to measure the structural ensemble directly in the slow timescale us- ing cryo-electron microscopy and small-angle X-ray scattering (although with lower resolution than X-ray crystallography). The drawback with these methods is that they cannot characterize the timescale of interconversion between stable sub-states.

The clear advantage of NMR spectroscopy [42] is due to the ability of obtaining the timescale of conformational transitions, along with atomically resolved structures.

The NMR timescale for conformational exchange is defined by the rate, kex (the sum of the forward and reverse rate) relative to the difference in chemical-shift of the interconverting species. In addition, detailed dynamics occurring in microsec- ond to millisecond scale can be obtained from the relaxation parameter, Rex, that contributes to the measured overall transverse relaxation rate, R2eff . In conjunction with these atomic-resolution methods, lower-resolution methods, such as the classical biophysical techniques of fluorescence, circular dichroism, infrared spectroscopy, Ra- man spectroscopy and electron paramagnetic resonance are also very useful because they can access a large range of timescales (Fig. 1.4) with high precision. The single- molecule fluorescence resonance energy transfer (FRET) [43] is an especially powerful technique which allows direct characterization of conformational dynamics [44,45].

Unfortunately, protein dynamics on the microsecond to millisecond timescale is 13

currently out of reach for traditional MD simulations. To overcome this problem, al-

ternate approaches with simplified force fields have been developed, including normal

mode analysis (NMA) [46, 47], Gaussian network models (GNM) [48], floppy inclu-

sion and rigid substructure topography (FIRST) [49], and G¯o-models [50]. Some, of

these computational approaches are discussed in more detail in Chapter 3.

1.6 Proteins Studied in this Dissertation

In this dissertation we focus on two separate proteins with flexibility determined

allosteric transitions to illustrate our model: Calmodulin and the N-terminal re-

ceiver domain of nitrogen regulatory protein C. In this section, I review the details of

large-scale structural changes of these proteins, which are crucial for their biological

functions.

1.6.1 Calmodulin

In eukaryotic cells, calmodulin (CaM) is a ubiquitous Ca2+-binding protein that plays

a key role in Ca2+-mediated signal transduction. On binding Ca2+, CaM undergoes a conformational change that allows it to bind and regulate more than 300 target proteins [see Fig.1.5(a)]. One of the most interesting characteristics of CaM is the diversity in the target proteins it regulates. CaM’s plasticity is crucial for enabling its interaction with the diverse partners [51–53]. CaM interacts with numerous pro- teins involved in phosphorylation/dephosphorylation of myosin light chain kinase,

CaM-dependent protein kinases, phophorylase kinase and the protein phosphatase calceneurin. It also regulates numerous -signaling proteins, such as 14

(a)

(b)

Figure 1.5: Function of calcium-binding EF-hand protein calmodulin. (a) Calcium binding occurs to the EF-hand subdomains of calmodulin, which favors binding to a target molecule such as myosin light chain kinase. Binding of the target to one site on calmodulin enhances binding affinity to a second binding site [5]. Adapted from Smock and Gierasch [3]. (b) The EF-hand helix-loop-helix motif. The index finger represents helix-E (shown in yellow), thumb is helix-F (shown in blue), and, rest of the fingers are binding loop (shown in red and calcium ion in green). Adapted from Ref. [6]. 15 synthase and cyclic nucleotide phosphodiesterase. In addition, it interacts with a variety of cytoskeletal proteins, to modulate cell movement and growth [54,55].

CaM belongs to a superfamily of homologous Ca2+-binding proteins (CaBPs), characterized by a common helix-loop-helix structural motif in their Ca2+-binding sites called EF-hand [56] shown in Fig.1.5(b). The basic structural/functional unit of the EF-hand CaBPs are pairs of EF-hands rather than single EF-hand, presumably to stabilize the protein conformation and increase the Ca2+ affinity of each Ca2+-binding site over that of isolated sites [57].

Ca2+-induced conformational changes

CaM has four EF hands organized into two distinct globular domains, usually referred to as N-terminal (nCaM) and C-terminal (cCaM) domains. nCaM is made of EF- hand 1 and 2, whereas EF-hand 3 and 4 form cCaM. The two domains are connected by a flexible linker often called the central linker. The cCaM cooperatively binds to two Ca2+ ions with dissociation constant ∼ 10−6M, and nCaM also cooperatively binds to two Ca2+ ions, albeit with nearly 10-fold lower affinity. The binding of

Ca2+ to CaM induces large conformational changes. Therefore, causes an opening of the hydrophobic cleft via a rearrangement of helices within each domain such that hydrophobic residues key to target protein binding are exposed [58].

The two domains of CaM are homologous (46% sequence identity) and each folds into similar secondary and tertiary structures. Structures of apo-CaM (Ca2+-free), 16

holo-CaM (Ca2+-bound) and with various bound target peptides have been deter- mined [59–61]. In the nuclear magnetic resonance (NMR) structure of apo-CaM each domain is found to be highly α-helical with two EF hands paired by a mini antiparallel β-sheet. In each domain of apo-CaM the EF-hand helices are oriented in an antiparallel fashion, giving each domain a relatively compact “closed” form while leaving the binding loops solvent exposed for easy access to Ca2+. From X-ray struc-

ture of holo-CaM it is observed that binding of Ca2+ causes the EF-hand helices of

each domain to a more perpendicular orientation, resulting in a transition to the ex-

panded “open” conformation of the motif. Ca2+-binding to CaM occurs sequentially,

where the two Ca2+-binding sites of cCaM are filled first in a positively cooperative

manner, followed by the nCaM Ca2+-binding sites. The unbound apo-CaM and the

Ca2+-bound holo-CaM structures are shown in Fig. 1.6. Comparison of the apo-CaM

with the holo-CaM structure shows that the two globular domains rotate outward

on Ca2+-binding connected by the central linker region constituted by a long α-helix

working as a rigid spacer between the two coupled domains. This causes a struc-

tural change from an overall globular ellipsoid shape of apo-CaM to a more extended

dumbbell shape of holo-CaM and exposes the hydrophobic cleft in each domain that

make direct contact with target proteins. However, the NMR relaxation data demon-

strated that the central linker is nonhelical and highly flexible near its mid point [62].

Especially, the flexibility of this interdomain linker is key in allowing the two domains

to come together and permitting rearrangement of the relative positions of the two

domains to fit wide array of target sites [63]. 17

nCaM

4 Ca2+ N N C C

cCaM

Figure 1.6: Calcium induced conformational change in calmodulin. The EF-hands of the two domains of unbound-calmodulin [PDB: 1cfd, (left)] undergo large structural rearrangement upon calcium binding and exposes hydrophobic surface on each domain to bind target proteins. The calcium ions are shown in white spheres in bound- calmodulin [PDB: 1cll, (right)]. These three-dimensional protein figures and others in this dissertation are made using Visual Molecular Dynamics (VMD) program [7]. 18

The binding of each Ca2+ ion in each CaM domain stabilizes the partially unstruc- tured loop region of its corresponding EF-hand motif and causes an overall stabiliza- tion of the internal mobility of all of the helices. As a result each domain undergoes a large conformational change from a closed state in apo-CaM to an open state in holo-CaM. In this dissertation my goal is to explore the role of inherent flexibility of

CaM for the open/closed conformational transition of each domain.

1.6.2 Nitrogen regulatory protein C (NtrC)

Reversible protein phosphorylation is a key event in cell regulation. The covalent attachment of one or more groups to the protein can influence the con- formation and charge of the protein, thereby altering its activity. Phosphorylation constitutes the trigger of many protein switches. It is thought that perhaps 30% of the proteins encoded in human genome contain covalently bound phosphate [64], and it is well known that abnormal phosphorylation is related to major human diseases, such as cancer and diabetes [65].

“Two component” regulatory systems dominate signal transduction in both eu- karyotic and prokaryotic cells [66]. These systems use phosphotransfer between two highly conserved proteins, a histidine kinase and a response regulator, to propagate the signal [67]. The “receiver domain” of the response regulators is the molecular switch and under appropriate environmental conditions, the histidine kinase becomes activated and phosphorylates a conserved asparate in the receiver domain. Nitrogen regulatory protein C (NtrC) of enteric bacteria is a response regulator and plays a 19

central role in the control of genes involved in nitrogen . It is composed

of three domains: the N-terminal receiver domain, the central ATPase domain, and

the C-terminal DNA binding domain. Phosphorylation of the NtrC receiver domain

(P-NtrCr) at an active site Asp54, results in large structural changes [68]. The active conformation P-NtrCr is essential for oligomerization of full-length NtrC to form the fully active ATPase rings. Consequently, the energy of ATP is used to open the DNA at the transcription initiation site of the σ54-holoenzyme form of RNA

polymerase [69] (see fig.1.7).

Figure 1.7: The transcriptional activation mechanism by NtrC. (a) The glnA gene is transcribed by the σ54-containing polymerase which alone cannot initiate transcrip- tion. The unphosphorylated NtrC dimers can bind only one site at the enhancer, still insufficient to stimulate transcription. (b) The phosphorylated NtrC dimers can bind both sites of the enhancer. (c) Their binding induces DNA looping. Contact between the activator and the polymerase stabilizes the interaction between the polymerase and DNA, thereby initiating transcription. Adapted from web-book [8]. 20

Phosphorylation induced conformational changes

The three-dimensional structures of the inactive-state (I-NtrCr) and active-state (P-

NtrCr) conformations of NtrCr have been solved by NMR [68, 70, 71]. The NtrCr adopt a (β/α)5 topology, as found in all receiver domains. Alternating β-strands and

α-helices in the primary structure fold into a five-stranded parallel β-sheet that forms the core of the protein surrounded by two α-helices on one side and three on the other [72] (see Fig. 1.8).

α1 β5 α2 β2 α4 α3 β1 β3 α5 β4

N

C

Figure 1.8: Phosphorylation induces large conformational change in the N-terminal receiver domain of NtrC at the active site Asp54 (near the C-terminal end of β3). The unphosphorylated (inactive) structure is shown in blue and the phosphorylated (active) structure in red.

Phosphorylation (activation) induces large conformational changes involving a 21 displacement of β4/β5 and α3/α4 away from the active site of Asp54. The α3-β4-α4-

r β5 region (also referred as the “3445” face) is the potential ‘switch region’ of the NtrC .

There is also substantial rearrangement of the loops above the active site, β3-α3 loop, as well as the β4-α4 loop and α4-β5 loop leading into and out of α4, respectively. These loops fluctuate freely in the inactive state and become more restricted in the active state conformation. In addition, α4 undergoes substantial axial rotation and register shift by about two residues from the N to C terminus, corresponding to more than half a turn is observed upon phosphorylation [68]. Subsequently, the reorientation of the side chains of α4 exposes its hydrophobic surface, which is believed to promote the oligomerization of NtrC necessary for ATP hydrolysis [68].

NMR relaxation experiments have revealed a strong correlation between phospho- rylation driven activation of NtrCr and its microsecond time-scale backbone dynamics

[13]. There is evidence of dynamical exchange between the inactive state and active state conformations. These studies have also suggested that both conformations are populated at room temperature, with the population of the active state being consid- erably smaller than that of the inactive state in the unphosphorylated form, while after phosphorylation, the active form dominates. Therefore, to understand the molecular basis of the activation, it is important to explore how the conformational transition occurs and why the phosphorylation stabilizes the active state conformation.

1.7 Organization of Dissertation

The dissertation is organized as follows: 22

In Chapter 2, I give a brief overview of some different approaches to study protein conformational dynamics. I mainly focus on coarse-grained protein models developed to explore the large-scale structural changes of proteins important for their functions.

In Chapter 3, I first present the variational model that I developed to study the conformational transitions of proteins in this dissertation. In this model, two differ- ent approaches are followed to combine the single-basin energies of the meta-stable states of the proteins. In the first approach, the energy of individual contacts are cou- pled during the transition from one basin into another and in the second approach the total single-basin energies are coupled smoothly following the multiple-basin en- ergy model [11]. I also discuss the original variational model that was developed by

Portman, Takada and Wolynes [26,27] to characterize protein folding mechanism.

In Chapter 4, I investigate the Ca2+-induced open/closed structural change in the

N-terminal domain of calmodulin (nCaM) to investigate how conformational flexibil- ity of proteins plays a crucial role in their functions. This study also focuses on the recently proposed EFβ-scaffold mechanism in the EF-hand family proteins [73].

In Chapter 5 in the dissertation, I discuss how inherent flexibility of a protein

molecule influences the mechanism controlling allosteric transitions. I first compare

the open/closed conformational transition mechanisms of the two domains of calmod-

ulin (CaM), which are “odd-even” paired EF-hands and then also study the conforma-

tional transition in the “even-odd” paired EF-hands of engineered fragment CaM2/3.

In Chapter 5, I mainly focus on a interesting mechanism call “cracking” (local partial

unfolding and refolding) and its association with inherent flexibility of proteins. 23

In Chapter 6, I extend the study presented in Chapter 5 for the two domains CaM.

In this chapter, I mainly focus on exploring the complex interrelationship among

topology, plasticity and conformational transition mechanism of the two domains of

CaM, which are topologically similar and homologous (similar in sequence identity).

I also use an idea of simple strain energy analysis to show how a very high elastic

stress in some specific regions of proteins often results in cracking to relieve this stress

by transiently increasing its flexibility (in the transition state).

Chapter 7 focuses on the allosteric transition mechanism of the receiver domain of a response regulator protein, the nitrogen regulatory protein C (NtrC). This protein undergoes large structural rearrangement upon phosphorylation through a dynamic population shift between its preexisting inactive and active states. In addition to studying the mechanism of inactive/active conformational change, we also explore the folding mechanism of both the inactive and active states of the protein. The results from NMR relaxation experiments [13] of this protein are also compared qualitatively with our predicted results in this chapter.

In Chapter 8 (Conclusions), I first summarize the results that I presented in this dissertation and also provide some insights for the future works based on the research accomplished in this dissertation.

Finally, in Chapter 9 (Appendix) I give some detailed formulation of the varia- tional models described in Chapter 3. CHAPTER 2

BACKGROUND INFORMATION ON COARSE-GRAINED

MODELING OF PROTEIN CONFORMATIONAL TRANSITIONS

2.1 Introduction

Theoretical and computational techniques along with the experimental measurements

are needed to understand the detailed mechanism of large-scale structural changes

of proteins. All-atom Molecular Dynamics (MD) simulation has proved especially

successful in reproducing the experimentally observed dynamical events that occur

near the native state of proteins at the time-scale of nanoseconds [74, 75]. To model

motions (relevant to functions) on the microsecond to seconds scale, one must use

alternative approaches due to high computational cost of all-atom MD. One approach

to extend the time-scale of MD simulation, pioneered by Levitt and Warshel [76,

77], is to use coarse-grained (low resolution) models. In this approach proteins are

represented with groups of atoms (the pseudo-atoms or beads) [78] that interact

through effective force-fields and allow microsecond time-scale reachable for small

proteins. For the simplest coarse-grained models of proteins one typically considers

the Cα-backbone atoms. The number of atoms in the coarse-grained models are

further greatly reduced by using implicit solvation rather than explicit water molecules

[79].

24 25

In recent years there have been several efforts to describe conformational transi- tion mechanisms of proteins using such reduced, coarse-grained models. In an early approach, Gerstein and coworkers [80, 81], simply linearly interpolated the coordi- nates of two known end-state structures into steps followed by energy minimization of the intermediate structure at each step. The minimization eliminates steric clashes between atoms, thereby ensuring each intermediate structure is chemically realistic.

Gerstein and coworkers have identified over several hundred distinct molecular mo- tions in the Database of Macromolecular Movements (http://MolMovDB.org). A similar approach using internal coordinates is presented in Ref. [82]. These methods can capture the conformational change qualitatively even for large systems.

Other attempts to characterize structural change are based on normal mode analy- sis (NMA) [83,84]. In NMA, one focuses on small deviations from a conformation that is a local energetic minimum. Approximating the energy surface by a harmonic poten- tial, one focuses on the normal modes to give atomic level insight to the mechanism for biologically relevant conformational change of proteins/protein complexes [85]. The low-frequency normal modes, which are often correlated with large-scale structural rearrangements, have been helpful to rationalize the motion that often occurs upon protein-ligand binding.

A simplified harmonic model, called the Elastic Network Model (ENM) is a con- venient and efficient way to calculate normal modes. In this model, introduced by

Tirion [86], harmonic bonds represent interactions between atoms in proximity in the native structure. This potential can be viewed as simple one parameter harmonic 26

G¯o-model. The motions relevant to protein functions are identified as one of many low frequency normal modes. Bahar and coworkers [87] later simplified the original

Tirion potential by coarse-graining to the residue level, representing the protein us- ing the Cα atoms. The resulting protein model had similarity to Flory’s earlier work on polymer networks [88]. The main advantage of an ENM based on Tirion poten- tial is that it frees the calculation from requiring the native structure (determined experimentally) be a minimum of any particular all-atom empirical potential.

In this chapter, I review some approaches based on the normal modes from a single minimum and discuss how these models have been extended to accommodate two minima such as an active and inactive state. Next, I briefly present some coarse- grained simulation models, that can capture a protein’s large-scale structural changes between multiple meta-stable conformations relevant to biomolecular functions.

2.2 Elastic Network Models for a Single Minimum

In the elastic network model, total energy of a conformation of a molecule is repre- sented as N k (|r |−|r0 |)2 for |r0 | ≤ r 2 ij ij ij c E =  i,j (2.1) Tirion  X   0 otherwise,  where rij = ri−rj is the vector connecting atoms i and j, the zero superscript indicates the equilibrium conformation defined in the protein data bank (PDB) structures.

Atoms i and j in the molecule that contains N atoms are connected by a Hookean spring if their separation in the native structure is closer than a cutoff distance rc

(For example see Fig. 2.1). The force constant k in Eq. 2.1 is chosen so that the 27 overall magnitude of harmonic fluctuations agrees with experimental measurements, such as the temperature factors (B-factor) of X-ray characterized structures [86,87].

Note that the energy function, ETirion, is a minimum for any chosen configuration of any system, thus eliminating the need for minimization prior to NMA.

Several studies have shown that this simple Tirion potential is sufficient to re- produce the low-frequency normal modes of proteins as produced by more complete potential energy functions (for example see Ref. [89]). The consistency suggests that low-frequency normal modes are predominantly a property of the shape of the molec- ular system [90–92]. NMA provides a simple and appealing explanation of biolog- ically relevant conformational changes as the natural motion along a small set of low-frequency modes [9].

The Gaussian Network Model (GNM) is an isotropic elastic model which con- strains the magnitude, but not the directions of the fluctuations of residues [87]. In the GNM, each residue of the protein is represented by its corresponding Cα backbone atom. The total energy of the protein is

N k 0 2 0 2 (rij − rij) for |rij| ≤ rc, E =  i,j (2.2) GNM  X   0 otherwise.  Here, we have used the similar notations as the Eq. 2.1. Constraining deviations in bond vectors (rather than bond distances) allows Eq. 2.2 to be written as

k E = ∆RTΓ∆R, (2.3) GNM 2

0 where ∆R represents the 3N-dimensional residue fluctuations vector of ∆ri = ri −ri , 28

Figure 2.1: The elastic network model. The LAO binding protein shown in ribbon representation (top). The elastic network model for the LAO binding protein (bot- tom). Pairs of atoms < 8 A˚ apart are connected by harmonic springs. Adapted from Tama and Sanejouand [9]. 29

∆RT is its transpose. Here, Γ is the N × N Kirchhoff matrix given by

−1 if i =6 j and |rij| ≤ rc   Γij =  0 if i =6 j and |rij| >rc (2.4)   − Γii if i = j   i,iX6=j   for a pair of residues i andj, where rc is the cutoff distance (usually 7−8 A).˚ The

mean-square fluctuations of residue i are given by

3k T h∆r2i = B Γ−1 . (2.5) i k ii   The mean square fluctuations of residues are experimentally measurable (X-ray B-

factors or root mean-square difference [rmsd] between different models from NMR)

and several studies have demonstrated that the fluctuations predicted by the GNM

are in good agreement with experimental B-factors [87,93] given by

8π2 B = h∆r2i. (2.6) i 3 i

Since the inverse of the Kirchhoff matrix can be expressed in terms of the eigen- values λl and eigenvectors ul of Γ, the mean-square fluctuations of residue i can also

be written as 3k T N h∆r2i = B λ−1u2 . (2.7) i k l li Xl=2 There are in total N − 1 modes generated by GNM (the first eigenvalue representing rigid body motion being zero) with the eigenvectors representing the magnitude of

fluctuations of the modes without information regarding the directions of fluctuations. 30

2.3 Elastic Network Models for Two Minima

Normal modes describe conformational fluctuations only in the vicinity of a single minimum energy conformation and therefore cannot describe the complete pathways between two metastable states. Nevertheless, some insight about a conformational transition mechanism can be obtained from comparing the overlap between the struc- tural difference obtained from the two X-ray crystal structures (say open and closed states) and the normal modes (from open- or closed-state) of the protein [9]. This gives a simple explanation of the conformational dynamics as the natural motion along a particular, soft normal mode. Although intuitive, normal mode dynamics are only a component of the appropriate description of the transition mechanism when the two metastable states are distinct and well-defined. In this case, the mechanism is described by the structural ensemble corresponding to the free energy barrier between the two states rather than the harmonic motions near the energy minima.

The energy barrier between the two states can be estimated using an elastic net- work model for each metastable state. Within the normal mode picture, structural deformations away from a stable conformation increase the elastic strain energy giving the stable conformation its structural integrity. Each metastable basin has its par- ticular elastic response to conformational deformations as illustrated in Fig. 2.2(a).

The barrier can be estimated by intersection of the strain energies obtained from each state. This is a rather crude estimate because as a protein moves away from one min- imum into the basin of another minimum (e.g., from the open state conformation to the closed-state conformation), the pathways between these two distinct states need 31

(a)

Energy Closed-state

Open-state Reaction coordinate (b) Energy Closed-state Open-state Reaction coordinate

Figure 2.2: Schematic view of energy profiles of protein conformational dynamics from elastic network approach. (a) Conventional elastic network model only captures the conformational dynamics within the single basin of the open-state or closed-state structures (solid curves) due to the harmonic approximation of the energy. The actual energy profile of closed/open conformational transition is depicted in dotted curve. (b) Large strain energy barrier results from the linear elastic network model during the conformational change due to high stress (solid curves). This barrier height is reduced through cracking (local unfolding and refolding) to relive high elastic stress (dotted curves), as shown by Miyashita et al. [10] using a nonlinear elastic network approach. 32

to be smoothly coupled together. This is not possible from a completely harmonic

description of the energy basins. Moreover, the large-scale conformational changes

in proteins often break some contacts and form other contacts. That is to say, the

elastic networks defined by the two metastable states must be coupled.

Miyashita et al. [10, 94] presented the first such coupled elastic model of confor- mational transitions. In this study, conformational deformations were defined in an iterative manner. Instead of moving the structure from the initial to the final form by uniformly interpolating the coordinates, the deformation is made in small steps, and normal modes are calculated for each deformed structure. They used the combina- tions of one or three normal modes that are most relevant to conformational change

(high overlap modes). Using only these low-frequency normal modes they generated conformational transitions that resemble the energy minimum pathway of the po- tential energy surface. They implemented this approach to study the open/closed conformational change of the protein adenylate kinase. The calculated elastic strain energy barrier for the open/closed transition resulting from their approach was very high (∼ 20kBT ). This is illustrated, schematically with the solid lines in Fig. 2.2(b).

Further analysis revealed that the strain energy is not uniformly distributed among the residues when its structure is deformed either from the open- or the closed-state minimum conformation. In particular, they observed that the high strain energy bar- rier of this protein came from specific contacts, which were under very high elastic stress during the deformation. This interesting observation further led them to hy- pothesize that these particular residues may “crack”; that is, unfold partially, during 33

the conformational change to relieve the high strain energy. To include this possibil-

ity, they considered a cracking model, in which a residue is allowed to unfold if the

strain energy of the residue exceeds a threshold value. An unfolded residue loses its

contacts but gains structural entropy through local unfolding. Including this partial

unfolding leads to a much lower energy barrier between local energy minima, facili-

tating faster transitional kinetics. The adaptive contact map that defines the elastic

network along the transition route provides the coupling in the this model.

Maragakis and Karplus [95] offered another way to couple two harmonic potentials

representing two states of a transition. The conformational transition is modeled by

energetically connecting the basins at their common energy point and the barrier

crossing part is made differentiable using an analogy to the quantum mechanical

coupling of two potential energy surfaces as

E + E − (E − E )2 +42 E = 1 2 1 2 , (2.8) p 2

Here, E1 and E2 are the elastic energy due to the deformation from the minimal

structure in basin 1 and 2, respectively. The small parameter  controls the sharpness of the coupling near E1 ≈ E2 and the barrier height relative to the stable confor-

mations. The minimum energy path of conformational change is then searched by

the steepest descent at the saddle point (the intersection of the basins) towards the

minima of the single basins by minimizing the integration of E in Eq. 2.8 along the

path using CHARMM modules [96].

Chu and Voth [97] also used a very similar approach as defined above. Instead

of generating a single, multidimensional double-well potential like in Eq. 2.8, a set 34

of networked, one-dimensional double-well potentials are used to connect two protein

conformations in their model. As a result, the interpolated potential by this latter

method has many more intermediate states and saddle points than Maragakis and

Karplus [95].

2.4 Coarse-grained MD Models for Conformational Transitions

Coarse-grained MD approaches have been used to study large-scale structural changes

between multiple conformations. In these approaches the interactions between Cα- monomers are modeled with energetic bias towards the respective native structure.

The folding free energy surface corresponding to each protein structure can be de- scribed by a smooth funnel-shaped basin with the native conformations having the minimum energy at the bottom of the basin. The main challenge in modeling confor- mational changes from one equilibrium conformation to other emerges from combining these well-separated energy basins into a continuous energy surface. Two such models have been proposed recently with promising extensions to multiple basins (to include additional intermediate conformations).

2.4.1 Takada-Onuchic-Wolynes model

In this model two or more off-lattice G¯o-potentials [98] are connected smoothly [11], following an approach very similar to the model defined in Eq. 2.8. The model defines the single-basin energies E(Γ|Γ1) and E(Γ|Γ2) constructed using the contact

map obtained from the native structure in basin 1 and basin 2 respectively. Here,

Γ collectively represents the conformations of the protein and Γ1 and Γ2 correspond 35

Figure 2.3: An illustration of multiple-basin energy landscape of proteins. Two funnel-shape single basins that are used for model construction are displayed by dashed lines. Below schematic of two protein configurations (unbound and ligand- bound) are shown. Conformational change occurs with the rearrangement of some contacts. Contacts specific to the unbound conformation are broken, and new con- tacts are formed in bound conformation. Thick solid bonds correspond to covalent linkages. Adapted from Okazakai et al. [11]. 36

to the native-state structures of the protein in basin 1 and 2, respectively (Fig. 2.3).

For this two-basin model, the relative stability and energy barrier are adjusted via

parameters ∆E and ∆ as follows: ∆E modulates the relative stability of the two

basins by making one energy minima higher than the other and the coupling constant

∆ makes a smooth double basin potential

E(Γ|Γ )+ E(Γ|Γ ) + ∆E E(Γ|Γ ) − E(Γ|Γ ) − ∆E 2 E = 1 2 − 1 2 + ∆2. (2.9) MB 2 2 s 

Ref. [11] reports simulation results from this model for the conformational tran- sitions of four proteins (glutamine-binding protein, HIV-1 protease, dihydrofolate re- ductase and a structural analog of calmodulin) between their respective unbound and ligand-bound conformations. In this work, the transition barrier height was adjusted by increasing the coupling constant ∆ (lowering the barrier height) until frequent visit between two basins could be observed. The relative stability ∆E of the two basins was also adjusted to give equally frequent transitions between basins. Sudden and infrequent transitions between energy basins are reported, one protein showing signs of cracking.

2.4.2 Best-Hummer model

In the Best-Hummer model two or more single-basin off-lattice G¯opotentials [99] are merged by summing up the corresponding Boltzmann weights [100]. This approach is physically equivalent to pooling the accessible conformational sub-states defined by the individual energy functions. The combined energy of the system for this model 37

is expressed as, 1 E = − ln e−βE(Γ|Γ1) + e−β(E(Γ|Γ2)+0) (2.10) MB β  where β = 1/kBT and 0 is the offset parameter to balance the relative stability of

the energy basins. The energy surfaces E(Γ|Γ1) and E(Γ|Γ2) correspond to single-

basin minima at two conformers Γ1 and Γ2, respectively and can be extended to use

harmonic ENM (as done by Zheng et al. [101]) or all-atom transferable potentials.

This model was first applied to investigate the transition between the beta-sheet

(wild-type structure) and helix (the N11L-L12N double mutant structure) of the N-

terminus domain (switch region) dimeric Arc repressor. Their study was able to

capture the conformational transition occurring in the micro- to millisecond time

scale (as suggested by NMR) with local unfolding of the switch region followed by

rapid refolding.

The Best-Hummer model was later modified to study the apo/holo conforma-

tional transition mechanism of the C-terminal domain of calmodulin (cCaM). Here,

combined potential is defined through

1 0 0 E = − ln e−β E(Γ|Γapo) + e−β (E(Γ|Γholo)+0) , (2.11) MB β0   where β0 is not the inverse simulation temperature but determines the barrier height and 0 controls the relative population of the two states as before. This approach

is very similar to the MB model of Takada-Onuchic-Wolynes and the parameters

0 β and 0 play the very similar roles as ∆E and ∆ in Eq.2.9, respectively. From

their simulations unfolded apo-state of cCaM with minor population was observed in 38 agreement with experimental studies [102,103]. CHAPTER 3

MODEL AND METHODS

The kinetics of conformational transitions between well folded basins are generally

controlled by relatively low probability partially ordered ensembles. The main chal-

lenge is to describe the transition state ensembles at the residue level, giving site

specific description of the transition mechanism. In this thesis, large-scale conforma-

tional changes of proteins are investigated through a coarse-grained analytical model.

This approach is based on the variational model previously developed to study fold-

ing by Portman, Takada and Wolynes [26, 27, 104]. The site resolved variational

model has proven to be very reliable in characterizing the partial ordered ensemble in

the folding mechanism of two-state proteins. This model has been successfully used

to predict the transition state ensemble (TSE) of folding of the protein azurin, by

comparing with the experimentally measured φ−values [105, 106]. The variational model is also applied to investigate the folding for a set of α, β and αβ proteins that have shown good agreement between theory and experiment for the overall folding rates [107]. Recently, this model is also used to predict the evolution of folding nucleus for a set of two-state fast folding proteins in terms of capillarity theory [108].

In this chapter, I first describe the variational model, which I developed to study protein conformational transitions between two known structures. Next, I give a brief overview of the original variational model used to characterize folding.

39 40

3.1 Variational Model of Conformational Transitions

The variational method is based on a coarse-grained free energy functional that em-

phasizes the dominance of native interactions consistent with the principle of minimal

frustration [15, 20, 24, 109]. In this model, configuration of a protein is modeled as a

collection of beads, represented by the Cα position vectors of N amino acid residues

of the polypeptide backbone. The interactions between beads are restricted to those

nearby in the native structures.

3.1.1 Hamiltonian of the protein system

The variational approach starts with a simple model Hamiltonian of the protein

[Fig. 3.1(a)],

H = Hchain + Hint. (3.1)

The first term (Hchain) models a collapsed stiff chain of monomers located at positions

{ri} 3 3 βH = r · Γ · r + B r 2, (3.2) chain 2a2 i ij j 2a2 i ij i X X where β =1/kBT is the inverse temperature, a is a microscopic length scale taken to

be the mean square distance between adjacent monomers in the chain, B is conjugate to the radius of gyration of the chain. The polymeric correlations between any two

Cα positions are given by Γ−1. (See Section 9.1 in Appendix for an explicit form of Γ for a freely-rotating stiff chain.) The second term in the Hamiltonian (Hint)

represents the (two-body) interactions between distant monomers in sequence. These 41

interactions are modeled by a pairwise potential u(rij),

Hint = iju(|rij|) (3.3) X[ij]

where ij, the strength of the interaction, depends on the identity of the residues i and

j parametrized with Miyazawa−Jernigan contact energies [110]. The sum in Eq. 3.3

is restricted to pairs of residues in contacts in the native structure. For convenience,

the interaction potential u(|rij|) is modeled as a sum of three Gaussian potentials

representing short (s)-, intermediate (i)-, and long (l)-range parts

3 u(r)= γ exp − α r2 , (3.4) k 2a2 k k=(sX,i,l)   where (αs > αi > αl) are the ranges of the short-, intermediate-, and long-range interactions, respectively. The intermediate-range term is repulsive (γi > 0) and the

long-range term is attractive (γl < 0); the intermediate- and long-ranged potential

parameters are chosen so that the sum of these two terms gives a potential well at

an appropriate distance for contacts in the native structure. The short-range term

is repulsive (γs > 0) and represents the hard core repulsion between residues. The

interaction Hamiltonian defined in Eq. 3.3 can describe the dynamics of folding to a

single native conformation. To model conformational transitions between two known

structures. The interactional potential, Hint, needs modifications. In particular, the

sum over pair residue contacts [ij] in Eq. 3.3 is different for two distinct reference

structures. In this case, the set of contacts for two reference structures, structure

1 and structure 2 are denoted as [ij]1 and [ij]2, respectively. Each set of contacts

defines a distinct, folded energy basin. In this dissertation, I follow two different ways 42

to model the energy of conformational transitions between two basins, which will be

discussed later in Section 3.1.4. (a) (b)

H = H + H 0 chain int H = Hchain + Hvar

Figure 3.1: A schematic of the variational method for the Hamiltonian of a protein system for a single conformation. The residues are represented as beads using the backbone alpha-carbon positions of a protein. The total Hamiltonian of the protein model consists of the Hamiltonian for the backbone chain and contact interactions (a). We replace the interaction Hamiltonian with a solvable Hamiltonian expressed as a quadratic function of the positions of the monomers (b). The coefficients of the quadratic function determine how localized each residue is around its mean positions. This figure is inspired from Shen et al. [12].

3.1.2 Reference Hamiltonian for two known structures

The partition function for the system with the model Hamiltonian (in Eq. 3.1)

Z = dri exp(−βH) (3.5) i Z Y 43 cannot be solved exactly. Instead, the free energy is approximated with the help of a reference density

P0({ri}) ∝ exp [−βH0 ({ri})] (3.6)

that corresponds to a partially ordered polymer. The reference Hamiltonian, H0, [as depicted in Fig. 3.1(b)] is taken as a summation of single-body terms that localizes

N each amino-acid residue (monomer) around its native position, ri ,

3 βH = βH + C (r − rN)2. (3.7) 0 chain 2a2 i i i i X

Here the N variational parameters, {Ci} is a set of harmonic constraints conjugate to

α N the fluctuations of the polymer about each of the native C positions {ri }. In general, the values {Ci} control the inherent flexibility of each monomer in the ensemble.

For folding, the magnitude of these fluctuations distinguish the two stable phases of the protein: the globule (or unfolded) state corresponds to large fluctuations (weak constraints, small {Ci}) and native (folded) state corresponds to small fluctuations

(strong constraints, large {Ci}). To model conformational transition between two known native structures, we introduce another set of N variational parameters, {α}

N1 N2 (0 ≤ αi ≤ 1) to specify residue positions as an interpolation between {ri } and {ri }

N for structures 1 and 2, respectively (Fig. 3.2). As a result, the native positions ri in

Eq. 3.7 is redefined as,

N N N1 N2 ri ≡ ri (αi)= αiri + (1 − αi)ri . (3.8) 44

TS Free energy

Conformation 2 Conformation 1 Reaction coordinate

Figure 3.2: A schematic of the free energy profile of conformational transitions be- tween two native (folded) sub-states 1 and 2. Conformations 1 and 2 are separated by a barrier correspond to the transition state (TS) barrier for some value of the reaction coordinate. The residues of the protein systems are represented by the positions of the alpha-carbon as beads. 45

The reference Hamiltonian with the set of constraints [{Ci}, {αi}] presented here, may be viewed as modeling the structural change of a protein from conformation 1 to

2, through a change in the interpolation parameter {αi} from 1 to 0, respectively. At the same time, the other set of variational parameters {Ci} control the magnitude of the fluctuations of each residue around the corresponding interpolated native confor-

N mations, {ri (αi)}, during the transition. Unlike folding, the values of the constraints

{Ci} that describe conformational transitions are expected to be large for both the folded native state conformations 1 and 2. Still each state has its distinct pattern of flexibility, and the flexibility of each residue can adapt along the transition route.

In particular it is possible for residues to be less localized near the transition state, revealing “cracking” or local unfolding and refolding during conformational change.

Since the reference Hamiltonian (Eq. 3.7), is quadratic in the positions {ri}, H0 can be expressed as 3 βH = δr G−1δr + const. (3.9) 0 2a2 i ij j ij X In this notation, the Boltzmann weight for a constrained chain described by H0 is proportional to 3 ω({C}, {α}) ∝ exp − δr G−1δr (3.10) 2a2 i ij j " ij # X where Gij denotes the correlations of monomers i and j relative to the mean locations,

2 Gij = hδri · δrji0/a , (3.11)

th with δri, the position of the i monomer

δri = ri −hrii0 = ri − si. (3.12) 46

Here, the average over the reference Hamiltonian H0 is denoted by

h···i0 = dri[··· exp(−βH0)]. (3.13) i Z Y

Connecting to the parameters of the reference Hamiltonian, the correlations Gij are given by the matrix inverse

−1 Gij =Γij +(B + Ci)δij, (3.14) while the mean position of each monomer is given by,

N si = hrii0 = GijCjrj (αj). (3.15) j X Here, the reference structure interpolates between the coordinates of each native structure,

N1 N2 si = GijCj[αjrj + (1 − αj)rj ]. (3.16) j X 3.1.3 Approximating the free energy surface

The variational free energy surface is developed through the Feynman–Gibbs– Peierls–

Bogoliubov expression

F [{C}, {α}] ≈ −kBT log Z0 + hH−H0i0, (3.17) where

Z0 = dri exp(−βH0), (3.18) i Z Y is the partition function of the reference Hamiltonian. [The detailed derivation is shown in Sect. 9.2 in Appendix]. 47

Substituting the expression of H and H0 gives the variational free energy

F [{C}, {α}]= E[{C}, {α}] − TS[{C}, {α}], (3.19)

where the energy

E[{C}, {α}]= ijhu(|rij|)i0, (3.20) X[ij] and the entropy

N 2 S[{C}, {α}]/kB = log Z0 + Cih(ri − ri ) i0, (3.21) i X are expressed as function of the variational constraints [{C}, {α}].

Since H0 is quadratic, Z0 and all of the averages over H0 can be expressed in

terms of correlations G and average positions {si} of the monomers. One useful

way to calculate these averages is by introducing approximations to the density of

monomer i, ni(r)= hδ(r − ri)i0,

3 3/2 3 (r − s )2 n (r)= exp − i , (3.22) i 2πa2G 2a2 G  ii   ii  and the pair density between i and j, nij(r)= hδ(r − (ri − rj))i0,

3 3/2 3 (r − (s − s ))2 n (r)= exp − i j , (3.23) ij 2πa2δG 2a2 δG  ij   ij 

2 2 Where δGij = h(δri − δrj) i0/a = Gii + Gjj − 2Gij. These densities depend on the

constraint parameters [{C}, {α}] through Gij and {si}. Averages over H0 can be

calculated through ni(r) and nij(r). For example,

hu(|ri − rj|)i0 = drnij(r)u(r). (3.24) Z 48

In this way, the variational free energy can be viewed as a density functional with a

particular approximation to the density that simultaneously incorporates the poly-

meric correlations and the monomeric fluctuations about the average positions. There-

fore, with some manipulations we can calculate energy and entropy in terms of the

monomer correlations and mean positions.

The entropy in Eq. 3.21 can be written more explicitly as

3 3 3 S[{C}, {α}]/k = log det G − s · Γ(ch) · s + C G . (3.25) B 2 2a2 i ij j 2 i ii ij i X X The different terms in entropy in the above equation can be defined as follows. The

first term is the entropy of the backbone polypeptide chain due to fluctuations, the second term is the entropy of the loss of fixing each monomer to the average posi- tions, and the last term is the entropy of the vibrations about the mean position

2 2 (= (3/2a ) Cihδri i0). Similarly, the pair potential can be averaged over H0 to give

the energy P

E[{C}, {α}]= ijuij, (3.26) X[ij] where

uij = hu(|rij|)i0 (3.27)

2 γk 3 αk(si − sj) = 3/2 exp − 2 . (1 + αkδGij) 2a 1+ αkδGij k=(Xs,i,l)   Finally, we choose to measure the free energy relative to the unconstrained chain

∆F [{C}, {α}] = ∆E[{C}, {α}] − T ∆S[{C}, {α}], (3.28)

where, for example, ∆F [{C}, {α}]= F [{C}, {α}] − F [{C =0}, {α}]. 49

3.1.4 Modeling the energy of conformational transition

During conformational transition of a protein from one single basin (conformation

1) to another (conformation 2), the energy of these two basins need to be connected smoothly. I have explored two different approaches. In the first approach, contacts presented exclusively in one of the metastable states are switched on or off depending on the energy of the individual contact. In the second approach the global energy determines the coupling of the two basins.

Switching of contact energy

In this approach, I consider a two-state model in which the contacts are separated into three sets: (i) contacts that occur in reference structure 1 only, (ii) contacts that occur in reference structure 2 only, and (iii) contacts in common from both reference structures 1 and 2. Then, I consider that each contact involved exclusively with only one structure is in equilibrium with energy from the other state (which is zero). That is, I replace the pair energy for contacts in sets (i) and (ii) according to

ijuij = −kBT log [1 + exp(−ijhu(rij)i0/kBT )] . (3.29)

This form is analogous to coupling between contact energies of conformational basins in simulations of conformational change [97,111], analogous to the Eq. 2.10. Contacts described by Eq. 3.29 independently switch on or off depending on the conformational density characterized by a set of constraints {C, α}.

One difficulty with this approach, arises in adjusting the relative stability of the two energy basins. Since, I separate the contacts into different sets based on the 50

overlap of the two contact maps, it is convenient to adjust the relative stability with

simple off-set parameter. This motivated me to introduce a second approach to couple

the two basins together.

Multiple basin energy model

In this model, the energy of the system E[{C}, {α}] is composed of two single-basin energy E1[{C}, {α}] and E2[{C}, {α}] for conformation 1 and 2 of the protein, re- spectively. Following the multiple-basin energy model defined in Ref. [11], I define the energy using Eq. 2.9 as,

E [{C}, {α}]+ E [{C}, {α}]+ E0 E[{C}, {α}]= 1 2 2 E [{C}, {α}] − E [{C}, {α}] − E0 2 − 1 2 + ∆2. (3.30) 2 s  where E0 is the energy offset value to adjust the relative stability of the two basins and

∆ is the coupling constant that modifies the energy barrier directly by connecting the two basins smoothly. The individual energy E1[{C}, {α}] (E2[{C}, {α}]) is derived

from two-body interactions between native contacts [ij]1 ([ij]2) in the single basin

of the conformation 1 (conformation 2) expressed as, E1[{C}, {α}] = [i,j]1 ijuij P (E2[{C}, {α}]= [i,j]2 ijuij ). P 3.1.5 Analysis of conformational transition route

Analysis of the free energy surface, F [{C}, {α}] described by the set of variational pa-

rameters {Ci, αi} follows the program developed to study folding [27]. The mechanism

controlling the kinetics of the transitions is determined by the ensemble of structures 51 characterized by the monomer density at the saddle points of the free energy. At this point, to simplify the present model I restrict the interpolation parameters {αi} to be same for all the residues or αi = α0 following Ref. [82]. With this modification, the numerical problem of finding saddle points with respect to 2N variational parameters

{Ci, αi}, simplifies to minimizing the free energy, F [{C}, α0] with respect to N vari- ational parameters, {Ci} for a given α0. For this one needs to be able to differentiate the free energy with respect to {Ci}, ∂αF = ∂F/∂Cα. These derivatives can be easily computed by the chain rule using the elementary derivatives ∂αGij = −GiαGαj and

∂α(log det G)= −Gαα. The transition route connecting the two single energy basins for conformations 1 and 2, is found as follows.

First, I find the set of {Ci} for the native states of the two known protein structures separately, using the folding program (which is discussed in the next section) at their corresponding folding temperature Tf . Then, I use these {Ci} as an initial guess to find the conformational transition route for a given transition temperature T , by minimizing F [{C}, α0] with respect to {C}, while changing α0 at small steps from 1 to 0.

3.1.6 Order parameters

The order parameters described below gives a residue-level, structural characteriza- tion of the transition state ensemble (TSE). The first order parameter describes the change in main-chain conformational flexibility of the protein backbone along tran- sition route, while the second parameter measures the structural similarity of the 52 protein per residue level compared to its corresponding native state conformations.

Conformational flexibility

In the variational free energy surface, F [{C}, α0], the N variational parameters, {Ci}, are conjugate to the Debye-Waller factors {Bi}. Each Ci indicates how localized a given residue i is around the native state positions. Once the constraints {Ci} are determined, the main-chain conformational flexibility can be characterized by the mean-square fluctuations

2 2 Bi[{C}, α0)] = hδri i0 = a Gii (3.31)

of each α-carbon of the polypeptide chain from its average positions, δri, along the transition route as α0 goes from 1 (conformation 1) to 0 (conformation 2). Here, the correlations Gii are given by Eq. 3.11. These natural order parameters of the reference Hamiltonian H0, Bi[{C}, α0)] are related to the B-factors or temperature factor contain information about the degree of structural order and flexibility of each residue.

Structural similarity during transition

For a given set of constraints, [{C}, α0], the monomer density of a partially ordered ensemble can be characterized by the Gaussian measure of similarity to conformation

N1 1 described by {ri },

3αN ρ(1)[{C}, α ] = exp − (r − rN1 )2 i 0 2a2 i i   0 3αN (s − rN1 )2 = (1+ αNG )−3/2 exp − i i , (3.32) ii 2a2 1+ αNG  ii  53

with αN = 0.5 defining the width of a Gaussian window about the meta-stable con-

formation 1. Similarly, the structural similarity to the conformation 2 described by

N2 {ri } is defined as

3αN (s − rN2 )2 ρ(2)[{C}, α ]=(1+ αNG )−3/2 exp − i i . (3.33) i 0 ii 2a2 1+ αNG  ii 

(1) (2) The structural similarity relative to the native structures given by {ρi } and {ρi } specify local order parameters suitable to describing conformational transitions be- tween metastable states in proteins.

To investigate the detailed main-chain dynamics controlling the structural change in CaM, we characterize the relative similarity to the closed structure along the tran- sition route through the normalized measure

(1) (1) (1) ρi (α0) − ρi (0) ρ¯i (α0)= (1) (2) , (3.34) ρi (1) − ρi (0)

(1) th where ρi (α0) is the monomer density of the i residue of conformation 1 (Eq. 3.32).

Similarly, we represent the relative structural similarity to conformation 2 as

(2) (2) (2) ρi (α0) − ρi (1) ρ¯i (α0)= (2) (2) , (3.35) ρi (0) − ρi (1)

(2) th where ρi (α0) is the monomer density of the i residue of conformation 2 (Eq. 3.33).

(1) (2) For conformation 1 (α0 = 1),ρ ¯i (1) = 1 andρ ¯i (1) = 0, while for conformation

(1) (2) 2 (α0 = 0),ρ ¯i (0) = 0 andρ ¯i (0) = 1. To represent the structural changes more

clearly, it is convenient to consider the difference,

(1) (2) ∆¯ρi(α0)=ρ ¯i (α0) − ρ¯i (α0) (3.36) 54

for each residue. This difference shifts the relative degree of localization to be between

∆¯ρi(1)=1 and ∆¯ρi(0) = −1 corresponding to conformation 1 and 2, respectively.

We also consider the global structural order parameters Q1(α0) and Q2(α0) to describe conformational transition from the average of the normalized native densities

(1) (2) ρ¯i (α0) andρ ¯i (α0) for conformations 1 and 2, respectively. Hence, we define

1 Q (α )= ρ¯(1)(α ), (3.37) 1 0 N i 0 i X and 1 Q (α )= ρ¯(2)(α ), (3.38) 2 0 N i 0 i X where N is the total number of residues present in conformation 1 or 2. Finally, we calculate the global order parameter for the structural similarity by averaging ∆¯ρi(α0) given by, 1 ∆Q(α )= Q (α ) − Q (α )= ∆¯ρ (α ). (3.39) 0 1 0 2 0 N i 0 i X ∆Q(α0) defines the global structural change from conformation 1 (α0 = 1) to 2

(α0 = 0).

3.2 Variational Model of Folding

The original variational model is used to characterize protein folding mechanism from

an unfolded (globule) to the folded (native) state [Fig. 3.3]. In this model the Hamil-

tonian, H, given in Eq. 3.1, defines the potential of a protein system. The interaction

potential Hint (Eq. 3.3) for folding is calculated using the contact map, [ij], of the

single protein conformation directly without any need of switching the contact energy

or coupling the energy basins for conformational transition. 55

The reference Hamiltonian H0 is also defined by Eq. 3.7 but for a single protein

N conformation {ri }. Then, following the similar procedure as for the conformational transition model, the variational free energy surface of folding, parameterized by the constraints {Ci} is calculated as,

F [{C}]= E[{C}] − TS[{C}]. (3.40)

Since H0 is again quadratic for the folding model, the averages over H0 can be expressed in terms of correlations G and average positions {si} of the monomers following the Eq. 3.11 and Eq. 3.15. Note that, the average positions {si} for folding

N is defined about the single native conformation, {ri } and need not to be interpolated between two native structures as in the conformational transition model.

3.2.1 Analysis of folding route

The folding route of a protein is determined by the saddle-points in F [{Ci}]. These saddle points are calculated numerically using an eigenvector-following algorithm [112].

This algorithm is similar to Newton’s method for optimization, but involves diago-

2 nalizing the Hessian matrix, ∂ F/∂Ci∂Cj, at each iteration. In this routine, the point is updated by stepping in a direction to maximize along the eigenvector with lowest eigenvalue and minimize along all others. To find a minimum, a step is taken to minimize along all eigenvectors of the Hessian. In order to use this algorithm, the

first and second order derivatives of the free energy with respect to {Ci} need to be calculated by following the chain rule in a very similar way as discussed for the conformational transition route analysis in the Section 3.1.5. The average folding 56

TSE Free energy

G N Reaction coordinate

Figure 3.3: An illustration of the free energy surface of a two-state folder at folding temperature. The native (N) (folded) state and globule (G) (unfolded) state are separated by a barrier correspond to the transition state barrier for some value of the reaction coordinate. The main goal is to characterize the protein structure at the TSE. route is characterized by a series of connected saddle-points and local minima in the variational theory. These average pathways are found as follows:

First the globule and native states are identified from the local minima with the largest and smallest entropy, respectively. [These are easy to identify since they are the only stable minimum at high and low temperatures for the globule and native state, respectively] These two minima are used as the initial guesses for the optimiza- tion algorithm with incremental temperature changes until these minima are found at the same temperature. Then linear combinations of these two sets of {Ci} are used as initial guesses to find the saddle points in the free energy. Once a saddle-point is found, the constraints {Ci} are perturbed along the unstable eigenvector and us- ing the eigenvector following algorithm with a small step size the closest minimum 57 can be found. This gives two local minima, one for each direction on the unstable eigenvector, connected by the saddle-point. This process is repeated until the globule and native state are connected by a series of local minima and saddle-points, which thereafter gives the average folding route, characterizing the transition states and local minima that are important in the folding kinetics.

3.2.2 Order parameters

The mean square fluctuations, {Bi} (in Eq. 3.11) is a natural local order parameter that can characterize the partially ordered residues along the folding route. The

{Bi} (also related to the experimentally determined X-ray B-factor or temperature factor) are conjugate to the constraints {Ci}, which correspond to large fluctuations in the unfolded (globule) state with the weak {C} about the localized positions of the average native structure. On the other hand, strong {C} gives relatively much lower fluctuations in the folded (native) state.

Alternatively, the structure at the saddle-points of the variational free energy surface can be characterized by a Gaussian measure to the native structure. This refers as native density

3 ρ = exp − αN(r − rN)2 (3.41) i 2a2 i i   0 3 αN(s − rN)2 = (1+ αNG )−3/2 exp − i i , ii 2a2 1+ αNG  ii  with αN =0.5 defining the width of a Gaussian window about the native conforma- tion.

This measure of the monomer density relative to the native position is the order 58

parameter to describe folding. The degree of native structure at the transition state

can be characterized by the normalized native density:

G N G ρ¯i =(ρi − ρi )/(ρi − ρi ) (3.42) here the superscripts G and N denote the fluctuations evaluated at the globule and native state, respectively.

The native density and temperature factors of each residue are local order pa- rameters that can describe local structure. However, in order to study the folding transition from globule (unfolded) state to native (folded) state, we also need a global order parameter which reflects the progress towards the native state. One way to de-

fine the global order parameter is the average of the normalized native density,

1 Q = ρ¯ , (3.43) N i i X where N is the total number of residues present in the protein. This progress coordi- nate ranges from Q = 0 (globule) to Q = 1 (native).

3.3 Model Parameters

The parameters which we use in the variational model are same for both folding and conformational transitions between two meta-stable states. The interaction potential is restricted to the contacts present only in the native structure. The set of native contacts [ij], is defined to be pairs of residues (i+4 ≤ j) that have Cβ (Cα for glycine)

distances within 6.5 A˚ cutoff in the folded native structure. In this set residues pairs

that are likely to have hydrogen bonds [as determined by the DSSP (Define Secondary

Structure of Proteins) algorithm [113]] but fall outside the cutoff are also included. 59

3

2.5

2

1.5

1

U(r) 0.5

0

-0.5

-1

-1.5 0 0.5 1 1.5 2 2.5 3 3.5 4 r [a]

Figure 3.4: The illustration of potential well used in the variational model. See the text for more detail.

The parameters for the interaction potential u(r) are chosen so that it has a

∗ ∗ minimum at r = 1.6a with value uij(r ) = −1 formed by the long-range attractive

interactions (γl = −6.0, αl =0.27) and intermediate-range repulsive interaction (γi =

9.0, αl = 0.54) as in Ref. [27]. Excluded volume interactions are represented by a

short-range repulsive potential with αs = 3.0 and γs is chosen so that each contact

has uij(0)/0 = 100, where 0 = kBT is the basic energy unit of the Miyazawa-

Jernigan scaled contacts. [110]. The interaction potential u(r) is plotted in Fig. 3.4

using Eq. 3.4 with the parameters mentioned above in this section. CHAPTER 4

THE OPEN/CLOSED CONFORMATIONAL TRANSITION OF THE

N-TERMINAL DOMAIN OF CALMODULIN

4.1 Introduction

Many protein functions fundamentally depend on structural flexibility. Complex con- formational transitions, induced by ligand binding for example, are often essential to proteins participating in regulatory networks or catalysis. More generally, a protein’s ability to sample a variety of conformational sub-states implies that pro- teins have an intrinsic flexibility and mobility that influence their function [15, 16].

While experimental measurement can offer direct dynamical information about spe- cific residues, uncovering the detailed mechanisms controlling conformational transi- tions between two meta-stable states is often elusive. In this chapter2, we particularly examine the open/closed conformational transition mechanism of the N-terminal do- main of Calmodulin (nCaM) to explore how calcium binding and target recognition can be understood by changes in the mobility and the degree of partial order of the protein backbone.

nCaM contains a pair of EF-hands (helix-loop-helix Ca2+-binding motif) made of helices A/B and C/D. These two EF-hands are connected by a flexible B/C helix- linker (see Fig. 4.1). The four helices of apo-nCaM are directed in a somewhat

2This chapter has been adapted from Tripathi and Portman Ref. [114].

60 61 antiparallel fashion giving the domains a relatively compact structure while leaving the Ca2+-binding loops exposed. The conformational change induced by binding Ca2+ can be described as a change in EF-hand interhelical angle (between helices A/B and

C/D) from nearly antiparallel (apo, closed conformation) to nearly perpendicular

(holo, open conformation) orientation. Further this domain opening mechanism in nCaM indicates that binding of Ca2+ occurs almost exclusively within EF-hands, not between them [115]. The structural rearrangement from closed to open exposes a large hydrophobic surface rich in Methionine (Met) residues responsible for molecular recognition of various cellular targets such as myosin light chain kinase.

The conformational dynamics of Ca2+-loaded and Ca2+-free CaM are well char- acterized by solution NMR [62, 116]. Site specific internal dynamics monitored by model free order parameters S2, indicate that the helices of the apo-CaM domains are well-folded on the picosecond to nanosecond timescale, while the Ca2+-binding loops, helix-linker and termini are more flexible [117]. On the other hand, spin- spin relaxation (or transverse auto-relaxation) rates, R2, indicate that the free and bound forms of the regulatory protein exchange on the millisecond timescale [118].

Akke and coworkers have investigated the rate of conformational exchange between the open and closed conformational substrates of C-terminal CaM (cCaM) domain by NMR 15N spin relaxation experiments [119]. Comparison of exchange rates as a function of Ca2+ concentration have established that the conformational exchange in apo-cCaM involves an equilibrium switching between the closed and open states that is independent of Ca2+ concentration [117]. 62

C Loop II D

B Loop I Loop II

D C A (a)

C

Loop II B

B Loop I Loop I

D

FKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARK A

(b) (c) A LTEEQIAE

Figure 4.1: The N-terminal domain of calmodulin (nCaM). (a) The Ca2+-free (apo, closed) structure, PDB code 1cfd. (b) The Ca2+-bound (holo, open) structure, PDB code 1cll. (c) The secondary structure of nCaM is shown with one letter amino acid sequence code for residues 4-75. The secondary structure of nCaM is as follows: helix A (5–19), Ca2+-binding loop I (20–31), helix B (29–37), B/C helix-linker (38–44), helix C (45–55), Ca2+-binding loop II (56–67), helix D (65–75). Note that, the last three residues of the binding loops I and II are also part of the exiting helices B and D. There are short β-sheet structures in binding loop I (residues 26–28) and loop II (residues 62–64). 63

X-ray crystallography temperature factors give additional insight into the con-

formational freedom and internal flexibility of CaM in the open and closed state.

Recently, Grabarek proposed a detailed mechanism of Ca2+ driven conformational change in EF-hand proteins based on the analysis of a trapped intermediate X-ray structure of Ca2+-bound CaM mutant [120]. This two-step Ca2+-binding mechanism is based on the hypothesis that Ca2+-binding and the resultant conformational change in all two EF-hand domains is determined by a segment of the structure that remains

fixed as the domain opens. This segment, called the EF-hand-β-scaffold, refers to the bond network that connects the two Ca2+ ions. It includes the backbone and the two hydrogen bonds formed by the residues in the 8th position of binding loops

(Ile27 and Ile63) and the C=O groups of the residues in the 7th position of the bind- ing loops (Thr26 and Thr62) [73]. Indeed, in the absence of Ca2+, the N-terminal end of the binding loop is found to be poorly structured and very dynamic from

NMR structures [61,119,121] and X-ray temperature factors [120]. Functional dis- tinction between the two ends of the binding loops in the domain opening mechanism is buttressed by the great variability of the amino acid sequences of the N-terminal ends of the Ca2+-binding loops compared with the more conserved C-terminal ends across a variety of different EF-hand Ca2+-binding proteins [73]. In addition to ex- tensive experimental work characterizing the inherent flexibility of CaM, our results also benefit from all atom molecular dynamics simulations [122,123] as well as re- cent coarse-grained simulations inspired by models developed to characterize protein folding [124,125]. 64

4.2 Methods

We use residues numbered 4-75 of unbound nCaM [apo (closed), PDB:1cfd] and bound

nCaM [holo (open), PDB:1cll]. These structures are shown in Fig. 4.1. The coor-

dinates of the open/closed structure was rotated to minimize the rmsd of α-carbons between the two structures [126]. We note global alignment has the risk of possibly obscuring or averaging out some local structural differences. The temperature T for the open/closed transition is taken to be the folding temperature (Tf ) of the open

structure with kBTf /0 = 2.0. For comparison, the closed structure is more stable

with a lower folding temperature, kBTf /0 = 1.9. Here, 0 is the basic energy unit

of the Miyazawa-Jernigan scaled contacts [110]. The folding temperatures of the two

states of nCaM are obtained from the variational model of folding (Section 3.2 in

Chapter 3).

Here, we study the closed → open conformational transition of nCaM using our

coarse-grained variational model of protein conformational transition (see Section 3.1

in Chapter 3). In our model, we have defined closed nCaM as conformation 1 and

open nCaM as conformation 2 in Eq. 3.8. Thus, the interpolation parameter α0 =1

corresponds to the closed state, and α0 = 0 corresponds to the open state. This

model accommodates the folded closed and open conformations of nCaM as minima

of the calculated free energy surface. The energy of the conformational switching from

closed to open state transition is modeled using the approach in Eq. 3.29, discussed

in the Subsection 3.1.4. In this approach the contacts of nCaM are separated as

follows: (i) contacts belong exclusively to closed state of nCaM, (ii) contacts belong 65

exclusively to open state of nCaM, and, (iii) contacts common to both the open and

closed state of nCaM. Then we use Eq. 3.29 to switch energy of contacts for type (i)

and (ii).

The natural order parameters of this model, discussed in detail in the Subsec-

tion 3.1.6, is well suited to describe partially ordered ensembles essential to the confor-

mational dynamics of flexible proteins. Transition routes and conformational changes

of the protein are determined by constrained minimization of a variational free energy

surface parameterized by the degree of localization of each residue about its mean

position. The computational time to calculate the transition route for nCaM is on

the order of several minutes on a typical single-processor PC.

4.3 Conformational Flexibility and Calcium Binding

The local mean square fluctuations of α-carbon positions (related to the temperature factors from X-ray crystallography) are a natural set of order parameters for the

2 reference Hamiltonian H0 (Eq. 3.7) in our model. This parameter, Bi(α0) = hδri i0,

(as shown in Eq. 3.31) contains information about the degree of structural order and

conformational flexibility of each residue. In Fig. 4.2 we have plotted Bi(α0) versus

sequence number at different values of α0, the parameter that controls the uniform

interpolation between the open structure (α0 = 0) and the closed structure (α0 = 1).

Fig. 4.3 shows the corresponding 3D structures of nCaM domain with the residues

colored according to Bi(α0). Aside from the very flexible ends of two terminal helices

A and D, the Ca2+-binding loops and the helix linker possess the highest flexibility. 66

The calculated fluctuations from our model exhibit very good qualitative agreement with X-ray temperature factors [120] and simulation results [124,127] of CaM.

2 closed (apo) 1 0.8 1.5 α 0.6 0 0.4 0.2 ] 0 2

a en

op (holo)

[ 1

i

B

0.5

0 0 10 20 30 40 50 60 70 80 residue index

A BC D Loop I Loop II

2 2 Figure 4.2: Fluctuations Bi(α0) = hδri i0 = Giia vs sequence index of nCaM for selected values of the interpolation parameter α0 in the conformational transition route between open and closed. Here a = 3.8A˚ is the distance between successive monomers. Different α0 are denoted by, red (α0 = 0) open; green (α0 = 0.2); blue (α0 = 0.4); magenta (α0 = 0.6); orange (α0 = 0.8) and black (α0 = 1) closed. The secondary structure is indicated below the plot.

4.3.1 Binding loops

Each EF-hand in CaM coordinates Ca2+ through a 12-residue loop: Asp20-Glu31 in loop I and Asp56-Glu67 in loop II. The C-terminal ends of the loops contain a short

β-sheet (residues 26-28 in loop I and residues 62-64 in loop II) adjacent the last three residues that are part of the exiting helices B and D, respectively. 67

(a) (b) C II II C

B B I I

D D A A α 0 = 1 α 0 = 0.4

(c) 2.0 C II 1.7 1.4 B I 0

1.1 (α ) i B D 0.8 0.5 A α 0 = 0 0.2

Figure 4.3: Change in fluctuations Bi(α0) in nCaM domain during the closed to open conformational transition. The 3D structure in (a) corresponds to the interpolation parameter, α0 = 1 (closed state); (b) corresponds to α0 = 0.4 (intermediate state) and (c) corresponds to α0 = 0 (open state). Red corresponds to low fluctuations and blue corresponds to high. 68

As shown in Fig. 4.2, the loops remain relatively flexible even in the open con-

formation. The highest flexibility is near the two Glycines in position 4 of the Ca2+- binding loops I (Gly23) and II (Gly59). This invariable Gly residue provides a sharp turn required for the proper geometry of the Ca2+-binding sites [127,128]. The linker between helices B and C is also very mobile, with the highest flexibility near residue

Glu45. Taken together, the mobility of the loops and B/C linker indicates that the do- main opening depends entirely on a set of inherent dynamics, or “intrinsic plasticity”, of CaM [62].

A closer look at the fluctuations of the Ca2+-binding loops reveals that the N- terminal part of each loop is more flexible than the C-terminal part. This agrees with

NMR data characterizing the flexibility of the N-terminal and C-terminal part of loop

III and IV of the C-terminal domain. [117, 119] In the transition route (from closed

→ open), the N-terminal ends of the loops stiffen gradually. On the other hand, in the C-terminal part of the loops the short β-sheet structure (residues 26-28 in loop I and 62-64 in loop II) remain rigid (see Fig. 4.2 and 4.3). Also the last three residues of the loops (residues 29-31 in loop I/helix B and residues 65-67 in loop II/helix D) remain relatively rigid, stabilized by the exiting helices B and D respectively [128].

This immobile region, the EF-hand β-scaffold, is central to a recent proposed mechanism for CaM [120] and other EF-hand domains [73]. Fig. 4.2 shows that residues Thr26 and Ile27 (in β-sheet of loop I) and Thr62 and Ile63 (in β-sheet of

loop II) remain very rigid during the domain opening.

It is also interesting to compare the relative flexibility of binding loop I and II. It 69

is clear that binding loop II is more flexible than loop I in the both conformations

[see Fig. 4.2 and 4.3(a)]. In particular, the connection between helix A and the

binding loop I is much more rigid than the connection between helix C and the

binding loop II. This large difference in flexibility suggests that binding loop II of

nCaM is more dominant in the mechanism for the structural transition. A similar

mechanism in C-terminal CaM domain was also observed from NMR studies, where

the Ca2+-dependent exchange contribution is dominated by binding loop IV with

lower S2 (higher flexibility) than loop III [117].

4.3.2 Helices B and C and the B/C linker

Fig. 4.2 and Fig. 4.3 also show that the bottom part of helix C (close to B/C helix

linker) is very flexible in apo nCaM. Upon opening, the flexibility of helix C decreases

significantly. Note the change in color from blue to white [Fig. 4.3(a)-(c)] at the

bottom part (close to B/C helix-linker) of helix C and from white to red at the

middle part of helix C. In contrast, the top part of helix B (close to binding loop I;

residues 29–31) becomes more flexible than the bottom part of helix B (close to B/C

helix-linker; residues 32–37) during closed to open transition (see Fig. 4.2). We also

note that residues 37–42 of the B/C helix-linker shows significant increase in flexibility

during opening of the domain. This change in flexibility of the B/C helix-linker helps

facilitate the concerted reorientation of helices B and C during the closed → open transition. Similar behavior was also observed in molecular dynamics simulation of

CaM [122] for this six-residue (residues 37–42) segment. 70

4.4 Conformational Transition Mechanism

The results discussed in the previous section gives a picture of the closed to open

transition with good overall agreement with experiment and simulation results on

an isolated apo-CaM domain. Nevertheless, the analysis has focused primarily on

the difference in the magnitude of fluctuations of the two meta-stable states. We

now turn our attention to the predicted transition mechanism and qualitative nature

of structural changes along the transition route. Such a description includes: along

the transition route from closed to open, what structural changes are predicted to

occur early/late, and which are predicted to happen gradually/cooperatively. While

such details have yet to be revealed directly through measurement, in principle, site-

directed mutagenesis experiments can be used to identify kinetically important struc-

tural regions of nCaM.

To clarify the transition route, we introduce a structural order parameter that measures the similarity to the open or closed state, ∆ρi(α0) given in Eq. 3.36. This

order parameter is defined so that ∆ρi(α0) = 1 corresponds to the closed conformation

(α0 = 1) and ∆ρi(α0)= −1 corresponds to the open conformation (α0 = 0) of nCaM

domain. Fig. 4.4 illustrates the conformational transition in nCaM domain in terms

of ∆ρi(α0) for each residue. An alternative representation of the same data is shown in Fig. 4.5; here, the value of ∆ρi(α0) is represented as colors ranging from red

[∆ρi(α0) = −1] to white [∆ρi(α0) = 0] to blue [∆ρi(α0) = −1] superimposed on the interpolated structure for selected values of α0.

We first notice that an early transition in the binding loops and in the central 71

1 closed (apo) 0.5 1 0.8

i 0.6 0 ρ α 0 ∆ 0.4 0.2 -0.5 0

-1 open (holo) 4 10 20 30 40 50 60 70 75 residue index

A BC D Loop I Loop II

Figure 4.4: Difference between the normalized native density ∆ρi (a measure of struc- tural similarity) of each residue for different α0. The change in color from red to blue is showing the closed → open conformational transition of nCaM. This is normalized to be −1 at the open state minimum (α0 = 0; blue) and 1 at the closed state mini- N mum (α0 = 1; red). Below the secondary structure of nCaM is shown. Here, α in Eq. 3.32 and Eq. 3.33 is 0.5. region of helix C evident in Fig. 4.4. [See also the gradual change in color from blue to red in the structures of Fig. 4.5(a)-(d).] We also note the concerted structural change of parts of helices B and C and flexible B/C helix-linker (residues 31–49).

In particular, the flexible B/C helix-linker (residues 38-44) in Fig. 4.4 exhibits a cooperative transition. Residue Gln41 which is located in this linker region is highly mobile according to NMR data [61, 121]. The change in color from red to blue in the B/C helix linker in Fig. 4.5(a) and (b) indicates that the structural transition of the N-terminal part (close to helix B) of this linker occurs earlier than its C-terminal part (close to helix C).

Fig. 4.4 and Fig. 4.5 also show a delayed initiation of structural change in residues

4–7 of helix A, residues 27–30 of binding loop I and N-terminal part of helix B. 72

Specifically, the residues near the top part of helix B (close to binding loop I) and in

binding loop I, have very little structural change at the beginning of domain opening,

with a sharp, cooperative transition near the end. [See the relatively slow color

change (from red to blue) in this part of helix B and binding loop I in Fig. 4.5(a)-

(d).] Although, the middle part of helix C (residues 50–52) has some limited structural

change early in the transition, it remains quite immobile after that. [See Fig. 4.4 and

the early color change from red to blue in Fig. 4.5.]

4.4.1 Binding loops I and II

Because of the central importance of the interactions between the binding loops in

the recently proposed two-step Ca2+-binding mechanism, this EFβ-scaffold region is highlighted in Fig. 4.6. In the first step of this binding mechanism, the Ca2+ is

immobilized by the structural rigidity in the plane of β-sheet and the ligands from

N-terminal part of the binding loops. In the second step, the backbone torsional

flexibility of the EFβ-scaffold enables repositioning of the C-terminal part of the

binding loop together with the exiting helix (helix B in loop I and helix D in loop

II) [73]. Since the Ca2+ ions are not included in our model and we can not characterize

backbone torsional flexibility of the EFβ-scaffold, our analysis is independent of that

developed in Ref. [73, 120]. The closed to open conformational transition of each

binding loop is quite different in Fig. 4.6. We predict that the structural changes in

binding loop II occur before binding loop I upon domain opening (see the relatively

slow color change from red to blue in binding loop I than loop II in Fig. 4.6). Since 73

C C II II

B B I I -1 D D α = 0. -0.5 0 8 α 0 = 0.6 A A (a) (b) i

ρ

0 ∆

C B 0.5

I I B 1 C II II

α = 0.4 D α = 0. D 0 0 2 (c) A (d) A

Figure 4.5: Closed to open conformational transition of nCaM with different inter- polation parameter α0. The 3D structure in (a) corresponds to the interpolation parameter, α0 = 0.8; (b) corresponds to α0 = 0.6; (c) corresponds to α0 = 0.4 and (d) corresponds to α0 =0.2. The change in color from red to blue corresponds to dif- ferent values of normalized native density ∆ρi(α0) (a measure of structural similarity) of each residue for different α0. Red corresponds to ∆ρi(α0) = 1 (closed conforma- tion, α0 = 1) and blue (open conformation, α0 = 1) corresponds to ∆ρi(α0) = −1. 74

the flexibility of binding loop II is also greater, this suggests that during Ca2+-binding

process the loop II dominates the overall conformational change between the closed

and open state. This agrees with results based on the all atom molecular dynamics

simulations of nCaM discussed by Vigil et al. [123].

II II II

I I I -1 (a) 0.9 (b) 0.8 (c) 0.7 -0.5

II II II i

0 ρ ∆

I I I 0.5 (d) 0.6(e) 0.5 (f) 0.4

1 II II II

I I I (g) 0.3(h) 0.2(i) 0.1

Figure 4.6: Comparison of structural change in binding loops I (in bottom) and II (in top) in terms of the order parameter ∆ρi(α0). The 3D structures in (a)-(i) corresponds to the interpolation parameter, α0 = 0.9 -0.1 during the closed to open transition. The change in color from red to blue corresponds to different values of ∆ρi(α0) (a measure of structural similarity) of each residue. Red corresponds to ∆ρi(α0)=1 (closed conformation, α0 = 1) and blue (open conformation, α0 = 0) corresponds to ∆ρi(α0)= −1.

Fig. 4.6 also shows that the N-terminal ends of the loops have relatively an early

transition compared to the C-terminal ends. Furthermore, the conformation change

of the C-terminal end of binding loop I is more cooperative, presumably relying on 75 the earlier structural change in binding loop II. Specifically, the closed state structure residue in position 9 (Thr28) of the loop I is very stable as shown in Fig. 4.7(a). This is due to a hydrogen bonding between Thr28 and Glu31. Fig. 4.7(a) also suggests that the structural change of Glu31 occurs before Thr28 upon domain opening, and proceeds through the transition much more gradually. Similar hydrogen bonding is also present between Asn64 and Glu67 in binding loop II. Nevertheless, compared to the corresponding residues in loop I, the structural change of these two residues is quite gradual [see Fig. 4.7(a)]. Nevertheless, Asn64 does seem to have a somewhat sharper transition than Glu67. Finally, residues Gly61 and Thr62 in binding loop II exhibit little structural change in Fig. 4.6 as the domain begins to open.

4.4.2 Methionine residues

The large hydrophobic binding surfaces that open in both domains of CaM are especially rich in Methionine residues, with four Methionines in each domain occupy- ing nearly 46% of the total hydrophobic surface area [115]. These side chains as well as other aliphatic residues, such as Valine, Isoleucine and Leucine, which make up the rest of the hydrophobic binding surface are highly dynamic in solution [129]. The

flexibility of the residues composing hydrophobic binding surface for target peptides explains CaM’s high degree of binding promiscuity. Here we consider the main-chain

flexibility. The four Methionine residues in nCaM are situated in position 36, 51,

71 and 72. The closed to open structural transition of residues Met36 and Met71 are similar and relatively sharp compared to residue Met72 which is quite gradual as 76

1 (a)

0.5 i

ρ 0 ∆

-0.5 Thr28 Glu31 Asn64 Glu67 -1 0 0.2 0.4 0.6 0.8 1 α 0 1 (b)

0.5 i

ρ 0 ∆

-0.5 Met36 Met51 Met71 Met72 -1 0 0.2 0.4 0.6 0.8 1 α0

Figure 4.7: Dynamical behavior of residues during conformational transition of nCaM. The normalized native density difference ∆ρi(α0) vs α0 are shown for four different group of residues. Structural transition of (a) residues in position 9 (Thr28 and Asn64) and position 12 (Glu31 and Glu67) of the two binding loops; (b) four hydrophobic Methionine residues in positions 36, 51, 71 and 72. 77

shown in Fig. 4.7(b). This suggests that residues Met36 and Met71 remain relatively

buried in the beginning of the domain opening. Curiously, from Fig. 4.7(b) residue

Met51 in the middle part of helix C at α0 =0.5, shows sudden increase in ∆ρi during

closed to open conformational change.

4.5 Conformational Transition Rate and Order Parameters

The one dimensional free energy profile parameterized by the interpolation parameter

α0 is shown in Fig. 4.8. The minimum corresponding to the open state is very shallow

and unstable compared to the closed state. Combined molecular dynamics simulations

and small angle X-ray scattering studies on apo nCaM and Ca2+-bound nCaM by

Vigil et al. [123] have also shown that in aqueous solution the closed state dominates the population. The equilibrium populations for the closed and open state from our model are found to be 94% and 6% respectively. For comparison, the NMR measurement of apo cCaM indicate a minor population of 5–10% [117]. These results suggest that on average, the residues in the hydrophobic surface of CaM are well protected from solvent.

The maximum of the free energy occurs quite close to the open state at α0 ∼ 0.2,

though the barrier is very broad in terms of this reaction coordinate. We also consider

the free energy of the global structural parameter ∆Q(α0) = Q1(α0) − Q2(α0) =

i ∆ρi(α0)/N, where ∆ρi(α0) is given in in Eq. 3.36 and N is the total number of P residues in nCaM. Fig. 4.8 shows that ∆Q(α0) is also a reasonable reaction coordinate

for the transition. The barrier broadens somewhat, with the maximum free energy 78

∆Q -1 -0.5 0 0.5 1 4

open 3

2

F/kT

1 closed

0 0 0.2 0.4 0.6 0.8 1 α 0

Figure 4.8: Free energy along the transition route. In the lower curve the abscissa is the interpolation parameter α0. In the upper curve the abscissa is the global structural order parameter ∆Q(α0). The entropy across the transition is relatively constant, so that the free energy barrier is largely energetic. 79 occurring around ∆Q(α0) = −0.25. In terms of the global structure, this roughly corresponds to 60%–75% of nCaM being similar to open state configuration in the transition state ensemble.

Even though the open state minimum is not well isolated, we estimate the confor-

† −∆F /kB T mational transition rate from closed to open using the Arrhenius form, k = k0e where ∆F † is the free energy difference between the closed conformation and transition-

−1 −1 state ensemble. Assuming the prefactor k0 =1µs gives the estimate k = 40, 000s .

This value is in reasonable agreement with the transition rate estimate of k =

20, 000s−1 based on NMR exchange rate data of cCaM [117].

4.6 Conclusion

In this chapter, we study the intrinsic flexibility and structural change of nCaM during its open to close transition. The predicted transition route from our model gives a detailed picture of the interplay between structural transition, conformational

flexibility and function of N-terminal calmodulin (nCaM) domain. The results from our model are largely consistent with the important role that the immobile EFβ- scaffold region plays in the transition mechanism. Dissection of the transition route of this region further suggests that it is the early structural change of loop II that drives the cooperative completion of the interactions between the loops in the open structure.

The strong qualitative agreement with available experimental measurements of

flexibility is an encouraging validation of the model. Recently, the folding dynamics 80

of zinc-metallated protein (azurin) was studied using a similar variational model and

compared with experiments for the detail coordination reaction coupled with the

entatic state [130]. A similar future study of detailed coordination reaction for the

complete description of conformational change stabilized by ion binding in CaM seems

very promising. Ultimately, we wish to extend this model to investigate the binding

mechanism and kinetic paths of several peptides to Ca2+-loaded CaM. Since large conformational changes coupled to binding depends fundamentally on the fluctuations of partially folded conformations [131], this polymer based variational formalism can accommodate coupled folding and binding very naturally. CHAPTER 5

CONFORMATIONAL TRANSITION MECHANISMS OF THE

EF-HANDS OF CALMODULIN

5.1 Introduction

To understand protein function, it is often essential to characterize large conforma-

tional changes that occur upon ligand binding. Within the population-shift mech-

anism of allosteric conformational change [132], the bound complex is formed when

a ligand selects and stabilizes a weakly populated conformational ensemble from the

kinetically accessible states of the unbound folded protein. Consequently, the ki-

netics of large scale conformational transitions between two meta-stable states are

determined largely by the inherent conformational dynamics within the folded state

free energy basin. Such conformational dynamics imply an inherent flexibility or

“intrinsic plasticity” of the folded state. In this chapter3, we focus on how this inher-

ent flexibility of a protein molecule influences the mechanism controlling the kinetics

of allosteric transitions. There are a couple of possible scenarios for the transition

mechanism in terms of changes in conformational flexibility. One possibility is that

the flexibility adjusts smoothly to the conformation deformation between two specific

meta-stable states in the free energy surface. In this case, the inherent flexibility of

3This chapter has been adapted from Tripathi and Portman Ref. [133].

81 82 the protein remains relatively constant if the meta-stable states have similar flexi- bilities or else changes smoothly between the flexibilities of the meta-stable states if the flexibilities are different. Another possible mechanism, called “cracking” [10], has recently been proposed as an alternative mechanism that may be important in some conformational transitions. In terms of conformational flexibility, cracking involves non-monotonic changes in flexibility along the transitions route. In particular, the

flexibility of specific regions of the protein may transiently increase through local un- folding. As explained in [10], local unfolding can relieve specific areas of high stress during conformational deformations that would result in high free energy barriers if the protein remained uniformly folded throughout the transition.

In the model developed to explore this idea, cracking is introduced directly into the formalism as means to incorporate nonlinear elasticity in an otherwise harmonic description of conformational fluctuations (i.e., normal modes) [10, 94]. This work clearly shows that local unfolding can dramatically lower predicted free energy barriers between local free energy minima and hence facilitates faster kinetics. In contrast, the variational model developed in the present work does not assume cracking from the outset. The evidence of cracking here arises entirely from the analysis of the local changes in flexibility along transition routes predicted by an inherently non-linear model of conformational transitions. Our results are thus an independent verification of the ideas and formalism developed in the model presented in Refs. [10] and [94].

Cracking does not occur in all conformational transitions, but even when it does, the average conformations of the local minima in the free energy may show little 83

signs that local unfolding is involved [10]. Instead, cracking is a subtle consequence

of the nature of the conformational deformation required to connect the two states.

Since cracking is akin to folding of the whole protein albeit on a constrained and

local scale, it is reasonable to ask if some insight appropriated from the success of the

energy landscape theory of protein folding [109] can help to anticipate when cracking

is likely to be important in conformational transitions between two distinct folded

conformations.

One motivation for the work in this chapter is to investigate if the cracking mech- anism of conformational transitions, like the mechanism for folding of two state pro- teins [134], is determined by structural topology of the two meta-stable states. To address this question we study the conformational transitions of the open and closed conformations of the two homologous domains of calmodulin (CaM) as well as a fragment of CaM involving parts of both domains. Aside from general theoretical im- plications, the results presented in this chapter are interesting from the point of view of understanding the well-known and intensely studied thermodynamic differences between the two domains of CaM [135,136].

Our approach is based on a coarse-grained variational model previously developed to characterize protein folding [26]. This model is in harmony with several recent coarse-grained simulations based on a folded-state biased interaction potential that interpolates between the contact maps of the meta-stable states [11,95,97,100,124,

137]. Interestingly some of these recently proposed coarse-grained model for protein conformational transition can also capture the cracking or local partial unfolding 84

II II B C I I C

D B D A A

Apo-nCaM (a) Holo-nCaM

IV G IV F F III III G H H E E

Apo-cCaM (b) Holo-cCaM

II II

C C D D

E III E F III F

Apo-CaM2/3 (c) Holo-CaM2/3

Figure 5.1: Three dimensional structures of calmodulin (CaM) domains and EF-hands 2 and 3 fragment. The apo-CaM and holo-CaM structures shown here are correspond to human CaM with PDB ID code 1cfd and 1cll, respectively. (a) The closed, apo and open, holo conformations of N-terminal domains of CaM (nCaM) consist of helices A/B and C/D with binding loops I and II respectively. (b) The closed, apo and open, holo conformations of C-terminal domains of CaM (cCaM) consist of helices E/F and G/H with binding loops III and IV respectively. (c) The apo and holo conformations of EF-hands 2 and 3 of CaM (CaM2/3) consist of helices C/D and E/F with binding loops II and III, respectively. 85

during transition [11, 100, 137]. In this chapter we compare the detailed predicted

transition routes for the open/closed conformational change of the C-terminal domain

of Calmodulin (cCaM) with that of the N-terminal domain of CaM (nCaM). The open

and closed state of each domain is shown in Fig. 5.1(a–b). Each domain of CaM

contains a pair of helix-loop-helix Ca2+-binding motifs called EF-hands. In nCaM,

EF-hands 1 and 2 consist of helices A/B and C/D (Fig. 5.1(a)), respectively, whereas helices E/F and G/H in cCaM (Fig. 5.1(b)) form EF-hands 3 and 4, respectively.

Hence, the nCaM and cCaM are known as “odd-even” paired EF-hands due to their odd-even sequential numbering of EF-hands.

Although, nCaM and cCaM have similar structures, the calculated transition routes predict that the cracking is essential only in cCaM. It is well established that the two domains differ in flexibility and binding affinity for Ca2+ [135, 136]. These properties are not determined by topology (because they are the same) but ultimately by more subtle structural differences encoded in the sequence of the two domains. Our results suggest that nCaM is flexible enough that cracking is not necessary to relieve stress during the conformational change, while the cCaM is relatively rigid so that cracking is involved in essentially the same conformational change. This is the central result of this chapter.

As an additional illustration of modeling conformational transitions in CaM, we investigate the conflicting Ca2+ binding mechanism in the odd-even paired EF-hands

(nCaM and cCaM) and “even-odd” paired EF-hands (CaM2/3) of CaM. The frag- ment CaM2/3 contains EF-hands 2 and 3 with helices C/D and E/F [Fig. 5.1(c)], 86

respectively. CaM2/3 is known as even-odd paired EF-hands due to its even-odd

sequential numbering of EF-hands. In CaM2/3, an EF-hand from each domain is

connected by a flexible helical linker. The extended helical linker separating the two

domains shown in the X-ray structure of holo-CaM is known to be flexible in solution

[138,139], making holo-CaM structurally more flexible overall than apo-CaM [61,140].

In contrast to cCaM and nCaM [140], CaM2/3 has a molten globule-like char-

acter in the absence of Ca2+, and folds into a structure similar to an intact domain

of CaM in the presence of Ca2+, as found from the NMR [141]. Accordingly, a re- alistic model of Ca2+ binding to CaM2/3 would have to treat the ion-binding and folding on equal footing as was done for folding Azurin in the presence of zinc ions in Ref. [130]. At the risk overly simplifying the binding mechanism of CaM2/3, we keep the focus of this chapter on conformational transition routes between two spec- ified conformations instead of binding induced folding. In particular, we take the holo-state of CaM2/3 from the X-ray structure of intact CaM as shown Fig. 5.1(c).

While this is a convenient way to define the conformation endpoint structures (and has a certain consistency with our studies of cCaM and nCaM), both the apo-CaM2/3 and the holo-CaM2/3 structures are expected to be low probability conformations of

CaM2/3 in solution: the apo-CaM structure assumed here is unstable to a molten globule state; and the holo-CaM structure inferred from the X-ray structure of CaM is unstable to a structurally compact domain like holo-cCaM or holo-nCaM [141].

Our study of the transition between the conformations of CaM2/3 defined by the

structures of the intact CaM suggests that the highly flexible central linker region 87 would locally unfold before gaining helical structure. Even though this model of the binding dynamics of CaM2/3 is based on low probability conformations in so- lution [138, 139], our predicted transition route can rationalize the sequential Ca2+ binding in CaM2/3 [141] in terms of the differences in conformational flexibility of

Ca2+-binding loops in CaM2/3. Finally, we address how our comparative study of conformational transitions in odd-even and even-odd paired EF-hands reveals the interplay between intrinsic plasticity, target binding affinity, and function of CaM.

5.2 Methods

The structures of the two conformations (apo and holo) of the transition for each protein system are rotated to have the same center of mass and minimum root-mean square deviation of the Cα positions. The conformational transition in cCaM is mod- eled from the residues 76–147 of apo-CaM (PDB ID code 1cfd) [61] and holo-CaM

(PDB ID 1cll) [142] structures, while the conformational transition in nCaM is mod- eled from the residues 4–75 of the same PDB structures. The apo to holo conforma- tional transition of the even-odd paired EF-hand motifs CaM2/3 are modeled from

EF-hands 2 and 3, residues 46–113 of apo-CaM (PDB ID code 1cfd) and holo-CaM

(PDB ID code 1cll) structures.

Here, we follow the same approach to study the apo/holo conformational change of each protein system that we discussed in the Methods Section 4.2 in Chapter 4. The folding temperatures of the nCaM and cCaM are found to be very similar, therefore, the transition temperature for both domains are chosen to be same, kBT/0 = 2.0, 88 where 0 = 0.6 kcal/mol in terms of Miyazawa-Jernigan energy scale [110]. For the

CaM2/3 fragment transition temperature is chosen to be kBT/0 = 1.3, close to the folding temperature of the apo-CaM2/3.

5.3 N-Terminal and C-Terminal Domains of CaM

In this section we analyze the conformational flexibility and the transition mechanism of the two domains of CaM. The results of the open/closed transition of the nCaM we present here is taken from our earlier study in Chapter 4.

5.3.1 Conformational flexibility of the CaM domains

We first compare the inherent flexibility of the two domains of CaM by calculating the mean-square fluctuations Bi(α0) (Eq. 3.31) for each residue in the nCaM and cCaM for the open/closed conformational transitions of each domain. The Bi(α0) (related to the temperature factors from X-ray crystallography) contain information about the degree of structural order and conformational flexibility of each residue. Fig. 5.2(a) shows Bi(α0) along the open/closed conformational transition for each residue of nCaM from our studies in the previous Chapter. The magnitude of the fluctuations for each residue of cCaM for the open/closed transition route is shown in Fig. 5.2(c).

Comparison of Fig. 5.2(a) and 5.2(c) show that although the binding loops and helix- linker in both domains are very flexible, the apo (closed)-nCaM is inherently more

flexible than apo (closed)-cCaM. For nCaM, the flexibility of the the binding loops I and II and the B/C helix-linker between EF-hands 1 and 2 decreases upon domain opening. Similarly, the flexibility of the Ca2+ binding loops III and IV decreases 89

nCaM cCaM A I BC II D E IIIF G IV H 2 2.5 no cracking cracking closed (apo) 2 1.5 1

2 1.5 0.8 1 0.6

[ a ] α

i 0 0.4 B 1 0.2 0.5 0.5 0 open (holo) 0 0 4 10 20 30 40 50 60 70 75 76 80 90 100 110 120 130 140 147 residue index residue index (a) (c) 1 1 closed (apo)

0.5 0.5 1

0.8 i

ρ α 0.6 ∆ 0 0 0 0.4 0.2 -0.5 -0.5 0 open (holo) -1 -1 4 10 20 30 40 50 60 70 75 80 90 100 110 120 130 140 147 residue index residue index (b) (d)

Figure 5.2: The fluctuations Bi(α0) vs residue index for selected values of the inter- polation parameter α0 in the conformational transition route between apo (α0 = 1) and holo (α0 = 1) structures. Here, a = 3.8 A˚ is the distance between successive monomers in our model. Parts (a) and (c) correspond to nCaM and cCaM, respec- tively. The secondary structures for nCaM and cCaM are indicated above each plot. Helices are represented by the rectangular boxes, binding loops and helix-linker are by lines and small β-sheets are by arrows. Parts (b) and (d) show the difference between the normalized native density ∆ρi(α0) (a measure of structural similarity) vs residue index for different α0 of nCaM and cCaM, respectively. In each plot the change in color from red to blue is showing the apo to holo transition for each conformation. This is normalized to be −1 at the holo state minimum (α0 = 0; blue) and 1 at the apo state minimum (α0 = 1; red). 90 considerably during the opening of cCaM. The fluctuations of the F/G helix-linker between EF-hands 3 and 4 has different behavior along the transition route. The

flexibility of this region of the protein increases to a relatively high flexibility in the intermediate states between the open and closed conformations before reducing the

flexibility of the folded open conformation [Fig. 5.2(c)]. We also note that the F/G helix-linker in the open structure is more flexible than in the closed structure. The calculated conformational flexibility suggests that unlike the B/C helix-linker in the apo (closed)-nCaM the F/G helix-linker in the apo (closed)-cCaM is relatively less

flexible. In contrast to the B/C helix-linker of nCaM, the increase and then decrease in flexibility of the F/G helix-linker during the domain opening of cCaM indicates cracking or local partial unfolding in this less flexible F/G helix-linker [See the change in fluctuations of the F/G helix-linker during domain opening in Fig. 5.2(c).]. The model also predicts that binding loop II in CaM has the highest flexibility compared to other binding loops. This result agrees with a recent molecular dynamics simulation study of CaM D129N mutant, where the binding loop II consistently exhibited higher mobility than the other three Ca2+-binding loops [127].

5.3.2 Cracking in the conformational transition of cCaM

To compare the conformational flexibility of the two CaM domains in more detail we have plotted the change in fluctuations Bi(α0) for some specific residues in Fig. 5.3.

Residue Asp118 of the F/G helix-linker from cCaM shows very different behavior relative to the rest of the residues in Fig. 5.3. Residue Asp118 shows highest Bi(α0) 91

2.6 Gly23 2.4 Glu45 Gly59 Gly96 2.2 Asp118 Gly132 2 ]

2 1.8 a

[

i 1.6 B 1.4 1.2 1 0.8 0 0.2 0.4 0.6 0.8 1 α0

Figure 5.3: Change in fluctuations Bi(α0) of the residues from binding loops and helix-linker of CaM domains along the open/closed conformational transition route for different α0. Here, a =3.8 A˚ is the distance between successive monomers in our model. Residues Gly23, Gly59 and Glu45 are from binding loop I, loop II and the helix-linker of the nCaM, respectively. While, residues Gly96, Gly132 and Asp118 are from binding loop III, loop IV and the helix-linker of the cCaM, respectively.

near the transition state at α0 =0.4, whereas Bi(α0) of other residues shown, decrease monotonically during closed to open transition. The increase in flexibility of the helix- linker region in cCaM near the transition state is caused by local transient unfolding or cracking. For further analysis of cracking in the cCaM we also compare the total contact pair potential energy for the ith residue

u¯i = ijuij, (5.1) i X using Eq. 3.29 (where ij is the Miyazawa-Jernigan energy scale [110] and uij is the pair contact energy), for residues Glu45 and Asp118 from the helix-linker of nCaM and cCaM, respectively. Fig. 5.4(a) shows that for residue Asp118,u ¯i increases in the transition state region, whereas,u ¯i for residue Glu45 from the helix-linker of nCaM 92

decreases in the transition state region. This striking difference inu ¯i implies that

the increase in contact energy of the F/G helix-linker during opening of cCaM leads

to some transient cracking in this region near the transition state, enhancing the

inherent flexibility of this linker during the open/closed conformational change. nCaM and cCaM CaM2/3 -1.52 -0.9 Glu45 Asp80 -1.56 Asp118 -1 -1.1

i

-1.6 i

u

u -1.2 -1.64 -1.3 -1.68 -1.4 -1.5 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 α 0 α 0

Figure 5.4: (a) Average pair potentialsu ¯i (in kcal/mol) of the residues from the helix- linker of CaM domains along the open/closed conformational transition route for different α0. Residue Glu45 and Asp118 are from the nCaM and cCaM, respectively. (b)u ¯i of the residue Asp80 from the central linker of CaM along the apo to holo conformational transition route for different α0.

5.3.3 Open/closed transition mechanism of the CaM domains

To elucidate the predicted conformational transition mechanisms of domain opening

in CaM, we consider a structural order parameter that measures the similarity to the

open (holo) or closed (apo) state conformations, ∆ρi(α0) (Eq. 3.36), described in the

Subsection 3.1.6 of Chapter 3. This order parameter is defined such that ∆ρi(α0)=1 corresponds to the closed (α0 = 1) conformation and ∆ρi(α0)= −1 corresponds to the open (α0 = 0) conformation of nCaM or cCaM. The transition routes illustrated by

∆ρi(α0) for each residue of nCaM and cCam are shown in Fig. 5.2(b) and Fig. 5.2(d), 93

respectively. Fig. 5.2(d) illustrates the conformational transition of cCaM in terms

of ∆ρi(α0) for each residue. As shown in Fig. 5.2(d), the model predicts that the structural change in binding loop IV occurs earlier than binding loop III during the domain opening of cCaM. [See the sharp change in color near residue number 130 in Fig. 5.2(d)]. Also, helix G has an earlier conformational transition than helix F

[Fig. 5.2(d)]. This sequence is quite different from nCaM as shown in Fig. 5.2(b). For nCaM the structural change in helix B is predicted to be earlier than the structural change in helix C for the closed to open conformational transition. Finally, when we compared the structural change in the helix-linker of the two domains, our results show F/G helix-linker in the cCaM has abrupt transition near the open state. [See the transition near residue number 115 in Fig. 5.2(d).] This may have some implication for cracking in the F/G helix-linker region of cCaM. In contrast, the transition in the

B/C helix-linker in nCaM is more gradual [see Fig. 5.2(b)].

5.4 CaM2/3 Fragment

5.4.1 Conformational flexibility and cracking of CaM2/3

The engineered even-odd EF-hands paired CaM2/3 fragment (composed of EF-hands

2 and 3) has been shown recently to have distinct transition characteristics from the odd-even EF-hands paired nCaM and cCaM [141]. NMR spectroscopy has shown that Ca2+-free (apo) CaM2/3 does not have a stable folded structure but rather shows characteristics of a molten globule state. Ca2+ binding induces the folding of this even-odd paired EF-hand motifs CaM2/3 and the Ca2+-bound (holo) CaM2/3 94

adopts a similar structure as holo-nCaM or cCaM [see Protein Data Bank (PDB)

ID code 2hf5] [141]. In this chapter, we study the transition between the two meta-

stable conformations, apo-CaM2/3 and holo-CaM2/3 taken directly from the folded

apo-CaM (PDB ID code 1cfd) and holo-CaM (PDB ID code 1cll), respectively, as

another example of flexibility influenced transitions in CaM. Here, we focus primarily

on the conformational flexibility and transition of the central linker to an α-helix

(even though this conformation has low probability in solution). The central linker of

CaM was also studied by MD simulation [139]. Although our model does not accom- modate the Ca2+-induced folding of CaM2/3 described in Ref. [141], the model does capture certain aspects of the the apo/holo conformational transition of this CaM2/3 fragment. In particular, the model predicts the apo-CaM2/3 to holo-CaM2/3 con- formational transition mechanism that agrees well with the sequential Ca2+-binding

mechanism of CaM2/3 suggested by the NMR measurements.

The magnitude of the fluctuations of each residue in CaM2/3 [shown in Fig. 5.5(a)]

suggests that the change in conformational flexibility of binding loops II and III of

CaM2/3 are very different. In particular, helix C and binding loop II are highly

flexible in the apo conformation of CaM2/3. The high intrinsic flexibility of apo-

CaM2/3 may account for the molten globule state characteristics described in Ref.

[141]. Fig. 5.5(a) also shows that binding loop II is more flexible than loop III. This

result is in harmony with the NMR measurements of Ca2+ binding in CaM2/3 [141]

that show Ca2+ binds to loop III with higher affinity (at lower Ca2+ concentration)

than loop II. Our results also indicate that the flexibility of binding loop III decreases 95

CaM-2/3 C II D E III F 3.5 closed (apo) 3 cracking 1 2.5 0.8 ]

2 2 0.6 α0 [ a 1.5 0.4 i

B 0.2 1 0 0.5 open (holo) 0 46 50 60 70 80 90 100 110 113 residue index (a) 1 closed (apo) 0.5 1 0.8

i 0.6 ρ α

∆ 0 0 0.4 0.2 -0.5 0 open (holo) -1 46 50 60 70 80 90 100 110 113 residue index (b)

Figure 5.5: Conformational transition of CaM2/3 fragment. The fluctuations Bi(α0) vs residue index for selected values of the interpolation parameter α0 in the confor- mational transition route between apo- and holo-CaM2/3 (a). Here, a = 3.8 A˚ is the distance between successive monomers in our model. The secondary structure of CaM2/3 is indicated above the plot. Helices are represented by the rectangular boxes, binding loops and helix-linker are by lines and small β-sheets are by arrows. The difference between the normalized native density ∆ρi(α0) (a measure of struc- tural similarity) vs residue index for different α0 of CaM2/3 (b). The change in color from red to blue is showing the apo to holo transition. This is normalized to be −1 at the holo state minimum (α0 = 0; blue) and 1 at the apo state minimum (α0 = 1; red). 96 monotonically along the transition route from the apo-CaM2/3 to the holo-CaM2/3 structure. In contrast, the flexibility of helix F increases along the transition route from apo to holo transition of CaM2/3 fragment. In particular, N-terminal part

(close to binding loop III) of helix F partially unfolds when adopting the holo-CaM2/3 conformation. We also notice in Fig. 5.5(a) that the domain linker between EF-hands

2 and 3 (residue number 74-81) during the apo to holo transition has a relatively large change in conformational flexibility. The fluctuation amplitude Bi of this linker increases in the intermediate states and then decreases abruptly as the helix forms near the holo state. The local unfolding of the domain linker enhances the flexibility of this linker region dramatically, first relaxing in structure and then stabilizing in the holo-CaM2/3 structure. This local unfolding signaled by increase and decrease in

flexibility is similar to the cracking exhibited in cCaM.

Local intermediate unfolding (i.e., cracking) of the domain linker during the apo to holo transition of CaM2/3 can also be seen through change in the pair potential energy u¯i for the Aspartic (Asp) residue from the linker at position 80 along the transition route as shown in Fig. 5.4(b). During the apo to holo transition of CaM2/3 the pair contact energy of residue Asp80 increases from the apo-CaM2/3 structure (α = 1) and decreases abruptly at the transition state (α0 = 0.4) to the minimum at holo-

CaM2/3 structure (α0 = 0) as the helix forms . This increase and then sharp decrease in the contact energy of residue Asp80 signals cracking (local unfolding and refolding) of the linker in CaM2/3 to a stable α-helix. We note also that the energy relative to the holo-CaM2/3 energy of Asp80 is higher than that of Asp118 from the F/G 97

helix-linker of cCaM. This emphasizes the importance of cracking in the apo-CaM2/3

to holo-CaM2/3 transition.

5.4.2 Conformational transition mechanism of CaM2/3

For CaM2/3, the order parameter ∆ρi(α0) = 1 corresponds to the apo state and

∆ρi(α0)= −1 corresponds to the holo state of CaM2/3, using Eq. 3.36 in chapter 3.

The abrupt and early change in ∆ρi = 1 for residues in the domain linker (sequence number 74–81) illustrated in Fig. 5.5(b) clearly shows the transition of the flexible inter domain linker to a rigid α-helix as discussed above in terms of the fluctuations

{Bi}. Fig. 5.5(b) also indicates that the structural change in binding loop III is initiated earlier than the structural change in binding loop II. Also, the conformational change in binding loop II is much more gradual than that of binding loop III. Similar to the identification of the Ca2+ binding affinity with the changes in inherent flexibility, the early structural change in binding loop III may imply the stepwise binding of

Ca2+ in CaM2/3 with loop III having a higher binding affinity than loop II.

5.5 Discussion

CaM, a small (148-residue) Ca2+-binding protein with very high plasticity, may be an ideal system to demonstrate flexibility influenced conformational transitions. The protein consists of structurally similar N- and C-terminal globular domains connected by a flexible tether also known as central or interdomain linker [138]. The Ca2+-

induced structural rearrangement in CaM result in the solvent exposure of large hy-

drophobic surface responsible for molecular recognition of various cellular targets [60]. 98

Although similar in structure and fold, the two CaM domains are quite different in terms of their flexibility, melting temperatures, and Ca2+-binding affinities [135,136].

In the two homologous (46% sequence identity) domains of CaM, Ca2+ binding occurs sequentially, first, in the binding sites of cCaM and then in the binding sites of nCaM [136]. Despite of large cooperativity in the Ca2+-binding process within each domain the two domains reflect different Ca2+ and target affinities. The N-terminal pair of EF-hands binds to Ca2+ ions with much lower affinity than the C-terminal EF- hands pair [136]. NMR [62,119] and molecular-dynamics MD simulation [123] studies have shown that the Ca2+-bound nCaM is considerably less open than the cCaM. This was not observed in the X-ray crystal structure of the protein. Experimentally, it has been shown that the more-conserved cCaM has a greater affinity for Ca2+ and some

CaM targets, whereas the nCaM is less specific in its choice of target motif [143,144].

Heat denaturation studies have shown that cCaM of Ca2+-free (apo) CaM starts to denature slightly above the physiological temperature [135]. The denaturation of the apo-cCaM was observed at lower concentration of denaturant than denaturation of the nCaM, while the order was reversed for Ca2+-CaM [145]. From temperature-jump

fluorescence spectroscopy by Rabl et al. [102] the instability of the cCaM was also observed from the study of unfolding of apo-CaM. They suggested that the cCaM was partially unfolded at native conditions. Recent NMR experiments done by Lundstr¨om and Akke [103] monitoring relaxation rates involving 13Cα spins in adjacent residues of E140Q mutant of cCaM revealed transient partial unfolding of helix F. This was interpreted as a global exchange process involves a partially unfolded minor state 99

that was not detected previously [119, 146]. A very recent conformational dynamics

simulation of the cCaM by Chen et al. [125] has shown presence of an unfolded apo- state. These observations further suggest that the conformational exchange may be more complex than a simple two-state process.

Differences in inherent flexibility of the two domains can account for their distinct physical characteristics and the complexity in the conformational transition mecha- nism [62]. Our results are consistent with a MD simulation study [51], which has shown that the nCaM is inherently more flexible with lower binding affinity than the cCaM. The more open conformation and lower intrinsic flexibility of the cCaM is also probably the key to understanding initial binding between this domain and CaM’s target [51,52].

The variational model presented in this chapter predicts that the different inherent

flexibilities of the two domains of CaM also lead to distinct transition mechanisms, even though the folded state topology of the two domains is the same. The mechanism controlling the open/closed transition does not involve cracking for the relatively

flexible nCaM whereas the mechanism controlling the conformation transition of the more rigid cCaM exhibits transient partial local unfolding in its helix-linker between the binding loops. This partial local unfolding or cracking in the cCaM shows the complexity of the open/closed conformational transition mechanism of cCaM.

A recent NMR experiment studied the EF-hand association effects on the struc- ture, Ca2+ affinity, and cooperativity of CaM [141]. The EF-hands in CaM domains are odd-even paired with EF-hands 1 and 2 in nCaM and EF-hands 3 and 4 in cCaM. 100

This arrangement of EF-hands in CaM is thought to be a consequence of its evolu-

tion from a biologically related ancestor EF-hand by gene duplication [147]. The

even-odd pairing of EF-hands 2 and 3 (CaM2/3) [Fig. 5.1(c)] has been characterized

by NMR in Ref. [141]. In this fragment, EF-hands 2 and 3 are connected by the

central linker between nCaM and cCaM. Although from the crystal structure of holo

Ca2+-CaM this interdomain linker region is observed to be a long rigid α-helix [59], several NMR relaxation experiments have demonstrated that this central linker is

flexible in solution near its midpoint and the two domains do not interact [138,139].

In contrast to the high affinity and positive cooperativity for Ca2+ binding in the two

odd-even paired EF-hand domains of CaM, the CaM2/3 binds Ca2+ in a sequence.

First in the high-affinity EF-hand 3 and then in the EF-hand 2 with much lower

affinity [141]. Although not a direct focus of our chapter, we also note that a peptide

binding to CaM2/3 has also been characterized recently by McIntosh and co-workers

[148]. It was found that CaM2/3 adopts Ca2+-bound structure with peptide binding

very much similar to those of nCaM or cCaM. These observations reflect the very

high plasticity of the EF-hand association and mediate Ca2+-dependent recognition

of target proteins.

In this chapter, we study the transition between the two states of the fragment

CaM2/3 inferred by the structures of intact CaM. Our results reveal that the central

linker in CaM2/3 is highly flexible and dominates the mechanism of this transition.

The folding of the linker to an α-helix is preceded by a further increase in its flexibility

and local unfolding. This cracking leads to a very sharp transition of the helix-linker 101

from apo-CaM2/3 to holo-CaM2/3 conformation. Although our model did not pre-

dict partial local unfolding or cracking of helix-F from the open/closed conformational

transition of cCaM, the apo/holo conformational transition of CaM2/3 reveals crack-

ing in some region of helix-F as we have discussed already in comparison with the

NMR study [103]. The predicted change in conformational flexibility of CaM2/3

reveals the stepwise binding of Ca2+ ions with binding loop III having higher affinity than binding loop II. Nevertheless, the relevance of this transition to the binding dy- namics of CaM2/3 is somewhat limited because the model is based on low-probability structures in solution. CHAPTER 6

INTERPLAY AMONG TOPOLOGY, PLASTICITY AND

ENERGETICS IN THE FUNCTIONAL TRANSITIONS OF THE

CALMODULIN DOMAINS

6.1 Introduction

The main focus of this chapter is to explore the inter-relationships between topology, conformational flexibility, and energetics of the protein calmodulin (CaM). As dis- cussed in Chapter 5, our model predicts that the C-terminal domain of CaM (cCaM), which is inherently less flexible than the N-terminal domain (nCaM), “cracks” or par- tially unfolds and refolds during the apo to holo conformational transition. The apo

(closed) and holo (open) state conformations of the two domains of CaM are shown in

Fig. 6.1. The model further suggests that cracking is not involved for the similar tran- sition in nCaM. Cracking is a mechanism that lowers free energy barriers by relieving localized regions of high stress during a conformational transition [10,85,137,149].

In the previous chapter, I rationalized the cracking in cCaM as arising from its rel- ative rigidity compared to nCaM. The analysis in this chapter is aimed not only at understanding the distinct transition mechanism of the two CaM domains, but it also provides a more direct and detailed connection to the ideas behind the cracking mechanism.

Since this is a topology-based model (G¯o-model), differences in the transition

102 103

(a) nCaM C II

B

I

D A

apo-nCaM holo-ncaM

(b) cCaM

G IV

F III

H E

apo-cCaM holo-ccaM

Figure 6.1: Two topologically similar domains of CaM. (a) The N-terminal domain of CaM (nCaM) and (b) the C-terminal domain of CaM (cCaM). The closed (apo)- state structure of the two domains are represented in blue and the open (holo)-state structures are shown in red. The measured Cα RMSD between the two apo-CaM domains is 4.5309 A˚ and for the two holo-CaM domains is 3.9384 A.˚ Whereas, the Cα RMSD between apo/holo-nCaM is 6.5039 A˚ and for apo/holo-cCaM is 8.4157 A.˚ The Cα atoms are shown by beads in the structures. 104 mechanism must ultimately be reflected in differences in the contact maps of the two domains. Indeed, the contact maps reveal interesting differences at the regions where cCaM shows cracking. These differences in the contacts lead to differences in the strain that develop in the two domains during the transition. Although the regions of high strain develops about specific residues in both domains with compa- rable magnitudes, we find that the strain patterns are distinct: The strain in cCaM is distributed more broadly across the protein compared to the strain pattern that develops in nCaM. The effect that protein flexibility has on the barrier height be- tween the open and closed state is also addressed in this chapter. We find that while adjustments in flexibility lower the barrier in both domains, the barrier for cCaM is lowered about twice as much as the barrier of nCaM. That is, without some mech- anism to adjust flexibility (like cracking) the free energy barrier for cCaM would be considerably higher than nCaM.

6.2 Strain Energy Analysis

A protein that is deformed from its stable state structure develops strain energy. The

Tirion potential [86] of the elastic network model in Eq. 2.1 of Chapter 2 can be used to calculate the strain energy of each residue of the two CaM domains for the apo → holo structural change.

k ε (α ) = apo (| rN (α ) |−| rNapo |)2, (6.1) i 0 apo→holo 4 i,j 0 i,j j∈X[ij]apo where kapo is the spring constant of the apo-CaM domains. The spring constant kapo, which determines the overall rigidity of the molecule, is assigned the value kapo = 1 105

for simplicity. [ij]apo denotes the set of contacts of the apo-CaM domains and the

N N sum over j is taken for these contacts. In the above expression, | ri,j(α0) |=| ri (α0)−

N N N rj (α0) |, where ri (α0) and rj (α0) are the linearly interpolated coordinates of the

Cα position vectors of the ith and jth residue of the CaM domains, respectively (see

Napo Napo Napo Napo Napo Eq. 3.8 in Chapter 3). Similarly, | ri,j |=| ri −rj |, where ri and rj are the

Cα position vectors of the ith and jth residue of the apo-CaM domains, respectively.

The total strain energy of the CaM domains for the apo → holo structure is the sum

over each residues,

ε(α0)apo→holo = εi(α0)apo→holo. (6.2) i X For structural deformation of the apo-CaM domains the Cα position vectors are

interpolated by varying the parameter α0 in a small step of .01, linearly from the

apo- (α0 = 1) → holo-state (α0 = 0) conformation (see Eq. 3.8 in Chapter 3). In

a same way the holo-state structures of the CaM domains were also deformed to

calculate the strain energy of the residues εi(α0)holo→apo and the total strain energy

ε(α0)holo→apo, along the holo → apo structural change.

The total strain energy of the CaM domains for the apo → holo deformation

as well as the holo → apo deformation, plotted in Fig. 6.2, clearly shows that the

change in total strain energy ε(α0)apo→holo, of the CaM domains are very similar. On

the other hand, the total strain energy ε(α0)holo→apo, of cCaM is significantly higher

than nCaM. As a result, the strain energy barrier, estimated by the intersection of the

strain energy relative to each metastable state, of cCaM is ∼ 1.3 times higher than

nCaM. The distribution of residue strain energy εi(α0)apo→holo is plotted, in Fig. 6.3(a) 106

7 apo to holo nCaM 6 [ holo to apo apo to holo cCaM[ 5 holo to apo au ]

y [ 4

energ 3 ain

Str 2

1 holo apo 0 0 0.2 0.4 0.6 0.8 1 α0

Figure 6.2: Total strain energy of the CaM domains for the linearly interpolated apo/holo structural change. The solid curves show the change in total strain energy of nCaM domain and the dotted curves represent the change in total strain energy of cCaM domain. and Fig. 6.3(b) for nCaM and cCaM, respectively. These plots show that strain energy of the CaM domains is not uniformly distributed among the residues. Rather, the strain that develops is localized to certain residues of the CaM domains. Notice that there are more residues in cCaM which are under strain than topologically similar nCaM. There are two EF-hand helix-loop-helix motifs in each domain (as indicated by the secondary structure of the CaM domains are shown at the top of Fig. 6.3)

In nCaM, helix-A, B and binding loop I form the first EF-hand and helix-C, D and binding loop II form the second EF-hand [see Fig. 6.3(a)]. Similarly, the first EF- hand of cCaM is made of helix-E, F and binding loop III and the second EF-hand consists helix-G, H and binding loop IV [see Fig. 6.3(b)]. There are only few residues from helix-A, B, D and the B/C helix-linker (specifically the residues close to helix B) 107

are under high strain in nCaM. In contrast, some residues from helix-F, H and F/G

helix-linker are under high strain in cCaM. Moreover, several residues from helix-E,

G and binding loop IV are also under strain in cCaM. The strain energy distribution

of the CaM domains at an intermediate stage of the apo → holo structural change is shown in Fig. 6.3(c) and Fig. 6.3(d) for nCaM and cCaM, respectively. These plots confirm that the binding loops of the CaM domains besides binding loop IV show negligible strain. It is interesting to note that the residue strain energy of cCaM is more widely distributed among the residues than the strain energy of nCaM.

(a) nCaM (b) cCaM A I B C II D E III F G IV H (apo) 1 (apo) 1 15 13.3 0.8 0.8

au ] au 10 au ] au

10 y [ 0.6 0.6 y [

0 0 erg erg 0 α α

0.4 en 0.4 5 en 5 ain ain

0.2 Str 0.2 Str 0 0 (holo) 0 (holo) 0 4 14 24 34 44 54 64 74 76 86 96 106 116 126 136 146 Residue index Residue index (c) (d) 2 α = 0.4 α = 0.4 2 0 0 au ] au au ] au y [ y [ erg

erg 1 1 en en ain ain Str Str

0 0 4 14 24 34 44 54 64 74 76 86 96 106 116 126 136 146 Residue index Residue index

Figure 6.3: Distributions of residue strain energy for the linearly interpolated CaM domain structures. (a) and (b) are the change of the strain energy for individual residues along the apo → holo-state structural change of nCaM and cCaM, respec- tively. Residues in blue have no strain energy while red residues have high strain energy. (c) and (d) are the residue strain energy distributions for the deformed struc- ture at an intermediate state, α0 = 0.4 for nCaM and cCaM, respectively. The secondary structures of nCaM and cCaM are indicated above the plots. 108

Some insight about origin of the distinct spatial distribution of strain energy of the CaM domains can be gained by comparing the contact maps of the two domains.

As mentioned in Chapter 3, the contacts are separated into three groups: contacts that occur only in the apo-state structure, contacts that occur only in the holo- state structure, and “common” contacts that occur in both the apo- and holo-state structures. The three different sets of contacts of each CaM domain are shown in

Figs. 6.4(a) and 6.4(b). Note that the number of contacts in the three classes are different between the two domains. While the number of common contacts in nCaM is higher than cCaM, the number of contacts present only in the apo-state is greater in cCaM than nCaM. [Although the number of contacts present only in the holo-state of cCaM is also much larger than nCaM. However, the native contact pairs of the apo-state are important in the present study according to Eq. 6.1] Moreover, in both domains the residues with high strain (as shown in Fig. 6.3) are those for which they have contacts only in the apo-state. Since there are more of these apo-state contacts under high strain in cCaM, it is perhaps less surprising that some of these break through cracking during the apo → holo structural change. [These apo-contacts are shown in blue dots for both the domains inside the big colored circles in Fig. 6.4, where the number of apo-contacts in cCaM is much higher than the apo-contacts in nCaM.] 109

(a) nCaM A I B C II D 80

70 D

60 II 50 40 idue index 30 BC Res I 20 common (80) apo (24) 10 holo (29) A 0 0 10 20 30 40 50 60 70 80 Residue index (b) cCaM E III F G IV H 150

140 H

130 IV

120 G ndex

e i 110 F idu 100 III Res common (57) 90 apo (34) E 80 holo (50) 70 70 80 90 100 110 120 130 140 150 Residue index

Figure 6.4: Contact map of the two CaM domains, (a) for nCaM and (b) for cCaM. In each plot the contacts are separated in three sets, common contacts (), contacts only from the apo-state ( ) and contacts only from the holo-state (4) of each CaM domain. The secondary structures of nCaM and cCaM are indicated in each plot. Helices are represented by the rectangular boxes, binding loops and helix-linker are indicated by lines and small β-sheets are shown by arrows. In nCaM, the helices are denoted as (A-D) and the binding loops are by I and II. While, in cCaM the helices are represented as (E-H) and the binding loops are by III and IV. Number of different contacts of the CaM domains are indicated inside each plot. 110

6.3 Analysis of Conformational Flexibility and Cracking

Inherent flexibility plays a key role in determining the conformational transition mech- anism of proteins. In this section, I compare the inherent flexibility of the apo-CaM domains and explain the differences in flexibility of the two domains in terms of their specific contact maps and strain profiles. The calculated mean square fluctuations

Bi({C}, α0) (using Eq. 3.31) of each residue is shown in Fig. 6.5. Besides the termi- nal helices, the binding loops and helix-linker of the apo-CaM domains are the most

flexible. Interestingly, the first Ca2+-binding loop of both domains (binding loop I of nCaM and loop III of cCaM) exhibit very similar flexibility. On the other hand, the second Ca2+-binding loop of apo-nCaM (binding loop II) is much more flexible than the second binding loop of apo-cCaM (binding loop IV), especially the residues from binding loop II (near helix-C of nCaM). This difference in the flexibility of the second binding loop of the apo-CaM domains is mainly due to their difference in the contact map of the apo-state conformation. The contact maps in Figs. 6.4(a) and

6.4(b) clearly show that there are no contacts for the residues from binding loop II

(near helix-C of apo-nCaM), whereas, binding loop IV has contacts with helix-H of apo-cCaM (represented by ( ) in Fig. 6.4(b)). A similar comparison also accounts for higher flexibility of helix-C (near binding loop II of apo-nCaM) than of helix-G

(near binding loop IV of apo-cCaM). Here, the greater number of contacts in helix-G of apo-cCaM compared with helix-C of apo-nCaM explains the difference in flexibility.

The helix-linker of the two apo-CaM domains also exhibits very different flexibility.

Specifically, the residues in F/G helix-linker close to helix-F of apo-cCaM have much 111 higher flexibility than the residues in B/C helix-linker near helix-B of apo-nCaM. In contrast, residues in B/C helix-linker near helix-C of apo-nCaM is more flexible than the residues in F/G helix-linker close to helix-G of apo-cCaM. This dissimilarity in the inherent flexibility of residues in helix-linker of two CaM domains can be under- stood by the presence of more contacts in F/G helix-linker in apo-cCaM than the

B/C helix-linker apo-nCaM. [See the contacts between F/G helix-linker and helix-H in Fig. 6.4(b).] A I B C II D nCaM 1 2 5 2 4 4 4 34 44 4 64 74 apo-nCaM apo-cCaM 1.5 ] 2

[a 1 i

B

0.5

0 76 86 96 106 116 126 136 146 cCaM E III F G IV H Residue index

2 Figure 6.5: Mean square fluctuations, Bi = hδri i0 of the apo-CaM domains. The solid and dotted curves indicate the Bi of the apo-state nCaM and cCaM, respectively. Secondary structures of the nCaM and cCaM are shown at the top and bottom part of the plot, respectively.

The evolution of the conformational flexibility of the CaM domains, throughout the transition can be represented by the ratio

r Napo Bi ({C}, α0)= Bi({C}, α0)/Bi , (6.3)

Napo evaluated along the the apo → holo transition route. Here, Bi denotes mean 112 square fluctuations of the apo-CaM domains (α0 = 1). This parameter emphasizes

r changes in flexibility from the apo state: the value Bi ({C}, α0) = 1 reflects that the

r flexibility remains unchanged throughout the transition, whereas Bi ({C}, α0) < 1 and

r Bi ({C}, α0) > 1 denote decrease and increase in the flexibility during the apo → holo transition of the CaM domains, respectively. Interestingly, as shown in Figs. 6.6(a) and 6.6(b), the conformational flexibility of some residues reaches a maximum along the transition route. The residues in helix-B of nCaM and F/G helix-linker and helix-G of cCaM clearly indicate this nonmonotonic change. Other residues in the

CaM domains show a nearly monotonic change in the conformational flexibility. In particular, some residues in binding loop I, loop II and helix-C of nCaM and some

r residues in binding loop III, loop IV and helix-E of cCaM show decrease in Bi (α0), whereas residues in helix-A, D and B/C helix-linker of nCaM and helix-E and F of

r cCaM which indicate monotonic increase in the Bi (α0).

The evolution of a residue’s flexibility during the transition reflects changes in the proximate environment such as the formation and breaking of contacts as well as development of local strain. A comparison the strain profiles shown in Fig. 6.3 with the relative flexibility shown in Fig. 6.6 reveals that the residues in both domains of the CaM which have monotonic changes in flexibility, are under high strain. The

r residues which exhibit nonmonotonic changes in Bi (α0) are of special interest because partial unfolding and refolding (cracking) is thought to be a response to high local strain. Indeed, residues that unfold and refold along the transition path are also under high strain. Such residues include the residues with sequence number 33-35 in 113

a nCaM cCaM ( ) A I B C II D (b) E III F G IV H (apo) 1 (apo) 1 2.4 2.2 0.8 0.8 2 1.8 1.6 0.6 0.6 0

0 0

1.4 (α )

r i i i i i /B B /B α B α 0.4 0.4 1.2 B 1 0.2 0.2 0.8 0.6 (holo) 0 0 0.4 4 14 24 34 44 54 64 74 (holo) 76 86 96 106 116 126 136 146 Residue index Residue index (c) nCaM (d) cCaM C G II IV 2.4

2 B F 1.6 I 0 III (α ) r i

1.2 B H D 0.8 E A 0.4

α0= 0.4 α0= 0.4

r Figure 6.6: Ratio of fluctuations Bi ({C}, α0) of the CaM domains for apo → holo r conformational change. (a) and (b) are showing Bi (α0) of each residue of nCaM and cCaM along the transition route, respectively. In (c) and (d) the linearly interpolated r 3D structures of nCaM and cCaM are colored according to Bi ({C}, α0) of each residue α at an intermediate state α0 = 0.4 during the apo → holo transition. C atoms of the CaM domains are represented as beads. In all figures residues in red have higher flexibility while the residues in blue have lower flexibility with respect to the apo state of the corresponding domains during the conformational change. 114

helix-B of nCaM and the residues number with sequence 115-123 in F/G helix-linker

and helix-G of cCaM. Interestingly, the residues under highest strain are not the ones

that partially unfold.

Recent NMR experiments on E140Q mutant of cCaM by Lundstrom et al. [103] revealed that the entire helix-F undergo transient unfolding in the course of confor- mational exchange. This implies that cCaM occupies a minor population of unfolded state in the physiological condition. This observation agrees with T-jump fluorescence spectroscopy measurements [102], as well as recent molecular dynamics simulation studies [125]. This observation is in harmony with the strain profile shown in Fig. 6.3 which indicates helix-F develops high strain for α0 between ∼ 0 and 0.3. The cal-

r culated Bi (α0) from the present model suggest that residues 104 and 105 of helix F

show a minor signature of local unfolding and refolding at α0 ∼ 0.5. Still, the model

predicts that cracking is more evident in the flexible F/G helix-linker and helix-G

[see Fig. 6.6(b)]. Overall, our model suggests that although a few residues in helix-B

and B/C helix-linker of nCaM show signs of minor cracking, the cracking in nCaM is

almost negligible compared to the cracking in cCaM. The difference in the predicted

transition mechanisms for the two domains can ultimately be traced back to differ-

ences in the contact map of the apo-CaM domains. In apo-nCaM the contacts are

present mostly between helix-A/B and helix-B/D [the blue dotted points inside the

colored areas in Fig. 6.4(a)], whereas, for apo-cCaM the contacts are mainly between

helix-E/F, helix-E/H, helix-F/H, and helix-G/H [the blue dotted points inside the

colored areas in Fig. 6.4(b)]. Accordingly, more contacts break during the apo → 115 holo conformational transition in the cCaM domain than break in the nCaM domain.

6.4 Relationship between Free Energy Barrier and Inherent Flexibility

Since cracking has been invoked as a way to relieve high strain energy, one would expect that if cracking were inhibited cCaM would develop more elastic strain energy than nCaM. Although the difference is found to be modest, the estimated barrier due to elastic strain energy is somewhat higher in cCaM (see Fig. 6.2). Here, we are interested in exploring the interplay between inherent flexibility and energetics of the conformational transitions. In this model the N variational parameters, {Ci}, control the conformational fluctuations Bi({C}, α0) along the transition route. We consider here what happens to the free energy barrier when the flexibility is constrained to the open or closed state throughout the transition. As shown in Fig. 6.7, we first note that, the free energy barrier height of the CaM domains are comparable when {Ci} are allowed to adjust during the apo → holo-state transitions. Specifically, the free

† energy barrier height ∆F ({C}, α0) ∼ 4 kBT for nCaM (measured between the apo

† and the transition state ensemble) is rather close to the ∆F ({C}, α0) ∼ 5.5 kBT of cCaM at the same T . [See the dotted black curves in Figs. 6.7(a) and 6.7(b) for the nCaM and cCaM, respectively.]

Napo The free energy profiles F ({C }, α0) where the variational parameters are fixed

Napo to the values corresponding to the apo-state ({Ci} = {Ci }) along the apo → holo- state structural change of the CaM domains are plotted as solid lines in Fig. 6.7. 116

(a) nCaM -4 variable {C} apo fixed {C} -6 [ holo

-8

T holo

F/k -10 ~ 6kT

-12 apo

-14 0 0.2 0.4 0.6 0.8 1 α 0 (b) cCaM 15 variable {C} apo 10 fixed {C} [ holo 5 T 0 F/k holo -5 ~13kT apo -10

-15 0 0.2 0.4 0.6 0.8 1 α0

Figure 6.7: Free energy barriers of the CaM domains during the apo/holo confor- mational change. In (a) and (b) the free energy barriers F ({C}, α0), along the apo → holo-state conformational change is shown by the dotted black curves, when the variational parameters {Ci}, for the residues are allowed to vary from the apo to Napo holo-state for both domains. The free energy profiles F ({C , α0), denoted by the solid black curves in (a) and (b) indicate that {Ci} of the residues of the CaM do- Napo mains are fixed to their corresponding {Ci } of the apo-state, during their apo → holo-state conformational change. The solid gray curves in (a) and (b) indicate Nholo the free energy profiles F ({C }, α0), such that, the {Ci} for the residues of the Nholo CaM domains are fixed to their corresponding {Ci } of the holo-state, during their holo → apo-state conformational change. The horizontal and vertical dotted lines are drawn for estimating the free energy barrier height from the plots. 117

Napo Similarly, the free energy profiles F ({C }, α0) obtained by restraining the varia-

Nholo tional parameters to the values correpsonding to the holo state ({Ci} to {Ci }) for the holo → apo-state conformational change is plotted as the grey solid curves in

Fig. 6.7. Not surprisingly, variable flexibility lowers the free energy barriers compared with the estimated barriers obtained by the intersection of the free energy profiles.

The estimated free energy barrier height relative to the apo-state of the CaM domains for nCaM ( ∼ 6 kBT ) is less than half the barrier for cCaM (∼ 13 kBT ). Here, we see that adjustments in flexibility in cCaM is more effective in lowering the putative constant flexibility barrier, a reasonable result since cracking is a way to avoid suf-

ficiently high kinetic barriers [10]. This is perhaps also expected in light of the fact that there are more residues in cCaM than nCaM which are under high strain during the conformational change.

6.5 Conclusions

In this chapter, I have extended the analysis presented in Chapter 5 to characterize the apo/holo conformational change of the CaM domains. Here, the investigation reveals some key aspects of the conformational transition mechanisms of the CaM domains in connection with their topology, plasticity and energetics of the intercon- version. Our results suggest that cracking is more essential and may be functionally important for cCaM in comparison with its topologically similar and homologous nCaM. This analysis connects to the original ideas behind the proposal of cracking as one mechanism that can lower kinetic barriers due to high strain. CHAPTER 7

CONFORMATIONAL TRANSITION AND FOLDING MECHANISMS

OF THE N-TERMINAL RECEIVER DOMAIN OF PROTEIN NTRC

7.1 Introduction

The N-terminal receiver domain of the nitrogen regulatory protein C (NtrCr) ex-

hibits slow conformational dynamics in the microsecond timescale [13]. The large-

scale structural change of NtrCr upon phosphorylation is a good model system that

illustrates how allostery play a key role in protein functions. NtrC is a member

of the response regulator protein family that contains a CheY-like receiver domain

(NtrCr). These small βα-repeat structure proteins typically contain ≈ 125 residues

with an α/β/α sandwich fold. The structure of NtrCr is shown in Fig. 7.1(a). Phos- phorylation of Asp54 induces large structural rearrangements in the potential switch

r region α3β4α4β5 (the “3445” face) and the β3α3 loop of the NtrC that transmits the

signal to the C-terminal DNA-binding domain [68, 150]. [See Subsection 1.6.2 for a

more detailed discussion of how NtrC functions.] NMR relaxation [13] as well as all-

atom molecular dynamics (MD) simulations [151] have recognized the population-shift

mechanism due to phosphoryation (activation) takes advantage of the high flexibility

r in α4 in the inactive (I)-NtrC . In this chapter, I explore the mechanism of inactive

→ active (I → A) state conformational change of NtrCr, as well as the folding to both

the I- and A-NtrCr structures.

118 119

(a) Inactive Active

α1

α2 β2 β2 α1 β5 β1 α4 α4 α3 β4 α3 β3 β3β1 β5 β4 N α2 α5 α5

N C C (b) 4−10 28−33 49−54 77−82 98−103 15−24 36−44 66−72 85−95 108−122

β1 α1 β2 α2 β3 α3 β4 α4 β5 α5 (c) 4 3.5 3 2.5 [a] | N i 2 r ∆

| 1.5 1 0.5 0 0 20 40 60 80 100 120 Residue index

Figure 7.1: The inactive (I) and active (A) conformations of NtrCr. (a) The I (PDB − ID code 1ntr) and the A (BeF3 -activated, PDB ID code 1krw) conformations of NtrCr are shown by representing the secondary structure with different colors. (b) Cartoon of the secondary structure of NtrCr, where the α-helices and β-sheets are represented by rectangles and arrows respectively, along with the residue number. (c) Deviations between the two native state structures of NtrCr (the I and the A N NI NA states) measured directly from the corresponding PDBs, where |∆ri | = |ri − ri | and a =3.8 A˚ is the distance between adjacent monomers in the variational model. 120

Folding mechanisms of CheY-βα-repeat proteins including NtrCr are distinct even though they have the same topology [152, 153]. Sensitivity to the sequence of the different proteins is thought arise from the topological frustration of these relatively complex structures. Theoretical models suggest that folding of NtrCr initiates from the N-terminal half that provides the partially structured (folding) transition state and further (residues 1–62) acts as a folding nucleus for the C-terminal half of this protein (residues 63–124) [153, 154]. [The secondary structure of NtrCr is shown in Fig. 7.1(b)] Itoh and Sasai [154] predict that the roles of the C-terminal and

N-terminal halves reverse in the dominant mechanism for the folding to the active

(A)-NtrCr structure, with folding initiating from the C-terminal half first and then the N-terminal.

The I → A conformational transition of this protein has been studied recently through both all-atom and coarse-grained protein models [154–159]. Two outcomes from these studies demand clarification. One is the disagreement in the description of the key mechanism of phosphorylation induced conformational change of NtrCr and the other is possibility of “cracking” or local unfolding and refolding of the functionally important helix-α4. Recently, Kern and coworkers performed computational [159] and experimental [160] studies of the conformational change and unfolding of the NtrCr.

Their simulations suggest that the conformational switch in NtrCr can be described by a sequence of conformational steps involving helix-α4 [159]. Furthermore this helix remained stable during the whole transition [159, 160]. They also argued that the non-native hydrogen bonds that form with α4 during the inactive/active transition 121 of NtrCr lowers the barrier height instead of cracking [159]. Additional unfolding

r experiments revealed that α4 unfolds cooperatively with the protein NtrC which, they argue, shows that α4 remains intact during the conformational transition [160].

Here, I use a variational model to investigate the folding and conformational

r − transition mechanisms of NtrC . In the present study, a BeF3 -activated structure is used as the active state NtrCr (A-NtrCr). The I-NtrCr and A-NtrCr conformations

− are shown in Fig. 7.1(a). Several response regulators in BeF3 -activated state exhibit similar functions to their phosphorylated counterparts [161]. Although the NMR structures of phosphorylated NtrCr (P-NtrCr) are available [68], it has been found that the lifetime of P-NtrCr state was short and some key features in the NMR

− r data were not well-defined. The main structural difference between the BeF3 -NtrC

r − r and P-NtrC is observed in the position of helix-α3. In BeF3 -NtrC structure, α3 seems closer to the position adopted in the I-NtrCr [71]. This is also evident from

Fig. 7.1(c) that shows very little deviation in α3 when the difference in the inactive

− r and BeF3 -NtrC are evaluated directly from their corresponding PDB structures.

7.2 Methods

The NMR structures of active and inactive forms of NtrCr are used to model the transition. The structures of the inactive protein (I-NtrCr, PDB ID code 1dc7) and the active (phosphorylated) protein (P-NtrCr, PDB ID code 1dc8) are available [68].

However, following Ref. [156], we use the PDB ID code 1ntr [70] as the I-NtrCr and

− r the berrylofluoride activated protein (BeF3 -NtrC , PDB ID code 1krw [71]) as the 122

active state (A-NtrCr), which is in fact structurally very similar to the P-NtrCr, 1dc8

with a backbone RMSD of 0.57 A˚ [71]. Before calculations, the structures of the

active/inactive NtrCr pair are aligned over the residues 4–9, 14–53 and 108–121 as

suggested in Ref. [13] using SuperPose [162].

A variational model of folding is used to study the folding of the I- and the A-

NtrCr (see Section 3.2 in Chapter 3). The folding temperatures of these two states

r are found to be kBTf /0 =2.15 and 2.13, for the I- and A-NtrC , respectively. Here,

0 =0.6 kcal/mol is set by the Miyazawa-Jernigan energy scale [110].

The structural change from I → A -state of NtrCr upon activation is studied using the variational model of conformational transition discussed in the Section 3.1 of Chapter 3. In this model, the energy basins of the I- and A-NtrCr are coupled smoothly using the multiple basin energy approach discussed in Section 3.1.4 (see

Eq. 2.9). The temperature of the conformational transition is chosen to be T =0.7Tf ,

r where Tf is the folding temperature of the I-NtrC .

7.3 Folding Mechanism of NtrCr

7.3.1 Folding of the inactive-NtrCr

r We study the folding of the I-NtrC at the folding temperature Tf . The folding route

consists of a series of minima connected by saddle points (transition states). To

investigate the folding mechanism of the I-NtrCr, we first analyze the mean square

2 fluctuations Bi = Giia evaluated at the critical points that define the folding route.

th Here, Gii is the magnitude of fluctuations of the i monomer and a = 3.8 A˚ is the 123

distance between adjacent monomers in the variational model. Fig. 7.2(a) illustrates

that folding of the I-NtrCr initiates from the core, which consists the β-sheets for this βα-repeat protein. Note that, β3 and β4 show very small fluctuations even in

the partially structured folding nucleus. The helices α1, α4 and α5 are predicted to

fold late. Helix-α4 in particular is among the last regions to become structured along

the folding route. The late folding of helix α4 reflects the low stability of α4 in the

inactive conformation.

The structural organization of the protein along the folding route can be char-

acterized by the normalized native density, a local order parameter that monitors

the structural similarity of each residue relative to the folded conformation. The

normalized native density of the I-NtrCr is expressed as (using Eq. 3.42),

G ρi(I) − ρi (I) ρi(I) = N G , (7.1) ρi (I) − ρi (I)

where N and G denotes the native density of the folded (i.e., native) and globule state

of the I-NtrCr, respectively. Progress along the folding route can be represented by the global order parameter Q(I) = 1/N i ρi(I), where Q(I) = 0 and Q(I) = 1 represent the unfolded (globule) state and theP folded (native) state, respectively. Fig. 7.2(b) reveals that folding of the segments α2β3 and α3β4 occur first and as a result gives a transition barrier of ∼ 9kBT at Q(I) = 0.2 [see Fig. 7.2(c)]. Folding of the β3α3

loop starts early at Q(I) = 0.2 but it structures gradually probably due to its high

flexibility in the native conformation. At Q(I) = 0.4 we find an intermediate state

along the folding route where β1 and α5 fold. At Q(I) = 0.6 we see another transition

state of ∼ 9.5kBT where the folding of α1 initiates and folding of α5 completes [see 124

Inactive state (a) β1 α1 β2 α2 β3 α3 β4 α4 β5 α5 50 Q = 0.182 40 0.347 0.354

] 0.384

2 30 0.367 a

[ 0.397 i 20 0.569 B 1.0 10 0 1 21 41 61 81 101 121 Residue index (b) β1 α1 β2 α2 β3 α3 β4 α4 β5 α5 N 0 1 0.2 0.8 0.4 0.6 I i Q 0.6

0.4 ρ 0.8 0.2 1 0 1 21 41 61 81 101 121 G Residue index (c) 10 9 8 7 6 5 4 3 2 Free energy [kT] 1 G 0 N -1 0 0.2 0.4 0.6 0.8 1 Q

Figure 7.2: Folding mechanism of the inactive NtrCr. (a) Mean square fluctuations 2 vs residue index, Bi = Giia where a is the distance between successive monomers, plotted for different values of the order parameter Q(I). Fluctuations of the transi- tion states (TS, dotted), minima (Min, solid) and native (N, solid) state are shown. Globule (G) state is not shown. (b) Folding route is characterized locally by the nor- G N G malized native density ρi(I) = ρi(I) − ρi (I) / ρi (I) − ρi (I) and the global order parameter Q(I) = 1/N i ρi(I), where I denotes the inactive state. The degree of structural localization of each residue is reflected  in the colors, linearly scaled between P blue [ρi(I) = 0forG]and red[ρi(I) = 1 for N]. (c) Free energy profile along the folding route vs global order parameter Q(I). 125

r Fig. 7.2(c)]. The most striking feature of the I-NtrC is the late folding of the α4β5 segment at Q(I) = 0.8, again probably because of high flexibility of this segment in the native state. The late folding of this region is relevant to the conformational transition mechanism of the NtrCr discussed in the next section.

Overall, these results agree qualitatively with other folding studies of the NtrCr protein which showed that the N-terminal half (β1α1β2α2β3 element of the secondary

r structure) of the NtrC folds earlier than its C-terminal half (α3β4α4β5α5 element of the secondary structure) [153,154]. Similar folding routes are reported for the folding studies of topologically similar other βα-repeat proteins, as well [153].

7.3.2 Folding of the active-NtrCr

The predicted folding route of the NtrCr to the active structure at its folding temper- ature Tf , is quite different from corresponding predicted folding route to the inactive structure I-NtrCr. The folding route consists of a series of minima connected by the transition states between the globule and native states. We find that the C-terminal half becomes structured before the N-terminal half along the folding route of the

A-NtrCr as indicated by the temperature factors shown in Fig. 7.3(a). In particular, the β3α3 loop and β4 are much less flexible even in the very early stage of folding [see

Fig. 7.3(a)].

Partially ordered ensembles of structures characterized by the native density

G ρi(A) − ρi (A) ρi(A) = N G , (7.2) ρi (A) − ρi (A)

and the global order parameter Q(A) = 1/N i ρi(A) are sown in Fig. 7.3(b). In P 126

Active state (a) β1 α1 β2 α2 β3 α3 β4 α4 β5 α5 60 Q = 0.189 50 0.382 0.46 0.56

] 40

2 0.676 0.733 [a 30

i 0.744

B 0.826 20 1.0 10 0 1 21 41 61 81 101 121 Residue index β1 α1 β2 α2 β3 α3 β4 α4 β5 α5 (b) N 0 1 0.2 0.8 0.4 0.6 A i Q 0.6 0.4 ρ 0.8 0.2 1 0 1 21 41 61 81 101 121 G Residue index (c) 12

10

8

6

4

Free energy [kT] 2

0 G N -2 0 0.2 0.4 0.6 0.8 1 Q

Figure 7.3: Folding mechanism of the active NtrCr. (a) Mean square fluctuations vs 2 residue index, Bi = Giia where a is the distance between successive monomers, plot- ted for different values of the order parameter Q(A). (b) Folding route is characterized G N G locally by the normalized native density ρi(A) = ρi(A) − ρi (A) / ρi (A) − ρi (A) and the global order parameter Q(A) = 1/N i ρi(A), where A denotes the active state. The degree of structural localization of each residue is reflected  in the colors, P linearly scaled between blue [ρi(A) = 0 for G] and red [ρi(A) = 1 for N]. (c) Free energy profile along the folding route vs global order parameter Q(A). 127 contrast to the I-NtrCr structure, folding to the A-NtrCr structure initiates with the

C-terminal half of the protein before its N-terminal half. This early folding of the

β3α3 segment and β4 corresponds to a transition barrier of ∼ 11kBT [Fig. 7.3(c)], somewhat higher than the first barrier to the inactive structure. Another interesting feature of the folding mechanism of A-NtrCr, compared to the inactive state is that

α4 folds much earlier in the folding route at Q(A) ∼ 0.3. This striking difference probably reflects an increase in the stability of this helix upon activation. The rest of the route shown in Fig. 7.3(c) consists of the helices α5 and α2 folding at Q(A) = 0.5 and 0.6, respectively, followed by the ordering of the β1α1β2 segment at the end.

These results are in harmony with the theoretical study by Itoh and Sasai [154], who predicted that the C-terminal half of NtrCr becomes ordered first in the dominant folding pathway to A-NtrCr.

7.4 Conformational Transition of NtrCr

In the previous section we studied the folding mechanisms of both the I- and A-NtrCr.

Here, the main focus is to investigate the I → A conformational transition mechanism of NtrCr. This study will help us to illustrate the large structural change of NtrCr that supports a dynamic population shift between its I and A conformations as the underlying mechanism of allostery in this protein.

7.4.1 Inactive to active state transition routes of NtrCr

In order to find the I → A conformational transition routes, we first need to evaluate

I A the single basin energies E [{C}, α0] and E [{C}, α0] of the inactive and active state 128 conformations of the NtrCr, respectively. These energies can be calculated using

Eq. 3.26 in Chapter 3, from all the pair of contacts [ij]I and [ij]A for the I- and

A-state NtrCr, respectively. The solid curves in Fig. 7.4(a) show the energy EI and

A E evaluated independently at α0 = 1 and α = 0 for the I and A-state, respectively.

The relative stability of the two energy curves are adjusted with an offset value

0 E = 5.96 kBT , giving an intersection of the two energy curves at α0 ∼ 0.5. The energy for the I → A conformational transition energy E[{C}, α0], shown as the dotted curve in Fig. 7.4(a), is obtained by combining the two energies EI and EA following the Eq. 3.30 in Chapter 3, with the coupling constant ∆ = 6 and the offset value

0 E =5.96 kBT . Using the expression for the entropic contribution, S[{C}, α0] (given in Eq. 3.25), we evaluate the free energy F [{C}, α0] = E[{C}, α0] − TS[{C}, α0] of the inactive → active conformational change of the NtrCr is shown in Fig. 7.4(b).

The energy offset value E0 in Eq. 3.30 is adjusted so that inactive state becomes more populated than the active state [see Fig. 7.4(b)]. Following a similar approach in Ref. [11], the value of the coupling parameter ∆ increased it from 0 in small steps until a smooth transition from the I → A state of NtrCr is observed in the calculated free energy, F [{C}, α0]. The estimated free energy barrier height for the I

† → A conformational transition from our model is found to be ∆F ∼ 2.7kBT , lower than the estimated I → A transition barrier of approximately 6 kcal/mol (∼ 10 kBT ) from NMR relaxation experiments [163]. One reason for the small value of activation barrier from our simple model is the relatively large entropic contribution to the transition state ensemble. 129

(a) -260 I -265 E EA -270 E -275 -280 -285 I Energy [kT] -290 -295 A -300 -305 0 0.2 0.4α 0.6 0.8 1 0 (b) -50.5

-51

-51.5 A

-52

-52.5

Free energy [kT] -53

-53.5 I -54 0 0.2 0.4α 0.6 0.8 1 0

Figure 7.4: Energy and free energy paths for conformational transition of NtrCr. The single basin energies EI and EA are for the inactive (black curve) and active (gray curve) state, respectively in (a). The dotted curve E is the energy path of the inactive (α0=1) → active (α0=0) state transition. The free energy path of the inactive (α0=1) r → active (α0=0) state transition of NtrC , (b). 130

7.4.2 Conformational flexibility of NtrCr

The change in conformational flexibility of NtrCr, is characterized by the mean square

fluctuations, Bi(α0) evaluated at different values of the interpolation parameter α0 along the I → A transition route. Overall the protein NtrCr becomes significantly more rigid upon activation, as shown in Fig. 7.5(a). In particular, loops β3α3, β4α4 and α4β5, and helix-α4 are highly flexible in the inactive state. With the exception of part of β5, the β sheets remain rigid during the I → A state transition. Also, the two ends of β3α3 loop remain rigid in both the inactive and active states. On the other hand, some residues of α2-β3 loop close to helix-α2 are flexible during the conformational change.

The relative flexibility of the inactive conformation agrees with the backbone dynamics inferred from NMR relaxation measurements [13], as well as mean square

fluctuations obtained from recent molecular dynamics simulations [151,158,159] and analytical model [157]. Our calculated Bi(α0) of α3 does not show significant change in flexibility upon activation of I-NtrCr, as suggested by Hastings et al. [71]. The

− source of this descrepency is that α3 is in contact with the β-sheet core of BeF3 - activated NtrCr structure. This helix is not well-defined in the NMR structure of

P-NtrCr [71], on which these other studies are based.

We also evaluate the ratio of the change in conformational flexibility along the transition route to understand how the activation mechanism stabilizes the A-NtrCr over the I-NtrCr. In Fig. 7.5(b), we plot the ratio of mean square fluctuations

r I Bi (α0) = Bi(α0)/Bi (α0 = 1), where Bi(α0) is the fluctuations of residues calculated 131

(a) β1 α1 β2 α2 β3 α3 β4 α4 β5 α5 2 I α 0 = 1 α = 0.8 1.5 0 α 0 = 0.6 ] α = 0.4

2 0 1 α = 0.2 [a 0 i 0 A B α 0 = 0.5

0 1 21 41 61 81 101 121 Residue index (b) β1 α1 β2 α2 β3 α3 β4 α4 β5 α5 I 1 1.8 1.6 0.8 1.4

1.2 0

0.6 ) 0

1 α

( α

o r

i e 0, 6. = lta 0.4 de 1.51, = T 0.8 B 0.2 0.6 0.4 A 0 0.2 1 21 41 61 81 101 121 Residue index

Figure 7.5: Change in conformational flexibility of NtrCr along the inactive → active conformational transition route. Mean square fluctuations Bi(α0) vs residue index for selected values of the interpolation parameter α0 are shown in (a). Ratio of mean r I square fluctuations, Bi (α0) = Bi(α0)/Bi , (b) where Bi(α0) is calculated for different I α0 and Bi is fluctuations of the inactive state at α0 = 1. The dotted curves in (a) and (b) enclose the regions from β4-α4 loop, helix-α4, α4-β5 loop and β3-α3 loop of NtrCr. 132

I at different α0 for the I → A transition and Bi (α0 = 1) is the fluctuations of the

r r I-NtrC . This parameter is normalized so that Bi (α0) = 1 implies that flexibil- ity of these residues of NtrCr remain unchanged throughout the transition, whereas

r r Bi (α0) < 1 and Bi (α0) > 1 denote decrease and increase in flexibility of the residues during the I → A structural change of NtrCr, respectively. This normalized parame- ter highlights changes in the conformational flexibility of each residue of NtrCr alter with respect to the I-state along the transition route. As shown in Fig. 7.5(b), there is a sharp decrease in flexibility of the residues for β3α3 loop, α4 and α4β5 loop at

α0 = 0.5 (near the transition state). [These regions are highlighted by the dotted curves in Fig. 7.5(b).] On the other hand, the β4α4 loop show relative small change in flexibility and remains flexible even in the active state. Figs. 7.5(a) and 7.5(b) reveal that part of flexible β5 increases its flexibility slightly during the transition.

These observation are consistent with NMR results [68, 71], that β4α4 loop and β5 have persistent motions in both the inactive and active states.

Even though the α1β2α2 segment, α3β4 loop and β4α4 loop exhibit some change in flexibility during the conformational we restrict our attention to the β3α3 loop, α4 helix, β4α4 loop, and α4β5 loop because these particular segments have significant structural change according to the NMR measurements [13].

7.4.3 Inactive to active transition mechanism of NtrCr

In order to explore the I → A conformational change mechanism of NtrCr in more detail, we plot the fluctuations Bi(α0) for selected residues of the β3α3 loop and β4α4 133 segment in Fig.7.6. Whereas the residues selected in the β3α3 loop were chosen based on their significant decrease in flexibility upon activation, the residues from the β4α4 segment were selected arbitrarily because this whole segment is highly flexible in the inactive conformation. As seen in Fig. 7.6(a), residues 57–59 in the N-terminal part of the β3α3 loop shows a more gradual decrease in the flexibility than the other residues

(60–62) in the C-terminal part of the loop, which has an abrupt change in flexibility near the transition state (α0 ∼ 0.5 − 0.55). The change in flexibility of the plotted residues in the β4α4 segment is abrupt as shown in Fig. 7.6(b). Some residues from this segment exhibit nonmonotonic changes in the flexibility at α0 ∼ 0.5 − 0.55. Similarly, the residue Gly62 of β3α3 loop shown in Fig. 7.6(a) increases its conformational

flexibility transiently along the transition route [see Fig. 7.6(b)]. While this increase in flexibility is similar to our previous study of the C-terminal domain of calmodulin

(cCaM) discussed in Chapters 5 and 6, the magnitude is much smaller.

The above results are particularly interesting because it has been shown that helix-

r α4 is functionally important and exhibits large deviations from the I-NtrC structure during I → A conformational change [13,68]. Recall that this helix folds much later in the folding route to I-NtrCr than the folding route to the A-NtrCr structure. This striking difference in the folding mechanism of α4 is consistent with a large change in the stability of the α4 upon activation. Other than α4, a large structural change is also observed for the β3α3 loop from the NMR relaxation studies [13]. Our results suggest that the residues from the middle part of this loop play an important role in

r the conformational change of NtrC in conjunction with α4. 134

(a) β3−α3 Loop 1.6 Met57 I 1.4 Pro58 Gly59 Met60 1.2 Asp61 ] Gly62 2

[a 1 i B 0.8 A 0.6

0.4

0.2 0 0.25 0.5 0.75 1 α 0 (b) Helix-α4 1.3 Asp86 1.2 Asp88 I 1.1 Val91 Ala93 1 Tyr94 0.9 ] 2 0.8 [a i B 0.7 A 0.6 0.5 0.4 0.3 0 0.25 0.5 0.75 1 α 0

Figure 7.6: Change in mean square fluctuations of selected residues of NtrCr along the conformational transition route. (a) Plot of fluctuations Bi(α0) vs interpolation parameter α0 of residues from β3α3 loop for inactive (I) → active (A) transition. (b) Similar plot of Bi(α0) vs α0 of residues from α4. 135

Comparison of the contact maps of the inactive and active conformations of NtrCr in Fig. 7.7 also gives some insight to understand the transition mechanism of NtrCr.

We focus here on accounts for the loss of flexibility of the β3α3 loop and α4 helix regions of the protein (see Fig. 7.5). The many additional contacts involving the β3α3 loop in the active state. Similarly, additional main chain i → i + 4 hydrogen bonds in the α4 helix leaves this helix more rigid upon activation. Notice also the extensive

r contacts between the β3α3 loop and α4 present only in the A-NtrC structure.

These observations underline functional motions of both the β3α3 loop and α4 in the I → A structural change of NtrCr. In particular, from our present study we find that the key mechanism of allostery in NtrCr is controlled by the large change in conformational flexibility of both the β3α3 loop and the helix-α4. More specifically, we notice that the significant increase in contact numbers for these two particular structural segments could be the reason of their large decrease in flexibility. Inter- estingly, we find a gradual decrease in flexibility of the β3α3 loop [see Fig. 7.6(a)], whereas the change in flexibility of α4 is abrupt [see Fig. 7.6(b)]. This abrupt change in flexibility is probably due to the presence of four additional main chain hydrogen

r bonds in α4 that form in the active state conformation of NtrC .

7.4.4 Strain energy analysis of inactive NtrCr

To further investigate the role that the flexibility of α4 plays in the transition mecha- nism, we also perform strain energy analysis of I-NtrCr. The residue strain energy of

NtrCr for the I → A structural change can be expressed following the elastic network 136

β1 α1 β2 α2 β3 α3 β4 α4 β5 α5

120

100 I

80 e index

idu 60 Res

40

A 20 β1 α1 β2 α2 β3 α3 β4 α4 β5 α5 0 0 20 40 60 80 100 120 Residue index

Figure 7.7: Contact maps (sequence networks) of the inactive and active conformation of NtrCr. The contacts represented in squares are for the inactive state and the contacts in triangles are for the active state. The set of contacts inside the circles are for β3α3 loop and helix-α4. Note that for α4 and β3α3 loop there are more contacts in the active state than the inactive state. 137

model [86], in a similar way to Eq. 6.1.

kI N NI 2 εi(α0)I→A = (| ri,j(α0) |−| ri,j |) . (7.3) 4 I jX∈[ij] Here, the spring constant kI determines the rigidity of the molecule and we assign

r I kI = 1, for all the residues of I-NtrC for simplicity. [ij] denotes the set of contacts

of the inactive NtrCr and the sum over j is taken for these contacts. In the above

N N N N N expression, | ri,j(α0) |=| ri (α0) − rj (α0) |, where ri (α0) and rj (α0) are the linearly

interpolated coordinates of the Cα position vectors of the ith and jth residue of the

r NI NI NI NtrC conformations, respectively (see Eq. 3.8). Similarly, | ri,j |=| ri − rj |, where

NI NI α th th ri and rj are the C position vectors of the i and j residue of the inactive state,

respectively. For structural deformation of the inactive NtrCr the Cα position vectors

are interpolated by varying the parameter α0 in a small step of 0.01, linearly from

the inactive (α0 = 1) → active state (α0 = 0) conformation (see Eq. 3.8).

The strain energy distribution of each residue εi(α0)I→A plotted in Fig. 7.8 shows

there are many regions in NtrCr which are under relatively high strain during the

inactive → active structural change. This includes residues from β1, α1β2 loop, part

of α2β3 loop and helix-α5. Although, residues in the β3α3 loop and helix-α4 have

significant changes in flexibility upon activation, these segments are under much lower

strain compare to other residues of the protein. We also note that the magnitude of

the strain for NtrCr is much lower than the magnitude of the strain calculated for

CaM discussed in the previous chapter.

The possibility of cracking of α4 during the inactive → active conformational tran-

sition suggest that residues from α4 should be under high strain. Recent simulation 138

α α α α α (a) β1 1 β2 2 β3 3 β4 4 β5 5 I 1 2.5 0.8 2 0.6 1.5 0

α 0.4 1 0.2 0.5

A 0 0 Strain energy [a.u] 1 21 41 61 81 101 121 Residue index α α α α α (b) β1 1 β2 2 β3 3 β4 4 β5 5 2 α 0 = 0.35 (Intermediate state) 1.5 1

0.5

Strain energy [a.u] 0 1 21 41 61 81 101 121 Residue index

Figure 7.8: Residue strain energy distribution for linearly interpolated NtrCr struc- tures. Strain energy of the inactive NtrCr vs residue index during the deformation from the inactive → active state conformation for different values of the interpolation parameter α0, (a). Strain energy of helix-α4 is enclosed inside the dotted curves in (a). Residue strain energy distribution at an intermediate state, α0 =0.35, (b). 139 and experimental studies by Kern and his group argued that instead of cracking of α4, some non-native hydrogen bonds which form transiently during the inactive/active conformational change of NtrCr lower the free energy barrier of this transition. The predicted free energy barrier from our I → A transition is much lower than what found by other groups, even without the influence of non-native contacts or relieving stress through cracking of helix-α4. Kern and coworkers [160] found from fluorescence un-

r folding experiments, that α4 cooperatively unfolds with the NtrC protein and argue that this ruled out any possibility of cracking of α4 during the I → A transition. Our

r model predicts that α4 folds very late in the folding route to I-NtrC . This accounts for the high flexibility of the α4 in the inactive state and indicates a significant change in the stability of α4 instead of cracking.

7.4.5 Comparison with experimentally predicted transition mechanism

Volkman et al. [13] have predicted the correlation between backbone dynamics pa- rameters and conformational switching mechanism of NtrCr upon activation from

NMR relaxation experiments. The value of the exchange parameter Rex from these experiments indicates motions that occur on the microsecond to millisecond timescale and reflect a large difference in chemical shift between the exchanging species [13].

The data reproduced in Fig. 7.9(a) show large exchange rates for the β3α3 loop and the switch region ‘3445’ face (α3β4α4β5 segment).

Since Rex is proportional to the difference in the chemical shift between the two metastable states, we first compare Rex with predicted local structural changes. In 140

β1 α1 β2 α2 β3 α3 β4 α4 β5 α5 (a) 20

-1 15 10 ex

R (s ) 5 0 (b) 0 20 40 60 80 100 120 1.5 1

] 0.5 2 [a

i 0 B

∆ -0.5 -1 -1.5 (c) 1 21 41 61 81 101 121 5 4

] 3 kT

[ 2 i A

E 1 ∆ 0 -1 -2 1 21 41 61 81 101 121 (d) 1 0.8 0.6

A

i 0.4 ρ

∆ 0.2 0 -0.2 -0.4 1 21 41 61 81 101 121 Residue index

Figure 7.9: Comparison of the experimentally measured backbone conformational dynamics of NtrCr with parameters calculated from the variational model. (a) Ex- change parameter Rex vs residue index from NMR relaxation study [13]. Adapted from [14]. The blue dots indicate Rex data for residues with larger than a threshold in the NMR experiments [13]. (b) Difference in the mean square fluctuations ∆Bi between the inactive and active conformations of each residue. (c) Difference in the A energy per residue ∆Ei of the active conformation for two end values of the inter- A polation parameter α0. (d) Native density difference of the active conformation ∆ρi for two end values of α0. 141 our simple model, such changes have an entropic or energetic origin. Fig. 7.9(b) shows the difference in fluctuations of each residue between the active and the inactive state,

I A ∆Bi = Bi (α0 = 1) − Bi (α0 = 0). Large differences in flexibility upon activation of

r NtrC occur primarily for residues from the β3α3 loop and the segment α4β5. Although we see a large change in fluctuations in the C-terminal end of α5, this is mainly due to the stabilization of the flexible termini of the protein. To represent the energetic differences between the two conformations, we calculate the energy per residue of the active conformation (using Eq. 3.26)

A Ei [{C}, α0]= ijuij, (7.4) A j∈X[ij] where [ij]A is the native contacts of the A-NtrCr. The difference in energy between the

A A A active and inactive conformation, ∆Ei = Ei (α0 = 1) − Ei (α0 = 0) for each residue is plotted in Fig.7.9(c). The β3α3 loop and α4β5 segment, which showed significant entropic changes upon activation, also have significant energetic changes as well. In

A addition ∆Ei also reveal large change for some residues from the α3β4 segment, α5,

β3, α1 and β1, more consistent with the experimental Rex. Finally, we calculate the

A A A difference in the native density of the active state, ∆ρi = ρi (α0 = 0) − ρi (α0 = 1),

A A where ρi (α0) is calculated from Eq. 3.33. The quantity ρi measures the similarity of the structural change of NtrCr with respect to the native structure of active state

A during the I → A state conformational transition. ∆ρi in Fig. 7.9(c) clearly reveals large conformational change of the active state mainly for the α5 [although not de- tected from experiments as shown in Fig. 7.9(a)] other than the β3α3 loop and α4β5 segment. 142

7.5 Discussion

Molecular dynamics and other types of simulation studies of NtrCr by several other groups have been performed to investigate the detail mechanism of conformational change upon activation. Interestingly, it has been noted that results from different simulations apparently do not agree on the key mechanistic features of the inac- tive/active transition of NtrCr [72].

L¨atzer et al. [155] suggested that the functionally important helix α4 partially unfolds and refolds during the transition in order to lower an unreasonably high calculated barrier of ∼90 kBT . This estimate is obtained from a mixture of sim- ple quadratic approximation of the inactive and active conformations. A new and promising pathway algorithm, the string method [164] developed by Pan et al. [165]

α found a barrier of about 50 kBT , using a two-state C atoms elastic network model similar to L¨atzer et al. [155]. Later, Vanden-Eijneden and Venturoli [166] used the

finite temperature string method with the same model in Ref. [165], obtained a

r barrier for the transition of NtrC of ∼15 kBT .

More detail descriptions of the inactive/active transition of NtrCr have been also reported from several other simulations studies. Hu and Wang [151] used the target molecular dynamics method and found significant difference in the flexibility between the two states of NtrCr with A-NtrCr being more flexible in agreement with NMR experiments [13]. Additionally, they also reported the functional role of the β3α3

r loop other than helix-α4 [151] for the activation mechanism of NtrC . Khalili and

Wales [156] studied the transition of NtrCr with the CHARMM force field and EEF1 143 implicit solvation model and reported several steps for the activation of NtrCr that involve movement of α4 followed by movement of α2 and then a flip of the β3α3 loop. Damjanovic et al. [158] using self-guided Langevin dynamics find a partial

r transition of NtrC and suggested unfolding of helix-α4. However, this unfolding of α4 is probably due to their use of an unrefined NMR structures which did not have the full complement of hydrogen bonds, as pointed out in the Ref. [159]. In a recent simulation study using targeted molecular dynamics pathways, Lei et al. [159] have identified several steps along the conformational change pathway of NtrCr and suggested a transition mechanism with a tilt in α4 followed by rotation of α4 and then a flip of the α4β5 loop.

Recently, Itoh and Sasai [14] used the structure based folding model of Mu˜noz and Eaton [28] and predicted the allosteric transition in NtrCr from a combinatorial number of preexisting transition routes with large change in entropy that lowers the free energy barrier of the transition, much consistent with our results presented here.

The calculated barrier height from their study was found to be in the range of ∼ 5–10 kBT [14].

7.6 Conclusions

In this chapter, I presented a detailed model for the allosteric transition mechanism of the protein NtrCr upon activation. Following Ref. [11], the multiple-basin energy model a convenient way to couple the two energy basins of NtrCr that allows the relative population (or, stability) of the two metastable native states of a protein to 144

be adjusted easily.

The results of folding and the I → A conformational transition from this study is overall in good harmony with other studies of NtrCr and other αβ repeat proteins such as CheY. In particular, from the studies by Kern and coworkers [159] it has been concluded that it is mainly the functionally important helix-α4, which plays

a significant role with several structural rearrangements in the activation of NtrCr.

Comparison of the predicted folding routes to the inactive and active structures from

our study in this chapter suggests that α4 undergoes a significant change in its sta-

bility upon activation. In addition, the I → A state conformational transition results

presented here reveal that the β3α3 loop, as well as the α4 helix play a crucial role in

the activation of NtrCr.

Finally, the use of different NMR structures may account for some of the inconsis-

tencies reported in the literature. In particular, large fluctuations and local unfolding

may in fact be due to the poorly resolved regions of the protein in some of these

structures. CHAPTER 8

CONCLUSIONS

The primary goal of the research presented in this dissertation is to provide a ba- sic framework to understand protein functions that involve large-scale (main-chain) dynamics and flexibility. To this end, I focus on the detailed transition mechanisms of specific proteins where conformational flexibility is essential to function. Under- standing the key mechanisms that control protein dynamics will help clarify the pro- tein sequence-structure-function relationship and eventually lead to the treatments of protein-related disease and novel drug design.

Toward the goal, I have proposed novel methods for developing realistic analytical coarse-grained protein models to investigate the role of inherent flexibility that deter- mines the mechanisms of complex structural changes and allostery in proteins. These methods are designed within the general statistical mechanical framework called the

“energy landscape theory” [109, 167]. This work provides a solid foundation for ex- ploring the mechanism of protein-protein binding, and coupling between folding and binding from the energy landscape perspective. In the near future, these approaches will aid to other theoretical/simulation and experimental techniques for a complete characterization of protein landscapes.

In this dissertation, a variational model of protein folding is extended [26, 27,

104,107] to accommodate the free energy minima relevant to transitions between two

145 146 known folded protein conformations. The results reported in this dissertation provide a residue level description of the transition state ensemble for an allosteric transition between two known conformations. This characterization captures the structural properties such as degree of order and flexibilities that controls the mechanism of protein conformational transitions.

One particular mechanism of protein conformational transitions that was explored in this research is called “cracking”. Cracking is a mechanism of local unfolding and refolding in the specific regions of proteins that relives high elastic stress. Although not an explicit element of our variational model of conformational transitions, cracking can emerge naturally from the anharmonic nature of this model. The work outlined in this dissertation illustrates that cracking is more likely to occur for relatively rigid pro- tein structures. The transient increase in flexibility modifies the network of contacts, thereby lowering the free energy transition barrier and promoting for faster transi- tional kinetics [10]. The relationship between cracking and flexibility demonstrated in this dissertation emphasizes the functional importance of protein flexibility.

In this dissertation, I also explored the relationship between the mechanism of protein folding and conformational flexibility. G¯o-models, like the variational model presented here, are based on the insight that folding mechanisms are determined by the topology of the native state. It is perhaps more obvious that conformational

flexibility of a protein in the folded basin is determined by its structure. Here, I extend this point of view to modeling conformational transition in which the flexibil- ity adjusts to structural changes. Some details of the transition mechanism can be 147

expected to depend on sequence as well. In fact, combined study of protein folding

and conformational transition mechanisms in this dissertation emphasizes the fact

that both topology and sequence contribute to defining the free energy landscapes of

proteins.

In conclusion, the successes in characterizing the mechanism protein functional transitions underline the importance of the simplified protein models, when accom- panied with novel theoretical and experimental techniques, to give a complete de- scription of functional motions of an allosteric protein. Such low resolution protein models can provide a strong basis to investigate the complexity in biological processes by bridging the gap between all-atom protein models and wet lab experiments. CHAPTER 9

APPENDIX

9.1 The Stiff Chain Model for Polypeptide Backbone

In the variational method, a collapsed stiff chain is modeled by a backbone Hamilto- nian given in Eq. 3.2 in Chapter 3. The first term in that equation enforces the chain connectivity and the correlations of monomer positions are given by the quadratic

2 −1 coefficients, hri · rji/a = [Γ]ij , where a is a microscopic length scale taken to be the mean square distance between adjacent monomers, such that for the ith bond vector ai =(ri+1 − ri). The angle between successive bond vector is given by θ and the stiff

2 l chain model is defined by the correlations hai · ai+1i/a = g , where g = cos θ. The chain stiffness is treated harmonically through the persistence length l. For simplicity, here the chain stiffness is assumed to be uniform so that the persistence length is re- lated to the chain stiffness by l ≈ a/(1 − g). A reasonable value for the chain stiffness of protein is g = 0.8 (with a =3.8 A)˚ which corresponds to the persistence length of polyalanine, l =5a ≈ 20 A.˚ The inverse of the monomer position correlations [168]

1 − g g g2 Γ= KR + [KR]2 − ∆, (9.1) 1+g 1 − g2 1 − g2

148 149

where KR is the Rouse matrix for a nearest-neighbor harmonic chain

1 −1 ··· 0

. . −1 2 −1 .

R . . . K = ...... , (9.2)

. . −1 2 −1

0 ··· −1 1

and ∆ accounts for the “boundaries” at the end of the chain

1 −1 ··· 0

. . −1 1 .

∆= . (9.3)

. . 1 −1

0 ··· −1 1

150

9.2 Variational Free Energy Approximation

The variational free energy in Eq. 3.17 resembles the standard thermodynamic per- turbation theory and can approximate the partition function with the help of the reference Hamiltonian as follows (using Eq. 3.5):

Z = dri exp(−βH) i Z Y = dri exp[−β(H − H0)] exp(−βH0) i Z Y ≈ dri exp(−βH0)[1 − β(H − H0)] i Z Y ≈ Z0 − βZ0hH−H0i0

≈ Z0(1 − βhH−H0i0)

≈ Z0 exp(−βhH−H0i0). (9.4) BIBLIOGRAPHY

[1] Villarreal, M. Main protein structure levels. http://commons.wikimedia. org/wiki/File:Main_protein_structure_levels_en%.svg. [2] Brooks 3rd, C., Onuchic, J., and Wales, D. Statistical thermodynamics. taking a walk on a landscape. Science 293, 612–613 (2001). [3] Smock, R. and Gierasch, L. Sending signals dynamically. Science 324, 198–203 (2009). [4] Henzler-Wildman, K. and Kern, D. Dynamic personalities of proteins. Nature 450, 964–972 (2007). [5] Gsponer, J., Christodoulou, J., Cavalli, A., Bui, J., Richter, B., Dobson, C., and Vendruscolo, M. A coupled equilibrium shift mechanism in calmodulin- mediated signal transduction. Structure 16, 736–746 (2008). [6] Berg, J., Tymoczko, J., and Stryer, L. . 5th, volume None. New York: W. H. Freeman and Co, (2002). [7] Humphrey, W., Dalke, A., and Schulten, K. Vmd: Visual molecular dynamics. J. Mol. Graphics 14, 33–38 (1996). [8] Activation of glna transcription by ntrc. http://www.web-books.com/MoBio/ Free/Ch4D5.htm. [9] Tama, F. and Sanejouand, Y. Conformational change of proteins arising from normal mode calculations. Protein Eng. 14, 1 (2001). [10] Miyashita, O., Onuchic, J. N., and Wolynes, P. G. Nonlinear elasticity, protein- quakes, and the energy landscape of functional transitions in proteins. Proc. Natl. Acad. Sci. USA 100, 12570–12575 (2003). [11] Okazaki, K., Koga, N., Takada, S., Onuchic, J. N., and Wolynes, P. G. Multiple- basin energy landscapes for large amplitude conformational motions of proteins: Structure-based molecular dynamics simulations. Proc. Natl. Acad. Sci. USA 103, 11844–11849 (2006). [12] Shen, T., Zong, C., Portman, J., and Wolynes, P. Variationally determined free energy profiles for structural models of proteins: characteristic temperatures for folding and trapping. J. Phys. Chem. B 112, 6074–6082 (2008). [13] Volkman, B., Lipson, D., Wemmer, D., and Kern, D. Two-state allosteric behavior in a single-domain signaling protein. Science 291, 2429–2433 (2001).

151 152

[14] Itoh, K. and Sasai, M. Entropic mechanism of large fluctuation in allosteric transition. Proc. Natl. Acad. Sci. USA 107, 7775–7780 (2010). [15] Frauenfelder, H., Sligar, S. G., and Wolynes, P. G. The energy landscapes and motions of proteins. Science 254, 1598–1603 (1991). [16] Gerstein, M., Lesk, A. M., and Chothia, C. Structural mechanisms for domain movement in proteins. Biochemistry 33, 6739–6749 (1994). [17] Anfinsen, C., Haber, E., Sela, M., and White, F. H. The kinetics of formation of native ribnuclease during oxidation of the reduced polypeptide chain. Proc. Natl. Acad. Sci. USA 47, 1309–1314 (1961). [18] Anfinsen, C. A. Principles that govern the folding of protein chains. Science 181, 223–230 (1973). [19] Levinthal, C. Are there pathways for protien folding? J. Chem. Phys. 85, 44–45 (1968). [20] Bryngelson, J. D. and Wolynes, P. G. Spin glasses and the statistical mechan- sims of protein folding. Proc. Natl. Acad. Sci. USA 84, 7524–7528 (1987). [21] Bryngelson, J. D. and Wolynes, P. G. Intermediates and barrier crossing in a random energy model (with applications to protein folding). J. Phys. Chem. 93, 6902–6915 (1989). [22] Leopold, P. E., Montal, M., and Onuchic, J. N. Protein folding funnels: A kinetic approach to the sequence-structure relationship. Proc. Natl. Acad. Sci. USA 89, 8721–8725 (1992). [23] Chan, H. S. and , K. A. Protein folding in the landscape perspective: chevron plots and non-arrhenius kinetics. Proteins Struct. Funct. Genet. 30, 2–33 (1998). [24] Onuchic, J. N., Luthey-Schulten, Z., and Wolynes, P. G. Theory of protein folding: The energy landscape perspective. Annu. Rev. Phys. Chem. 48, 545– 600 (1997). [25] G¯o, N. Theoretical studies of protein folding. Annu. Rev. Biophys. Bioeng. 12, 183–210 (1983). [26] Portman, J. J., Takada, S., and Wolynes, P. G. Variational theory for site resolved protein folding free enery surfaces. Phys. Rev. Lett. 81(23), 5237–5240 (1998). [27] Portman, J. J., Takada, S., and Wolynes, P. G. Microscopic theory of protein folding rates.i: Fine structure of the free energy profile and folding routes form a variational approach. J. Chem. Phys. 114(11), 5069–5081 (2001). 153

[28] Mu˜noz, V. and Eaton, W. A. A simple model for calculating the kinetics of protein folding from three-dimensional structures. Proc. Natl. Acad. Sci. USA 96, 11311–11316 (1999). [29] Alm, E. and Baker, D. Prediction of protein-folding mechanisms from free- energy landscapes derived from native structures. Proc. Natl. Acad. Sci. USA 96, 11305–11310 (1999). [30] Galzitskaya, O. V. and Finkelstein, A. V. A theoretical search for fold- ing/unfolding nuclei in three–dimensional protein structures. Proc. Natl. Acad. Sci. USA 96, 112999–11304 (1999). [31] Shoemaker, B. A., Wang, J., and Wolynes, P. G. Exploring structures in protein folding funnels with free energy functionals: The transition state ensemble. J. Mol. Biol. 287(3), 675–694 (1999). [32] Koga, N. and Takada, S. Role of native topology and chain length in protein folding. J. Mol. Biol. 313, 171–180 (2001). [33] Tsai, C. J., Kumar, S., and Nussinov, R. Folding funnels, binding funnels, and protein function. Protein Sci. 8, 1181–1190 (1999). [34] Ma, B., Kumar, S., Tsai, C.-J., and Nussinov, R. Folding funnels and binding mechanisms. Protein Engineering 12(9), 713–720 (1999). [35] Tovchigrechko, A. and Vakser, I. How common is the funnel-like energy land- scape in protein-protein interactions? Protein Sci. 10, 1572 (2001). [36] Verkhivker, G. M., Bouzida, D., Gehlhaar, D. K., Rejto, P. A., Freer, S. T., and Rose, P. W. Complexity and simplicity of ligand-macromolecule interactions: the energy landscape perspective. Curr. Opin. Struct. Biol. 12, 197–203 (2002). [37] Wang, J., Zhang, K., Lu, H. Y., and Wang, E. K. Dominant kinetic paths on biomolecular binding-folding energy landscapes. Phys. Rev. Lett. 96, 168101 (2006). [38] Andrew L. Lee, S. A. K. and Wand, A. J. Redistribution and loss of side chain entropy upon formation of a calmodulinpeptide complex. Nat. Struct. Biol. 7, 72–77 (2000). [39] Schotte, F., Soman, J., Olson, J., Wulff, M., and Anfinrud, P. Picosecond time-resolved X-ray crystallography: probing protein function in real time. J. Struct. Biol. 147, 235–246 (2004). [40] Akke, M., Brueschweiler, R., and Palmer III, A. NMR order parameters and free energy: an analytical approach and its application to cooperative calcium (2+) binding by calbindin D9k. J. Am. Chem. Soc. 115, 9832–9833 (1993). 154

[41] Lee, A., Sharp, K., Kranz, J., Song, X., and Wand, A. Temperature Dependence of the Internal Dynamics of a Calmodulin- Peptide Complex. Biochemistry 41, 13814–13825 (2002). [42] Mittermaier, A. and Kay, L. E. New tools provide new insights in nmr studies of protein dynamics. Science 14, 224–228 (2006). [43] Stryer, L. and Haugland, R. Energy transfer: a spectroscopic ruler. Ann. Rev. Biochem 47, 819–46 (1978). [44] Yang, H., Luo, G., Karnchanaphanurach, P., Louie, T., Rech, I., Cova, S., Xun, L., and Xie, X. Protein conformational dynamics probed by single-molecule electron transfer. Science 302, 262 (2003). [45] Myong, S., Stevens, B., and Ha, T. Bridging conformational dynamics and function using single-molecule spectroscopy. Structure 14, 633–643 (2006). [46] Brooks, B. and Karplus, M. Normal modes for specific motions of macro- molecules: application to the hinge-bending mode of lysozyme. Proc. Natl. Acad. Sci. USA 82, 4995 (1985). [47] Levitt, M., Sander, C., and Stern, P. Protein normal-mode dynamics: trypsin inhibitor, crambin, ribonuclease and lysozyme. J. Mol. Biol. 181, 423–447 (1985). [48] Haliloglu, T., Bahar, I., and Erman, B. Gaussian dynamics of folded proteins. Phys. Rev. Lett. 79(16), 3090–3093 (1997). [49] Jacobs, D., Rader, A., Kuhn, L., and Thorpe, M. Protein flexibility predictions using graph theory. Proteins Struct. Funct. Genet. 44, 150–165 (2001). [50] Scheraga, H., Khalili, M., and Liwo, A. Protein-folding dynamics: overview of molecular simulation techniques. Annu. Rev. Phys. Chem. 58, 57–83 (2007). [51] Barton, N. P., Verma, C. S., and Caves, L. S. D. Inherent flexibility of calmod- ulin domains: A normal-mode analysis study. J. Phys. Chem. B 106, 11036– 11040 (2002). [52] Yamniuk, A. P. and Vogel, H. J. flexibility allows for promiscuity in its interactions with target proteins and peptides. Mol. Biotechnol. 27, 33–57 (2004). [53] Ikura, M. and Ames, J. Genetic polymorphism and protein conformational plas- ticity in the calmodulin superfamily: two ways to promote multifunctionality. Proc. Natl. Acad. Sci. USA 103, 1159 (2006). [54] Crivici, A. and Ikura, M. Molecular and structural basis of target recognition by calmodulin. Annu. Rev. Biophys. Biomol. Struct. 24, 85–116 (1995). 155

[55] James, P., Vorherr, T., and Carafoli, E. Calmodulin-binding domains: just two faced or multi-faceted? Trends Biochem. Sci. 20, 38–42 (1995). [56] Kretsinger, R. and Wasserman, R. Structure and evolution of calcium- modulated proteins. CRC Crit. Rev. in Biochem. 8, 119–174 (1980). [57] Seamon, K. and Kretsinger, R. . Metal Ions in Biology VI, John Wiley, New York, (1983). [58] Zhang, M. and Yuan, T. Molecular mechanisms of calmodulin’s functional versatility. Biochem. Cell Biol. 76, 313–323 (1998). [59] Babu, Y. S., Sack, S., Greenhough, T. J., Bugg, C. E., Means, A. R., and Cook, W. J. Three-dimensional structure of calmodulin. Nature 315, 37–40 (1985). [60] Meador, W. E., Means, A. R., and Quiocho, F. A. Modulation of calmodulin plasticity in molecular recognition on the basis of x-ray structures. Science 262, 1718–1721 (1993). [61] Kuboniwa, H., Tjandra, N., Grzesiek, S., Ren, H., Klee, C. B., and Bax, A. Solution structure of calcium-free calmodulin. Nat. Struct. Biol. 2, 768–776 (1995). [62] Chou, J. J., Li, S., Klee, C. B., and Bax, A. Solution structure of ca2+- calmodulin reveals flexible hand-like properties of its domains. Nat. Struct. Biol. 8(11), 990–996 (2001). [63] Nelson, M. and Chazin, W. Structures of EF-hand Ca 2+-binding proteins: diversity in the organization, packing and response to Ca 2+ binding. Biometals 11, 297–318 (1998). [64] Cohen, P. The origins of protein phosphorylation. Nature Cell Biol. 4, 127–130 (2002). [65] Alessi, D. and Zeqiraj, E. The governor: Protein phosphorylation. Biochemist 29, 20 (2007). [66] Johnson, L. and Barford, D. The effects of phosphorylation on the structure and function of proteins. Annu. Rev. Biophys. Biomol. Struct. 22, 199–232 (1993). [67] Stock, A., Robinson, V., and Goudreau, P. Two-component signal transduction. Annu. Rev. of Biochem. 69, 183–215 (2000). [68] Kern, D., Volkman, B., Luginb¨uhl, P., Nohaile, M., Kustu, S., and Wemmer, D. Structure of a transiently phosphorylated switch in bacterial signal trans- duction. Nature 402, 894–898 (1999). 156

[69] Porter, S., North, A., and Kustu, S. Mechanism of transcriptional activation by NtrC. In: Hoch J A, Silhavy T J, editors., Two-component signal transduction. ASM Press, Washington, DC, (1995). [70] Volkman, B., Nohaile, M., Amy, N., Kustu, S., and Wemmer, D. Three- dimensional solution structure of the N-terminal receiver domain of NTRC. Biochemistry 34, 1413–1424 (1995). [71] Hastings, C., Lee, S., Cho, H., Yan, D., Kustu, S., and Wemmer, D. High- Resolution Solution Structure of the Beryllofluoride-Activated NtrC Receiver Domain. Biochemistry 42, 9081–9090 (2003). [72] Bourret, R. Receiver domain structure and function in response regulator pro- teins. Curr. Opin. Microbiol. 13 (2010). [73] Grabarek, Z. Structural basis for diversity of the ef-hand calcium-binding pro- teins. J. Mol. Biol. 359, 509–525 (2006). [74] McCammon, J., Gelin, B., and Karplus, M. Dynamics of folded proteins. Na- ture 267, 585–590 (1977). [75] Karplus, M. and McCammon, J. Molecular dynamics simulations of biomolecules. Nat. Struct. Biol. 9, 646–652 (2002). [76] Levitt, M. and Warshel, A. Computer simulation of protein folding. Nature 253, 694–698 (1975). [77] Warshel, A. and Levitt, M. Theoretical studies of enzymic reactions: dielectric, electrostatic and steric stabilization of the carbonium ion in the reaction of lysozyme. J. Mol. Biol. 103, 227–249 (1976). [78] Tozzini, V. Coarse-grained models for proteins. Curr. Opin. Struct. Biol. 15, 144–150 (2005). [79] Cramer, C. and Truhlar, D. Implicit solvation models: equilibria, structure, spectra, and dynamics. Chem. Rev 99, 2161–2200 (1999). [80] Gerstein, M. and Krebs, W. A database of macromolecular motions. Nucl. Acids Res. 26, 4280 (1998). [81] Krebs, W. G. and Gerstein, M. The morph server: a standardized system for analyzing and visualizing macromolecular motions in a database framework. Nucl. Acids Res. 28, 1665–1675 (2000). [82] Kim, M. K., Jernigan, R. L., and Chirikjian, G. S. Efficient generation of feasible pathways for protein conformational transitions. Biophys. J. 83, 1620– 1630 (2002). 157

[83] Go, N., Noguti, T., and Nishikawa, T. Dynamics of a small globular protein in terms of low-frequency vibrational modes. Proc. Natl. Acad. Sci. USA 80, 3696–3700 (1983). [84] Brooks, B. and Karplus, M. Harmonic dynamics of proteins: normal modes and fluctuations in bovine pancreatic trypsin inhibitor. Proc. Natl. Acad. Sci. USA 80, 6571–6575 (1983). [85] Miyashita, O. and Tama, F. Coarse-graining of condensed phase and biomolec- ular systems. CRC, (2008). Voth, G. A. (Ed.), Chpt. 18. [86] Tirion, M. M. Large amplitude elastic motions in proteins from a single- parameter, atomic analysis. Phys. Rev. Lett. 77, 1905–1908, Aug (1996). [87] Bahar, I., Atilgan, A. R., and Erman, B. Direct evaluation of thermal fluc- tuations in proteins using asingle parameter harmonic potential. Folding and Design 2, 173 (1997). [88] Flory, P. J. Statistical thermodynamics of random networks. Proc. R. Soc. London 351, 351–378 (1976). [89] Bahar, I., Lezon, T., Yang, L., and Eyal, E. Global Dynamics of Proteins: Bridging Between Structure and Function. Annu. Rev. Biophys. 39, 23–42 (2010). [90] Tama, F., Valle, M., Frank, J., and Brooks, C. Dynamic reorganization of the functionally active ribosome explored by normal mode analysis and cryo- electron microscopy. Proc. Natl. Acad. Sci. USA 100, 9319–9323 (2003). [91] Tama, F. and Brooks, C. L. The mechanism and pathway of ph induced swelling in chlorotic mottle virus. J. Mol. Biol. 318, 733–747 (2002). [92] Ming, D., Kong, Y., Wakil, S., Brink, J., and Ma, J. Domain movements in human fatty acid synthase by quantized elastic deformational model. Proc. Natl. Acad. Sci. USA 99, 7895–7899 (2002). [93] Cui, Q. and Bahar, I. Normal mode analysis: theory and applications to bio- logical and chemical systems. CRC Press, (2006). [94] Miyashita, O., Wolynes, P. G., and Onuchic, J. N. Simple energy landscape model for the kinetics of functional transitions in proteins. J. Phys. Chem. B 109, 1959–1969 (2005). [95] Maragakis, P. and Karplus, M. Large amplitude conformational change in proteins explored with a plastic network model: Adenlate kinase. J. Mol. Biol. 352, 807–822 (2005). [96] Brooks, B., Bruccoleri, R., Olafson, B., et al. CHARMM: A program for macro- molecular energy, minimization, and dynamics calculations. Journal of Com- putational Chemistry 4, 187–217 (1983). 158

[97] Chu, J. and Voth, G. Coarse-grained free energy functions for studying protein conformational changes: a double-well network model. Biophys. J. 93, 3860– 3871 (2007). [98] Clementi, C., Nymeyer, H., and Onuchic, J. N. Topological and energetic factors: What determines the structural details of the transition state ensemble and ”en-route” intermediates for protein folding? and investigation for small globular proteins. J. Mol. Biol. 298, 937–953 (2000). [99] Karanicolas, J. and Brooks III, C. The origins of asymmetry in the folding transition states of protein L and protein G. Protein Sci. 11, 2351–2361 (2002). [100] Best, R. B., Chen, Y. G., and Hummer, G. Slow protein conformational dy- namics from multiple experimental structures: The helix/sheet transition of arc repressor. Structure 13, 1755–1763 (2005). [101] Zheng, W., Brooks, B., and Hummer, G. Protein conformational transitions ex- plored by mixed elastic network models. Proteins Struct. Funct. Bioinfomatics 69, 43–57 (2007). [102] Rabl, C., Martin, S., Neumann, E., and Bayley, P. Temperature jump kinetic study of the stability of apo-calmodulin. Biophys. Chem. 101, 553–564 (2002). [103] Lundstrom, P., Mulder, F. A. A., and Akke, M. Corrleated dynamics of consec- utive residues reveal transient and cooperative unfolding of secondary structure in proteins. Proc. Natl. Acad. Sci. USA 102, 16984–16989 (2005). [104] Portman, J. J., Takada, S., and Wolynes, P. G. Microscopic theory of protein folding rates.ii: Local reaction coordinates and chain dynamics. J. Chem. Phys. 114(11), 5082–5096 (2001). [105] Shen, T., Hofmann, C. P., Oliveberg, M., and Wolynes, P. G. Scanning mal- leable transition state ensembles: Comparing theory and experiment for folding protein u1a. Biochemistry 44, 6433–6439 (2005). [106] Zong, C., Wilson, C., Shen, T., Wolynes, P., and Wittung-Stafshede, P. Φ-Value analysis of apo-azurin folding: Comparison between Experiment and Theory. Biochemistry 45, 6458–6466 (2006). [107] Qi, X. and Portman, J. J. Excluded volume , local structural cooperativity , and the polymer physics of protein folding rates. Proc. Natl. Acad. Sci. USA 104, 10841–10846 (2007). [108] Qi, X. and Portman, J. Capillarity-like growth of protein folding nuclei. Pro- ceedings of the National Academy of Sciences 105, 11164 (2008). [109] Bryngelson, J. D., Onuchic, J. N., Socci, N. D., and Wolynes, P. G. Funnels, pathways and the energy landscape of protein folding: a synthesis. Proteins Struct. Funct. Genet. 21, 167–195 (1995). 159

[110] Miyazawa, S. and Jernigan, R. L. Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J. Mol. Biol. 256, 623–644 (1996). [111] Lu, Q. and Wang, J. Single molecule conformational dynamics of adenylate ki- nase: energy landscape, structural correlations, and transition state ensembles. J. Am. Chem. Soc. 130, 4772–4783 (2008).

[112] Wales, D. J. Rearrangements of 55-atom lennard-jones and c60)55 clusters. J. Chem. Phys. 101(5), 3750–3762 (1994). [113] Kabsch, W. and Sander, C. xxx. Biopolymers 22, 2577–2637 (1983). [114] Tripathi, S. and Portman, J. Inherent flexibility and protein function: The open/closed conformational transition in the N-terminal domain of calmodulin. J. Chem. Phys. 128, 205104 (2008). [115] Nelson, M. R. and Chazin, W. J. An interaction-based analysis of calcium- induced conformational changes in ca(2+) sensor proteins. Protein Sci. 7, 270– 282 (1998). [116] Baber, J. L., Szabo, A., and Tjandra, N. Analysis of slow interdomain motion of macromolecules using nmr relaxation data. J. Am. Chem. Soc. 123, 3953–3959 (2001). [117] Malmendal, A., Evanas, J., Forsen, S., and Akke, M. Structural dynamics in the c-terminal domain of calmodulin at low calcium levels. J. Mol. Biol. 293(4), 883–899 (1999). [118] Ishima, R. and Torchia, D. A. Protein dynamics from nmr. Nat. Struct. Biol. 7, 740–743 (2000). [119] Evenas, J., Forsen, S., Malmendal, A., and Akke, M. Backbone dynamics and energetics of a calmodulin domain mutant exchanging between closed and open conformations. J. Mol. Biol. 289, 603–617 (1999). [120] Grabarek, Z. Structure of a trapped intermediate of calmodulin: Calcium regu- lation of ef-hand proteins from a new perspective. J. Mol. Biol. 346, 1351–1366 (2005). [121] Zhang, M., Tanaka, T., and Ikura, M. Calcium-induced conformational transi- tion revealed by the solution structure of apo calmodulin. Nat. Struct. Biol. 2, 758–767 (1995). [122] Wriggers, W., Mehler, E., Pitici, F., Weinstein, H., and Schulten, K. Structure and dynamics of calmodulin in solution. Biophys. J. 74, 1622–1639 (1998). [123] Vigil, D., Gallagher, S. C., Trewhella, J., and Garcia, A. E. Functional dy- namics of the hydrophobic cleft in the n-domain of calmodulin. Biophys. J. 80, 2082–2092 (2001). 160

[124] Zuckerman, D. M. Simulation of an ensemble of conformational transitions in a united-residue model of calmodulin. J. Phys. Chem. B 108, 5127–5137 (2004). [125] Chen, Y.-G. and Hummer, G. Slow conformational dynamics and unfolding of the calmodulin c-terminal domain. J. Am. Chem. Soc. 129(9), 2414–2415 (2007). [126] Russell, R. B. and Barton, G. J. Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels. Proteins Struct. Funct. Genet. 14, 309–323 (1992). [127] Likic, V. A., Strehler, E. E., and Gooley, P. R. Dynamics of ca2+-saturated calmodulin d129n mutant studied by multiple molecular dynamics simulations. Protein Sci. 12, 2215–2229 (2003). [128] Strynadka, N. C. J. and James, M. N. Crystal structures of the helix-loop-helix calcium-binding proteins. Annu. Rev. Biochem. 58, 951–998 (1989). [129] Siivari, K., Zhang, M., Arthur G. Palmer, I., and Vogel, H. J. Nmr studies of the methionine methyl groups in calmodulin. FEBS Lett. 366, 104–108 (1995). [130] Zong, C., Wilson, C. J., Shen, T., Wittung-Stafshede, P., Mayo, S. L., and Wolynes, P. G. Establishing the entatic state in folding metallated pseudomonas aeruginosa azurin. Proc. Natl. Acad. Sci. USA 104, 3159–3164 (2007). [131] Shoemaker, B. A., Portman, J. J., and Wolynes, P. G. Speeding molecular recognition by using the folding funnel: the fly-casting mechanism. Proc. Natl. Acad. Sci. USA 97(16), 8868–8873 (2000). [132] Weber, G. Ligand binding and internal equilibiums in proteins. Biochemistry 11, 864–878 (1972). [133] Tripathi, S. and Portman, J. Inherent flexibility determines the transition mech- anisms of the EF-hands of calmodulin. Proc. Natl. Acad. Sci. USA 106, 2104– 2109 (2009). [134] Onuchic, J. and Wolynes, P. Theory of protein folding. Curr. Opin. Struct. Biol. 14, 70–75 (2004). [135] Tsalkova, T. N. and Privalov, P. L. Thermodynamic study of domain organi- zation in c and calmodulin. J. Mol. Biol. 181, 533–544 (1985). [136] Linse, S., Helmersson, A., and Forsen, S. Calcium binding to calmodulin and its globular domains. J. Biol. Chem. 266, 8050–8054 (1991). [137] Whitford, P. C., Miyashita, O., Levy, Y., and Onuchic, J. N. Conformational transitions of adenylate kinase: Switching by cracking. J. Mol. Biol. 366, 1661– 1671 (2007). 161

[138] Barbato, G., Ikura, M., Kay, L. E., Pastor, R. W., and Bax, A. Backbone dynamics of calmodulin studied by 15n relaxation using inverse detected two- dimensional nmr spectroscopy: The central helix is flexible. Biochemistry 31, 5269–5278 (1992). [139] van der Spoel, D., de Groot, B. L., Hayward, S., Berendsen, H. J., and Vogel, H. J. Bending of the calmodulin central helix: A theoretical study. Protein Sci. 5, 2044–2053 (1996). [140] Tjandra, N., Kuboniwa, H., Ren, H., and Bax, A. Rotational dynamics of calcium-free calmodulin studied by l5 N NMR relaxation measurements. Eur. J. Biochem 230, 1014–1024 (1995). [141] Lakowski, T., Lee, G., Lelj-Garolla, B., Okon, M., Reid, R., and McIntosh, L. Peptide Binding by a Fragment of Calmodulin Composed of EF-Hands 2 and 3. Biochemistry 46, 8525–8536 (2007). [142] Chattopadhyaya, R., Meador, W., Means, A., and Quiocho, F. Calmodulin structure refined at 1.7 A˚ resolution. J. Mol. Biol. 228, 1177–1192 (1992). [143] Baley, P. M., Findlay, W. A., and Martin, S. R. Target recognition by calmod- ulin: Dissecting the kinetics and affinity of interaction using short peptide se- quences. Protein Sci. 5, 1215–1228 (1996). [144] Barth, A., Martin, S. R., and Bayley, P. M. Specificity and symmetry in the interaction of calmodulin domains with the myosin light chain kinase target sequence. J. Biol. Chem. 273, 2174–2183 (1998). [145] Masino, L., Martin, S. R., and Bayley, P. M. Ligand binding and thermody- namic stability of a multidomain protein, calmodulin. Protein Sci. 9, 1519–1529 (2000). [146] Evenas, J., Malmendal, A., and Akke, M. Dynamics of the transition between open and closed comformations in a calmodulin c-terminal domain mutant. Structure 9, 185–195 (2001). [147] Nakayama, S., Moncrief, N. D., and Kretsinger, R. H. Evolution of ef-hand calcium-modulated proteins. ii. domains of several subfamilies have diverse evo- lutionary histories. J. Mol. Evol. 34, 416–448 (1992). [148] Lakowski, T. M., Lee, G. M., Okon, M., Reid, R. E., and McIntosh, L. P. Calcium-induced folding of a fragment of calmodulin composed of ef-hands 2 and 3. Protein Sci. 16, 1119–1132 (2007). [149] Whitford, P., Onuchic, J., and Wolynes, P. Energy landscape along an enzy- matic reaction trajectory: hinges or cracks? HFSP J. 2, 61–64 (2008). [150] De Carlo, S., Chen, B., Hoover, T., Kondrashkina, E., Nogales, E., and Nixon, B. The structural basis for regulated assembly and function of the transcrip- tional activator NtrC. Genes Dev. 20, 1485–1495 (2006). 162

[151] Hu, X. and Wang, Y. Molecular dynamic simulations of the N-terminal receiver domain of NtrC reveal intrinsic conformational flexibility in the inactive state. J. Biomol. Struct. Dyn. 23, 509–518 (2006). [152] Hills Jr, R. and Brooks III, C. Insights from Coarse-Grained G¯oModels for Protein Folding and Dynamics. Int. J. Mol. Sci. 10, 889–905 (2009). [153] Hills Jr, R., Kathuria, S., Wallace, L., Day, I., Brooks III, C., and Matthews, C. Topological frustration in [beta][alpha]-repeat proteins: Sequence diversity modulates the conserved folding mechanisms of [alpha]/[beta]/[alpha] sandwich proteins. J. Mol. Biol. 398, 332–350 (2010). [154] Itoh, K. and Sasai, M. Multidimensional theory of protein folding. J. Chem. Phys. 130, 145104 (2009). [155] Latzer, J., Shen, T., and Wolynes, P. Conformational switching upon phospho- rylation: a predictive framework based on energy landscape principles. Bio- chemistry 47, 2110–2122 (2008). [156] Khalili, M. and Wales, D. Pathways for conformational change in nitrogen regulatory protein C from discrete path sampling. J. Phys. Chem. B 112, 2456–2465 (2008). [157] Liu, M., Todd, B., Yao, S., Feng, Z., Norton, R., and Sadus, R. Coarse- grained dynamics of the receiver domain of NtrC: Fluctuations, correlations and implications for allosteric cooperativity. Proteins Struct. Funct. Bioinfomatics 73, 218–227 (2008). [158] Damjanovi&cacute, A. et al. Self-guided Langevin dynamics study of regula- tory interactions in NtrC. Proteins Struct. Funct. Bioinfomatics 76, 1007–1019 (2009). [159] Lei, M., Velos, J., Gardino, A., Kivenson, A., Karplus, M., and Kern, D. Seg- mented transition pathway of the signaling protein nitrogen regulatory protein C. J. Mol. Biol. 392, 823–836 (2009). [160] Gardino, A., Villali, J., Kivenson, A., Lei, M., Liu, C., Steindel, P., Eisenmesser, E., Labeikovsky, W., Wolf-Watz, M., Clarkson, M., and Kern, D. Transient non-native hydrogen bonds promote activation of a signaling protein. Cell 139, 1109–1118 (2009). [161] Yan, D., Cho, H., Hastings, C., Igo, M., Lee, S., Pelton, J., Stewart, V., Wem- mer, D., and Kustu, S. Beryllofluoride mimics phosphorylation of NtrC and other bacterial response regulators. Proc. Natl. Acad. Sci. USA 96, 14789– 14794 (1999). [162] Maiti, R., Van Domselaar, G., Zhang, H., and Wishart, D. SuperPose: a simple server for sophisticated structural superposition. Nucl. Acids Res. 32, W590– W594 (2004). 163

[163] Gardino, A. and Kern, D. Functional dynamics of response regulators using NMR relaxation techniques. Methods Enzymol. 423, 149–165 (2007). [164] Weinan, E., Ren, W., and Vanden-Eijnden, E. Finite Temperature String Method for the Study of Rare Events. J. Phys. Chem. B 109, 6688–6693 (2005). [165] Pan, A., Sezer, D., and Roux, B. Finding transition pathways using the string method with swarms of trajectories. J. Phys. Chem. B 112, 3432–3440 (2008). [166] Vanden-Eijnden, E. and Venturoli, M. Markovian milestoning with Voronoi tessellations. J. Chem. Phys. 130, 194101 (2009). [167] Dill, K. A. and Chan, H. S. From Levinthal to pathways to funnels. Nat. Struct. Biol. 4(1), 10–19 (1997). [168] Bixon, M. and Zwanzig, R. Optimized rouse-zimm theory for stiff polymer chains. J. Chem. Phys. 68(4), 1896–1902 (1978).