COOPERATIVE ALLOSTERIC LIGAND BINDING IN

A dissertation submitted to Kent State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy

by

Prithviraj Nandigrami

December, 2017

c Copyright

All rights reserved

Except for previously published materials Dissertation written by

Prithviraj Nandigrami

B.Sc., University of Calcutta, 2006

M.Sc., Indian Institute of Technology Bombay, 2008

M.S., Brandeis University, 2010

Ph.D., Kent State University, 2017

Approved by

, Chair, Doctoral Dissertation Committee Dr. John J. Portman

, Members, Doctoral Dissertation Committee Dr. Hamza Balci

, Dr. Bj¨ornL¨ussem

, Dr. Robin Selinger

, Dr. Qi-Huo Wei

Accepted by

, Chair, Department of Physics Dr. James T. Gleeson

, Dean, College of Arts and Sciences Dr. James L. Blank Table of Contents

List of Figures ...... vi

List of Tables ...... xviii

Acknowledgments ...... xix

1 Introduction ...... 1

1.1 Overview ...... 1

1.2 Structure ...... 2

1.3 Energy Landscape Theory ...... 4

1.4 Mechanism of Allostery in ...... 9

1.5 Calmodulin ...... 12

1.6 Organization of Dissertation ...... 15

2 Models of cooperative ligand binding ...... 18

2.1 Introduction ...... 18

2.2 The Hill Equation ...... 19

2.3 for binding two ligands ...... 21

2.4 Other models of cooperativity ...... 25

2.5 The MWC Model ...... 27

2.6 Generalized ensemble view of allostery ...... 31

2.7 Allostery in calmodulin ...... 33

iii 3 Coarse-grained models for conformational dynamics in proteins . 35

3.1 Introduction ...... 35

3.2 Native structure-based models for proteins ...... 36

3.3 An analytic model for CaM ...... 39

3.4 Simulation model for CaM ...... 42

3.4.1 Model for conformational transition ...... 44

3.4.2 Model for Ca2+-binding ...... 45

4 Comparing allosteric transitions in the domains of calmodulin

through coarse-grained simulations ...... 49

Abstract ...... 49

4.1 Introduction ...... 50

4.2 Methods ...... 53

4.3 Conformational Transitions of Isolated Domains ...... 60

4.4 Transition Kinetics ...... 67

4.5 Discussion ...... 68

5 Coarse-grained molecular simulations of allosteric cooperativity . 71

Abstract ...... 71

5.1 Introduction ...... 72

5.2 Methods ...... 74

5.3 Simulations of Binding a Single Ligand ...... 79

5.4 Simulations of Binding Two Ligands ...... 84

5.5 Binding Cooperativity ...... 86

5.6 Molecular Description of Ligand Binding ...... 90

iv 5.7 Concluding Remarks ...... 93

6 Thermodynamic and kinetic representations of cooperative allosteric

ligand binding in intact calmodulin ...... 95

6.1 Introduction ...... 95

6.2 Methods ...... 97

6.3 Binding Thermodynamics ...... 103

6.4 Binding Kinetics ...... 110

6.5 Concluding Remarks ...... 117

7 Outlook and future directions ...... 119

Appendices ...... 124

A Supplement for Chapter 4 ...... 125

A.1 Simulated probability of contact formation ...... 125

B Supplement for Chapter 5 ...... 127

B.1 Ligand-mediated contact pair distribution ...... 127

B.2 One-dimensional simulated free energy ...... 131

B.3 Exploring ligand contact strength and range ...... 132

v List of Figures

1.2.1 Different levels of hierarchy in protein structure. (A) A protein’s pri-

mary structure consists of its amino acid sequence. (B) Secondary

structure consists of organization of helices and sheets (only helix is

shown in the figure). (C) Tertiary structure is illustrated by one of the

four polypeptide chains (subunits) of . Here, N-terminal

represents amino terminus and C-terminal represents carboxyl termi-

nus of the polypeptide chain. (D) Quaternary structure is shown as

the arrangement of multiple polypeptide chains to form the functional

hemoglobin molecule. Adapted from Branden and Tooze (1991)[20]. . 3

1.3.1 Funnel-shaped protein folding landscape. Folding occurs through the

progressive organization of ensembles of structures on a free energy

landscape...... 6

1.3.2 Schematic representation of protein functional motion at the bottom

of the folding funnel. At the bottom of the funnel, protein dynamics

is sensitive to modulation by binding of ligand. The resulting energy

landscape upon binding of a ligand involves redistribution of popula-

tion. Also shown is the stabilization of a low energy conformer upon

binding a ligand...... 8

vi 1.4.1 Schematic representations of macroscopic and microscopic allosteric

binding. (A) In the macroscopic point of view, binding of a ligand sta-

bilizes the ligand-bound ensemble of conformations. (B) Microscopic

picture is more complex with rates determined by the relative stabi-

lization of conformations in the unbound and bound ensembles. . . . 11

1.5.1 Structure of calmodulin in different forms. (A) Ca2+-free structure of

calmodulin (PDB id: 1CFD), and (B) Ca2+-bound structure of calmod-

ulin (PDB id: 1CLL). Upon binding 4 Ca2+-ions to its binding loops,

calmodulin undergoes a large structural change that exposes its hy-

drophobic surface, and thereby enabling calmodulin to bind to target

proteins. (C) NMR structure of calmodulin bound to smooth muscle

myosin light chain kinase (PDB id: 2KOF). The calcium ions are shown

as silver circles. This visualization was made using Visual Molecular

Dynamics (VMD) software[85]...... 14

2.2.1 Schematic representation of Hill equation given by Eq. 2.1 for different

values of the Hill coefficient, nH. The sharpness of the curve increases

for higher values of nH. The x-axis represents ligand concentration,

[X], and y-axis shows fractional occupancy, Y ...... 20

vii 2.3.1 Thermodynamic cycle showing microscopic schematic representation

for binding two ligands of heterogeneous strength. Starting from unli-

gated state ligand binding stabilizes the state with both sites occupied

by ligand. The two routes via which the binding process proceeds

consists of partially bound states where a single site is occupied by a

ligand, while the other site is empty. The final state in the cycle is the

state with both binding sites occupied by ligand...... 22

2.3.2 Dependence of bound probability on the parameter c for a protein with

two binding sites of homogeneous strength. The x-axis represents lig-

and concentration and y-axis represents probability of states with both

sites occupied by ligand (left), and probability of states where only one

site is occupied by ligand (right). Non-cooperative binding corresponds

to c = 1. The binding curve becomes increasingly cooperative and the

peak of population of singly ligated states decreases for higher values

of c...... 23

2.5.1 Schematic representation of shift in population in MWC model. In the

unbound ensemble of conformations the ligand-free closed state has

higher stability in the free energy landscape and it exists in dynamic

equilibrium with the open state. Upon ligand binding, the ligand-

bound open state is stabilized. The x-axis represents reaction coordi-

nate that is used to define the closed and open states of the protein,

and the y-axis represents free energy (in arbitrary units). Also shown

are representative protein structures in the unbound and ligand-bound

ensemble of conformations...... 28

viii 2.5.2 States and corresponding statistical weights for a simplified description

of MWC model of allostery for a protein with two ligand binding sites.

In the unligated ensemble of conformations, the relative stability of the

open and closed states is set by the parameter . The singly-ligated

ensemble of conformations consists of a ligand bound to either the

closed state or the open state. The fully-loaded ensemble comprises of

both ligand bound to either the closed state or the open state of the

protein...... 30

2.6.1 Schematic of ensemble description of models of allosteric cooperativity

for a protein with two subunits. (A) The MWC model of allostery, (B)

KNF model, (C) A more generalized ensemble allosteric model that ac-

commodates all possible microstates of a protein with two binding sites.

Green shaded regions correspond to subunit interaction energy. The

two different shapes correspond to the closed and open conformations

of the protein. Colored shapes correspond to ligand unbound and un-

colored shapes correspond to ligand bound conformations. Blue shaded

regions show the ensemble of states for each framework. Adapted from

the framework developed by Hilser and co-workers[81]...... 32

3.3.1 Distribution of strain energy of residues in CaM domain. (a) and

(b) Change in strain energy for individual residues along the apo → holo structural change of nCaM and cCaM. (c) and (d) Residue strain

energy distributions at an intermediate state for nCaM and cCaM,

respectively. Adapted from the work by Tripathi and Portman[198]. . 43

ix 4.2.1 Aligned structures Ca2+-free (closed/apo) and Ca2+-bound (open/holo)

native conformations for (a) N-terminal domain and (b) C-terminal do-

main of Calmodulin. The closed state (pdb: 1cfd [103] ) is shown in

blue, and the open state (pdb: 1cll[34]) is shown in green. The closed

(apo) and open (holo) conformations of (a) nCaM (residue index 4–75)

consist of helices A, B and C, D with binding loops I and II respectively.

The closed (apo) and open (holo) conformations of (b) cCaM (residue

index 76–147) consist of helices E, F and G, H with binding loops III

and IV respectively. Secondary structure legend for nCaM and cCaM

are shown on top of the protein structures. The CaM structures were

made using visual molecular dynamics[85]...... 54

4.2.2 Simulated free energy (in units of kBT ) as a function of the global

progress coordinate ∆Q = Q Q for (a) nCaM and (b) cCaM. 57 closed − open 4.2.3 Heat capacity as a function of temperature for cCaM (red) and nCaM

(blue) for two relative stabilities of the open and closed basins. The

solid curves correspond to equally stable open and closed basins, and

in the dashed curves the open state occupies approximately 10% of the

total population...... 59

4.3.1 Free energy, in units of kBT , projected onto global order parameters

Qopen, Qclosed, and Q∩ for nCaM (a and c) and cCaM (b and d). The

intermediate in free energy surface of cCaM corresponds to an ensemble

of states with intact secondary structure but lacking stable tertiary

contacts...... 61

x 4.3.2 Local order parameter, q∩(i), plotted as a function of the global progress

coordinate, Q Q , for each residue of (a) nCaM and (b) cCaM. closed − open The color represents the probability of each residue forming native con-

tacts common to both the open and closed structures: low probability

is shown by red and high probability is shown by blue...... 63

4.3.3 Magnitude of the root mean square fluctuations for each residue for the

conformational ensembles along the transition pathway for (a) nCaM

and (b) cCaM. Each color corresponds to the value of ∆Q = Q closed −

Qopen indicated in the legend in (a)...... 64

4.3.4 Free energy, in units of kBT , projected onto global order parameters

Qclosed and Qopen for nCaM (a and c) and cCaM (b and d) with temper-

? ? ature Tsim = 0.89TF and Tsim = 1.08TF. At lower temperatures, the un- folded conformations are destabilized so that the transition mechanism

in both domains becomes more two-state. At higher temperatures, the

unfolded states are stabilized for both nCaM and cCaM...... 66

5.3.1 Simulated binding curves for the individual loops of (A) nCaM and (B)

cCaM. Lines are fits to the two state MWC model given by Eq. 5.6. (C)

Simulated mean number of bound ligands occupancy of binding sites

with two ligands for nCaM (blue) and cCaM (red) as a function of

ligand concentration. The solid lines plot n (µ) = pA0(µ) + p0B(µ) + h b i b b AB 2pb (µ) with probabilities given from the MWC model evaluated with the binding parameters found from fits of binding to each individual

loops...... 80

xi 5.3.2 Simulated free energy as a function of Qclosed and Qopen for binding

loops at ligand concentration c = Kd for nCaM (A,B) and cCaM (C,D).

The set of native contacts in the open and closed conformations are

separated into three groups: those that occur exclusively in either the

open or the closed native structures, and those that are common to

both states. For each of these groups, a local order parameter, qα(i), is

defined as the fraction of native contacts involving the ith residue. For

each of these groups, denoted by α = (open, closed, and ), a local ∩

order parameter, qα(i), is defined as the fraction of native contacts

th formed involving the i residue. Local order parameters qopen(i) and

th qclosed(i) are defined as the fraction of native contacts involving the i

residue that occur exclusively in the open and closed native structures,

respectively. Overall native similarity is monitored by a corresponding

global order parameter, Q = q (i) , where the average is taken over α h α i

the residues of the protein. Global order parameters, Qopen and Qclosed,

are the average of the corresponding local order parameters over the

residues of the protein. The open state ensemble are conformations

with 0.18 Q 0.35 and 0.55 Q 0.75...... 81 ≤ closed ≤ ≤ open ≤

5.3.3 Simultaneous fits of simulation data for a single ligand to pb(µ) and

po(µ) for individual binding loops. Solid curves are plots of pb(µ) and

po(µ) with c and o determined by a simultaneous fit to the simulation

data (shown as points)...... 82

xii A0 0B AB 5.4.1 Populations of ligation states pb (µ) (blue), pb (µ) (green), and pb (µ) (red) plotted as a function of Ca2+concentration for nCaM (top) and

cCaM (bottom). Simulation data shown as points. Solid curves plot

Eq. (5.9–5.11) from the MWC model. Dotted curves show plots of

the non-cooperative induced fit model of binding to independent sites

described by the partition function given in Eq. 5.13. Note some data

points are skipped for clarity...... 87

5.5.1 Thermodynamic cycle for binding two ligands...... 88

5.6.1 Simulated root mean square fluctuations (rmsf) for each residue for

nCaM (top) and cCaM (bottom) calculated at different ligand concen-

trations: high ligand concentration gives the fully saturated ensemble

(blue curve), low ligand concentration gives the unligated ensemble

(red curve), and at Kd (black curve). The rmsf curves are calculated for

each ensemble after aligning to the open native conformation. (Align-

ing to the closed conformation give similar curves.) Also shown is the

reference fluctuations given in Eq. 5.18 (green curve)...... 92

6.2.1 Heat capacity as a function of temperature for N-domain and C-domain

of CaM when the open (holo) state is destabilized to a relative pop-

ulation of 10%. Heat capacity is calculated using the WHAM[105] ≈ method...... 99

6.2.2 Binding and unbinding rates for loop I of N-domain of CaM as a func-

tion of concentration using the non-symmetric Monte Carlo simulation

scheme. The x-axis and y-axis represent concentration and rate, re-

spectively. Loops II, III, and IV show similar behavior (not shown). . 102

xiii 6.2.3 Free energy contours for the N-domain and C-domain in intact CaM.

N-domain shows a dominant two-state transition, while C-domain pop-

ulates an intermediate state showing a three-state transition behavior. 103

6.3.1 Fractional occupancy to the loops of intact domain (red), N-domain

(blue) and C-domain (green). The solid lines represent theoretical

binding curve calculated using the MWC model using the parameters,

c and o, obtained from simulations...... 105

6.3.2 Population of simulated ligation states as a function of concentration.

The points represent simulated data and the solid lines show popu-

lations calculated using the MWC model. Colors represent different

ligation states defined by whether a loop is occupied or empty. . . . . 106

6.3.3 Population of singly, doubly, triply, and fully-loaded ligation states

as a function of concentration for (top) heterogeneous binding loops

and (bottom) homogeneous binding loops. The homogeneous binding

strength is set to be the average values of binding free energies to the

closed and open states of the four binding loops...... 108

6.3.4 Total free energy (in kcal/mol) of simulated ligation states of intact

CaM. The blue lines represent free energy contribution for ligand bind-

ing to individual loops and the red lines show free energy stabilization

due to multi-body cooperative interactions...... 109

xiv 6.4.1 Full flux network for binding transitions showing all possible transitions

at c = Kd (top) and at c > Kd (bottom). The ensemble of states are

grouped by their degree of ligation. The vertical axis represents the

total free energy of stabilization. The width of the arrows represent

the amount of flux through a pair of ligation states. Red represents a

higher probability to bind and green represents a higher probability to

unbind...... 114

6.4.2 Simulated unbinding (left) and binding (right) flux networks at c = Kd.

The ligation states are placed according to their total stabilization.

The width of the arrows represent the amount of flux through a pair

of states. For binding transitions, red represents a higher probability

to bind and green represents a higher probability to unbind. For un-

binding transitions, red represents a higher probability to unbind and

green represents a higher probability to bind...... 116

6.4.3 Simulated flux networks for unbinding transitions (left) at c < Kd

and for binding transitions (right) at c > Kd. The ligation states are

placed according to their total stabilization. The width of the arrows

represent the amount of flux through a pair of states. For binding

transitions, red represents a higher probability to bind and green rep-

resents a higher probability to unbind. For unbinding transitions, red

represents a higher probability to unbind and green represents a higher

probability to bind...... 117

xv A.1.1Simulated contact map for nCaM and cCaM showing the probabil-

ity of formation of contacts between secondary structure elements of

nCaM for the ensemble of conformations in the transition state (A),

and cCaM in the ensemble of conformations in the intermediate state

(B). nCaM shows limited loss of contacts in the transition state en-

semble, while the intermediate of cCaM involves several regions of low

contact probability (highlighted in pink). Color represents the proba-

bility of contact formation. Secondary structures of nCaM and cCaM

are shown along x and y-axis...... 126

B.1.1Ligand-mediated contact pair distribution in the ensemble of open and

closed states for loop I in nCaM. The x-axis represents contact distance

(in A˚ ) and y-axis represents normalized count. The ligand-mediated

contact pairs are (A) 17 — 27, (B) 18 — 22, (C) 19 — 27, (D) 21 —

27, (E) 22 — 28...... 127

B.1.2Ligand-mediated contact pair distribution in the ensemble of open

and closed states for loop II in nCaM. The x-axis represents contact

distance (in A˚ ) and y-axis represents normalized count. The ligand-

mediated contact pairs are (A) 54 — 60, (B) 54 — 61, (C) 55 — 60,

(D) 56 — 60, (E) 56 — 61...... 128

B.1.3Ligand-mediated contact pair distribution in the ensemble of open

and closed states for loop III in cCaM. The x-axis represents contact

distance (in A˚ ) and y-axis represents normalized count. The ligand-

mediated contact pairs are (A) 18 — 27, (B) 18 — 28, (C) 19 — 26,

(D) 19 — 27, (E) 19 — 28, (F) 20 — 27, (G) 20 — 28, (H) 21 — 26. 129

xvi B.1.4Ligand-mediated contact pair distribution in the ensemble of open

and closed states for loop IV in cCaM. The x-axis represents contact

distance (in A˚ ) and y-axis represents normalized count. The ligand-

mediated contact pairs are (A) 54 — 64, (B) 55 — 59, (C) 56 — 64,

(D) 59 — 65, (E) 60 — 64...... 130

B.2.1Simulated free energy for nCaM (A,B,C) and cCaM (D,E,F) corre-

sponding to the ensemble of unligated (top), singly ligated (middle)

and fully saturated (bottom) conformations. The x-axis represents

simulated progress coordinate ∆Q = Q Q and the y-axis closed − open

represents simulated free energy in units of kBT ...... 131

B.3.1Simulated binding curves for loop I of nCaM with varying clig (left)

and σij (right). Consistent behavior is observed for other binding loops

(data not shown)...... 132

xvii List of Tables

5.3.1 Number of ligand-mediated contacts, dissociation constants, and bind-

ing free energies for the loops of CaM...... 79

5.5.1 Simulated microscopic and macroscopic equilibrium constants . . . . 89

6.3.1 Number of ligand-mediated contacts, and binding free energies for the

loops of CaM...... 104

xviii Acknowledgments

I owe my gratitude towards my family members for all their help and support through- out my graduate career. In particular, I would like to mention my mother, Kanika

Nandigrami, and father, Debiprasad Nandigrami, for all their encouragement over the years. My sister, Rupa Nandigrami, has also been a constant source of love, support and inspiration. The fulfillment of my graduate career would not have been possible without my wife, Srijani Chatterjee. She has been a vital source of love, care, support, concern, enthusiasm, and strength during the final phase of my graduate career.

I would like to express my deepest gratitude towards my esteemed advisor, Dr.

John J. Portman. I came to Kent State University with limited knowledge of the fascinating area of biological physics. Dr. Portman introduced me to this area of research and guided me through the thinking process that goes behind the setup of a problem. He taught me scientific writing and provided me with many opportunities to explore, critically analyze, and solve a problem. He was readily available to an- swer questions and provided perceptive and concrete feedback. I am indebted to Dr.

Portman for providing me many opportunities to grow as a scientist.

I am indebted to Dr. Robin Selinger for offering Computational Materials Science course that was crucial for me to get the confidence to learn programming. I sincerely thank her for being a constant source of motivation and positive thinking. I am grateful to Dr. Selinger for providing me the opportunity to work in her group for a summer semester.

xix I would like to mention other faculty members with whom I have had the oppor- tunity to interact. I am grateful to Dr. Declan Keane who motivated me to apply for graduate studies in the Physics department at Kent State. He made the whole transfer process from Brandeis University to Kent State University seamless. I am grateful to Dr. Keane for being my academic advisor and for all his helpful advice.

My special gratitude goes to Kelly Conley, Constance Reho, and other office staffs in the Physics department. I would like to mention Academic Laboratory Manager,

Mr. Greg Putman, for providing a great teaching experience during the semesters in which I was a teaching assistant.

I appreciate the help of my colleagues, Talant Ruzmetov and Daniel Gavazzi, for their time and valuable suggestions.

I would like to acknowledge financial support from the National Science Founda- tion for part of the research presented in this dissertation.

Finally, despite all the assistance provided by Dr. Portman and others, I alone remain responsible for the content of the material in this dissertation, including any errors or omissions which may unwittingly remain.

xx Chapter 1

Introduction

1.1 Overview

Protein molecules are involved in nearly all cellular functions that make life possible.

A typical cell contains approximately 106 types of protein molecules, each with a distinct essential function. Digestive (amylase, lipase, pepsin), for example, break down nutrients in food for absorption; transport proteins, such as hemoglobin, acts as a carrier of in human body; structural proteins compose cytoskeleton; hormone signaling proteins coordinate the activity of different body systems; anti- bodies protect the body from foreign pathogens; and proteins, such as myosin, are responsible for muscle contraction.

Many protein molecules are able to perform specific functions through their ability to bind other molecules with high specificity and exquisite control. Protein ligand interactions provide a ubiquitous strategy that enables nearly all biological processes.

Well-designed binding surfaces have evolved to stabilize protein-ligand complexes that are tuned to the specific functional needs of the cell. Thermodynamic stability, char- acterized by binding equilibrium constants, is one essential property of protein-ligand related functions. Often, protein function requires the ability to switch from ligand unbound population to ligand bound population (and vice-versa) with precise control in response to changes in environment. One of the most important ways proteins enhance sensitivity is by allosteric cooperativity which is often accomplished through a conformational change of the protein.

1 X-ray crystallography measurements provide a picture of a protein as a unique,

static three-dimensional folded structure. Although a protein’s folded structure is

essential to its function, the key to understanding protein-ligand interactions often

lies in its conformational dynamics[60, 65]. Conformational flexibility and dynamics

can even determine its functional specificity[104, 71, 203, 18, 227, 217, 81, 137]. The

dynamic nature of the ensemble of protein conformations is of particular importance

in understanding protein function, activation, and its sensitivity to local external

changes in environment such as ligand concentration. Understanding and quantify-

ing how a protein’s conformational dynamics determines thermodynamic and kinetic

aspects of ligand binding is a central focus of this Dissertation.

Next, I briefly describe protein structures at different levels of its structural hi-

erarchy. The structural classification at various levels arises due to complexity of

molecular level interactions of different structural elements that constitute a protein

molecule.

1.2 Protein Structure

Proteins are polymers composed of a sequence of amino acids. Each of the twenty amino acids consist of a central carbon atom, known as the α carbon, which is con-

nected via a bond to the amino group, a carboxyl group, and a hydrogen atom. A

protein’s structure can be summarized at different levels of a hierarchy: “primary”,

“secondary”, “tertiary”, and “quaternary” structures, as illustrated in Fig. 1.2.1.

The primary structure of a protein, its sequence, is the simplest level of protein struc-

ture. The secondary structure of a protein refers to the local folded structure that

forms within a polypeptide chain. These folded regions typically form due to inter-

actions between monomers through hydrogen bonds for example. Most commonly

2 occurring secondary structure elements in a protein molecule are the α helix and

β sheet first described by Pauling in Ref. [155]. The tertiary structure of a pro- tein is its three-dimensional structure. The tertiary structure is stabilized through non-covalent bonds, including hydrogen bonding, ionic bonding, dipole-dipole inter- actions, and hydrophobic forces. The next level of complexity in hierarchy is the quaternary structure which involves particular spatial arrangement and interactions between two or more polypeptide chains. Quaternary structure essentially describes how different polypeptide chains are assembled into complexes. Not all proteins have

(A) (B) (C) (D)

Sequence of amino acids Helix Long-range interaction Arrangement of multiple of subunits polypeptide chains

Figure 1.2.1: Different levels of hierarchy in protein structure. (A) A protein’s pri- mary structure consists of its amino acid sequence. (B) Secondary structure con- sists of organization of helices and sheets (only helix is shown in the figure). (C) Tertiary structure is illustrated by one of the four polypeptide chains (subunits) of hemoglobin. Here, N-terminal represents amino terminus and C-terminal represents carboxyl terminus of the polypeptide chain. (D) Quaternary structure is shown as the arrangement of multiple polypeptide chains to form the functional hemoglobin molecule. Adapted from Branden and Tooze (1991)[20]. quaternary structure. For example, proteins that exists as single chains, have no quaternary structure. High resolution protein structures are often essential to ratio- nalizing detailed molecular mechanisms that control protein function.

3 1.3 Energy Landscape Theory

In the early 1960s, experiments carried out by Anfinsen demonstrated that a protein’s unique three-dimensional structure is determined solely by its specific amino acid sequence[5, 4]. That is, no other cellular apparatus such as chaperones is needed to help a protein fold. This observation is often summarized as “sequence determines structure”. Because a large number of three-dimensional conformations is available to a typical polypeptide chain, it would apparently take an enormous amount of time to explore phase space to find a particular conformation. This incompatibility with biological requirements is called the “Levinthal’s paradox”[109] named after the scientist who first articulated the discrepancy. Levinthal estimated that if it takes a picosecond to sample a particular conformation, the timescale to fold for a protein through a random search would be of the order of the age of the universe.

This suggests that proteins have evolved efficient search methods to find the folded conformation from the huge conformational phase space. One proposed resolution to this search problem is that proteins may fold through a series of specific well defined steps. Consequently, researchers tried to identify a single pathway (or small set of pathways) that particular proteins take along their way to folding. One problem with this effort is that it focuses attention on any conformation that lives long enough to characterize experimentally (such as proline isomerization), even if there is weak evidence that those states are essential to the folding mechanism.

A new view emerged in the late 1980s and established in subsequent decade known as the “free energy landscape theory”[25, 27, 108, 151, 30, 21]. This alternative per- spective describes folding mechanism not as a single pathway of essential steps to the native state, but rather as a statistical search through a self-organized ensemble of

4 conformations within the energy landscape. A single pathway proposed early on is in some sense replaced by average properties of a large number of pathways. Per- haps somewhat surprisingly, accommodating the large number of pathways simplifies the description of folding with a small set of average statistical properties, or order parameters.

One way to understand the energy landscape of a protein is to compare it with the landscape of a polymer composed of monomers with random competing inter- actions. Random heteropolymers have rugged landscape with lots of different low energy conformations separated by barriers. Finding a unique structure for a random heteropolymer involves searching through a landscape similar to glassy systems with an improbability to find a particular conformation on a reasonable timescale. Unlike random heteropolymers, proteins have evolved to fold into unique, three-dimensional ordered structures determined by their amino acid sequence. The energy landscape theory proposes that naturally occurring proteins have landscapes shaped by an en- ergetic bias toward the native state. This kind of landscape significantly reduces its conformational search enabling a protein to fold in biologically relevant timescales.

Like random heteropolymers, proteins have some degree of ruggedness due to tran- sition barrier and misfolded states that tend to slow down folding kinetics. The energetic bias to the native state gives rise to a energy landscape in the shape of a rugged funnel (Fig. 1.3.1). This bias is known as the “principle of minimum frus- tration” [25, 27], stating that, on average conformations that are “native-like” have lower energy.

Much theoretical, computational, and experimental evidence over the last two decades support the energy landscape theory of protein folding. On the theoretical

5 side, much insight has come from unfrustrated Hamiltonians called G¯o-models with explicit energetic bias to the native state[192, 67]. G¯o-models are the limit where ruggedness play minimal role and the driving force that arises from strong energetic bias to the native state dominates the folding landscape. Analytic models [160, 159,

Unfolded Entropy

Q |E(Q)|

Native folded

Figure 1.3.1: Funnel-shaped protein folding landscape. Folding occurs through the progressive organization of ensembles of structures on a free energy landscape.

139, 2, 179, 163, 177, 178, 61, 62, 59], as well as simulations [173, 42, 99, 93, 152,

94, 113, 114, 58, 168, 35, 70, 229, 162, 78] based on G¯o-model Hamiltonian have been developed to predict protein folding mechanisms that compares well to experi- mental measurement. Native structure based models were first applied to study fast folding proteins, with sub-milisecond folding times[216, 151], and two-state folding

6 proteins[26, 12]. The success of these models to predict the folding mechanism in proteins encouraged researchers to apply it to other systems such as protein-ligand binding allostery[149, 115], coupled folding and binding[113, 205, 63], and folding cooperativity[229, 78].

Energy landscapes provide a useful framework to understand the intricacies of protein function and activation upon ligand binding[202, 122, 181]. Folded proteins sample a broad spectrum of low free energy native conformations at the bottom of the fun- nel. These low energy conformations are in dynamic equilibrium within the folded ensemble with transition rates controlled by free energy barriers[73]. Conformational

flexibility and dynamics among the thermodynamically accessible states enable a pro- tein to statistically respond to changing environmental conditions. For example, the distribution of states may shift in response to interactions with other proteins, or changes in environmental conditions.

Fig. 1.3.2 shows a schematic representation of modulation of the landscape upon binding a ligand. The landscape at the bottom of the funnel consists of a diverse ensemble of conformations. A ligand may favor a particular conformation due to structural compatibility, thereby stabilizing a previously metastable state within the unbound ensemble. The population of conformations at the bottom of the funnel thereby redistributes in response to ligand binding. That is, a conformational switch occurs from unligated to a ligated state upon ligand binding. The re-distribution of low energy conformations due to stabilization of a particular metastable state is a powerful strategy for allosteric control in proteins[72, 45, 92]. In fact, it has been proposed that allosteric conformational changes are a ubiquitous protein property essential to their role as the primary molecular machines in biology[72].

7 Conformational coordinate

Unbound conformations

Ligand +

Bottom of the folding funnel

Ligand-bound conformation is stabilized

Figure 1.3.2: Schematic representation of protein functional motion at the bottom of the folding funnel. At the bottom of the funnel, protein dynamics is sensitive to modulation by binding of ligand. The resulting energy landscape upon binding of a ligand involves redistribution of population. Also shown is the stabilization of a low energy conformer upon binding a ligand.

8 The dynamic equilibrium between low energy conformations is essential descrip-

tion for this “population shift” (or “conformational selection”) mechanism of allostery.

NMR experiments establish the existence of dynamic equilibrium between meta-stable

conformations of several proteins even in the absence of ligand support this pic-

ture[124, 54]. From a computational point of view, a minimal molecular model to

study ligand binding must be able to accommodate the protein conformational dy-

namics and sample the appropriate conformational ensembles. As a first approxi-

mation, a model must accommodate at least two metastable states connected by a

transition barrier. Furthermore, the transition between the two meta-stable states

should not be discrete. That is, the protein should be able to sample a broad range

of conformations within the two folded basins. One primary focus of the work in this

dissertation is to elucidate the allosteric ligand binding mechanism through simula-

tions of a model in which explicit conformational change of protein is coupled with

ligand binding.

1.4 Mechanism of Allostery in Proteins

The term allostery refers to any mechanism by which proteins communicate the effect of binding an external molecule (such as a ligand) at one site to a distal region of the molecule. This process is prevalent in biology and allows for its regulation of activity[185]. This regulation is often achieved through the control of dynamic equilibrium of the conformational ensemble[44] by a shift in population due to binding of small molecules, changes in temperature, concentration or mutation[117]. Allostery can also enhance the binding specificity of a biological macromolecule at the target site[147].

Several mechanisms exhibit allosteric control. One of the most common is through

9 a protein’s conformational change upon binding of a ligand. The kinetics of allostery

can be seen against the background of two conceptually distinct mechanisms. In

population shift mechanism, a ligand preferentially stabilizes a particular conforma-

tion among the ensemble of unligated conformations. An alternative viewpoint was

introduced by Koshland and coworkers, who postulated that a ligand binds to the

equilibrium ligand-free form with subsequent change in structure to the ligand-bound

form[101]. This binding mechanism is known as the “induced fit” binding.

As shown in Fig. 1.4.1, the binding mechanism of a ligand to a protein can be

described by different levels of resolution. For example, Fig. 1.4.1 (A) shows a “macro-

scopic” description of the ligand-induced conformational change in a protein. This

is a simplified version of the detailed kinetics since it does not describe the confor-

mational change of the protein. The equilibrium between the unbound and bound

conformational ensembles is balanced by the on and off rates with equilibrium con-

stant (Keq)

kon[L] (1.1) Keq = . koff

Here, [L] denotes the concentration of ligand, kon is the bimolecular binding rate,

and koff is the unbinding rate. A more detailed kinetics scheme includes transition

rates for the conformational change. As shown in Fig. 1.4.1 (B), for a protein with

open and closed states, conformational transitions occur with rates kuo and kuc, re-

spectively. The transition can also occur when ligand is bound with rates kbo and

kbc, respectively. The overall binding mechanism can proceed via two distinct routes

as shown in Fig. 1.4.1 (B). For example, ligand can bind weakly to the closed state

c c with forward rate kon[L], and can come off with rate koff . In induced fit binding, the weakly-bound intermediate conformation is stabilized by a conformational change to

10 Ligand [L] Unbound Bound kon

koff (A)

Unbound-Closed Unbound-Open kuo

kuc

Ligand [L] Ligand [L]

o c c k ko kon koff off on

kbo

kbc Bound-Closed Bound-Open (B) Figure 1.4.1: Schematic representations of macroscopic and microscopic allosteric binding. (A) In the macroscopic point of view, binding of a ligand stabilizes the ligand-bound ensemble of conformations. (B) Microscopic picture is more complex with rates determined by the relative stabilization of conformations in the unbound and bound ensembles.

11 the ligand-bound open state. In the other scenario known as conformational selection

(or population shift) mechanism, the conformational change to the unbound open

state occurs before ligand binding followed by subsequent transition to the ligand-

bound open state. Which route dominates for a particular protein and ligand is a

kinetic issue. It is challenging for experimental measurements to resolve binding ki-

netics of intermediate states[47]. Furthermore, focusing on only two basins may not

capture a particular protein’s conformational dynamics which may be considerably

complex.

Extending this simple kinetic scheme to proteins with multiple binding sites is

straightforward, though the microscopic binding mechanism can be rather complex.

In this case, each conformational state of protein can exist in either closed or open

conformation with the possibility of ligand binding to any of the conformational states

sampled by the protein. Hence, the ensembles would involve many more microstates.

Capturing all of the kinetic details within a simplified molecular model is challenging.

My research investigates the interplay between a protein’s conformational dynamics

and its binding properties. In particular, I focus on characterizing allosteric binding

thermodynamics and kinetics of calmodulin, a flexible protein with multiple hetero-

geneous sites.

1.5 Calmodulin

The Ca2+-binding protein calmodulin is a good model system to gain mechanistic insights into allosteric ligand binding properties. I report on binding properties of the individual N-terminal and C-terminal domains of calmodulin, as well as intact calmodulin. In this section, I briefly describe the biological functional properties and the importance of calmodulin as a regulatory protein.

12 Calmodulin (CaM) is a ubiquitous Ca2+-binding protein consisting of two struc-

turally similar, globular domains. The two domains of CaM, N-terminal domain

(nCaM) and C-terminal domain (cCaM), have similar secondary and tertiary struc-

tures. Each domain consists of two helix-loop-helix motifs (the EF-hands)[102] con-

nected by a flexible linker as shown in Fig. 1.5.1. The Ca2+-free structure of CaM

has been resolved by solution NMR experiment[103] and Ca2+-bound structure has

been X-ray crystallography measurements[34].

Although topologically similar, the two domains of CaM have distinct flexibilities,

melting temperatures, and thermodynamic Ca2+-binding properties[204, 116, 184,

127]. In the absence of Ca2+, the C-terminal domain is particularly dynamic[196],

and is less stable than the N-terminal domain in the intact protein as well as sepa-

rate isolated domains[23, 184, 127]. The C-terminal domain, which has a very low

denaturation temperature, is reported to be considerably unfolded under physiolog-

ical temperature[164]. Furthermore, NMR experiments monitoring the open/closed

transition of isolated cCaM have revealed local transient unfolding of helix F during

domain opening[121]. NMR experiments also reveal that the linker region connecting

the two globular domains of CaM is highly flexible even when CaM is not bound to

its target proteins[39].

The Ca2+-free “apo”-form and Ca2+-bound “holo” forms of calmodulin, as well

as calmodulin bound to a target peptide are shown in Fig. 1.5.1. On binding Ca2+- ions, CaM undergoes a conformational change exposing hydrophobic residues in each domain. The overall structure of calmodulin has a dumbbell shape which enables it to bind to target proteins with both domains like two hands holding a stick. The high flexibility of the domains allows CaM to adapt to bind different targets that play

13 (A) (B)

(C) Figure 1.5.1: Structure of calmodulin in different forms. (A) Ca2+-free structure of calmodulin (PDB id: 1CFD), and (B) Ca2+-bound structure of calmodulin (PDB id: 1CLL). Upon binding 4 Ca2+-ions to its binding loops, calmodulin undergoes a large structural change that exposes its hydrophobic surface, and thereby enabling calmodulin to bind to target proteins. (C) NMR structure of calmodulin bound to smooth muscle myosin light chain kinase (PDB id: 2KOF). The calcium ions are shown as silver circles. This visualization was made using Visual Molecular Dynamics (VMD) software[85].

14 crucial role in cell signaling, ion transport and cell death[189]. A key reason behind

CaM’s ability to bind to a wide variety of target proteins is non-polar interactions,

through the abundant methionine resides of CaM. The non-polar surface of CaM is

exposed by Ca2+binding process which allows CaM to bind to the non-polar regions

of the target proteins. It is interesting to note that since the non-polar grooves do

not have a specific shape, CaM acts as a versatile regulatory protein and its targets

are not required to possess any specific amino acid sequence or structural binding

motifs[208].

CaM binds to a diverse range of target proteins, and modulates various cellular

processes, such as muscle contraction, metabolism, nerve growth, and immune re-

sponse. One key reason behind CaM’s functional diversity is the flexibility of the

central helix region upon binding Ca2+ions, as well as the flexibility of the domains.

Additionally, binding of CaM to its target proteins involves a greater degree of flexi- bility of the interdomain linker region[143].

CaM’s ability to regulate a wide spectrum of crucial cellular processes has made it a model system for studies spanning last few decades. The Ca2+-induced structural change is a key step in CaM’s activation process. In this dissertation, I aim to give a molecular description of the activation of CaM with Ca2+ions. Coarse-grained simulation of CaM binding to target peptides have been reported recently by Cheung et.al.[201, 222]. In future work, it would be interesting to combine Ca2+activation and target peptide binding.

1.6 Organization of Dissertation

The dissertation is organized as follows:

In Chapter 2, I discuss thermodynamic binding cooperativity in proteins. Several

15 approaches to analyze experimental data are also described. Next, I introduce the

viewpoint of allostery as shift in population in the energy landscape perspective and

describe the framework of Monod-Wyman-Changeux (MWC) model of allostery. I

also discuss the assumption in the MWC model and why it fits our investigation

of allostery in Ca2+-CaM binding. Finally, a more recent viewpoint in terms of

generalized ensemble of allostery is briefly discussed.

In Chapter 3, I first give an overview of the different simulation and analytic

models that have been developed to study conformational dynamics of proteins. Then,

I briefly describe the coarse-grained model used in our simulations.

Next three chapters report our work on allosteric transitions in CaM. In Chapter 4,

I describe the open/closed transitions of isolated N-terminal domain and C-terminal

domain of CaM. Here, I explore how stability influences the transition mechanism

of domain opening. In Chapter 5, I describe results from a model that incorpo-

rates conformational dynamics and ligand binding. Here, I investigate the binding

cooperativity of Ca2+binding to the isolated domains of CaM. The approach used in

our study is a molecular realization of the classic Monod-Wyman-Changeux (MWC)

model of allosteric binding cooperativity. The simulations predict that the binding

strength of the loops of calmodulin are heterogeneous. For binding two Ca2+-ions,

the simulations predict that the two domains of CaM have distinct binding affinity

and cooperativity. I also provide a structural rationalization of the binding free en-

ergies of the loops in the simulated ensemble of unbound and bound conformations.

Chapter 6 focuses on extending our work on isolated CaM domains to investigating

four Ca2+binding to intact CaM. Thermodynamic simulations show that the hetero- geneity in CaM’s binding loops is conserved in the intact protein. Kinetics results

16 predict dominant binding routes within the ensemble of ligand-bound conformations.

17 Chapter 2

Models of cooperative ligand binding

2.1 Introduction

Allosteric cooperative binding is an essential component of many fundamental bio- chemical and physiological functions such as enzymatic activity. An allosteric protein has multiple binding sites in which the binding of ligand at one site influences sub- sequent binding at other sites. In cooperative binding, the stability of the bound complex is greater than the sum of the stabilities of binding to the individual sites.

Cooperativity plays crucial role in cell signaling, transcriptional regulation, and many other processes in cells.

Cooperative binding was first observed in early 1900s by Christian Bohr and coworkers who investigated oxygen binding to hemoglobin[19]. They measured the average saturation of hemoglobin with oxygen as a function of the partial pressure of oxygen. The shape of the sigmoidal binding curve suggests that binding oxy- gen to hemoglobin makes it easier for additional oxygen to bind. Throughout the twentieth century, various frameworks have been developed to describe cooperative binding of a ligand to a protein with multiple binding sites[218]. Early phenomeno- logical models were proposed by Hill, Pauling, Adair, and others. Of these, the classic Monod-Wyman-Changeux (MWC) model[136] and the Koshland-N´emethy-

Filmer (KNF) model[101] of allostery have had lasting influence in the interpretation of cooperativity. Nevertheless, in practice these provide a largely conceptual pic- ture. Gaining structural insight into the fundamental mechanism governing allostery

18 requires molecular understanding of the protein’s conformational dynamics[45]. Elu-

cidating the mechanism of allostery and associated binding cooperativity has enjoyed

renewed interest in recent years in part because of its natural place in landscape the-

ory as well as development of more refined experimental techniques[214, 169, 209, 48,

88].

Many types of proteins exhibit cooperative binding. While oxygen binding to

hemoglobin is perhaps the most famous example, several other molecular assemblies

that exhibit cooperative binding have been characterized in great detail including the

enzymes threonine deaminase[32], aspartate transcarbamylase[64], as well as ligand

gated ion channels like nicotinic acetylcholine receptors[95] and inositol triphosphate

(IP3) receptors[133]. These complexes served as model systems to illustrate cooper-

ativity as a fundamental protein strategy.

The main focus of my research in this dissertation is to characterize and provide

molecular insights into allosteric binding cooperativity of a protein with heteroge-

neous binding affinities. In this chapter, I give a brief overview of early framework

to characterize ligand binding cooperativity to interpret experimental binding titra-

tion curves. I also discuss various alternative interpretations developed to quantify

cooperative binding.

2.2 The Hill Equation

The first quantitative description of cooperative binding to a protein with multiple binding sites was developed by A.V.Hill[77] with a phenomenological equation repre- senting the fractional occupancy

[X]nH (2.1) Y = n , Kd + [X] H

19 where [X] represents ligand concentration, Kd denotes apparent dissociation constant, and nH is known as the Hill coefficient. A common form of the Hill equation convenient for fitting measured binding data is to rewrite Eq. 2.1 as

Y (2.2) log = nH log[X] log(Kd). 1 Y − −

I note that, in this formalism, the cooperativity (represented by the value of nH) is

assumed to be fixed and is independent of the degree of saturation of the protein by

ligand. As shown in Fig. 2.2.1, the average saturation exhibits sigmoidal behavior

1.0

0.8

0.6 Y 0.4

nH = 1 0.2 nH = 2 nH = 3 nH = 4 0.0 0 1 2 3 4 5 ln[X] Figure 2.2.1: Schematic representation of Hill equation given by Eq. 2.1 for different values of the Hill coefficient, nH. The sharpness of the curve increases for higher values of nH. The x-axis represents ligand concentration, [X], and y-axis shows fractional occupancy, Y .

as a function of ligand concentration. The value of concentration at which the value

of average saturation equals 0.5 is known as the dissociation constant (Kd). The

sharpness of the curves increases with increasing values of nH, which characterizes

the degree of cooperativity in ligand binding. Higher values of nH results in sharper

binding curve and decrease in corresponding values of Kd. For positive cooperativity,

20 the Hill coefficient is usually bound by 1 < nH < n, where n is the number of ligands.

A value of Hill coefficient equal to 1 indicates each site binds independently. That is,

binding at one site does not affect the binding at another site. When n n, the H ' model corresponds to very high cooperativity in which the only thermodynamically

relevant species are unligated or fully saturated states.

The Hill coefficient is generally used as a fitting parameter to match measured

binding curves. The physical interpretation of nH can be summarized as the number

of ligand molecules necessary to bind to a receptor for activation of the receptor and

to produce functional effect. This measurement only provides an overall macroscopic

description of the cooperative interaction between the binding sites and does not

provide insight into the microscopic nature of binding. The site-specific microscopic

information, and coupling between binding sites, for example, is not captured by this

description.

2.3 Cooperativity for binding two ligands

Cooperative binding for a protein with two binding sites can be conceptualized by the thermodynamic cycle shown in Fig. 2.3.1. To illustrate the influence of coopera- tivity on the binding curve, I treat the sites to be of uniform strength for simplicity.

The stability of binding a single ligand is denoted by the equilibrium constant, K.

Recruitment of a second ligand has an additional stability due to cooperativity repre- sented by a factor c > 1. The overall stabilization of the bound ensemble is therefore cK2. The partition function can be expressed in terms of ligand concentration and explicit coupling between the sites as

(2.3) Z = 1 + 2K[X] + cK2[X]2 ,

21 A KA A B B

B cAB > 1 cooperative K B cAB = 1 uncooperative K AB

c cAB < 1 anti-cooperative

A A B cAB KA B

Figure 2.3.1: Thermodynamic cycle showing microscopic schematic representation for binding two ligands of heterogeneous strength. Starting from unligated state ligand binding stabilizes the state with both sites occupied by ligand. The two routes via which the binding process proceeds consists of partially bound states where a single site is occupied by a ligand, while the other site is empty. The final state in the cycle is the state with both binding sites occupied by ligand.

where [X] is ligand concentration. The cooperativity parameter c represents the effective coupling between the sites. Nevertheless, the underlying mechanisms need not be specified in this purely phenomenological description.

The equilibrium constant K and the coupling parameter c can be expressed in terms of free energy of stabilization of the ligand-bound ensemble with respect to the unbound ensemble of conformations. In this notation, the partition function given by

Eq. 2.3 can be written as

(2.4) Z = 1 + 2e−βF [X] + e−β(2F +∆F )[X]2 , with F = k T log(K). The additional free energy of stabilization arising due to − B explicit coupling between the binding sites is represented by ∆F = k T log(c). A − B binding transition curve (or titration curve) connects the unbound ensemble of con- formations at low ligand concentration to the bound ensemble of conformations at high ligand concentration. As shown in Fig. 2.3.2, the binding curve obtained from

Eq. 2.3 for coupled sites (c > 1) is sharper than the binding curve for uncoupled sites

22 1.0 c = 1 1.0 c = 1 c = 10 c = 10 c = 50 c = 50 0.8 c = 100 0.8 c = 100

0.6 0.6

0.4 0.4

0.2 0.2 Bound probability Bound probability

0.0 0.0

4 2 0 2 4 4 2 0 2 4 − − − − ln[X] ln[X] Figure 2.3.2: Dependence of bound probability on the parameter c for a protein with two binding sites of homogeneous strength. The x-axis represents ligand concentration and y-axis represents probability of states with both sites occupied by ligand (left), and probability of states where only one site is occupied by ligand (right). Non- cooperative binding corresponds to c = 1. The binding curve becomes increasingly cooperative and the peak of population of singly ligated states decreases for higher values of c.

(c = 1). For uncoupled binding sites the binding is non-cooperative. For higher values of c > 1, the binding curve shows an increased cooperative behavior. Cooperative

binding also suppresses population of partially ligated states (only one site occupied)

in the ensemble of ligand bound conformations. This suppression stems from en-

hanced recruitment of a second ligand after one is occupied. From an

experimental point of view, however, binding cooperativity is commonly quantified

in terms of the sharpness of binding curve since intermediate states are generally

difficult to characterize experimentally.

This framework can be generalized to accommodate heterogeneous binding sites

with

2 (2.5) Z = 1 + (KA + KB)[X] + cABKAKB[X] .

Here, KA and KB reflect equilibrium constants for binding a ligand to the sites (A and

23 B), respectively and cAB is the coupling interaction between the sites. For proteins

with more than two binding sites, the cooperativity is reflected in a partition function

with higher order interactions in addition to the two-body interaction term (cAB).

These multi-body interaction terms are representative of additional stabilization due

to coupling between two or more binding sites. While straightforward, achieving such

a detailed description via available experimental measurement may be very difficult.

For example, binding sites are often treated as equivalent in analysis of experimental

data, for simplicity, even when differences in binding sites are likely[116].

A major aim of my research in this dissertation is to characterize cooperativity

of binding Ca2+ions to the isolated domains as well as intact calmodulin (CaM). In particular, I focus on the heterogeneous nature of the binding loops in calmodulin.

In contrast, much early work on cooperative binding in calmodulin treats the binding sites on equal footing[116]. Assigning a site-specific binding strength within the avail- able experimental resolution is indeed challenging. Techniques designed to distinguish site-specific binding often alter the stability of the relevant states, making it hard to assess the relevance of the wild type protein[66]. Recent techniques that isolate the binding loops by grafting them to a scaffold presents a complicated experimental framework to interpret[221]. Due to the limitation on obtaining site-specific, micro- scopic description of ligand binding to proteins, much experimental work has relied on defining an “effective” uniform coupling parameter between binding sites. Analysis of my simulations do not have such limitations and I am able to extract binding free energies for individual sites in order to calculate cooperativity parameters.

24 2.4 Other models of cooperativity

In 1925, G.S. Adair hypothesized that cooperativity was not a fixed quantity, but dependent on the degree of ligand saturation. Using hemoglobin as model system[1],

Adair assumed that fully saturated hemoglobin is formed in stages via subsequent binding of one, two, three, and four oxygen molecules. The formation of intermediate states consisting of one, two, and three oxygen bound forms of hemoglobin starting from unbound hemoglobin is described in terms of an apparent macroscopic associa- tion constant Ki. The fractional average occupancy in this model for a protein with n ligand binding sites can be expressed as

2 n 1 KI [X] + 2KII [X] + + nKn[X] (2.6) Y n = ··· . n 1 + K [X] + K [X]2 + + K [X]n I II ··· n Here, n represents the number of ligand binding sites and the binding of i ligand

molecules is given by the macroscopic association constant Ki.

For hemoglobin, with 4 oxygen binding sites, n = 4. Hence, Eq. 2.6 takes the

form

2 3 4 1 KI [X] + 2KII [X] + 3KIII [X] + 4KIV [X] (2.7) Y = 2 3 4 , 4 1 + KI [X] + KII [X] + KIII [X] + KIV [X] which gives a measure of average occupancy of binding sites in terms of macroscopic

(site-agnostic) binding constants.

For the thermodynamic cycle shown in Fig. 2.3.1, the average fractional occupancy for heterogeneous sites can be expressed as

2 1 (K1[X] + 2K2[X] ) (2.8) Y II = 2 . 2 (1 + K1[X] + K2[X] )

Here, the macroscopic equilibrium constants, K1 and K2, represent ligand binding to

a single site and to both sites, respectively. The macroscopic binding constants are

25 often reported through the microscopic equilibrium constants to each site, KA and

KB. Comparing with Eq. 2.5 gives, K1 = KA + KB and K2 = cABKAKB[210, 156,

174, 184, 206, 144]. The stabilizing free energy for binding ligands to both sites is given by ∆G = RT ln K , which includes the cooperativity between the binding 2 − 2 sites.

Several other models have been developed to describe cooperative binding to macromolecules. developed a framework by reinterpreting the Adair equation by assuming the binding constants in Eq. 2.7 incorporated a contribution from binding to a site and a cooperative free energy associated with binding multiple ligands[154]. This idea was later improved by Daniel Koshland and coworkers. The

Koshland-N´emethy-Filmer (KNF) model is based on the assumption that for a pro- tein with multiple subunits, each subunit can exist in either “active” or “inactive” conformations[101]. Ligand binding to a subunit in the inactive state induces a confor- mational change of that subunit to the active conformation. This so called induced fit model is one of the first that rationalizes cooperativity through the structural change in the protein.

Models that attempt to provide a comprehensive description of cooperative bind- ing in macromolecules often fall short of a molecular level description. A typical macroscopic binding titration curve obtained from experiment does not provide this information directly. Rather, microscopic binding information in some sense is “hid- den” within a typical binding titration curve. In my work, I aim to shed light on this microscopic information.

Another key aspect of cooperative ligand binding is to distinguish binding proper- ties of individual sites. The framework developed by Adair and other related models

26 describes cooperativity as an effective coupling between the sites, without distinguish-

ing differences in binding strengths of the sites because this information is not easily

available from experiment. Furthermore, cooperativity in these models is phenomeno-

logical, without appealing to a specific structural mechanism. In this dissertation, I

develop a molecular description of ligand binding for a protein with multiple sites

of heterogeneous binding strengths. To achieve this, I work within the framework

of an explicit molecular model of ligand binding where the binding is coupled with

conformational change. The Monod-Wyman-Changeux (MWC) model of allostery

provides one such framework to study ligand binding in allosteric proteins.

2.5 The MWC Model

The classic Monod-Wyman-Changeux (MWC) model of allostery accounts for con- certed allosteric transitions to describe cooperative binding[136]. The MWC model was originally developed to describe oligomeric proteins which have symmetric, identi- cal subunits, each containing one ligand binding site. This model assumes a thermal equilibrium between two interconvertible conformations of an allosteric protein: a

“tensed” state T and a “relaxed” state R. The tensed state, which is more stable than the relaxed state in the absence of the ligand, binds the ligand with lower affinity than the relaxed state as shown in Fig. 2.5.1. The equilibrium population thereby switches equilibrium from T to R upon ligand binding. A key assumption of the

MWC model is that all subunits of a protein makes a transition simultaneously. This

is often referred to as “strong coupling”.

Since its introduction over 50 years ago, the MWC model has been extended and

generalized to include proteins that exist in more than two states[53], conformational

27 Unbound

Bound Free Energy

Tense Relaxed Reaction Coordinate Figure 2.5.1: Schematic representation of shift in population in MWC model. In the unbound ensemble of conformations the ligand-free closed state has higher stability in the free energy landscape and it exists in dynamic equilibrium with the open state. Upon ligand binding, the ligand-bound open state is stabilized. The x-axis represents reaction coordinate that is used to define the closed and open states of the protein, and the y-axis represents free energy (in arbitrary units). Also shown are representative protein structures in the unbound and ligand-bound ensemble of conformations.

28 transitions and ligand binding with proteins that can bind to several types of lig-

ands[131, 140], as well as proteins with heterogeneous binding sites[187].

The MWC picture shown in Fig. 2.5.1 is a simplified representation of the po-

tentially broad conformational ensemble generally available to a dynamic protein. In

general, the landscape may be more complex than can be represented with two native

basins. Nonetheless, this picture of a shift in population between the basins upon lig-

and binding is a good starting point to develop a model for allosteric cooperativity.

In this framework, the cooperative coupling of binding sites arise from a concerted

conformational change.

In this dissertation, I develop a molecular dynamics simulation of CaM that is

essentially a molecular realization of the MWC model. For CaM, Nuclear Magnetic

Resonance experiments[13, 55, 54], as well as all atom molecular dynamics simula-

tion[207] suggest a dynamic equilibrium between the open and closed conformations

in the absence of Ca2+which supports our approach. In my research, I explore the molecular origins of allosteric cooperativity within the confines of an explicit molec- ular model where ligand binding is coupled with protein conformational change.

In the model described in the next chapter, the conformational dynamics between the two basins is coupled to Ca2+binding in calmodulin. Analysis of our simulation

for binding two Ca2+-ions to a domain of CaM follows the statistical weights shown in

Fig. 2.5.2. In this figure, I use the notation that the ligand concentration is controlled

by its chemical potential, µ k T log[X]. Thermodynamic properties can be easily ∝ B calculated through the partition function for two ligands. The free energies to bind to

the open and closed states are extracted from simulated titration curves for binding

to individual sites.

29 closed open β 1 e−

β(I µ) β β(I µ) e− c − e− e− o−

β(II µ) β β(II µ) e− c − e− e− o − Free Energy β(I +II 2µ) β β(I +II 2µ) e− c c − e− e− o o −

closed open Reaction Coordinate Figure 2.5.2: States and corresponding statistical weights for a simplified description of MWC model of allostery for a protein with two ligand binding sites. In the unligated ensemble of conformations, the relative stability of the open and closed states is set by the parameter . The singly-ligated ensemble of conformations consists of a ligand bound to either the closed state or the open state. The fully-loaded ensemble comprises of both ligand bound to either the closed state or the open state of the protein.

30 2.6 Generalized ensemble view of allostery

In the sequential (KNF) model of allostery, ligand binds to the T state weakly followed

by a conformational transition to the R state. In this picture, the R state in the unbound ensemble is missing. In the MWC model, this state is included in the unbound ensemble. These models can be placed in a general framework with more ligation states as emphasized by Hilser and coworkers[81]. For example, the MWC model asserts that the sites are in the same conformation (either both closed or both open) when the protein undergoes a global structural change. This neglects states with binding sites that can adopt different conformations (one site closed and the other site open). Such binding sites would be “weakly coupled” and hence are expected to have a lower cooperativity. Still, additional states allow more diverse strategies for tuning ligand binding affinity.

Both the KNF and MWC models have been applied extensively to investigate allostery. In spite of their success to quantitatively describe binding titration curves, the KNF and MWC models exist in a broader framework[81]. These models are a subset of an ensemble of microstates that include all combination of open and closed conformations for each site (Fig. 2.6.1). This expanded picture still simplifies the conformational states involved in allostery. In fact, there is evidence that allostery occurs even without a conformational change[158, 46, 157, 217]. Allostery can also involve negative cooperativity between binding pockets[165], which is difficult to ex- plain through structural change alone. Surface mutations, for example, can give rise to allosteric interactions without reorganization of structure[171, 172]. Nevertheless, this generalized picture is able to describe allostery of some systems where that is difficult for models like KNF and MWC to describe. As emphasized by Hilser and

31 (A)

(B)

(C)

Figure 2.6.1: Schematic of ensemble description of models of allosteric cooperativity for a protein with two subunits. (A) The MWC model of allostery, (B) KNF model, (C) A more generalized ensemble allosteric model that accommodates all possible microstates of a protein with two binding sites. Green shaded regions correspond to subunit interaction energy. The two different shapes correspond to the closed and open conformations of the protein. Colored shapes correspond to ligand unbound and uncolored shapes correspond to ligand bound conformations. Blue shaded re- gions show the ensemble of states for each framework. Adapted from the framework developed by Hilser and co-workers[81].

32 coworkers[80], allosteric properties of intrinsically disordered proteins (IDPs)[165],

such as negative cooperativity, require the inclusion of weakly coupled subunits.

2.7 Allostery in calmodulin

In spite of the existence of allosteric systems where structural information alone does not always provide mechanistic insight, MWC model of allostery have been applied with success to numerous systems[51, 52, 33, 24]. For CaM, the activation upon binding Ca2+-ions results in a large structural rearrangement of contacts. Structural details of both the Ca2+-free “closed” state and Ca2+-bound “open” state of CaM

is well characterized. Furthermore, the nature of the conformational change requires

the sites within a domain are strongly coupled. Hence, it seems plausible that the

MWC model of allostery can be applied faithfully to gain mechanistic insight into a

molecular description of allostery in CaM.

Experimentally, the nature of cooperativity for Ca2+-binding to the isolated N-

terminal and C-terminal domains, as well as to the intact domain of CaM has not

yet been fully resolved. Available strategies to measure the extent of cooperative

interactions between the binding loops, and between the two domains of CaM is often

sensitive to mutational effects on the stability of the domains. Typically, experiments

can detect the overall population of bound ligand to the protein but are unable to

detect the order that the loops are occupied. For example, experiments provide very

limited information about how Ca2+binding to the loops influences each other to

produce an overall macroscopic binding curve. Available reports suggest that the

coupling between the N-terminal and C-terminal domains in intact CaM is perhaps

weak, or even negative[184]. For intact CaM, some studies suggest that the Ca2+-

induced stabilization of the residues in the interdomain linker region mediates the

33 coupling between the two domains of CaM[175].

Capturing all these details in a single model is challenging. As mentioned earlier, envisioning a mechanism in which one site in the closed conformation while the other one in the open conformation is probably not physical for the open/closed transition of the domains. On the other hand, relaxing the coupling effect mediated by the interdomain linker between the two domains may be reasonable. In the absence of clear guiding principles to characterize allostery between the binding sites of CaM, I choose “strong” coupling approximation between the binding sites within a domain, as well as between the two domains of CaM. In fact, a recent work has reanalyzed available experimental data within this strong coupling approximation between the domains[186].

34 Chapter 3

Coarse-grained models for conformational dynamics in proteins

3.1 Introduction

Many computational models aim for a comprehensive description of conformational dynamics with atomistic detail. Nevertheless, for some applications such as protein folding, a lower resolution description that can capture long timescales is a practical alternative. These models can predict key features of transition mechanisms of pro- tein folding and other large scale conformational transitions to complement higher resolution models and interpret experimental results. Both theoretical and compu- tational methods have been developed to understand protein transition mechanisms.

Coarse-grained models, introduced in the pioneering work of Levitt and Warshel[111,

110] shortly after the first protein all-atom simulations[213, 123], offer a powerful approach to identify key organizing principles that are involved in complex interac- tion details of proteins. Coarse-grained models represent each amino acid with one or more units whose properties are defined by the specific amino acid type. For ex- ample, one site can be associated with the α-carbon of the amino acid. Later, the protein backbone can be reconstructed if necessary[129, 167]. Interactions can be parameterized by their intrinsic properties such as hydrophobicity and charge[84, 22].

Despite simplification, these models have proven to be highly successful in elucidating the principles concerning protein folding and associated interactions[160, 139, 61, 7,

74, 57, 150, 215, 6, 225, 79, 89, 49]. While falling short of the detail provided by ab initio all-atom models, the ability of coarse-grained models to simulate relevant

35 lengthscales and timescales that make it possible to directly compare, validate, and

improve simulation models.

To give a context for the approach developed in this dissertation, in this chapter

I introduce several ways coarse-grained models have been used to understand pro-

tein dynamics. Theoretical and computational approaches have different advantages

and weaknesses. Simulations of protein dynamics can accommodate a broader phase

space and allows for a more faithful representation of ensemble-averaged properties.

However, capturing the effect of certain physical properties at the atomic level may

be difficult to describe via simulations. Analytic models may be able to incorporate

some physical details directly and provide insight into conformational mechanisms

but they generally require severe approximations to be tractable. All-atom models

on the other hand often provide a detailed description but require higher computa-

tional cost. My experience in this work suggests that, employing simulation models

at a less-detailed level allows one to incorporate important experimental observations

directly in a tractable model, making them an attractive approach to gain insight

into the mechanism of allosteric cooperativity.

3.2 Native structure-based models for proteins

Many coarse-grained models make use of energy functions defined by the interactions that occur in the protein’s native structure, the so-called “native contacts”. These structure-based models are referred to by different names such as structure based, native-centric, or G¯omodels[192, 67]. In these models, interactions that are present in the native structure are stabilized, and other “non-native” contacts are destabilized.

This strong assumption corresponds to a smooth energy landscape dominated by the driving force from the principle of minimum frustration[25, 151]. It is perhaps no

36 surprise that initial support for this model came from work to understand the fastest

folding proteins for which this landscape ruggedness from non-native contacts likely

plays a relatively minor role in the folding kinetics.

Network models are among the simplest structure based models[195, 82, 7]. In this

framework, the ensemble of conformations within the native free energy basin is ap-

proximated by harmonic fluctuation about the native structure. Interactions between

residues are included as harmonic springs between α-carbons. Network models have been highly successful in characterizing fluctuations obtained from measured temper- ature factors of X-ray structures as well as Molecular Dynamics (MD) simulations[3,

106, 166].

The potential in the Elastic Network Model can be expressed as[195]

1 X 2 (3.1) U (R R0) = k ∆ (R0) R R , ENM | 2 IJ IJ | IJ − IJ;0| I

where RIJ and RIJ;0 are distance between coarse-grained sites I and J in configuration

R, and in the folded configuration R0, respectively. The term kIJ represents the

spring constant, and ∆IJ (R0) denotes the matrix of native contacts that equal 1 if

RIJ;0 < Rc, where Rc is a specified cutoff. Often, one is interested in characterizing

the fluctuations R R and low frequency collective motions within the native h I · Ji basin. Network models have been extensively applied to study a broad variety of

questions, including protein conformational dynamics[97, 134, 125, 224, 40, 225, 183,

118, 83], refining experimentally determined structures[194, 69], and modeling protein

docking[170, 130].

Simulations of native-centric or G¯o-models is another coarse-grained approach

used to study biological macromolecules. Similar to network models, contacts stabi-

lizing the known reference folded structure define the attractive interactions in the

37 model. Although this approach can be applied with an all-atom representation, the

simplest native-centric model represent each residue by its α carbon atom. Non-

bonded interactions between residues in proximity in the native structure typically

have Lennard-Jones type potential that takes its minimum at the separation distance

in the reference folded structure. The non-bonded interactions between all other pairs,

i.e. non-native interactions, are modeled by simple repulsive potential to capture the

effect of excluded volume interactions and tends to minimize frustration. Bond angles

and dihedral angles are also parameterized by the angles in the native structure.

A primary motivation behind the development of these models is the energy land-

scape idea that many proteins in nature are “minimally frustrated”. Non-native

interactions generally gives rise to energetic frustration that slows kinetics of confor-

mational transitions. By ignoring energetic frustration, native-centric models provide

an idealized folding funnel[182, 50].

Because of their simplicity and low computational cost, native-centric models have

enjoyed high popularity with wide ranging applications. More refined models include

native-centric models with atomistic interaction details[176, 41], and “flavored” C α − models, in which the interaction strength depends on the chemical nature of the residues involved in making the contact[93]. Other native-centric models have been introduced that incorporate desolvation barriers[38], non-native interactions[43, 223], many-body cooperative interactions[8], and electrostatic interactions[112]. Confor- mational transitions between multiple conformations[228, 17, 150, 87, 100], as well as validation of experimentally obtained free energies of mutation[128] have also utilized native-centric models. Other areas of application of these models include modeling the effects of confinement and crowding on folding[191, 37], studies involving nucleic

38 acids[86], and to investigate coupled folding and binding of intrinsically disordered

proteins[113, 212]. Large-scale conformational transitions in proteins often involve

much longer timescale than that can be achieved from typical simulation timescales.

Due in part to this limitation, theoretical approaches often rely on simplified, coarse-

grained models that incorporate energy basins defined by two distinct conformations.

The “closed” conformation is stable in the absence of a ligand, and an “open” confor-

mation which is stable when ligated. The Elastic network and other related models

describe allosteric transition as motion along low frequency normal modes of the

closed state conformational basin[9, 7, 193, 10]. In this approach, protein dynam-

ics for describing conformational changes depends entirely on conformations about

a single free energy minimum. While this provides a natural rationale for protein

conformational change as collective low energy motions[219, 11], a minimal model

for the transition mechanism must accommodate both open and closed states. Al-

losteric transitions have been modeled by coupling two meta-stable basins through an

interpolation function defined by its energy[17, 150, 36, 149, 119, 220]. For example,

minimal energy pathways have been computed for a potential energy surface based

on the strain energies relative to each distinct mechanism[125, 40, 49].

3.3 An analytic model for CaM

The work on conformational transition in CaM in this dissertation is motivated by results from an analytic variational model of allostery[198, 197]. This model describes conformational changes in terms of the evolution of each residue’s local flexibility[199,

200, 197]. An alternative approach, provided by Itoh and Sasai, employs a model for which contacts of the two meta-stable structures are treated on equal footing rather than through an interpolated energy function[89, 90]. In this section, I briefly describe

39 variational model developed to study folding and allosteric transitions in proteins.

In this analytic coarse-grained model, partially folded ensembles of structures can

be characterized by their Gaussian fluctuations. Here, a protein conformation is rep-

resented by the N position vectors of the C-α positions of the polypeptide backbone,

ri. In this representation, partially ordered ensembles of protein configurations are

described by a reference Hamiltonian

3 X N 2 (3.2) βH = βH + C [ri r ] , 0 chain 2a2 i − i i where β = 1 and a is mean bond length of a freely rotating chain. The first term kBT in Eq. 3.2 represents the protein backbone as a uniformly stiff homopolymer. The second term is an external field in which the N variational parameters, C , control { i} the magnitude of the fluctuations about the native C-α positions of the polypeptide

N ri . The free energy landscape of partially folded ensemble of structures is explored via the reference Hamiltonian. The Hamiltonian that determines the population of partially folded ensembles is described by

X (3.3) H = H +  u(ri rj) . chain ij − ij∈|ij|

Here, u(ri rj) is a pair potential and  is the strength of contact between residues − ij i and j. The variational free energy for a partially folded ensemble specified by the constraints, C , at temperature T is given by { }

(3.4) F [ C ] = k T log Z + H H . { } − B 0 h − 0i0

Here, Z0 is the partition function corresponding to the reference Hamiltonian, H0, and H H is the average computed with respect to the reference Hamiltonian. h − 0i0

40 The variational free energy given in Eq. 3.4 can be expressed as

(3.5) F [ C ] = E[ C ] TS[ C ] , { } { } − { } where E[ C ] is the energy arising from formation of native contacts, and S[ C ] is { } { } the entropy loss of each residue.

This framework can be adapted to study conformational transitions as well. Here, the energy function couples the two metastable basins. The reference Hamiltonian becomes

3 X N 2 (3.6) βH = βH + C [ri r (α )] . 0 chain 2a2 i − i i i

In this case the variational constraints tune the magnitude of fluctuations of each residue about an interpolated native conformation of the protein

N NI NA (3.7) ri (α ) = α ri + (1 α )ri , i i − i where α controls the conformation from the inactive (I; α = 1) and active (A; α = { i} i i 0) states of the protein. The energies computed with respect to the two basins,

EI[ C , α ] and EA[ C , α ], are coupled through the interpolation function[197] { } { } { } { } EI[ C , α ] + EA[ C , α ] + E E[ C , α ] = { } { } { } { } 0 { } { } 2 s EI[ C , α ] EA[ C , α ] E 2 (3.8) { } { } − { } { } − 0 + ∆2 . − 2

Here, the coupling function is based on a conformation’s total energy. An alterna-

tive coupling based on individual contacts has also been explored with the coupling

function[198]

(3.9) ( +  )u = k T ln[1 + e−(ij +0)uij /kBT ] . ij 0 ij − B

41 Similar coupling function for two single basin energies shown in Eq. 3.8 has been used

in other coarse-grained models to study protein conformational change[17, 125, 150,

215, 220].

The variational model has been applied to investigate allosteric transition mech-

anisms in the isolated N-terminal (nCaM) and C-terminal (cCaM) domains of CaM.

Notably, the two domains are predicted to have distinct transition mechanisms. In

particular, the domain opening mechanism of cCaM involves local partial unfolding

and refolding while nCaM remains folded throughout the transition[200, 198]. This

is reminiscent of the cracking mechanism introduced in Ref. [134, 135, 198] whereby

local unfolding relieves regions of local strain, thereby reducing an otherwise high free

energy barrier. Recent all-atom simulation work predict cracking in the linker region

of cCaM[180]. For the domains of CaM, Tripathi and Portman suggest that whether

a region of protein will undergo cracking during its conformational transition is not

solely determined by the magnitude of strain that develops during such transition

(Fig. 3.3.1), but depends on details of contact maps of the two states as well. One

motivation for the work in this dissertation is to investigate the robustness of the

prediction of distinct domain opening mechanism of nCaM and cCaM.

3.4 Simulation model for CaM

Although my primary goal is to study allosteric cooperativity of CaM, I first focus on simulating the open/closed conformational transition of nCaM and cCaM in the absence of Ca2+. This model couples two energy basins, one biased to the open (holo) conformation and the other to the closed (apo) reference conformation of CaM[150].

In the following sections, I briefly describe the general framework of the simulation models used in the studies.

42 (a) nCaM (b) cCaM A I B C II D E III F G IV H (apo) 1 (apo) 1 15 13.3 0.8 0.8

au ] au 10 au ] au

10 y [ 0.6 0.6 y [

0 0 erg erg 0 α α

0.4 en 0.4 5 en 5 ain ain

0.2 Str 0.2 Str 0 0 (holo) 0 (holo) 0 4 14 24 34 44 54 64 74 76 86 96 106 116 126 136 146 Residue index Residue index (c) (d) 2 α = 0.4 α = 0.4 2 0 0 au ] au au ] au y [ y [ erg

erg 1 1 en en ain ain Str Str

0 0 4 14 24 34 44 54 64 74 76 86 96 106 116 126 136 146 Residue index Residue index Figure 3.3.1: Distribution of strain energy of residues in CaM domain. (a) and (b) Change in strain energy for individual residues along the apo holo structural change of nCaM and cCaM. (c) and (d) Residue strain energy distributions→ at an intermediate state for nCaM and cCaM, respectively. Adapted from the work by Tripathi and Portman[198].

43 3.4.1 Model for conformational transition

A conformation in this coarse-grained model[150] is specified by the N position vectors of the C-α atoms of the protein backbone, R = r , r . For an energy { 1 ··· N } basin biased to the reference conformation, R0, the energy of a configuration R can be written as

(3.10) V (R) = V (R R ) + V (R R ) + V (R R ). 0 local | 0 n | 0 nn | 0

The first term in Eq. 3.10 defines the coarse-grained backbone

X X V (R R ) = K (b b0)2 + K (θ θ0)2 local | 0 b i − i θ i − i bonds angles X + K [1 cos(φ φ0)] φ − i − i dihedrals i + K(3)[1 cos 3(φ φ0)] , φ − i − i (3.11)

where bi, θi, and φi denote bond lengths, bond angles, and dihedral angles, respec- tively. The corresponding values in the native structure are denoted with a super-

0 0 0 script: bi , θi , and φi . The non-bonded interaction between neighboring residues in the native structure (native contacts) have short-ranged attraction

"  0 12  0 10# X rij rij (3.12) V (R R ) =  5 6 , n | 0 go r − r i

10 X  d  (3.13) V (R R ) =  . nn | 0 rep r i

0 Here, rij is the distance between C-α atoms i and j in a conformation, R, and rij is the corresponding separation distance found in the reference structure, R0.

44 To study conformational changes between two meta-stable states, the energies of the corresponding native basins, V1(R) and V2(R), are coupled through an interpo- lation function[96] s V + V + ∆V V V ∆V 2 (3.14) V (R) = 1 2 1 − 2 − + ∆2 . 2 − 2

Here, the interpolation parameters, ∆ and ∆V , control the barrier height and the relative stability of the two basins. The single basin energies V1(R) and V2(R) are computed from Eq. 3.10 with modifications to some of the reference parameters in the potential in order to minimize conflicts between the two contact maps. This model has been developed by Takada and co-workers and is available in the simulation package

Cafemol [96]. The source code is also available allowing us to modify the model for our studies.

3.4.2 Model for Ca2+-binding

To study thermodynamics of calcium binding to the two EF-hand loops of each do- main of CaM, the binding event is modeled implicitly by adding a potential defined from the ligand-mediated contacts in the EF-hand loops of the open (holo) confor- mation

" 0 2 # X rij rij (3.15) V = c  exp − . bind − lig go − 2σ2 i,j ij

Here, the sum is over residue pairs within a cutoff distance to a Ca2+ion in the holo reference conformation. For simplicity, I treat the binding energy parameters in

Eq. 3.15 to be the same for each contact formed in the presence of Ca2+.

I work within the Grand Canonical Ensemble to simulate Ca2+binding/unbinding events coupled to conformational change of the protein. The ligation state of each

45 binding loop (four in total) is determined stochastically through Monte Carlo steps

attempted at a fixed rate during the protein’s conformational transitions. The at-

tempt probabilities for binding and unbinding, α0→1 and α1→0, satisfy the detailed

balance condition

(3.16) α0→1p0 = α1→0p1.

Here, p0 and p1 denote the equilibrium probabilities for the protein to be in the

unbound and bound state, respectively.

In Monte Carlo scheme, acceptance probability is determined based on detailed

balance condition

P α p (3.17) 0→1 = 1→0 1 . P1→0 α0→1 p0

Here, P0→1 and P1→0 denote acceptance probabilities for binding and unbinding, respectively. Eq. 3.17 can be expressed as

P0→1 α1→0 (3.18) = exp [ (Vbind µ)/kBT ] . P1→0 α0→1 − −

For thermodynamics simulations, we attempt to change the ligation state of each loop per Monte Carlo step. That is, α1→0 = α0→1. If the loop is unligated, a ligand

is introduced (V V + V ) with probability → bind

(3.19) P → = min[1, exp [ (V µ)/k T ]]. 0 1 − bind − B

For unbinding transitions, if the loop is ligated, the ligand dissociates from the binding

loop (V + V V ) with probability bind →

(3.20) P → = min[1, exp [(V µ)/k T ]]. 1 0 bind − B

46 Here, µ is the chemical potential of a bound ligand. At equilibrium, µ equals the chemical potential of the ligand in solution,

 c  (3.21) µ = kBT ln + µ0 , c0

where c is the ligand concentration, and c0 and µ0 are the reference concentration and

reference chemical potential, respectively. The simulated binding curves are reported

as function of the chemical potential, or equivalently, in terms of the relative ligand

concentration defined through µ/k T = ln (c/c¯ ) where,c ¯ = c exp( µ /k T ). B 0 0 0 − 0 B For thermodynamic calculations, it is efficient to sample binding and unbinding

events with the same frequency. Nevertheless, this choice is problematic for kinetics

simulations. The symmetric Monte Carlo scheme does not accurately describe the

concentration dependence of the binding rates. For bimolecular reactions, the bind-

ing rate is proportional to ligand concentration, k c, while the unbinding rate, bind ∝

kunbind, is independent of ligand concentration. Accordingly, for kinetics I choose a

non-symmetric scheme for binding and unbinding that still satisfies the detailed bal-

ance. Here, unbinding is independent of concentration which suggests the acceptance

probability

(3.22) P1→0 = min[1, exp [Vbind/kBT ]].

To satisfy detailed balance, the attempt probabilities become

(3.23) α0→1 = α1→0 exp [µ/kBT ] .

with a binding acceptance probability of

(3.24) P → = min[1, exp [ (V µ)/k T ]]. 0 1 − bind − B

47 Eq. 3.23 is satisfied by attempting unbinding events every τ0 steps, while binding is

−1 attempted with probability τ0 exp [µ/kBT ]. This choice of Monte Carlo acceptance probabilities gives the expected dependence of simulated on and off rates for ligand binding. Takada and co-workers provide a similar approach albeit the ligand concentration is controlled by changing the attempt rate for binding[115]. The advantage of our formalism is that the ligand concentration is controlled by its binding chemical potential making the connection to thermody- namic models straightforward.

The simulations of open/closed transitions and Ca2+binding thermodynamics and kinetics were parameterized based upon experimental reports of relative stability between the two basins, and folding temperature of the domains. This is discussed in more detail in Chapters 4, 5, 6.

48 Chapter 4

Comparing allosteric transitions in the domains of calmodulin

through coarse-grained simulations

Abstract

Calmodulin (CaM) is a ubiquitous Ca2+-binding protein consisting of two struc- turally similar domains with distinct stabilities, binding affinities, and flexibilities.

We present coarse grained simulations that suggest the mechanism for the domain’s allosteric transitions between the open and closed conformations depend on subtle dif- ferences in the folded state topology of the two domains. Throughout a wide temper- ature range, the simulated transition mechanism of the N-terminal domain (nCaM) follows a two-state transition mechanism while domain opening in the C-terminal domain (cCaM) involves unfolding and refolding of the tertiary structure. The ap- pearance of the unfolded intermediate occurs at a higher temperature in nCaM than it does in cCaM consistent with nCaM’s higher thermal stability. Under approximate physiological conditions, the simulated unfolded state population of cCaM accounts for 10% of the population with nearly all of the sampled transitions (approximately

95%) unfolding and refolding during the conformational change. Transient unfolding significantly slows the domain opening and closing rates of cCaM. This potentially influences the mechanism of Ca2+-binding to each domain1.

1Adapted from P. Nandigrami and J. J. Portman, The Journal of Chemical Physics 144, 105102 (2016); http://doi.org/10.1063/1.4943130

49 4.1 Introduction

Allostery is central to the precise molecular control necessary for protein function.

Indirect coupling between distant regions of a protein is often provided through a conformational transition between a “closed” (ligand-free) and “open” (ligand-bound) structure upon ligation. NMR experiments that reveal proteins exist in dynamic equilibrium with multiple conformers[13, 138, 55, 54, 209, 76] suggest that a protein’s conformational dynamics in the absence of a ligand plays an essential role in [122, 190, 75, 18]. The functional dynamics of a folded protein occurs near the bottom of the funneled energy landscape, a part of the landscape generally more susceptible to perturbations than the self-averaged kinetic bottleneck that determines the mechanism of folding[26]. This sensitivity, while important for a protein’s ability to dynamically respond to environmental conditions and interaction with ligands, also makes the prospect of general organizing principles for allostery problematic[227]. In this work, we explore the sense in which the summarizing statement that native state topology determines the folding mechanism of small single domain proteins[12] carries over to large-scale conformational transitions.

Due in part to limitations on computational timescales, much theoretical work modeling large-scale conformational transitions in proteins has focused on simplified, coarse-grained models based on the energy basins defined by the open and closed conformations. The Gaussian network and related models describe an allosteric tran- sition as motion along low frequency normal modes of the closed state conformational basin[9, 7, 193, 10]. While the dynamics about a single free energy minimum offers a natural rationale and clear description of the collective motions involved in the conformational change[219, 11], a minimal model capable of capturing the transition

50 mechanism must accommodate the change in dynamics as protein moves between the two distinct meta-stable free energy basins. Allosteric transitions have been modeled by several different methods in which two meta-stable basins are coupled through an interpolation based on its energy. For example, minimal energy pathways have been computed for a potential surface based on the strain energies relative to each minimum conformation to predict the transition mechanism[125, 40, 49]. Structure based simulations that couple two conformational basins have also been developed to understand the mechanism of allosteric transitions[17, 150, 149, 36, 119, 220]. Ad- ditionally, transition mechanisms have been described in terms of the evolution of each residue’s local flexibility using a coarse grained variational model [199, 200, 198,

197]. Itoh and Sasai present an alternative approach to predict allosteric transition mechanisms in which contacts from two meta-stable structures are treated on equal footing rather than through an interpolated energy function[89, 90].

We utilize coupled structure based simulation of the opening transition in the domains of calmodulin (CaM) to explore how subtle differences in the native state topology can lead to qualitative changes in the transition mechanism. This work is motivated in part by an intriguing theoretical prediction[200] that the domain opening mechanism of the C-terminal domain (cCaM) involves local partial unfolding and refolding while the N-terminal domain (nCaM) remains folded throughout the transition. These distinct transition mechanisms are in harmony with the Itoh and

Sasai’s model that predicts cCaM has larger fluctuations than nCaM during domain opening[90]. Local unfolding in cCaM is found to relieve regions of high local strain during the transition[198] in agreement with the cracking mechanism of allosteric transitions discussed by Miyashita et al[134, 135].

51 CaM is a ubiquitous Ca2+-binding protein consisting of two structurally similar globular domains connected by a flexible linker. Each domain consists of two helix- loop-helix motifs (the EF-hands) connected by a flexible linker as shown in Fig. 4.2.1.

Although topologically similar, the two CaM domains have distinct flexibilities, melt- ing temperatures and thermodynamic Ca2+-binding properties[204, 116, 184, 127].

In the absence of Ca2+, the C-terminal domain is particularly dynamic[196] and is less stable than the N-terminal domain in the intact protein and when separated into isolated domains[23, 184, 127]. The C-terminal domain, which has a very low denaturation temperature, is reported to be considerably unfolded under physiolog- ical temperature[164]. Furthermore, NMR experiments monitoring the open/closed transition of isolated cCaM have revealed local transient unfolding of helix F during domain opening[121].

The simulations presented in this paper suggest that over a wide range of tem- peratures, domain opening in cCaM involves global unfolding and refolding, while the unfolded conformations are much less prominent in nCaM’s primarily two-state domain opening mechanism. The appearance of an unfolded intermediate at a suffi- ciently high temperature is expected and has been reported for similar simulations of the conformational transition of cCaM[36] and the homologous protein S100A6[150], as well as other proteins[17, 220]. Given the structural similarity of the two domains, it is harder to anticipate that the unfolded ensemble becomes locally stable at a significantly higher temperature in nCaM than it does in cCaM. Both the analytic model and simulations suggest that cCaM is more susceptible to unfolding during domain opening, despite employing very different approximations. Nevertheless, the

52 simulated intermediate is globally unfolded in contrast to the local unfolding pre-

dicted by the analytic model. In terms of the kinetics, global unfolding and refolding

significantly slows the simulated domain opening rate in cCaM which potentially can

bias the partitioning of Ca2+-binding kinetics between induced fit and conformational

selection for the two domains.

4.2 Methods

We use a native-centric model implemented in the Cafemol simulation package[96]

to study the open/closed conformational transitions of the isolated N-terminal and

C-terminal domains of CaM. This model couples two energy basins, one biased to

the open (holo) conformation and the other to the closed (apo) reference confor-

mation[150]. To study allosteric transitions in the isolated Ca2+-free domains of

calmodulin, the two energy basins are coupled via an interpolation function. The

two energy basins are biased to the open (holo) reference native conformation and

closed (apo) native reference conformation, respectively. The open and closed con-

formations of the domains of CaM are shown in Fig. 4.2.1. A conformation in this

coarse-grained model[150] is specified by the N position vectors of the C-α atoms of

the protein backbone, R = r , r . For an energy basin biased to the reference { 1 ··· N }

conformation, R0, the energy of a configuration R can be written as

(4.1) V (R) = V (R R ) + V (R R ) + V (R R ). 0 local | 0 n | 0 nn | 0

53 (a) A I B C II D (b) E III F G IV H

10 20 30 40 50 60 70 80 90 100 110 120 130 140

Figure 4.2.1: Aligned structures Ca2+-free (closed/apo) and Ca2+-bound (open/holo) native conformations for (a) N-terminal domain and (b) C-terminal domain of Calmodulin. The closed state (pdb: 1cfd [103] ) is shown in blue, and the open state (pdb: 1cll[34]) is shown in green. The closed (apo) and open (holo) conforma- tions of (a) nCaM (residue index 4–75) consist of helices A, B and C, D with binding loops I and II respectively. The closed (apo) and open (holo) conformations of (b) cCaM (residue index 76–147) consist of helices E, F and G, H with binding loops III and IV respectively. Secondary structure legend for nCaM and cCaM are shown on top of the protein structures. The CaM structures were made using visual molecular dynamics[85].

54 The first term in Eq. 4.1 defines the coarse-grained backbone

X X V (R R ) = K (b b0)2 + K (θ θ0)2 local | 0 b i − i θ i − i bonds angles X + K [1 cos(φ φ0)] φ − i − i dihedrals i + K(3)[1 cos 3(φ φ0)] , φ − i − i where bi, θi, and φi denote bond lengths, bond angles, and dihedral angles, respec- tively. The corresponding values in the native structure are denoted with a super-

0 0 0 script: bi , θi , and φi . The non-bonded interaction between neighboring residues in the native structure (native contacts) have short-ranged attraction

"  0 12  0 10# X rij rij (4.2) V (R R ) =  5 6 , n | 0 go r − r i

10 X  d  (4.3) V (R R ) =  . nn | 0 rep r i

0 Here, rij is the distance between C-α atoms i and j in a conformation, R, and rij is the corresponding separation distance found in the reference structure, R0. In the study of allosteric transitions between the two domains of Ca2+-free CaM, the coefficients defining the energy function are set to their default values in Cafemol:

(1) (3) Kb = 100.0, Kθ = 20.0, Kφ = 1.0 and Kφ = 0.5, go = 0.3, rep = 0.2 in units of kcal/mol, and d = 4A.˚ Trajectories are simulated using Langevin dynamics with a friction coefficient of γ = 0.25 and a timestep of ∆t = 0.2 (in coarse-grained units)[149]. With these parameters, the folding transition temperatures of the isolated

o ◦ CaM domains are estimated from equilibrium trajectories to be TF(nCaM) = 333.6 K

55 c ◦ o and TF(nCaM) = 328.9 K for the open and closed state of nCaM, and TF(cCaM) =

◦ c ◦ 335.1 K and TF(cCaM) = 330.5 K for the open and closed state of cCaM, respectively. Experimentally, the isolated domains have similar folding transition tempera-

◦ o o tures of approximately 323 K[127]. Although TF(cCaM) and TF(nCaM) as well as

c c ◦ TF(cCaM) and TF(nCaM) are within 2 K (with cCaM’s thermal stability slightly be- low nCaM’s ), coupling the open and closed basins significantly destabilizes cCaM with respect to nCaM (described below). Consequently, the simulations relevant to the domains of intact CaM, for which interactions between the domains, particu- larly with the linker region[148], reduce the folding temperature of the C-terminal domain to roughly 315◦K and increase the folding temperature of N-terminal domain to 328◦K[204, 184, 127].

To study conformational changes between two meta-stable states, the energies of the corresponding native basins, V1(R) and V2(R), are coupled through an interpo- lation function[96] s V + V + ∆V V V ∆V 2 (4.4) V (R) = 1 2 1 − 2 − + ∆2. 2 − 2 Here, the interpolation parameters, ∆ and ∆V , control the barrier height and the relative stability of the two basins. The single basin energies V1(R) and V2(R) are computed from Eq. 4.1 with modifications to some of the reference parameters in the potential in order to minimize conflicts between the two contact maps. (Please refer to

Ref. [150, 149, 96] for details). To compare the simulated domain opening mechanisms most clearly, it is convenient to choose coupling parameters ∆ and ∆V so that the barrier between the two states is low enough to give sufficient sampling of the two states and equal stability of the open and closed conformations (a choice to improve sampling of the equilibrium transition kinetics). With ∆ = 14.0 kcal/mol and ∆V =

56 -4.0 (a)

-6.0

-8.0 Free Energy -10.0

-12.0 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6

-4.0 (b)

-6.0

-8.0 Free Energy -10.0

-12.0 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 ∆Q

Figure 4.2.2: Simulated free energy (in units of kBT ) as a function of the global progress coordinate ∆Q = Q Q for (a) nCaM and (b) cCaM. closed − open

57 2.15 kcal/mol for nCaM, and ∆ = 17.5 kcal/mol and ∆V = 0.25 kcal/mol for cCaM, the open and closed states are equally probable with a free energy barrier of 4k T ' B as shown in Fig. 4.2.2. With these parameters, the folding temperature for cCaM is approximately 25 degrees below the folding temperature of nCaM as indicated by the peaks in the heat capacity shown in Fig. 4.2.3. We report temperatures relative

? ◦ to the simulated folding temperature of cCaM, denoted as TF = 275.0 K. Although we have explored a wide range of temperatures, most of the results presented in this

? paper have Tsim = 0.96TF, a temperature slightly below the folding temperature of cCaM, and significantly below the folding temperature of nCaM.

NMR experiments indicate that the closed state of cCaM is more stable than the open state under physiological conditions, accounting for roughly 90% of the popula- tion[124]. Assuming nCaM is similar, we adjust the relative stability of both domains through the coupling parameter ∆V to match this stability (∆V = 3.5 kcal/mol for nCaM, and ∆V = 4.0 kcal/mol for cCaM). As shown in Fig. 4.2.3, the folding tem- peratures of the domains are sensitive to this destabilization of the open state. The simulated folding temperatures of the two domains differ by approximately 18◦K, somewhat larger than the difference in experimental folding temperatures of the do- mains in intact CaM, approximately 13◦K[164]. To connect to the domain opening kinetics in intact CaM, we relate the simulated temperatures to the folding temper- atures of its N-terminal and C-terminal domains. With this choice, the physiological temperature 310◦K corresponds to simulation temperature of 95% of nCaM’s folding temperature, and 98% of cCaM’s folding temperature. Simulated conformational en- sembles are characterized through local and global structural order parameters based on the contacts formed in each sampled conformation. A native contact is considered

58 K) 1000 ◦ 800

600

400

200

Heat capacity (kcal/ 0 240 260 280 300 320 340

Temperature (◦K) Figure 4.2.3: Heat capacity as a function of temperature for cCaM (red) and nCaM (blue) for two relative stabilities of the open and closed basins. The solid curves correspond to equally stable open and closed basins, and in the dashed curves the open state occupies approximately 10% of the total population.

to be formed if the distance between the residues is closer than 1.2 times the cor- responding distance in the native conformation. To characterize structural changes during the conformational transition, it is convenient to separate the set of native contacts in the open (holo) and closed (apo) conformations into three groups: those that occur exclusively in either the open or the closed native reference conformation, and those that are common to both states. For each of these groups, denoted by α =

(open, closed, and ), we define a local order parameter, q (i), as the fraction of na- ∩ α tive contacts formed involving the ith residue. Overall native similarity is monitored by corresponding global order parameters, Q = q (i) , where the average is taken α h α i over the residues of the protein. The free energy parameterized by these global order parameters are used to identify locally stable conformational ensembles such as the open and closed basins.

The transition rates between two coarse-grained ensembles are calculated from

59 equilibrium simulations of length 108 steps typically involving O(103) open/closed

transitions for nCaM and O(102) open/closed transitions for cCaM. The transition rate between two states labeled by i and j is estimated by[28]

Ni→j (4.5) ki→j = P , τ N → h ii k=6 i i k

where τ is the mean time spent in state i between transitions, and N → are the h ii i j number of transitions from state i to state j. When the allosteric transition involves

−1 only the open and closed states, Eq. 4.5 reduces to the two state rates, k → = τ o c h oi −1 and k → = τ , where τ and τ are the mean first passage times to leave the c o h ci h oi h ci open and closed state, respectively.

4.3 Conformational Transitions of Isolated Domains

The populations of simulated conformations organized in terms of global order

parameters are shown in Fig. 4.3.1. The free energy as a function of Qopen and Qclosed

shows that the nCaM has a two-state domain opening and its conformational tran-

sition is sequential. That is, contacts specific to the closed conformation are lost

prior to formation of contacts specific to the open conformation which mostly form

after transition state region. Fig. 4.3.1 also shows the free energy projected onto the

order parameter monitoring common contacts, Q∩, and a progress coordinate for the

conformational transition, ∆Q = Q Q . The global order parameter Q∩ closed − open monitors the overall structural integrity of the secondary structure as well as tertiary

contacts within parts of the protein that do not have large conformational changes

during the transition. As shown in Fig. 4.3.1, the common contacts in nCaM’s tran-

sition state ensemble remain largely intact. In contrast, the simulated open/closed

free energy for cCaM has a locally stable intermediate state. Simultaneously low

60 values of Qopen and Qclosed (both less than 0.3), and Q∩ (less than 0.7), indicate that the intermediate has significantly reduced tertiary structure. The probability to form

nCaM cCaM 1.0 0.0 (a) (b) 0.8 2.5 − 0.6 5.0 open −

Q 0.4 7.5 0.2 − 10.0 0.0 − 0.0 0.4 0.8 0.0 0.4 0.8 Qclosed Qclosed 1.0 0.0 (c) (d) 0.8 2.5 − 0.6 ∩ 5.0 Q 0.4 − 7.5 0.2 − 10.0 0.0 − -0.8 -0.4 0.0 0.4 0.8 -0.8 -0.4 0.0 0.4 0.8 Q Q Q Q closed − open closed − open

Figure 4.3.1: Free energy, in units of kBT , projected onto global order parameters Qopen, Qclosed, and Q∩ for nCaM (a and c) and cCaM (b and d). The intermediate in free energy surface of cCaM corresponds to an ensemble of states with intact secondary structure but lacking stable tertiary contacts.

individual contacts in the intermediate (data not shown) verifies that the secondary structure remains intact, though nearly all the tertiary interactions are lost. Since the barrier for the transition closed I (∆F † 4 k T) is higher than the barrier for → ' B I open (∆F † 1.5 k T), the intermediate can be considered to be part of cCaM’s → ' B extended open basin.

To describe the transition mechanism at the residue level, we consider the local

61 order parameter q∩(i) of each residue as a function of the global progress coordinate

∆Q. As shown in Fig. 4.3.2, cCaM’s residues lose the majority of their common contacts upon opening (moving upward in the plot) and regain them later in the transition. Although the folding and refolding of residues in helices E and H are more gradual than other residues, nearly every residue (except the residues in the linker region between helices F and G) looses native tertiary structure. In contrast, the common contacts in nCaM remain intact throughout the transition, though the contacts involving specific residues in helices A and D and the β-sheets in the loops are

strained. Limited loss of long range common contacts in nCaM reflect an increased

flexibility of the folded transition state ensemble.

The pairwise contact probability shown in Fig. A.1.1 in Appendix A provides an

alternative description of the transition state ensemble in nCaM and intermediate

ensemble in cCaM during domain opening. The secondary structure of both nCaM

and cCaM is intact during the transition. However, nCaM’s contacts show a signature

of limited loss of long range tertiary contacts particularly between helices A/D and

helix A/C-D linker. In contrast, cCaM shows global loss of almost all the long-

rage tertiary contacts in the intermediate ensemble. This description provides a

complementary picture of the transition process as described by Fig. 4.3.2.

A coarse-grained, analytic model, also predicts distinct transition mechanisms for

each domain in which cCaM is susceptible to local unfolding during the open/closed

transition, while nCaM remains folded[200, 198]. The conformational transition in

the analytic model is described as the evolution of local flexibility along the transition

route. Fig. 4.3.3 shows the simulated local flexibility for four discrete values of the

progress coordinate, ∆Q. Although the fluctuations of the residues in both domains

62 A I B C II D 1.0 (a) 0.4 0.8 0.2 open 0.6 Q

− 0.0 0.4

closed 0.2 Q − 0.2 0.4 − 0.0 10 20 30 40 50 60 70 sequence index E III F G IV H 1.0 (b) 0.4 0.8 0.2 open 0.6 Q

− 0.0 0.4

closed 0.2 Q − 0.2 0.4 − 0.0 80 90 100 110 120 130 140 sequence index

Figure 4.3.2: Local order parameter, q∩(i), plotted as a function of the global progress coordinate, Qclosed Qopen, for each residue of (a) nCaM and (b) cCaM. The color represents the probability− of each residue forming native contacts common to both the open and closed structures: low probability is shown by red and high probability is shown by blue.

63 increase and then decrease during the transition, the magnitude of the largest fluc- tuations are much greater in cCaM. In contrast to the global unfolding observed in the simulations, unfolding and refolding of cCaM predicted by the analytic model is localized to particular residues (primarily in the linker between helix F and G).

9 (a) 8 ∆Q = 0.4 ∆Q = 0.0 7 − ∆Q = 0.2 ∆Q = 0.2 6 − ˚ A) 5 4

rmsf ( 3 2 1 0 10 20 30 40 50 60 70 16 (b) 14 12 10 ˚ A) 8

rmsf ( 6 4 2 0 80 90 100 110 120 130 140 sequence index Figure 4.3.3: Magnitude of the root mean square fluctuations for each residue for the conformational ensembles along the transition pathway for (a) nCaM and (b) cCaM. Each color corresponds to the value of ∆Q = Qclosed Qopen indicated in the legend in (a). −

64 Exploring a range of temperatures reveals that both domains can exhibit a two-

state transition mechanism or a transition mechanism that involves unfolding and

refolding depending on the temperature (see Fig. 4.3.4). The transition mechanism

at low temperatures is two state, involving primarily well folded conformational en-

sembles throughout the transition. Increasing the temperature progressively stabilizes

the unfolded ensemble until it becomes locally stable at a spinodal temperature, Ts.

Above the spinodal temperature, the transition between the open and closed state involves unfolding and refolding of the domain. At high enough temperatures, the unfolded conformation becomes the most stable state. Although both domains fol- low similar transition scenarios as a function of temperature, the domains can have different transition mechanisms from each other because the spinodal temperatures are different. Comparing the two domains, cCaM has a lower spinodal temperature

(T c 0.93T ?) than nCaM (T n 1.005T ?). For low temperatures, (T < T c), both the s ≈ F s ≈ F s c n domains have two state transitions. For intermediate temperatures (Ts < T < Ts ), the domain opening transition of nCaM is two state, while the transition of cCaM

n involves unfolding and refolding. For higher temperatures (Ts < T ), the unfolded ensemble of nCaM is locally stable, but at this temperature the unfolded ensemble of cCaM is stabilized enough to become the global minimum. Focusing on the scenario when the open state is 10% of the total population and at a simulation temperature corresponding to T = 310◦K (to model intact CaM at physiological conditions), the simulated unfolded population is less than 1% for nCaM, and approximately 9% for cCaM. These equilibrium unfolded populations can be compared to reports of 2% for the N-terminal domain and 24% for the C-terminal domain in intact CaM based on thermodynamic stability measurements[127].

65 nCaM cCaM 1.0 0 ? ? (a) T = 0.89TF (b) T = 0.89TF 0.8 2 − 0.6 4

open −

Q 0.4 6 − 0.2 8 0.0 − 1.0 0 ? ? (c) T = 1.08TF (d) T = 1.08TF 0.8 2 − 0.6 4 − open

Q 0.4 6 − 0.2 8 − 0.0 0.0 0.4 0.8 0.0 0.4 0.8 Qclosed Qclosed

Figure 4.3.4: Free energy, in units of kBT , projected onto global order parameters Qclosed and Qopen for nCaM (a and c) and cCaM (b and d) with temperature Tsim = ? ? 0.89TF and Tsim = 1.08TF. At lower temperatures, the unfolded conformations are destabilized so that the transition mechanism in both domains becomes more two- state. At higher temperatures, the unfolded states are stabilized for both nCaM and cCaM.

66 4.4 Transition Kinetics

Using Eq. 4.5 to calculate opening rates for each domain at Tsim, we find that unfolding

and refolding along the transition route significantly slows cCaM’s domain opening

rate compared to nCaM. Quantitatively, the domain opening and closing rates of

−3 −1 nCaM, k → = k → = 2 10 ∆t , are 50 times larger than the effective opening o c c o × −5 −1 and closing rates of cCaM, k → = k → = 4 10 ∆t . o c c o × A closer look at cCaM’s kinetic transitions reveals that only 5% of its transi- ≈ tion paths proceed through direct transitions from the closed to open state without

significant unfolding along the way. The rest of the transitions occur according to the

kinetic equation

slow fast (4.6) closed(c)  I  open(o),

−5 −1 −4 −1 −3 −1 where k → = 4 10 ∆t , k → = 2 10 ∆t , k → = 8 10 ∆t , and c I × I c × I o × −3 −1 k → = 2 10 ∆t are the corresponding simulated rates between the open, closed, o I × and intermediate states.

Equilibrium between the open and the unfolded intermediate is established quickly

on the timescale of the conformational transition so that the unfolded intermediate

establishes a steady-state population

kc→I Pc + ko→I Po (4.7) PI = , kI→c + kI→o

where Pc and Po are the equilibrium populations of the closed and open state re- spectively. The effective two-state kinetics for open/closed transition can be written as

eff kc→I kI→o (4.8) kc→o = kI→c + kI→o

67 and

eff ko→I kI→c (4.9) ko→c = . kI→c + kI→o

Since k → k → , these expressions for the two-state rates can be simplified. I c  I o The effective domain opening rate is determined by the unfolding of the closed state

eff (4.10) k k → , c→o ≈ c I and the closing rate can be understood through the equilibration of the intermediate and open state   eff PI (4.11) ko→c kI→c , ≈ Po where PI /Po = 0.2 is the population of the unfolded intermediate relative to the open state. The simulated effective two state rates for cCaM are consistent with this steady-state description of the kinetics.

The slowing influence of the folding and unfolding transition persists when the open state is destabilized to 10% of the total population, with domain opening ap- proximately 45 times faster in nCaM than in cCaM at simulated temperatures that correspond to T = 310◦K.

4.5 Discussion

Although the isolated domains of CaM are topologically similar, the simulated open/closed transition mechanisms are distinct due to the presence of an unfolded intermediate that appears in the free energy landscape at a different temperature for each domain. Two-state transition kinetics persist at higher temperatures in nCaM, whereas the unfolded ensemble is more readily stabilized in cCaM. Above the spin- odal temperature, transient unfolding and refolding of the domain occurs through

68 the locally stable unfolded intermediate (exemplified by cCaM at Tsim). Below the

spinodal temperature, the transition is two-state like albeit with conformational dy-

namics that anticipates the unfolded intermediate with high flexibility and stressed

tertiary interactions (as in nCaM at Tsim).

The unfolding and refolding along the open and closed transition is reminiscent of the cracking mechanism[134, 135, 215] in which regions of high local strain are relieved through unfolding and refolding in the transition region. Since the unfolded conformations involved in cracking are typically locally unstable, the domain open- ing of CaM most closely follows this canonical description at temperatures near the spinodal for the unfolded conformations.

High temperature unfolded intermediates have been reported previously in simula- tions of the open/closed transition in cCaM[36] and the homologous protein S100A6[150].

Chen and Hummer found that the population of the open ensemble is comparable to that of a marginally stable unfolded ensemble within a narrow temperature range.

They argue that the sensitive balance between unstable folding and unfolded pop- ulations explains why some experiments report an open/closed transition[55, 124,

207, 120, 121], and others report folding/unfolding transition for cCaM under similar conditions[164].

Our simulations suggest that subtle differences in the topology and stability of the two domains can result in distinct transition mechanisms. In particular, we find that the unfolded population is stabilized more readily in cCaM, a result consistent with the prediction that cCaM (and not nCaM) exhibits local folding and unfolding during opening[199, 198]. The C-terminal domain’s lower spinodal temperature may reflect its decreased overall relative thermodynamic stability. Indeed, nCaM is measured to

69 be more stable than cCaM in the absence of Ca2+[184], with cCaM being significantly unfolded at room temperature (20 – 25◦C)[127].

The transient unfolding and refolding observed in the simulations significantly slows the transition kinetics of cCaM. Several key observations of CaM dynamics have been reported, but how the dynamics of the individual domains compare is not clear from the literature. NMR studies of intact CaM in the absence of Ca2+report that cCaM is more dynamic than the nCaM, with an exchange time of 350 µs for cCaM[196]. This timescale is comparable to the folding and unfolding equilibration time of 200 µs for cCaM under similar conditions[164]. The dynamics of Ca2+-loaded cCaM with a mutation E140Q that stabilizes the open state and prevents binding to loop IV exhibits exchange on the faster timescale of 25 µs[54] and undergoes local transient unfolding[121]. The dynamics of both domains under similar conditions has been reported by Price and co-workers who used fluorescence correlation spectroscopy coupled to F¨orsterResonance Energy Transfer (FRET) to monitor the intramolecular dynamics of both nCaM and cCaM on the microsecond timescale[161]. They report that both domains have fluctuations on the 30 – 40 µs timescale in the absence of

Ca2+. The Ca2+-dependence of the fluctuation amplitude, however, indicates that the observed fluctuations couple to the occupancy of the binding sites (and hence to domain opening) only in nCaM. Taken together, the evidence that the two domains have a different conformational timescale and/or mechanism is intriguing in light of the predictions from the coarse-grained simulations. Nevertheless, understanding how flexibility and transient unfolding influences domain opening dynamics of CaM requires further experimental clarification.

70 Chapter 5

Coarse-grained molecular simulations of allosteric cooperativity

Abstract

Interactions between a protein and a ligand are often accompanied by a redistribu- tion of the population of thermally accessible conformations. This dynamic response of the protein’s functional energy landscape enables a protein to modulate binding affinities and control binding sensitivity to ligand concentration. In this paper, we investigate the structural origins of binding affinity and allosteric cooperativity of binding two Ca2+-ions to each domain of calmodulin (CaM) through simulations of a simple coarse-grained model. In this model, the protein’s conformational transitions between open and closed conformational ensembles are simulated explicitly and ligand binding and unbinding is treated implicitly within the Grand Canonical Ensemble.

Ligand binding is cooperative because the binding sites are coupled through a shift in the dominant conformational ensemble upon binding. The classic Monod-Wyman-

Changeux model of allostery with appropriate binding free energy to the open and closed ensembles accurately describes the simulated binding thermodynamics. The simulations predict that the two domains of CaM have distinct binding affinity and cooperativity. In particular, C-terminal domain binds Ca2+with higher affinity and greater cooperativity than the N-terminal domain. From a structural point of view, the affinity of an individual binding loop depends sensitively on the loop’s structural compatibility with the ligand in the bound ensemble, as well as the conformational

71 flexibility of the binding site in the unbound ensemble1.

5.1 Introduction

Conformational dynamics is essential for a protein’s ability to exhibit allostery. The coupling between two distant binding sites is frequently accomplished by a confor- mational change between a “closed” (apo) to an “open” (holo) conformation upon ligand binding[68]. Although the end point conformations often give valuable in- sight into protein function, a detailed description of the allosteric mechanism for a particular protein requires one to consider a broader conformational ensemble. The landscape theory of binding[122, 190, 18] acknowledges that a folded protein is in- herently dynamic and explores the thermally accessible conformational states in its native basin[75]. This conformational ensemble comprises the protein’s “functional landscape”[227]. While only a small subset of the states comprising the folding en- ergy landscape[26], the functional landscape determines how a protein responds to the changes in its local environment such as ligand interactions. Due to the het- erogeneous nature of the conformational ensemble, a ligand preferentially stabilizes some conformations more than others, causing the protein’s thermal population to redistribute to a ligated ensemble which in general has distinct equilibrium proper- ties[104, 181]. The ensemble nature of allostery accommodates a rich and diverse set of regulatory strategies and provides a general framework to understand binding thermodynamics and kinetics of specific proteins[81, 137]. Even simple landscapes with a small number of well defined basins separated by kinetic barriers can have subtle binding mechanisms because they depend on ligand interactions to short-lived

1Adapted from P. Nandigrami and J. J. Portman, The Journal of Chemical Physics 144, 105101 (2016); http://doi.org/10.1063/1.4943043

72 transient states. Experimental progress on this challenging kinetics problem has ap- peared only very recently[47]. In principle, affinities of metastable states can also be obtained from thermodynamic binding measurements, although such analysis may not always be practical. In this paper, we focus on the cooperative binding of two

Ca2+-ions to the binding loops of the domains of Calmodulin (CaM) through equi- librium coarse-grained simulations.

In this minimal model, the conformational transition between the open and closed ensembles are simulated explicitly and the dynamic shift in population due to ligand binding and unbinding is approximated by discrete jumps between a ligated and unli- gated free energy surfaces[96]. The protein dynamics are governed by a native-centric potential that couples the open and closed conformational basins while ligation is represented implicitly through ligand mediated protein contacts. This model, devel- oped by Takada and co-workers, has been used to investigate the kinetic partitioning of induced fit and conformational selection binding pathways[149] as well as mechan- ical unfolding of Calmodulin in the presence of Ca2+[115]. Here, we assume that the ligands bound to the protein are in equilibrium with a dilute solution and calculate binding thermodynamics as a function of ligand concentration. The model is param- eterized so that the closed basin is more stable than the open basin in the unligated ensemble. The two binding sites in the open and closed basins can be in the follow- ing ensemble of states: both sites un-occupied (unligated ensemble), either binding site occupied (partially saturated ensembles) and both sites occupied (fully saturated ensemble), thereby giving rise to a “four-state” description (see Fig. 5.5.1).

Ligands interact with all conformations in the ensemble, but the affinity is largest for conformations within the open basin due to their high structural compatibility

73 with the ligand. Thus, the population shifts towards the open ensemble with increas- ing ligand concentration (see Fig. B.1.1 in Appendix B). The simulated ensembles have significant molecular fluctuations which modulate ligand affinities and affect the coupling between the binding sites. When binding thermodynamics are dominated by the open and closed ensembles, this model provides a molecular realization of the celebrated Monod-Wyman-Changeux (MWC) model of allostery[136, 31]. For bind- ing a single ligand the MWC model has four states: unligated-open, unligated-closed, ligated-open, and ligated-closed. Appealing to this simple four-state model allows us to extract binding free energies of the isolated sites in the simulated open and closed ensembles and to calculate the free energy associated with the cooperative coupling between the sites. The simulations connect the conformational ensemble underlying the protein’s dynamics with the MWC phenomenological binding parameters[45].

Early work on binding thermodynamics of CaM has revealed that the affinities and cooperativities of the N-terminal domain (nCaM) and the C-terminal domain (cCaM) are distinct despite their structural similarity[98, 211, 116, 14, 127]. Although some experimental data has been reanalyzed recently[186, 187, 107], the traditional analysis of thermodynamic binding data has not used a dynamic landscape (or MWC) frame- work[116, 184, 144]. Nuclear Magnetic Resonance experiments[13, 55, 54] and all atom molecular dynamics simulations[207] that show a dynamic equilibrium between the open and closed conformations of CaM’s domains in the absence of Ca2+support our approach.

5.2 Methods

Calmodulin (CaM) is a small, 148 amino acid long protein consisting of two topolog- ically similar domains. Each domain consists of four α-helices and a pair of EF-hand

74 Ca2+-binding loops. The N-terminal domain (nCaM) has helices labeled A – D with binding loops I and II, and the C-terminal domain (cCaM) has helices labeled E – H with binding loops III and IV. We simulate open/closed allosteric transitions of the isolated domains of CaM using a native-centric model implemented in the Cafemol simulation package developed by Takada and co-workers[96].

This model couples two energy basins, one biased to the open (pdb: 1cll[34]) reference structure and the other biased to the closed (pdb: 1cfd[103]) reference structure. The energy of a conformation, specified by the N position vectors of the

C-α atoms of the protein backbone, R = r , r , is given by { 1 ··· N }

V (R) = (Vo(R) + Vc(R) + ∆V ) /2(5.1) q (V (R) V (R) ∆V )2 /4 + ∆2 , − o − c − where Vo(R) is the single basin potential defined by the open structure and Vc(R) is the single basin potential defined by the closed structure. The interpolation pa- rameters, ∆ and ∆V , control the barrier height and the relative stability of the two basins. Parameters defining the single energy basins are set to their default values with uniform contact strength.

The simulation temperature is set below the folding transition temperature of each of the four conformations. Specifically, the simulation temperature is set to

? ? ◦ Tsim = 0.8TF where TF = 329 K is the folding transition temperature corresponding to the closed (apo) state of nCaM, the lowest transition temperature among the open and closed states of nCaM and cCaM. Equilibrium trajectories of length 108 steps are simulated using Langevin dynamics with a friction coefficient of γ = 0.25 and a timestep of ∆t = 0.2 (in coarse-grained units)[149].

Calcium binding to the two EF-hand loops of each domain of CaM is modeled

75 implicitly by adding a potential defined from the ligand-mediated contacts in the

EF-hand loops of the open (holo) conformation

" 0 2 # X rij rij (5.2) V = c  exp − . bind − lig go − 2σ2 i,j ij Here, the sum is over pairs of residues that are each within 4.5 A˚ of a Ca2+ion and closer than 10.0 A˚ in the open (holo) conformation. The binding energy parameters clig, go, and σ are taken to be the same for each ligand-mediated contact for simplicity.

The distribution of distance between implicit ligand-mediated contact pairs in the simulated ensemble of open and closed states for the binding loops in nCaM and cCaM are shown in Figs. B.1.1, B.1.2, B.1.3, B.1.4 in Appendix B.

Binding cooperativity is influenced by the relative stability of the unligated open and closed states determined by ∆V and the binding free energy determined by Vbind.

In principle, these parameters can be adjusted to match measured binding properties.

In the absence of clear measured constraints, we choose parameters so that the relative stability between the open and closed states are the same for each domain.

The transition barrier height is determined by ∆ which is set to 14.0 kcal/mol for nCaM and 17.5 kcal/mol for cCaM. Adjusting ∆V = 5.0 kcal/mol for nCaM and

∆V = 4.75 kcal/mol for cCaM while keeping other parameters fixed gives an energy difference between the unligated open and closed states,  = 4 kBT for both domains.

Experimentally, the folding temperatures of the N-terminal and C-terminal domains in intact CaM are approximately 328◦K and 315◦K, respectively[164]. Connecting to the domain opening kinetics in the intact protein, our simulation temperature corresponds to approximately 310◦K which is 95% of nCaM’s simulated folding tem- perature, and 98% of cCaM’s simulated folding temperature.

For the results reported in this work, the binding energy parameters are set to

76 0 0 go = 0.3 (default value in Cafemol), clig = 2.5 and σij = (0.1)rij where rij is the corresponding separation distance in the open (holo) reference conformation. We

have performed additional simulations to explore the dependence of binding thermo-

dynamics on the ligand-mediated contact strength and interaction range. At higher

values of clig and σij, the affinities of ligand binding to individual loops increase. Nev-

ertheless, the slope of the titration curve at the midpoint of the transition (a measure

of binding cooperativity) remains the same (Fig. B.2.1 in Appendix B).

The simulated conformational ensembles are characterized structurally in terms

of local and global order parameters based on the contacts formed in each sampled

conformation. The set of native contacts in the open and closed conformations are

separated into three groups: those that occur exclusively in either the open or the

closed native structures, and those that are common to both states. A native contact

in a given conformation is considered to be formed provided the distance between the

two residues is closer than 1.2 times the corresponding distance in the native con-

formation. Local order parameters qopen(i) and qclosed(i) are defined as the fraction

of native contacts involving the ith residue that occur exclusively in the open and

closed native structures, respectively. Overall native similarity is monitored by corre-

sponding global order parameters, Q = q (i) and Q = q (i) , where open h open i closed h closed i the average is taken over the residues of the protein. We identify metastable con-

formational basins from minima in the free energy computed through the population

histogram parameterized by Qopen and Qclosed.

Ligand binding/unbinding events coupled with a conformational change of the protein is modeled within the Grand Canonical Ensemble. Throughout the protein’s conformational transitions, the ligation state of each loop is determined stochastically

77 through a Monte Carlo step attempted every 1000 steps in the Langevin trajectory.

If the loop is unligated, a ligand is introduced to the binding loop (V V + V ) → bind with probability

(5.3) P → = min[1, exp [ (V µ)/k T ]]. 0 1 − bind − B If the loop is ligated, the ligand dissociates from the binding loop (V + V V ) bind → with probability

(5.4) P → = min[1, exp [(V µ)/k T ]]. 1 0 bind − B Here, µ is the chemical potential of a bound ligand. At equilibrium, µ equals the chemical potential of the ligand in solution,

 c  (5.5) µ = kBT ln + µ0 , c0 where c is the ligand concentration, and c0 and µ0 are the reference concentration and reference chemical potential, respectively. To compute binding curves, a series of simulations are performed, each at a different value of the ligand chemical potential.

These simulated titration curves are reported as function of the chemical potential, or equivalently, in terms of the relative ligand concentration defined through µ/kBT = ln (c/c¯ ) wherec ¯ = c exp( µ /k T ). 0 0 0 − 0 B This approach with Monte Carlo acceptance rates given in Eq. 5.3 and Eq. 5.4 is oriented towards binding thermodynamics from the outset. Takada and co-workers present a different choice motivated by ligand binding kinetics[149]. Instead of in- troducing a chemical potential, ligand concentration enters their model through a variable binding attempt rate, while the attempt rate of unbinding is fixed. Binding titration curves can also be calculated in this model, but as a function of the binding attempt rate rather than the concentration directly[115].

78 Table 5.3.1: Number of ligand- mediated contacts, dissociation con- stants, and binding free energies for the loops of CaM.

a a Ncon Kd/c¯0 c o loop I 5 0.054 -1.3 -3.1 loop II 5 0.062 -1.3 -2.9 loop III 8 0.018 -1.3 -4.1 loop IV 5 0.13 -0.5 -3.0 a in kcal/mol

5.3 Simulations of Binding a Single Ligand

We first consider Ca2+binding exclusively to each individual loop by simulating the conformational change of the entire domain while permitting binding only to a single site. As shown in Fig. 5.3.1 (A) and Fig. 5.3.1 (B), the bound population as a function of ligand concentration, pb(c), follows a typical sigmoidal profile connecting a fully unbound population at low concentration and a fully bound population at high concentration. The overall binding strength of the individual loops is reflected in the dissociation constant, Kd, shown in Table. 6.3.1. Binding affinities of nCaM’s

loops are nearly the same, whereas the affinities of cCaM’s loops are significantly

different, with K(IV) 7 K(III). Comparing the binding strength of CaM’s loops, our d ≈ d simulations predict that K(III) < K(I) K(II) < K(IV). d d ≈ d d It is reasonable to expect that binding affinities from a uniform native-centric

model correlate with the number of ligand mediated contacts, Ncon. While loop III

does indeed have the most contacts and the greatest binding affinity, accounting for

reduced affinity of loop IV compared to the loops of nCaM, each with the same number

of contacts, requires more careful explanation. Such subtlety is not surprising because

binding strength is sensitive to a protein’s conformational flexibility that modulates

79 1.0 (A) 1.0 (B) 2.0 (C) Loop-I Loop-III nCaM 0.8 0.8 Loop-II Loop-IV 1.6 cCaM 0.6 0.6

i 1.2 b n h 0.4 0.4 0.8 0.2 0.2

Bound Probability Bound Probability 0.4 0.0 0.0 0.0 -10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0 -10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0 -10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0

ln (c/c¯0) ln (c/c¯0) ln (c/c¯0) Figure 5.3.1: Simulated binding curves for the individual loops of (A) nCaM and (B) cCaM. Lines are fits to the two state MWC model given by Eq. 5.6. (C) Simulated mean number of bound ligands occupancy of binding sites with two ligands for nCaM (blue) and cCaM (red) as a function of ligand concentration. The solid lines plot A0 0B AB nb(µ) = pb (µ) + pb (µ) + 2pb (µ) with probabilities given from the MWC model evaluatedh i with the binding parameters found from fits of binding to each individual loops.

ligand interactions in both the open and closed ensembles.

The MWC model provides insight into the affinities for the individual binding

loops. Using notations in Ref. [126], the bound population in the MWC model can

be expressed as

−β(c−µ) −β(+o−µ) (5.6) pb(µ) = e + e /Z1 where

−β(c−µ) −β −β(o−µ) (5.7) Z1 = 1 + e + e 1 + e

is the single ligand partition function. Here, c and o denote the binding free energies of the ligand to the closed and open ensemble,  is the difference in stability between the unbound closed and open ensemble, and µ is the ligand chemical potential.

The coupling parameters of the Hamiltonian fix  = 4kBT in the simulation, leav- ing the binding parameters, c and o, to be determined from the simulated titration curves. These two parameters are under-determined by a fit to the bound state

80 1.0 0 (A) Loop-I (B) Loop-II 0.8 2 − 0.6 4

open −

Q 0.4 6 − 0.2 8 0.0 − 1.0 0 (C) Loop-III (D) Loop-IV 0.8 2 − 0.6 4

open −

Q 0.4 6 − 0.2 8 − 0.0 0.0 0.4 0.8 0.0 0.4 0.8 Qclosed Qclosed

Figure 5.3.2: Simulated free energy as a function of Qclosed and Qopen for binding loops at ligand concentration c = Kd for nCaM (A,B) and cCaM (C,D). The set of native contacts in the open and closed conformations are separated into three groups: those that occur exclusively in either the open or the closed native structures, and those that are common to both states. For each of these groups, a local order parameter, th qα(i), is defined as the fraction of native contacts involving the i residue. For each of these groups, denoted by α = (open, closed, and ), a local order parameter, ∩ th qα(i), is defined as the fraction of native contacts formed involving the i residue. Local order parameters qopen(i) and qclosed(i) are defined as the fraction of native contacts involving the ith residue that occur exclusively in the open and closed native structures, respectively. Overall native similarity is monitored by a corresponding global order parameter, Q = q (i) , where the average is taken over the residues α h α i of the protein. Global order parameters, Qopen and Qclosed, are the average of the corresponding local order parameters over the residues of the protein. The open state ensemble are conformations with 0.18 Q 0.35 and 0.55 Q 0.75. ≤ closed ≤ ≤ open ≤

81 probability alone. The population of the open ensemble (regardless of ligation state) provides an additional constraint for parameters in the model. The open state con-

figurations are identified in the simulations by global order parameters that measure the similarity to the open and closed native structures (see Fig. 5.3.2). At high ligand concentration, the bound population saturates to unity. Since ligands bind to both the open and closed conformations, the limiting value of the open population, po(µ), is the fraction of ligated protein in an open conformation. As shown in Fig. 5.3.3, the simulated open population of each loop saturates to a different limiting value determined by the relative stability of binding to the open and closed ensembles.

1.0 Loop-I 1.0 Loop-II bound bound 0.8 0.8 open open 0.6 0.6

0.4 0.4 Probability Probability 0.2 0.2

0.0 0.0

-10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0 -10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0

ln (c/c¯0) ln (c/c¯0)

1.0 Loop-III 1.0 Loop-IV bound bound 0.8 0.8 open open 0.6 0.6

0.4 0.4 Probability Probability 0.2 0.2

0.0 0.0

-10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0 -10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0

ln (c/c¯0) ln (c/c¯0)

Figure 5.3.3: Simultaneous fits of simulation data for a single ligand to pb(µ) and po(µ) for individual binding loops. Solid curves are plots of pb(µ) and po(µ) with c and o determined by a simultaneous fit to the simulation data (shown as points).

82 In the MWC model, the open state population

−β −β(o−µ) (5.8) po(µ) = e 1 + e /Z1 has a limiting value, p (βµ 1) [1 + eβ(+∆)]−1, that depends on ∆ =   . o  ∼ o − c

Thus, po(µ) and pb(µ) are independent constraints that can be used to determine reliable model parameters for the open and closed binding free energies, o and c.

The binding free energies determined by a simultaneous fit to Eq. 5.6 and Eq. 5.8 are shown in Table. 6.3.1. As expected, the dissociation constants depend on both

c and o. The values of o tracks the number of ligand mediated contacts in each loop, with loop III being the most stable, while the other loops have similar stability.

The values of c is more subtle. Although loop III has more contacts than loop I and loop II, they all have the same c. Thus, the relatively high affinity of loop III can be attributed to its greater stabilization upon binding to the open state. The relatively low affinity of loop IV, in contrast, is explained by the smaller binding stabilization to the closed state.

Although experiments suggest that CaM’s binding loops have heterogeneous affini- ties, assigning a binding strength to each loop is challenging because techniques to distinguish site-specific binding tend to alter the stability of the open and closed states[66]. Early studies which isolate the binding properties of loops III and IV of cCaM through site-directed mutagenesis indicate that loop IV has a higher propensity of Ca2+-binding than loop III[56, 124]. A similar approach indicates the affinity of the nCaM’s loops are comparable, with loop I reported to have only 1.5 times higher affinity than the affinity of loop II[15]. On the other hand, isolating the loops by grafting them to a scaffold suggests a different order, with loop I having the highest affinity and loop III binding Ca2+more tightly than loop IV[221]. Validation of our

83 simulation results would benefit from experimental clarification of the relative binding

affinities of the loops of CaM.

The effective binding free energies represent average properties over an ensemble

that may include a broader range of conformations than those near the open and

closed state minima. As shown in Fig. 5.3.2, the conformational ensemble for binding

to nCaM’s loops are two state while the binding to cCaM’s loops includes contribution

from a partially unfolded basin as well. The appearance of an unfolded intermediate

in the domain opening transition of Ca2+-free cCaM was first reported by Chen and

Hummer[36]. Distinct ensembles for nCaM and cCaM are consistent with the simu-

lated transition mechanisms of the domains in the absence of Ca2+[142]. Although

the four-state description of MWC is an approximation (especially for cCaM), its use

is validated by the accurate description of the populations of different ligation states

for simulations of two ligand binding discussed in the next section.

5.4 Simulations of Binding Two Ligands

We turn now to simulations in which both binding sites are accessible to the ligands.

The mean number of bound ligands as a function of concentration, shown in Fig. 5.3.1

(C), indicates that the effective dissociation constant of nCaM, Kd(nCaM)/c¯0 =

2.2 10−2, is roughly three times larger than the dissociation constant of cCaM, × K (cCaM)/c¯ = 8.8 10−3. For both domains, the mid-point concentration for bind- d 0 × ing two ligands is smaller than the dissociation constants for the individual binding sites in the domain. The finding that cCaM has greater overall binding affinity than nCaM agrees qualitatively with experiments[14, 188]. Additionally, the estimated value of Kd(nCaM) is within the experimentally reported range of approximately 6 –

10 times Kd(cCaM)[98, 211, 116].

84 Binding curves calculated within the MWC model are also shown in Fig. 5.3.1

(C). Denoting the two binding sites in the domains as site A (loop I or loop III) and

site B (loop II or loop IV), we calculate n (µ) = pA0(µ) + p0B(µ) + 2pAB(µ) where h b i b b b

A A A0 −β(c −µ) −β(+o −µ) (5.9) pb (µ) = (e + e )/Z2

B B 0B −β(c −µ) −β(+o −µ) (5.10) pb (µ) = (e + e )/Z2

A B A B AB −β(c +c −2µ) −β(+o +o −2µ) (5.11) pb (µ) = (e + e )/Z2 are the probabilities for the conformational ensemble with site A occupied and site

B empty, site B occupied and site A empty, and both binding sites simultaneously

−β occupied, respectively. Here, Z2 = Zc + e Zo denotes the two ligand partition function with

−β(A−µ) −β(B−µ) (5.12) Zi = 1 + e i + e i , i = (o, c).

A A where, for example, c and o denotes the binding free energies to loop A (described in the previous section). The agreement between the MWC model and simulated binding curves is excellent, indicating that the binding cooperativity of the simulations is well characterized by the MWC Model.

The MWC model also quantitatively captures the simulated populations of indi- vidual ligation states as shown in Fig. 5.4.1. Starting at low concentration, the growth of the singly ligated and the fully ligated states are concomitant. The fully loaded protein becomes increasingly stable, thereby reducing the singly ligated populations after certain threshold. The probability of exclusive binding to either loop of nCaM is equal, attaining a maximum population of 15% each. In contrast, virtually all of binding of the first ligand in cCaM occurs in loop III, reaching a maximum population

85 of 20%. The near complete suppression of Ca2+ligation exclusively to loop IV is due to its small relative binding affinity.

5.5 Binding Cooperativity

In cooperative binding, enhanced recruitment of a second ligand suppresses the pop- ulation of singly ligated proteins thereby sharpening the binding curve. Within the assumptions of the MWC model, the shape of the binding curve is determined by two mechanisms. First, the greater stabilization of the open conformation over the closed conformation upon binding makes even binding a single ligand more sensitive to changes in concentration. The second source of cooperativity is the allosteric cou- pling provided by the assertion that the binding sites are either both open or both closed depending on the conformational state of the entire protein.

Comparing the simulated binding curves to those produced from a model that ne- glects both of these cooperative assumptions gives a qualitative sense of the simulated binding cooperativity. We consider the binding probabilities calculated according to the partition function for induced fit binding to independent binding sites. These binding probabilities, shown in Fig. 5.4.1, are calculated according to the partition function ZIF = ZAZB where

α α −β(o −µ) −β(c −µ) (5.13) Zα = 1 + e + e , α = (A, B).

A0 0B Compared to the simulations, the populations pb (µ) and pb (µ) for independent binding to loops I – III initiate growth at smaller concentrations relative the the mid-

AB point of pb (µ), and achieve a greater maximum. Exclusive binding to loop IV does not develop significant population even when the loops are independent. Comparing the two domains, the singly ligated states are suppressed more in cCaM’s loop III

86 1.0 nCaM I and II 0.8 I only 0.6 II only

0.4

0.2 Bound Probability 0.0

-10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0

ln (c/c¯0)

1.0 cCaM III and IV 0.8 III only IV only 0.6

0.4

0.2 Bound Probability 0.0

-10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0

ln (c/c¯0)

A0 0B AB Figure 5.4.1: Populations of ligation states pb (µ) (blue), pb (µ) (green), and pb (µ) (red) plotted as a function of Ca2+concentration for nCaM (top) and cCaM (bottom). Simulation data shown as points. Solid curves plot Eq. (5.9–5.11) from the MWC model. Dotted curves show plots of the non-cooperative induced fit model of binding to independent sites described by the partition function given in Eq. 5.13. Note some data points are skipped for clarity.

87 than either of nCaM’s loops. Furthermore, the binding curve sharpens more in cCaM

than nCaM. These comparisons show that the simulated binding is indeed cooperative

with cCaM having greater binding cooperativity than nCaM in qualitative agreement

to experiments[116, 127].

A KA A B B

cAB > 1 cooperative KB cAB KB cAB = 1 uncooperative cAB < 1 anti-cooperative

A A B cAB KA B

Figure 5.5.1: Thermodynamic cycle for binding two ligands.

The strength of the binding cooperativity for each domain can be determined

quantitatively by considering the thermodynamic cycle shown in Fig. 5.5.1. A Ca2+- ion can bind to either loop A or loop B of the unligated protein with equilibrium

2+ constant KA and KB, respectively. The change in stability upon a Ca -ion binding to site B when site A is occupied, for example, can be expressed as cABKB where

cAB represents the additional stability associated with the presence of a previously

bound ligand to site A. A similar argument gives the equilibrium constant cABKA

representing the change in stability when a Ca2+-ion binds to site A if site B is already

occupied. The overall equilibrium constant of the fully ligated protein relative to the

unligated protein is given by K2 = cABKAKB which corresponds to the binding free energy, ∆F = k T ln K . The free energy associated with allosteric interactions − B 2 between the ligands is therefore given by ∆F = k T ln c . AB − B AB

88 Table 5.5.1: Simulated microscopic and macroscopic equilibrium constants

a a a b KA KB cAB K1 K2 nCaM 15.2 14.1 6.8 29.1 1.5 103 cCaM 40.0 6.7 29.6 46.7 8.0 × 103 × a −1 in units ofc ¯0 b −2 in units ofc ¯0

In order to calculate cAB for the simulated transitions, we express the equilibrium constants of the thermodynamic cycle in terms of populations of ligation states

A0 0B (5.14) KA = pb (µ)/[pub(µ) c],KB = pb (µ)/[pub(µ) c] and

AB 2 (5.15) cABKAKB = pb (µ)/[pub(µ) c ],

where pub(µ) denotes the unbound population and c stands for the ligand concentra- tion. Solving for cAB gives

AB pb (µ)pub(µ) (5.16) cAB = A0 0B pb (µ)pb (µ) in terms of the population of ligated states. The value of c = exp( β∆F ) AB − AB reflects the degree of cooperativity of the transition. When the sites are independent,

AB A B pb (µ) = pb (µ)pb (µ)/pub(µ), so that cAB = 1 as expected for uncoupled sites. Although the right hand side of Eq. 5.16 can be evaluated directly from simulated populations, it is convenient to take advantage of the parameterization provided by the MWC model since it accurately describes the simulated equilibrium populations.

−β Using Eq. (5.9–5.11) with pub(µ) = (1 + e )/Z2 leads to

[1 + exp( β)] 1 + exp( β( + ∆A + ∆B)) (5.17) c = − − , AB [1 + exp( β( + ∆A))] [1 + exp( β( + ∆B))] − −

89 with ∆A = A A and ∆B = B B. The computed equilibrium constants, o − c o − c shown in Table. 5.5.1, indicate that Ca2+-binding to cCaM (with c 29.6) is more AB ≈ cooperative than Ca2+-binding to nCaM (with c 6.8) in qualitative agreement AB ≈ with experiment[116, 127]. The cooperative free energy is estimated to be ∆F AB ≈ 3.4 k T for cCaM and ∆F 1.9 k T for nCaM. The cooperative free energy − B AB ≈ − B for cCaM is 1.8 times that of nCaM in agreement with the experimental measured

range of relative free energies of 1.2 – 3 reported in Ref. [116] and Ref. [210].

Binding thermodynamics determined from experiments that cannot distinguish

between binding to individual sites are often reported through the macroscopic equi-

librium constants K1 = KA + KB and K2 = cABKAKB[1, 210, 184, 144]. The macro-

scopic equilibrium constants describing the simulated binding thermodynamics are

shown in Table. 5.5.1. The value of K1 for cCaM is greater than K1 for nCaM by

a factor of 1.5 in agreement with the experimentally reported range of 1.2 – 2.2[116,

210]. The free energy of binding two Ca2+-ions can be estimated from the macroscopic

binding constants summarized in Table. 5.5.1, ∆ G = k T log(K K ). The simu- tot − B 1 2

lated relative values of ∆ Gtot for cCaM is approximately 1.5 times the value of ∆ Gtot

for nCaM, which is in agreement with experimentally reported value of approximately

1.1 – 1.3[116, 91]. Taken together, the simulated values of the macroscopic binding

constants for CaM are in qualitative agreement with those reported from experiments.

5.6 Molecular Description of Ligand Binding

The simulations offer a detailed molecular description of Ca2+binding as well as insight

into the conformational ensembles underlying the binding free energies, c and o.

Fig. 5.6.1 shows the root mean square fluctuations (rmsf) of each residue for the

unligated (closed) ensemble at low ligand concentration and the fully saturated (open)

90 ensemble at high ligand concentration. Focusing on nCaM, we see that helix A, the N-

terminal end of helix B, and the B-C linker become more flexible upon Ca2+-binding,

while helix C and helix D show little change in flexibility. The temperature factors

of the corresponding regions in cCaM show qualitatively similar behavior.

All four binding loops, on the other hand, become more rigid upon Ca2+-coordination.

The difference in flexibility upon binding is largest for loop IV due to its large fluc-

tuations in the unligated ensemble. Greater entropic stabilization of loop IV in the

unligated state explains its relatively small binding affinity[66]. Furthermore, ac-

counting for differences in loop entropy completes the rationalization of the binding

free energies to the loops of CaM: while the value of o is dominated by the ener-

getic stabilization of binding to the open state, the value of c reflects the degree of

conformational entropy of the loop in the unligated ensemble.

The flexibility of individual residues are local order parameters that characterizes

residue-specific conformational changes upon Ca2+- binding[200]. To qualitatively

understand CaM’s structural changes along the binding curve, we compare the fluc-

th tuations of the i residue to a two state reference rmsf, β0(i), given by average

(5.18) β (i) = f β (i) + (1 f )β (i), 0 h bi o − h bi c

where the rmsf of the open ensemble, βo(i), and the closed ensemble, βc(i), are

weighted by the fractional occupancy of the binding sites f = n /2. The struc- h bi h bi tural ordering of a residue at any concentration can be characterized as early or late

compared to mean flexibility β (i) evaluated at the corresponding value of f . For 0 h bi example, Fig. 5.6.1 shows the simulated rmsf of each residue at Kd, as well as the reference fluctuations evaluated at f = 1/2. Although the Ca2+-occupancy of the h bi binding loops is only 50%, the local environment of helix A and the B-C linker of

91 A I B C II D

3

˚ A) 2 rmsf ( 1

Unligated Mean Fully-loaded f = 0.5 h bi 0 10 20 30 40 50 60 70 Sequence index E III F G IV H

3

˚ A) 2 rmsf ( 1

Unligated Mean Fully-loaded f = 0.5 h bi 0 80 90 100 110 120 130 140 Sequence index Figure 5.6.1: Simulated root mean square fluctuations (rmsf) for each residue for nCaM (top) and cCaM (bottom) calculated at different ligand concentrations: high ligand concentration gives the fully saturated ensemble (blue curve), low ligand con- centration gives the unligated ensemble (red curve), and at Kd (black curve). The rmsf curves are calculated for each ensemble after aligning to the open native confor- mation. (Aligning to the closed conformation give similar curves.) Also shown is the reference fluctuations given in Eq. 5.18 (green curve).

92 nCaM as well as corresponding helix E and F-G linker of cCaM is already similar to that of the open state ensemble. This “early” transition to the open ensemble is a reflection of the allosteric cooperativity. In contrast, the average structural order of the binding loops is similar to the weighted average of the open and closed state

flexibility. The exception is the β-sheet in the C-terminal end of loop IV which takes on the open state structure at higher ligand concentrations. This “late” transition is in harmony with its lower binding affinity.

5.7 Concluding Remarks

In this paper, we introduce a method to simulate binding curves involving a protein that undergoes a conformational change upon binding. This approach allows us to identify the structural origins of binding affinity and to quantify allosteric cooper- ativity within a simple coarse-grained description of the protein dynamics. In this implicit ligand model, the protein conformation modulates the protein-ligand interac- tions through effective ligand-mediated contacts among residues in the binding site.

The influence of ligand concentration on the effective binding strength is described through its uniform chemical potential.

Applying this approach to CaM, we find that this model can distinguish the bind- ing properties of the two domains of CaM: binding loops I and II of nCaM have similar affinities, while in cCaM, binding loop III has significantly greater affinity than loop IV. The broader range of binding affinities in cCaM accounts for its greater cooperativity. Simulated populations of the ligation states as a function of concen- tration are accurately described by the MWC model with appropriate binding free energies for the individual loops. These binding free energies are average properties

93 of the simulated ensemble and are not obvious solely from the open and closed struc-

tures. While the simulated binding thermodynamics is well-described by the MWC

model, this simple analysis can obscure complexities in the free energy landscape. In

separate publication, we describe how subtle differences in the topology and stabil-

ity of the two domains lead to distinct simulated mechanisms for Ca2+-free domain opening for nCaM and cCaM[142]. In particular, we find that cCaM unfolds more readily than nCaM during the domain opening transition under similar conditions, a result consistent with the lower thermal stability of the C-terminal domain in the intact protein[127, 164]. Although the unfolded conformations play a minor role in the binding thermodynamics described in this paper (aside from modifying the bind- ing free energies to the open and closed states), global folding and unfolding in the domain opening transition likely has a significant qualitative influence on the binding kinetics.

94 Chapter 6

Thermodynamic and kinetic representations of cooperative allosteric

ligand binding in intact calmodulin

6.1 Introduction

Conformational dynamics plays a key role in the ligand binding process and subse- quent activation of several proteins. A classic example is the Ca2+binding protein

Calmodulin (CaM), which binds two Ca2+ions to each of its topologically similar globular domains. In the intact protein, the two domains are coupled by a flexible linker region. This coupling gives rise to interaction energy between the simulated ensemble of ligation states. Bound ensemble of conformations consists of unligated, patially ligated (one, two or three sites occupied), and fully loaded states. Ignoring the conformational state of the protein and only considering the ligation state, the ensemble consists of 16 distinct states. During the conformational change the binding sites change their conformation in a concerted manner. This gives rise to multi-body interaction energies between the ligated states. Characterizing such microscopic in- teractions between these ligation states and predicting free energy stabilization due to cooperativity is challenging from direct experimental measurements. In this work,

I provide a scheme to rationalize the microscopic interactions from two distinct, albeit complementary, approaches to describe the ligand binding mechanism.

The work presented here is motivated in part by predictions from our previ- ous thermodynamic results on Ca2+binding cooperativity in individual domains of

CaM[141]. In particular, I predict heterogeneous binding strengths of the loops, with

95 loop III having the highest affinity and loop IV being the weakest. Here, I test

whether intact CaM retains the same order of binding strength among its constituent

loops. In previous work, I also predict higher cooperativity for isolated C-terminal do-

main compared to the N-terminal domain. However, it remains unclear how the two

domains interact and how cooperative interaction (if any) results in overall stabiliza-

tion of fully loaded CaM. Here, I focus on providing complementary thermodynamic

and kinetic description of the binding process. The thermodynamic picture is use-

ful because it provides insight into the nature of cooperative interactions between

ligation states, which can be compared to experiments directly focused on resolving

site-specific binding affinities in intact CaM[221]. On the other hand, kinetic descrip-

tion of the binding process help understand the microscopic complexity albeit at a

more detailed level than current experimental resolution.

From an experimental point of view, resolving binding strengths of individual sites

as well as binding strengths of each domain in the intact protein is challenging. Studies

designed to resolve domain-specific properties are often limited by subsequent confor-

mational change of the protein[196, 39]. Available experimental data pertaining to the

intact domain suggests a wide range of interpretation for the inter-domain coupling

energy[184, 127, 206]. Few experiments suggest weak coupling energy between the

two domains, while other reports even suggest negative coupling[184]. Deciphering all

this information within the framework of a single molecular model is challenging. I

aim to shed light on the microscopic picture of the binding mechanism by simplifying

the complex binding network in terms of dominant pathways that contribute to the

overall Ca2+binding process in the protein. While this approximation does not take into account all aspects of complex coupling between the domains in the protein, it

96 provides insight into the binding mechanism which in principle, can be validated by

experimental measurements. It also predicts that while domain-specific properties is

conserved in the intact protein, there is complexity that arises due to inter-domain

coupling. These predictions, in principle, can provide a rationale behind varying

observations from experimental reports addressing this question[196, 124, 164, 153,

161].

6.2 Methods

Calmodulin (CaM) is a small, 148 amino acid long protein consisting of two topolog- ically similar domains. Each domain consists of four α-helices and a pair of EF-hand

Ca2+-binding loops. The N-terminal domain has helices labeled A – D with binding

loops I and II, and the C-terminal domain has helices labeled E – H with binding

loops III and IV. I simulate open/closed allosteric transitions of intact domain of

CaM using a native-centric model implemented in the Cafemol simulation package

developed by Takada and co-workers[96].

This model couples two energy basins, one biased to the open (pdb: 1cll[34])

reference structure and the other biased to the closed (pdb: 1cfd[103]) reference

structure. The energy of a conformation, specified by the N position vectors of the

C-α atoms of the protein backbone, R = r , r , is given by { 1 ··· N }

V (R) = (Vo(R) + Vc(R) + ∆V ) /2(6.1) q (V (R) V (R) ∆V )2 /4 + ∆2 , − o − c −

where Vo(R) is the single basin potential defined by the open structure and Vc(R) is

the single basin potential defined by the closed structure of the intact domain of CaM

respectively. The interpolation parameters, ∆ and ∆V , control the barrier height and

97 the relative stability of the two basins. Parameters defining the single energy basins

are set to their default values with uniform contact strength.

From experimental measurements, the relative stabilities of the closed (apo) and

open (holo) states of C-domain of intact CaM is approximately 90% and 10%, while

the corresponding measurements for N-domain is less evident[115]. To draw a corre-

spondence between the simulation parameters and physiological measurements, the

coefficients of the interaction terms in the model are tuned to match experimentally

available data. As shown in Fig. 6.2.1, under the condition that the open state is

destabilized with respect to the closed state, the difference in temperatures that cor-

respond to the peaks of the heat capacity curve of the two domains is approximately

18◦K. This is in good agreement with experimental measurement of folding temper-

atures of N-terminal and C-terminal domains in intact CaM to be 328◦K and 315◦K,

◦ respectively[164]. The simulation temperature is set to Tsim = 300 K. Equilibrium

trajectories of length 108 steps are simulated using Langevin dynamics with a friction coefficient of γ = 0.25 and a timestep of ∆t = 0.2 (in coarse-grained units)[149].

Calcium binding to the four EF-hand loops of intact CaM is modeled implicitly by

adding a potential defined from the ligand-mediated contacts in the EF-hand loops

of the open conformation

" 0 2 # X rij rij (6.2) V = c  exp − . bind − lig go − 2σ2 i,j ij Here, the sum is over pairs of residues that are each within 4.5 A˚ of a Ca2+-ion and

closer than 10.0 A˚ in the open conformation. The binding energy parameters clig,

go, and σ are taken to be the same for each ligand-mediated contact for simplicity.

The transition barrier height is determined by ∆ which is set to 35.0 kcal/mol.

Adjusting ∆V = 7.5 kcal/mol while keeping other parameters fixed sets the relative

98 1200.0 Holo destabilized 1000.0

800.0

600.0

400.0 Heat capacity 200.0 nCaM cCaM 0.0 300.0 325.0 350.0 375.0 400.0 Temperature (K) Figure 6.2.1: Heat capacity as a function of temperature for N-domain and C-domain of CaM when the open (holo) state is destabilized to a relative population of 10%. Heat capacity is calculated using the WHAM[105] method. ≈

stability between the closed and open states at approximately 90% and 10%. For the results reported in this paper, the binding energy parameters are set to go = 0.35,

0 0 clig = 2.5 and σij = (0.1)rij where rij is the corresponding separation distance in the open (holo) reference conformation. Binding and unbinding of a ligand is modeled through Monte Carlo step which is obtained from the potential in the molecular dynamics simulations.

The simulated conformational ensembles are characterized structurally in terms of local and global order parameters based on the contacts formed in each sampled conformation. The set of native contacts in the open and closed conformations are separated into three groups: those that occur exclusively in either the open or the closed native structures, and those that are common to both states. A native contact in a given conformation is considered to be formed provided the distance between the

99 two residues is closer than 1.2 times the corresponding distance in the native confor-

mation. Local order parameters qopen(i) and qclosed(i) are defined as the fraction of

native contacts involving the ith residue that occur exclusively in the open and closed

native structures, respectively. Overall native similarity is monitored by correspond-

ing global order parameters, Q = q (i) and Q = q (i) , where the open h open i closed h closed i average is taken over the residues of the protein. I identify metastable conformational

basins from minima in the free energy computed through the population histogram

parameterized by Qopen and Qclosed.

In Ref. [141], I chose to treat binding and unbinding events on equal footing. While

this choice is reasonable to calculate thermodynamic quantities, it is problematic for

kinetics simulations as I explain here. A general scheme satisfying the detailed-balance

can be written as

(6.3) α0→1p0 = α1→0p1.

Here, p0 and p1 denote the equilibrium probabilities for the protein to be in the

unbound and bound state, respectively.

In Monte Carlo scheme, acceptance probability is determined based on detailed

balance condition

P α p (6.4) 0→1 = 1→0 1 . P1→0 α0→1 p0

Here, P0→1 and P1→0 denote acceptance probabilities for binding and unbinding, respectively. Eq. 6.4 can be expressed as

P0→1 α1→0 (6.5) = exp [ (Vbind µ)/kBT ] . P1→0 α0→1 − − For thermodynamics simulations, we attempt to choose the ligation state of each loop per Monte Carlo step. That is, α1→0 = α0→1. If the loop is unligated, a ligand

100 is introduced (V V + V ) with probability → bind

(6.6) P → = min[1, exp [ (V µ)/k T ]]. 0 1 − bind − B

For unbinding transitions, if the loop is ligated, the ligand dissociates from the binding loop (V + V V ) with probability bind →

(6.7) P → = min[1, exp [(V µ)/k T ]]. 1 0 bind − B

Here, µ is the chemical potential of a bound ligand. At equilibrium, µ equals the chemical potential of the ligand in solution,

 c  (6.8) µ = kBT ln + µ0 , c0 where c is the ligand concentration, and c0 and µ0 are the reference concentration and reference chemical potential, respectively. While this choice satisfying detailed balance gives the correct description of binding thermodynamics, it does not describe the correct concentration dependence of binding kinetics. Since binding is a bimolec- ular reaction and unbinding is unimolecular, k c, and k is independent bind ∝ unbind of concentration. The acceptance probabilities for binding and unbinding in Eq. 6.6 and Eq. 6.7 gives incorrect concentration dependence of both kbind and kunbind. Ac- cordingly, for kinetics simulations I choose a non-symmetric scheme for binding and unbinding that still satisfies detailed balance and gives reasonable binding and un- binding rates. Here, unbinding is independent of concentration which suggests the acceptance probability

(6.9) P1→0 = min[1, exp [Vbind/kBT ]].

To satisfy detailed balance, the attempt probabilities become

(6.10) α0→1 = α1→0 exp [µ/kBT ] ,

101 with a binding acceptance probability of

(6.11) P → = min[1, exp [ (V µ)/k T ]]. 0 1 − bind − B

Eq. 6.10 is satisfied by attempting unbinding events every τ0 steps, while binding is

−1 attempted with probability τ0 exp [µ/kBT ].

0.0 Binding Rate Unbinding Rate

-4.0 ln(Rate)

-8.0

-10.0 -7.5 -5.0 -2.5 0.0 2.5

ln (c/c¯0) Figure 6.2.2: Binding and unbinding rates for loop I of N-domain of CaM as a function of concentration using the non-symmetric Monte Carlo simulation scheme. The x-axis and y-axis represent concentration and rate, respectively. Loops II, III, and IV show similar behavior (not shown).

This non-symmetric Monte Carlo scheme provides a reasonable description for the on and off rates for ligand binding. As shown in Fig. 6.2.2, the simulated on and off rates for a ligand has the expected concentration dependence in this non-symmetric scheme. I have checked that this scheme also gives the same thermodynamic be- havior as calculated from the symmetric choice of probabilities. In previous work,

Takada and co-workers provide a similar approach to simulate ligand binding to a protein with a single binding site in which the ligand concentration is controlled by

102 changing the attempt rate for binding, kon, while the off rate is independent of con- centration[149, 115]. The non-symmetric Monte Carlo scheme directly incorporates the ligand concentration into the binding kinetics through the chemical potential.

1.0 0 (A) N-domain (B) C-domain 0.8 2 − 0.6 4

open −

Q 0.4 6 − 0.2 8 − 0.0 0.0 0.4 0.8 0.0 0.4 0.8 Qclosed Qclosed Figure 6.2.3: Free energy contours for the N-domain and C-domain in intact CaM. N-domain shows a dominant two-state transition, while C-domain populates an in- termediate state showing a three-state transition behavior.

I characterize the population of the open and closed ensembles of N-domain and

C-domain in intact CaM in terms of structural order parameters, Qclosed and Qopen.

As shown in Fig. 6.2.3, the simulated populations of ensembles in the N-domain are: closed 86%, open 11%, and intermediate minor and that of the C-domain are ≈ ≈ ≈ closed 88%, open 3%, and intermediate 8%. I checked the simulated pop- ≈ ≈ ≈ ulations against available experimental data that characterizes population of states under physiological conditions. The simulated population of unfolded intermediate is in good agreement with values reported by Bailey and coworkers[127].

6.3 Binding Thermodynamics

I first consider Ca2+binding exclusively to each individual loop by simulating the conformational change of the entire domain while permitting binding only to a single

103 Table 6.3.1: Number of ligand-mediated contacts, and binding free energies for the loops of CaM.

a b b Ncon Kd/c¯0 Kd c o loop I 5 0.008 5 10−7 -2.6 -3.7 loop II 5 0.0108 6.5× 10−7 -2.4 -3.5 loop III 8 0.004 2.4 × 10−7 -2.6 -5.3 loop IV 5 0.155 9.3 × 10−6 -0.95 -2.3 × a in M b in kcal/mol site. The MWC model provides insight into the affinities for the individual binding loops. The partition function for binding a single ligand within the MWC framework can be written as

−β(c−µ) −β −β(o−µ) (6.12) Z1 = 1 + e + e 1 + e .

Here, c and o denote the binding free energies of the ligand to the closed and

open ensemble,  is the difference in stability between the unbound closed and open

ensemble, and µ is the ligand chemical potential. The free energy parameters to the

closed and open ensembles, c and o, is uniquely determined by simultaneous fit

to the simulated bound and open populations (described in Ref. [141]). As shown

in Table. 6.3.1, the binding loops have heterogeneous strength. Although there are

discrepancy in reports of the order of binding strengths, their heterogeneous nature

agrees with experiments[221, 15]. It is indeed intriguing that an uniform native-centric

model (in terms of the inter and intramolecular interactions) is able to capture this

heterogeneity of the loops.

As shown in Fig. 6.3.1, the overall affinity of the intact domain is greater than

individual affinities for binding to the N-domain and C-domain. In particular, the

equilibrium simulations predict that KN−domain 7 Kintact and KC−domain 3 Kintact. d ≈ d d ≈ d

104 The overall higher affinity of intact CaM is probably a signature of positive inter-

domain coupling assumed in this model which enhances the free energy of stabilization

of the bound ensemble in the intact protein.

4.0 3.6 3.2 2.8

i 2.4 b

n 2.0 h 1.6 1.2 intact CaM 0.8 N-domain 0.4 0.0 C-domain -10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0

ln (c/c¯0) Figure 6.3.1: Fractional occupancy to the loops of intact domain (red), N-domain (blue) and C-domain (green). The solid lines represent theoretical binding curve calculated using the MWC model using the parameters, c and o, obtained from simulations.

The simulations populate various singly, doubly, and triply ligated (partially sat- urated) ensembles. These simulated populations of ligation states are well character- ized by the MWC model described in the previous chapter. The partition function for binding four ligands to the loops of CaM can be expressed as

−β (6.13) Ztot = Zc + e Zo , where Zc and Zo represent partition functions for the protein in the closed and open states, respectively.

X −β(α−µ) X −β(α+β −2µ) X −β(α+β +γ −3µ) Zi = 1 + e i + e i i + e i i i (6.14) α α<β α<β<γ −β(α+β +γ +δ−4µ) + e i i i i ,

105 where i = (o, c) labels open and closed states, and α, β, γ, δ = (I, II, III, IV) labels the loop identity.

Simulated ligation states are defined by whether a loop is occupied or empty. For example, state 1110 represent a triply ligated state where loops I, II, III are occupied and loop IV is vacant. Similarly, 0010 represent a singly ligated state where only loop III is occupied and other loops are empty. As shown in Fig. 6.3.2, the agreement between simulated populations and those obtained from the MWC model is excellent.

This close agreement suggests that the MWC model parameterized by the binding free energies, c and o, provides a good description for the simulated populations.

Previous work described in Chapter 5 on isolated CaM domains predicted that loop

1.0 0010 0.8 1010 0.6 0110 0011 0.4 1110 0.2 0111

Bound probability 1011 0.0 1111 -10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0

ln (c/c¯0) Figure 6.3.2: Population of simulated ligation states as a function of concentration. The points represent simulated data and the solid lines show populations calculated using the MWC model. Colors represent different ligation states defined by whether a loop is occupied or empty.

III in cCaM has the highest affinity, while loop IV has the lowest binding strength.

The overall higher affinity of cCaM was rationalized by a greater difference in binding affinity of its constituent loops[141]. The simulations presented in this paper predict

106 that even when the two domains are coupled, loop III still retains its highest binding affinity among the constituent loops while loop IV has the lowest strength.

To illustrate the effect of heterogeneous binding strength on the overall cooper- ativity of the intact domain, I compare the populations of ligated ensembles to the scenario when the binding loops are of uniform affinity (the average of the binding free energies of the loops). As shown in Fig. 6.3.3, the peaks of partially ligated states for heterogeneous binding sites is lower compared to the corresponding peaks for sites with uniform binding strength. Binding cooperativity can be qualitatively described by suppressed population of partially ligated states. Hence, the heterogeneous nature of the binding loops accounts for overall binding cooperativity for intact CaM.

The suppressed population of partially loaded ligation states provides a macro- scopic rationale for overall binding cooperativity of intact CaM. However, the sim- ulations provide us with a more detailed description of the microscopic cooperative interaction summarized in terms of the MWC parameters, c and o. I represent the partition function for binding four ligands as a function of explicit cooperative interactions. The multi-body cooperative interactions can then be estimated by iden- tifying the corresponding quantities in the two descriptions. I write an alternative form of the partition function with multi-body cooperative interaction energies, cαβ, cαβγ, and cαβγδ where, α, β, γ,δ denote loop identity

X X Ztot = 1 + Kα + cαβKαKβ α α<β (6.15) X X + cαβγKαKβKγ + cαβγδKαKβKγKδ. α<β<γ α<β<γ<δ

Here, Kα, Kβ, Kγ, Kδ represent the equilibrium constants for binding to loops I, II,

III, and IV, respectively. Comparing Eq. 6.13 and Eq. 6.15, I identify the cooperativity

107 1.1 1.0 0.9 0.8 0.7 Singly occupied 0.6 0.5 Doubly occupied 0.4 Triply occupied 0.3 Fully loaded 0.2 0.1 Bound probability 0.0 -10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0

ln (c/c¯0) 1.1 1.0 0.9 0.8 0.7 Singly occupied 0.6 0.5 Doubly occupied 0.4 Triply occupied 0.3 Fully loaded 0.2

Bound probability 0.1 0.0 -10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0

ln (c/c¯0) Figure 6.3.3: Population of singly, doubly, triply, and fully-loaded ligation states as a function of concentration for (top) heterogeneous binding loops and (bottom) homogeneous binding loops. The homogeneous binding strength is set to be the average values of binding free energies to the closed and open states of the four binding loops.

108 0000

0001

0100 1000 0010 0101 1001

1100 0011

0110 (in kcal/mol) 1010 G

∆ 1101

0111 1011

1110

1111 Figure 6.3.4: Total free energy (in kcal/mol) of simulated ligation states of intact CaM. The blue lines represent free energy contribution for ligand binding to individual loops and the red lines show free energy stabilization due to multi-body cooperative interactions.

109 parameters cαβ, cαβγ, and cαβγδ. For example, two-body cooperativity parameter, cαβ, and three-body cooperativity parameter, cαβγ, can be expressed as

α β α β e−β(c +c ) + e−βe−β(o +o ) (6.16) cαβ = −β ;(α = β) (1 + e )KαKβ 6

α β γ α β γ e−β(c +c +c ) + e−βe−β(o +o +o ) (6.17) cαβγ = −β ;(α = β = γ) (1 + e )KαKβKγ 6 6 As shown in Fig. 6.3.4, the cooperative free energy is greatest for transition from unligated to fully loaded state. It is interesting to note that although the fully loaded

N-domain (state: 1100) and fully loaded C-domain (state: 0011) have similar overall stability, the contributions from binding to individual loops and cooperative interac- tions are distinct. Specifically, the cooperative free energy stabilization is approxi- mately five times greater for C-domain.

The total free energy stabilization for the binding transition to fully saturate N- domain from the unligated state (while the C-domain is unoccupied) and to fully saturate the C-domain (while the N-domain is unoccupied) is 5.4 kcal/mol. On ≈ − the other hand, the overall free energy stabilization to fully saturate intact CaM with four ligands bound to its four binding sites starting from unligated state is ap- proximately 13 kcal/mol. Hence, ∆Gintact > ∆GN−domain + ∆GC−domain. This extra − tot free energy contribution can be attributed to inter-domain cooperative interaction.

Thus, the two domains are not independent in the intact protein and their interaction accounts for an additional free energy stabilization in the binding transition.

6.4 Binding Kinetics

Ligand binding simulations to intact CaM populate an ensemble of 16 ligation states

(I consider only the ligation state of the protein, irrespective of the conformational

110 state). The thermodynamic results predict that the binding loops of intact CaM are heterogeneous and there is additional free energy stabilization of the Ca2+loaded intact protein due to interaction between N-domain and C-domain. My aim is to investigate the effect of these predictions on the kinetic binding routes.

Network models have been developed to predict dominant pathways for a pro- tein’s folding transition[145, 16, 146]. Typically, in these models, transitions between sampled states are described in terms of a directed transition flux between a set of sampled ensembles of conformations. The overall folding rate is calculated in terms of an effective flux between the initial (unfolded) state and final (folded) state. I analyze kinetic binding results within this framework by describing the binding and unbind- ing transitions in terms of a directed flux between the unligated and fully loaded, and between fully loaded and unligated states, respectively. The starting point of this analysis is the rate matrix Kij which I calculate from simulations. Here, i and j represent different ligation states. The rate of transition from state i to state j, kij, can be expressed as[28]

Ni→j (6.18) ki→j = P , τ N → h ii k=6 i i k where τ is the mean time spent in state i between transitions, and N → are the h ii i j number of transitions from state i to state j. Directed transition flux between two ligation states, i and j, is given by   + + + +  peq(i)kij(qj qi ) for qj > qi (6.19) fi,j = − ,  0 forq+ q+ i ≥ j where peq(i) is the equilibrium probability of state i, kij is the rate of transition from

+ state i to state j, and qj is the committor probability of state j. For binding tran-

+ sitions, the committor probability (qi ) of ligation state i is a measure of how likely

111 the protein is to bind a ligand before it loses an already bound ligand. For unbind-

ing transitions, the committor probability measures the probability of unbinding an

already bound ligand before it can bind an additional ligand.

Quantitatively, the committor probability is given by the following equation

X X (6.20) k q+ = k . ij j − ij j∈I j∈B Here, I and B denote ensemble of intermediate (i.e. partially ligated) and fully loaded

+ state, respectively. By setting the committor probability value, qi , to 0 for unligated state and to 1 for fully loaded state, the flux network is biased towards binding. To

look at the symmetry between binding and unbinding mechanisms, I use a committor

probability for unbinding that is similar to that of binding. Here, the committor

probability is related to the committor probability for binding by q− = 1 q+ for i − i − any ligation state i. The unligated and fully loaded states’ qi are set to 1 and 0,

+ − respectively. Thus, by replacing qi with qi in our analysis I bias the flux towards unbinding.

The binding/unbinding transitions involve two end-point states (unligated and

fully loaded, and vice versa) and an ensemble of singly, doubly and triply occupied lig-

ation states. A typical flux path (A) can be represented as i i i i , { U → 1 → 2 → · · · → L}

where iU represents ensemble of unligated state; in represent intermediate ensembles

of singly, doubly, or triply ligated states; and iL represents ensemble of fully loaded

state. The overall flux of a transition path is calculated by

L−1 Y fk,k+1 (6.21) FiU→iB = fiU ,i1 . fk k=1

Here, FiU→iB is the total flux of path A for a binding transition, fk,k+1 is the flux for P going from state k to state k+1. The flux entering and exiting a state i is k fk,i and

112 P j fi,j, respectively. As a result of conservation of probability these two quantities P P are equal. The quantity fi for a state i is given by fi = k fk,i = j fi,j. The complete simulated flux network for binding/unbinding transitions is com- plex with up to O(103) unique connecting routes from unligated to fully loaded states.

To illustrate the concentration dependence on the binding pathways I choose three concentration values along the binding transition curve is chosen such that N val- h bi ues correspond to 25%, 50%, and 75% of the limiting value of average saturation.

Simulated flux network showing all possible transitions at relatively high ligand con-

centration is shown in Fig. 6.4.1. The arrows represent directed flux from unligated

(state: 0000) to fully ligated (state: 1111) state. The width of the arrows represent

the amount of directed flux through a pair of states.

To decipher the relevant binding transitions at different concentration values and

to compare the dominant binding/unbinding transitions and paths, I look at flux

network for unbinding transitions at low ligand concentration, and flux network for

binding transitions at high ligand concentration. It is reasonable to expect symmetric

binding and unbinding flux network from this uniform model. I compare the binding

and unbinding flux networks at c = Kd to analyze whether the flux networks are

symmetric and if the dominant transition pathway remains the same.

Since the full flux network is complex with lots of transitions, I focus on capturing

approximately 90% of the total flux for a given network. In order to identify the

transition paths that constitute the majority of flux, I utilize the formalism of pathway

decomposition[145, 146, 132]. With this approach, a set of all possible flux pathways

(P ) connecting the unligated state (U) to the fully loaded state (B) is reduced using

a simple algorithm. First, I choose a single pathway (A) from the set P , where

113 iue641 ulflxntokfrbnigtastossoigalpsil transitions possible all showing transitions binding for network at flux Full 6.4.1: Figure e ersnsahge rbblt obn n re ersnsahge probability higher a states. represents ligation green of and stabilization. pair bind of a unbind. to through energy probability to flux free higher of amount total a the represents the represent Red represents arrows the axis of vertical width The The ligation. of degree c = K d tp n at and (top)

Total Free Energy (in kcal/mol) Total Free Energy (in kcal/mol) Unbound Unbound 0000 0000 K > c 1000 1000 d Singly Singly 0001 0001 0010 0010 bto) h nebeo ttsaegopdb their by grouped are states of ensemble The (bottom). 0100 0100 1010 1100 1001 1010 1100 1001 114 Doubly Doubly 0110 0011 0101 0110 0011 0101 0111 0111 Triply Triply 1110 1101 1110 1101 1011 1011 Fully Fully 1111 1111 A = i , i , i , , i . Next, I find the transition containing the least flux, f . If { U 1 2 ··· L} min

the flux capacity for pathway A be fmin, then I subtract that flux from every transition in A, f = f f , where i and i are connected through a transition. in,in+1 in,in+1 − min n n+1 I then repeat this algorithm for every pathway in set P , removing pathways with

f 0 until there is no remaining flux. Hence, the final set of pathways will be a min ≤ representative group which conserves the total flux of our initial system. Since the

set produced is not unique, I also used a selection rule to choose a pathway whose

fmin was greater than every other pathway.

In the representations for flux networks shown in this paper, the pathways connect-

ing the end point ligation states (unligated and fully loaded) are organized in terms of

their corresponding flux capacity. I note that while this framework may not provide

an “unique” result for flux capacity of different paths for a given binding/unbinding

transition, it provides a convenient way to rank the flux paths and allows us to de-

termine the relevant connected paths for a given transition. As shown in Fig. 6.4.2,

the binding and unbinding networks under same condition of ligand concentration

are symmetric except for a few low probability ligation states that are involved in

binding but not in unbinding transition. The dominant path for both binding and

unbinding networks are similar. At low Ca2+concentration, there is higher propensity

for sequential binding, that is majority of pathways follow the pattern unligated → singly ligated doubly ligated triply ligated fully loaded. At low, − → − → − → − intermediate, and high Ca2+concentration, the dominant binding route remains the

same. However, the relative probabilities of pathways are different at different con-

centration. Interestingly, at high Ca2+concentration, ligation states with loop-IV

occupied by a ligand have a higher probability for non-zero flux through them. In

115 nacdcmeiinbtenptwy thge iadcnetainadtetotal the and concentration is ligand there higher 6.4.3, at Fig. pathways in between shown competition As enhanced same. the remains flux path of amount transition greatest dominant the the carrying which in pattern symmetric ligand quite Un- high a at show CaM. transitions concentration of binding domain and concentration C-terminal ligand the low at in complementary transitions loops a binding provide of results step strengths These first binding step. very the last the of the in prediction in ligand IV a loop with to binds occupied ligand gets and III loop state, starting unligated transition the binding ther- with from for IV Similarly, loop earlier. for presented affinity is lowest observation calculations and This modynamic while III route. step loop binding of first typical affinity a very in greatest the step with last in consistent very off the comes in off IV comes loop III loop state, starting ligation transition loaded unbinding fully for the example, from For strength. binding loop’s binding the otat tlwCa low at contrast, unbind to bind. probability to higher probability probability a higher higher represents a a red represents represents transitions, green transitions, green unbinding binding and and For For bind states. the to unbind. of of probability to pair width higher a The a through stabilization. represents flux total red of their amount to at the according networks represent placed flux arrows (right) are binding states and ligation (left) The unbinding Simulated 6.4.2: Figure Total Free Energy (in kcal/mol) Unbound 0000 h niiultastosta osiueatasto ahaecnitn with consistent are path transition a constitute that transitions individual The 1000 Singly 0001 0010 0100 1001 1010 1100 Doubly 2+ 0110 0011 0101 ocnrto,teesae aemnrflxtruhthem. through flux minor have states these concentration, 0111 Triply 1110 1101 1011 Fully 1111 0.0 0.2 0.4 0.6 0.8 1.0 116 q− Total Free Energy (in kcal/mol) Unbound 0000 1000 Singly 0001 0010 0100 1010 1100 1001 Doubly 0110 0011 0101 0111 Triply 1110 1101 1011 Fully 1111 c = 1.0 0.0 0.2 0.4 0.6 0.8 K

d + . q flux capacity for a network gets distributed among its constituent binding transition

paths. At even higher ligand concentration the competition among constituent paths

1101 0101 1001 1.0 1.0 0001 0111 1111 0.8 0.8 0011 1011 0101 1100 0001 1001 0.6 1101 0.6 0100 + 1000 1110 − q 0110 q 0.4 0100 1100 0011 0.4 1000 0111 0000 1010 0000 1011 0010 0.2 0.2 0110 1111

0.0 0010 0.0 Total Free Energy (in kcal/mol) Total Free Energy (in kcal/mol) 1110 1010 Unbound Singly Doubly Triply Fully Unbound Singly Doubly Triply Fully

Figure 6.4.3: Simulated flux networks for unbinding transitions (left) at c < Kd and for binding transitions (right) at c > Kd. The ligation states are placed according to their total stabilization. The width of the arrows represent the amount of flux through a pair of states. For binding transitions, red represents a higher probability to bind and green represents a higher probability to unbind. For unbinding transitions, red represents a higher probability to unbind and green represents a higher probability to bind.

of a binding flux network increases. At such high concentration, binding strengths of loops I, II, and III become comparable and hence ligation states comprising of these loops have similar contribution to the overall binding flux.

6.5 Concluding Remarks

In this chapter, I extended previous work on Ca2+-binding simulations of the isolated domains of CaM to include the entire protein. In earlier work, I predicted heteroge- neous binding strengths of the binding loops of CaM, and rationalized overall higher binding cooperativity of the C-terminal domain as a consequence of greater difference in affinity of the domain’s constituent loops[141]. In this work, I offer detailed analy- sis of binding 4 Ca2+ions to the intact domain of CaM. The binding thermodynamics

117 provides an account of the multi-body cooperative interactions of simulated ensem- ble of unligated, partially ligated (singly, doubly, triply occupied), and fully loaded ligation states. In agreement with predictions for isolated domains, the binding loops in intact CaM have heterogeneous binding strengths. The simulations have positive interdomain interaction between N-domain and C-domain, and thereby results in an additional free energy of stabilization of the bound ensemble. Kinetic analysis pro- vide a complementary viewpoint of the overall binding process. In particular, kinetic binding simulations provide insight into the possible binding routes and the domi- nant path involved in terms of a flux description of the overall binding mechanism.

At higher ligand concentration, there is enhanced competition between the binding routes. However, the dominant route remains the same at low, intermediate, and high ligand concentration. Ligation states that have loop III occupied by a ligand has higher contribution to the overall flux. In harmony with thermodynamics results, ligation states with loop III occupied by Ca2+occurs early in a typical binding route, while states with loop IV occupied by Ca2+occurs late. At higher ligand concentra- tions, there is enhanced competition between the binding routes.

Although the thermodynamic and kinetic accounts of the Ca2+binding shed light into the microscopic nature of the mechanism, current experimental resolution to investigate these properties is rather limited. The predictions from thermodynamic and kinetics analysis presented here would benefit from more direct experimental validation.

118 Chapter 7

Outlook and future directions

The primary goal of the research presented in my dissertation is to elucidate how protein dynamics and flexibility affect its ligand binding mechanism using an ex- plicit molecular model. Using the well characterized Ca2+binding allosteric protein calmodulin (CaM), I address several aspects of ligand binding to a protein coupled with conformational change.

I first focus on characterizing conformational dynamics in the isolated domains of

CaM in the isolated CaM domains. This study examines the robustness of the idea that “a protein’s folding mechanism is determined by its native state topology” when applied to conformational transitions. The simulations suggest that two proteins with the same topology can have distinct transition mechanism. Additionally, the results are consistent with the results from an analytic model (with very different approxi- mations) that predicts that the C-terminal domain of CaM (cCaM) undergoes local partial unfolding (or cracking) along the transition route. In contrast, the N-terminal domain (nCaM) remains folded along the transition route. The simulations place the cracking mechanism into context of a broad temperature range which influences the stability of unfolded conformations. These results are also important in the context of understanding the function of CaM, which is a rich and well characterized system where folded state dynamics and flexibility are essential to its function.

119 The coarse-grained simulations of Ca2+binding to CaM help reconcile simple phe- nomenological models of binding thermodynamics and cooperativity with the under- lying conformational ensemble at the molecular level. The simulations capture the shift in population of protein conformational dynamics as a function of ligand concen- tration. To my knowledge, cooperative interactions associated with conformational change have never been computed directly from simulated binding titration data. To this end, I provide a reliable fitting strategy to extract the MWC model binding pa- rameters of individual binding loops of CaM. Remarkably, the classic MWC model

fit the binding probabilities well in spite of large fluctuations in the open and closed ensembles. This approach and analysis can be influential in future studies of binding multiple ligands coupled with conformational change of protein.

I extend the analysis scheme developed for Ca2+binding to isolated CaM domains to investigate Ca2+binding to intact CaM. Thermodynamic simulations predict het- erogeneous strengths of CaM’s binding loops. I characterize the contribution of multi- body interaction energies of ligation states on the overall stability and cooperativity of the fully bound ensemble. From an experimental point of view, these questions are unsettled and there are conflicting reports in the literature. Hence, the predictions from the thermodynamic simulations presented here can be tested experimentally which will provide deeper insight into the mechanistic details of CaM’s activation upon Ca2+-binding. The kinetics simulations provides a complementary description of the binding process in terms of stability of ligation states as a function of con- centration. The predictions from kinetic studies presented here would benefit from further experimental studies aimed at characterizing binding routes.

Looking towards the future, gaining insight into the fundamental mechanisms

120 governing allostery may eventually lead to development of treatment methods of protein-related disease and novel drug design. The work presented in this dissertation can also be extended to address some key questions about allostery. These coarse- grained simulations suggest that subtle changes in the topology of the domains of CaM lead to distinct conformational dynamics and thermodynamic binding properties. An important issue to address in the future is to investigate the effect of domain opening rate on the binding kinetics of the domains. The binding mechanism (conformational selection or induced fit) and the overall rate to bind Ca2+-to the domains can be sensitive to the timescales of open/closed conformational transition. This premise is particularly interesting to address because as described in Chapter 4, the effective domain opening rate of cCaM is 50 times slower than the domain opening rate of ≈ nCaM due to transient unfolding and refolding along the transition route of cCaM.

The binding rate may depend on the relative stability of the unfolded intermediate as well. From an experimental point of view, the kinetics of individual domains of CaM has been addressed rather rarely, with notable exceptions[196, 164, 153, 161]. Given the wide range of experimental reports on the binding kinetics of nCaM and cCaM, it will be interesting to propose experimentally testable predictions for Ca2+binding kinetics of nCaM and cCaM. This model can also be extended to study the overall binding rate as well as the flux through conformational selection and induced fit pathways for binding. One important question to address in this context is the extent to which global domain opening affects Ca2+binding to the domains. Kinetic aspects of binding a single ligand to a protein has been addressed by Takada and co-workers[149]. However, providing a comprehensive account of the detailed binding kinetics and binding rate for binding two ligands to each domain is considerably more

121 challenging. I aim to shed light on this rich kinetic question.

Addressing the binding pathways that nCaM and cCaM takes while binding Ca2+is

another important kinetic question. Analytic model has been developed to investi-

gate the kinetics of ligand binding to a protein that undergoes a conformational

change[29]. This model describes allosteric kinetics as a stochastically gated chemical

reaction where the binding rate depends on the relative timescale of gating and ligand

encounter time[226]. In such a scenario, when the rate of conformational dynamics is

slow compared to the ligand encounter time the binding proceeds via conformational

selection, whereas fast conformational dynamics results in a binding rate that is an

average of binding to the closed and open states. Here, an important issue is to in-

vestigate how the simulated binding mechanism depends on the timescale of ligand

encounter. The free energy barrier height can be directly varied in the simulations.

An alternative choice is to fix the rate of conformational transition and vary the lig-

and encounter rate via the frequency of Monte Carlo binding events. This approach

can be used to provide a complementary dynamical explanation of the kinetic bind-

ing mechanism in terms of the range and strength of the protein-ligand contacts[149].

The effect of unfolded intermediate population on the binding mechanism of cCaM

can be investigated. In particular, ligands may bind to the unfolded state and sub-

sequently stabilize the open state or it may be unaffected by the presence of the

unfolded state. In any case, the binding mechanism of nCaM and cCaM may include

competing and/or mixed contributions from conformational selection and induced fit

pathways.

For the work on Ca2+binding to intact CaM, I made the assumption that the two domains of CaM are strongly coupled. That is, both domains undergo conformational

122 transition in a concerted manner. Experimentally this question has not yet been fully resolved. Park and co-workers find that in the presence of Ca2+, the timescale of conformational transition in cCaM is more than two orders of magnitude faster than that of nCaM[153]. Since the timescale for Ca2+binding to cCaM is similar to the timescale of conformational transition in the absence of Ca2+[124], they suggest that

Ca2+binding to cCaM proceeds via conformational selection. On the other hand, binding to nCaM occurs on a much longer timescale. Negative coupling between the domains may be responsible in accounting for this kinetic difference. In this context, it will be interesting to relax the coupling between the domains or even consider negative coupling to probe its influence on the Ca2+binding mechanism of intact

CaM.

In the kinetics analysis described in Chapter 6, the conformational states (open or closed) of the simulated ligation states of CaM are ignored for simplicity. However, this only provides an approximate account of the binding kinetics. Inclusion of struc- tural order parameter that monitors the conformational state of the ligation states is indeed a detailed and complex kinetic problem. For example, one can characterize the kinetic flux of pathways through conformational selection and induced fit for each individual ligand. In light of the timescales of conformational transitions reported by Park and co-workers[153], it will be interesting to probe the detailed Ca2+binding mechanism to intact CaM and characterize how the extent of coupling between the domains affect the binding mechanism and overall binding rate. Capturing all aspects of such details within a single topology based model would be challenging but this will be the first step towards developing a comprehensive coarse-grained molecular model of Ca2+binding to CaM.

123 Appendices

124 Appendix A

Supplement for Chapter 4

A.1 Simulated probability of contact formation

To gain a better understanding of the probability of contact formation during the ensemble of sampled conformations during the conformational transition of nCaM and cCaM, I look at pairwise contact formation probability between the secondary structure components of the protein. As shown in Fig. A.1.1, the transition state ensemble of nCaM consists of few low probability pairwise contacts that are present in the native state, while for cCaM the ensemble of intermediate state consists of several regions of the secondary structure with low contact probability. Since contacts common to native open and closed states represent structural integrity of the protein during the transition process, I look at contact probability of the common contacts of nCaM in the transition state ensemble and cCaM in the intermediate state ensemble.

In particular, some tertiary contacts between helices A/D and helix A/C-D linker regions of nCaM have decreased probability in the transition state ensemble. In contrast, almost all the long-range tertiary contacts in cCaM have low probability of formation in the intermediate state ensemble. Both nCaM and cCaM retains their secondary structure elements during the transition. Loss of long-range tertiary interactions for nCaM is localized and limited, while cCaM shows global loss of almost all of its tertiary contacts.

125 1 1 H D 70 140

IV 0.8 I 0.8 60 I 130 G C 50

0.6 120 0.6 40 110 F

B 0.4 0.4 30 100 I II I

20 0.2 0.2 90 E A 10 80 0 0 A I B C II D E III F G IV H 10 20 30 40 50 60 70 80 90 100 110 120 130 140

(A) (B) Figure A.1.1: Simulated contact map for nCaM and cCaM showing the probability of formation of contacts between secondary structure elements of nCaM for the en- semble of conformations in the transition state (A), and cCaM in the ensemble of conformations in the intermediate state (B). nCaM shows limited loss of contacts in the transition state ensemble, while the intermediate of cCaM involves several regions of low contact probability (highlighted in pink). Color represents the probability of contact formation. Secondary structures of nCaM and cCaM are shown along x and y-axis.

126 Appendix B

Supplement for Chapter 5

B.1 Ligand-mediated contact pair distribution

1.0 1.0 1.0 (A) (B) (C) 0.8 0.8 0.8

0.6 0.6 0.6

Count 0.4 Count 0.4 Count 0.4

0.2 0.2 0.2

0.0 0.0 0.0 5.0 10.0 15.0 20.0 5.0 10.0 15.0 20.0 5.0 10.0 15.0 20.0 Distance Distance Distance

1.0 1.0 (D) (E) 0.8 0.8

0.6 0.6 open closed Count 0.4 Count 0.4

0.2 0.2

0.0 0.0 5.0 10.0 15.0 20.0 5.0 10.0 15.0 20.0 Distance Distance Figure B.1.1: Ligand-mediated contact pair distribution in the ensemble of open and closed states for loop I in nCaM. The x-axis represents contact distance (in A˚ ) and y-axis represents normalized count. The ligand-mediated contact pairs are (A) 17 — 27, (B) 18 — 22, (C) 19 — 27, (D) 21 — 27, (E) 22 — 28.

Distribution of distance between ligand-mediated contact pairs in the binding loops of nCaM and cCaM, as shown in Figs. SB.1.1, SB.1.2, SB.1.3, SB.1.4, show distinct peaks in the open and closed states. In particular, the peaks of distribution

127 in the open state correspond to a lower distance between the contact pairs compared to the peaks in the closed state. The distribution is calculated by computing the distance between a ligand-mediated contact pair for each frame along a trajectory

file, and calculating the distribution of distance in the open and closed states. The distribution shows distinct peaks illustrating that the distance between contact pairs is sensitive to the binding of ligand.

1.0 1.0 1.0 (A) (B) (C) 0.8 0.8 0.8

0.6 0.6 0.6

Count 0.4 Count 0.4 Count 0.4

0.2 0.2 0.2

0.0 0.0 0.0 5.0 10.0 15.0 20.0 5.0 10.0 15.0 20.0 5.0 10.0 15.0 20.0 Distance Distance Distance

1.0 1.0 (D) (E) 0.8 0.8

0.6 0.6 open closed Count 0.4 Count 0.4

0.2 0.2

0.0 0.0 5.0 10.0 15.0 20.0 5.0 10.0 15.0 20.0 Distance Distance Figure B.1.2: Ligand-mediated contact pair distribution in the ensemble of open and closed states for loop II in nCaM. The x-axis represents contact distance (in A˚ ) and y-axis represents normalized count. The ligand-mediated contact pairs are (A) 54 — 60, (B) 54 — 61, (C) 55 — 60, (D) 56 — 60, (E) 56 — 61.

128 1.0 1.0 1.0 (A) (B) (C) 0.8 0.8 0.8 0.6 0.6 0.6

Count 0.4 Count 0.4 Count 0.4 0.2 0.2 0.2 0.0 0.0 0.0 5.0 10.0 15.0 20.0 5.0 10.0 15.0 20.0 5.0 10.0 15.0 20.0 Distance Distance Distance 1.0 1.0 1.0 (D) (E) (F) 0.8 0.8 0.8 0.6 0.6 0.6

Count 0.4 Count 0.4 Count 0.4 0.2 0.2 0.2 0.0 0.0 0.0 5.0 10.0 15.0 20.0 5.0 10.0 15.0 20.0 5.0 10.0 15.0 20.0 Distance Distance Distance 1.0 1.0 (G) (H) 0.8 0.8 0.6 0.6 open

Count 0.4 Count 0.4 closed 0.2 0.2 0.0 0.0 5.0 10.0 15.0 20.0 5.0 10.0 15.0 20.0 Distance Distance Figure B.1.3: Ligand-mediated contact pair distribution in the ensemble of open and closed states for loop III in cCaM. The x-axis represents contact distance (in A˚ ) and y-axis represents normalized count. The ligand-mediated contact pairs are (A) 18 — 27, (B) 18 — 28, (C) 19 — 26, (D) 19 — 27, (E) 19 — 28, (F) 20 — 27, (G) 20 — 28, (H) 21 — 26.

129 1.0 1.0 1.0 (A) (B) (C) 0.8 0.8 0.8

0.6 0.6 0.6

Count 0.4 Count 0.4 Count 0.4

0.2 0.2 0.2

0.0 0.0 0.0 5.0 10.0 15.0 20.0 5.0 10.0 15.0 20.0 5.0 10.0 15.0 20.0 Distance Distance Distance

1.0 1.0 (D) (E) 0.8 0.8

0.6 0.6 open closed Count 0.4 Count 0.4

0.2 0.2

0.0 0.0 5.0 10.0 15.0 20.0 5.0 10.0 15.0 20.0 Distance Distance Figure B.1.4: Ligand-mediated contact pair distribution in the ensemble of open and closed states for loop IV in cCaM. The x-axis represents contact distance (in A˚ ) and y-axis represents normalized count. The ligand-mediated contact pairs are (A) 54 — 64, (B) 55 — 59, (C) 56 — 64, (D) 59 — 65, (E) 60 — 64.

130 B.2 One-dimensional simulated free energy

Simulated free energy in terms of one-dimensional progress coordinate, ∆Q =

Q Q , as shown in Fig.SB.2.1, illustrates that for both domains, the closed closed − open state is more stable in the unligated ensemble. Binding of first ligand stabilizes both the closed and open states but the high affinity open state is stabilized to a greater extent due to its structural compatibility with the ligand. In the fully saturated ensemble, the open state has greater stability.

0.0 0.0 (A) (D)

-4.0 -4.0

-8.0 -8.0 Free energy Free energy

-12.0 -12.0

0.0 0.0 (B) (E)

-4.0 -4.0

-8.0 -8.0 Free energy Free energy

-12.0 -12.0

0.0 0.0 (C) (F)

-4.0 -4.0

-8.0 -8.0 Free energy Free energy

-12.0 -12.0 -0.4 0.0 0.4 -0.4 0.0 0.4 ∆Q ∆Q Figure B.2.1: Simulated free energy for nCaM (A,B,C) and cCaM (D,E,F) corre- sponding to the ensemble of unligated (top), singly ligated (middle) and fully satu- rated (bottom) conformations. The x-axis represents simulated progress coordinate ∆Q = Q Q and the y-axis represents simulated free energy in units of k T . closed − open B

131 B.3 Exploring ligand contact strength and range

For the results presented in the paper, I made specific choices for the ligand-mediated

0 contact strength, clig = 2.5, and interaction range, σij = (0.1)rij, respectively. As shown in Fig.SB.3.1, with the increase of clig and σij, the value of Kd for individual loops decreases. However, the slope of the binding transition curve at a concentration for which pbound = 0.5 remains the same. Here, I only show results for binding loop I as an illustration.

1.0 1.0 1.0 0.03 2.5 0.1 0.8 5.0 0.8 0.5

0.6 0.6

0.4 0.4

0.2 0.2 Bound probability Bound probability

0.0 0.0

20 15 10 5 0 5 10 20 15 10 5 0 5 10 − − − − − − − − ln (c/c¯0) ln (c/c¯0)

Figure B.3.1: Simulated binding curves for loop I of nCaM with varying clig (left) and σij (right). Consistent behavior is observed for other binding loops (data not shown).

132 Bibliography

[1] G. S. Adair, With the collaboration of A. V. Bock, and Jr. H. Field. “THE

HEMOGLOBIN SYSTEM: VI. THE OXYGEN DISSOCIATION CURVE OF

HEMOGLOBIN”. In: Journal of Biological Chemistry 63.2 (1925), pp. 529–

545. eprint: http://www.jbc.org/content/63/2/529.full.pdf+html. url:

http://www.jbc.org/content/63/2/529.short.

[2] E. Alm and D. Baker. “Prediction of protein-folding mechanisms from free-

energy landscapes derived from native structures”. In: Proceedings of the Na-

tional Academy of Sciences of the United States of America 96.20 (1999),

pp. 11305–11310. eprint: http://www.pnas.org/content/96/20/11305.

full.pdf+html. url: http://www.pnas.org/content/96/20/11305.

abstract.

[3] Andrea Amadei, Antonius Linssen, and Herman JC Berendsen. “Essential dy-

namics of proteins”. In: Proteins: Structure, Function, and Bioinformatics 17.4

(1993), pp. 412–425.

[4] Christian B Anfinsen. “Principles that govern the folding of protein chains”.

In: Science 181.4096 (1973), pp. 223–230.

[5] Christian B Anfinsen et al. “The kinetics of formation of native ribonuclease

during oxidation of the reduced polypeptide chain”. In: Proceedings of the

National Academy of Sciences 47.9 (1961), pp. 1309–1314.

133 [6] Karunesh Arora and Charles L Brooks. “Large-scale allosteric conformational

transitions of adenylate kinase appear to involve a population-shift mecha-

nism”. In: Proceedings of the National Academy of Sciences 104.47 (2007),

pp. 18496–18501.

[7] AR Atilgan et al. “Anisotropy of fluctuation dynamics of proteins with an

elastic network model”. In: Biophysical Journal 80.1 (2001), pp. 505–515.

[8] Artem Badasyan, Zhirong Liu, and Hue Sun Chan. “Probing possible downhill

folding: native contact topology likely places a significant constraint on the

folding cooperativity of proteins with 40 residues”. In: Journal of molecular

biology 384.2 (2008), pp. 512–530.

[9] Ivet Bahar, Ali Rana Atilgan, and Burak Erman. “Direct evaluation of ther-

mal fluctuations in proteins using a single-parameter harmonic potential”. In:

Folding and Design 2.3 (1997), pp. 173–181.

[10] Ivet Bahar and AJ Rader. “Coarse-grained normal mode analysis in structural

biology”. In: Current Opinion in Structural Biology 15.5 (2005), pp. 586–592.

[11] Ivet Bahar et al. “Global dynamics of proteins: bridging between structure

and function”. In: Annual Review of Biophysics 39 (2010), p. 23.

[12] D. Baker. “A surprising simplicity to protein folding”. In: Nature 405.6782

(2000), pp. 39–42.

[13] G Barbato et al. “Backbone dynamics of calmodulin studied by 15N relaxation

using inverse detected two-dimensional NMR spectroscopy: the central helix

is flexible”. In: Biochemistry 31.23 (1992), pp. 5269–78.

134 [14] P M Bayley, W A Findlay, and S R Martin. “Target recognition by calmod-

ulin: dissecting the kinetics and affinity of interaction using short peptide se-

quences”. In: Protein Sci. 5.7 (1996), pp. 1215–1228.

[15] Maria Rosa Beccia et al. “Thermodynamics of Calcium binding to the Calmod-

ulin N-terminal domain to evaluate site-specific affinity constants and cooper-

ativity”. In: Journal of Biological Inorganic Chemistry 20.5 (2015), pp. 905–

919.

[16] Alexander Berezhkovskii, Gerhard Hummer, and Attila Szabo. “Reactive flux

and folding pathways in network models of coarse-grained protein dynamics”.

In: The Journal of chemical physics 130.20 (2009), 05B614.

[17] Robert B Best, Yng-Gwei Chen, and Gerhard Hummer. “Slow protein con-

formational dynamics from multiple experimental structures: the helix/sheet

transition of arc repressor”. In: Structure 13.12 (2005), pp. 1755–63. doi:

10.1016/j.str.2005.08.009.

[18] David D Boehr, Ruth Nussinov, and Peter E Wright. “The role of dynamic

conformational ensembles in biomolecular recognition”. In: Nat. Chem. Biol.

5.11 (2009), pp. 789–96. doi: 10.1038/nchembio.232.

[19] C Bohr, K Hasselbalch, and August Krogh. “Uber¨ einen in biologischer Beziehung

wichtigen Einfluss, den die Kohlens¨aurespannung des Blutes auf dessen Sauer-

stoffbindung ¨ubt”.In: Acta Physiologica 16.2 (1904), pp. 402–412.

[20] C Branden and J Tooze. “Introduction to protein science”. In: Garland, New

York (1991).

135 [21] Charles L Brooks, Jos´eN Onuchic, and David J Wales. “Taking a walk on a

landscape”. In: Science 293.5530 (2001), pp. 612–613.

[22] Scott Brown, Nicolas J Fawzi, and Teresa Head-Gordon. “Coarse-grained se-

quences for protein folding and design”. In: Proceedings of the National Academy

of Sciences 100.19 (2003), pp. 10712–10717.

[23] J Peter Browne et al. “The role of β-sheet interactions in domain stability,

folding, and target recognition reactions of calmodulin”. In: Biochemistry 36.31

(1997), pp. 9550–9561.

[24] M Brunori and G Careri. “Allosteric proteins: 40 Years with Monod-Wyman-

Changeux”. In: Proceedings of the Accademia Nazionale dei Lincei. Rome,

Italy (2006).

[25] J D Bryngelson and P G Wolynes. “Spin glasses and the statistical mechanics

of protein folding”. In: Proc. Natl. Acad. Sci. U. S. A. 84.21 (1987), pp. 7524–

8.

[26] J D Bryngelson et al. “Funnels, pathways, and the energy landscape of protein

folding: a synthesis”. In: Proteins 21.3 (1995), pp. 167–95. doi: 10.1002/prot.

340210302.

[27] JD BRYNGELSON and PG WOLYNES. “INTERMEDIATES AND BAR-

RIER CROSSING IN A RANDOM ENERGY-MODEL (WITH APPLICA-

TIONS TO PROTEIN FOLDING)”. In: J. Phys. Chem. 93.19 (Sept. 1989),

pp. 6902–6915.

136 [28] N.V. Buchete and G. Hummer. “Coarse master equations for peptide folding

dynamics”. In: The Journal of Physical Chemistry B 112.19 (2008), pp. 6057–

6069.

[29] Lu Cai and Huan Xiang Zhou. “Theory and simulation on the kinetics of

protein-ligand binding coupled to conformational change”. In: J. Chem. Phys.

134.10 (2011), p. 105101. doi: 10.1063/1.3561694.

[30] Hue Sun Chan and Ken A Dill. “Protein folding in the landscape perspective:

Chevron plots and non-Arrhenius kinetics”. In: Proteins: Structure, Function,

and Bioinformatics 30.1 (1998), pp. 2–33.

[31] Jean Pierre Changeux. “Allostery and the Monod-Wyman-Changeux model

after 50 years”. In: Annual Review of Biophysics 41 (2012), pp. 103–133.

[32] Jean-Pierre Changeux. “The feedback control mechanism of biosynthetic L-

threonine deaminase by L-isoleucine”. In: Cold Spring Harbor symposia on

quantitative biology. Vol. 26. Cold Spring Harbor Laboratory Press. 1961,

pp. 313–318.

[33] Jean-Pierre Changeux and Stuart J Edelstein. “Allosteric mechanisms of signal

transduction”. In: Science 308.5727 (2005), pp. 1424–1428.

[34] R Chattopadhyaya et al. “Calmodulin structure refined at 1.7 A resolution”.

In: J. Mol. Biol. 228.4 (1992), pp. 1177–92.

[35] Leslie L Chavez et al. “Multiple routes lead to the native state in the energy

landscape of the β-trefoil family”. In: Proceedings of the National Academy of

Sciences 103.27 (2006), pp. 10254–10258.

137 [36] Yng Gwei Chen and Gerhard Hummer. “Slow conformational dynamics and

unfolding of the calmodulin C-terminal domain”. In: J. Am. Chem. Soc. 129.9

(2007), pp. 2414–5. doi: 10.1021/ja067791a.

[37] Margaret S Cheung, Dmitri Klimov, and D Thirumalai. “Molecular crowd-

ing enhances native state stability and refolding rates of globular proteins”.

In: Proceedings of the National Academy of Sciences of the United States of

America 102.13 (2005), pp. 4753–4758.

[38] M.S. Cheung, A.E. Garc´ıa,and J.N. Onuchic. “Protein folding mediated by

solvation: water expulsion and formation of the hydrophobic core occur after

the structural collapse”. In: Proceedings of the National Academy of Sciences

of the United States of America 99.2 (2002), p. 685.

[39] J J Chou et al. “Solution structure of Ca(2+)-calmodulin reveals flexible hand-

like properties of its domains”. In: Nat. Struct. Biol. 8.11 (2001), pp. 990–7.

doi: 10.1038/nsb1101-990.

[40] J.W. Chu and G.A. Voth. “Coarse-grained free energy functions for studying

protein conformational changes: a double-well network model”. In: Biophys.

J. 93.11 (2007), pp. 3860–3871.

[41] Cecilia Clementi, Angel E Garcıa, and Jos´eN Onuchic. “Interplay among

tertiary contacts, secondary structure formation and side-chain packing in the

protein folding mechanism: all-atom representation study of protein L”. In:

Journal of molecular biology 326.3 (2003), pp. 933–954.

[42] Cecilia Clementi, Hugh Nymeyer, and Jos´eNelson Onuchic. “Topological and

energetic factors: what determines the structural details of the transition state

138 ensemble and “en-route” intermediates for protein folding? An investigation

for small globular proteins”. In: Journal of molecular biology 298.5 (2000),

pp. 937–953.

[43] Cecilia Clementi and Steven S Plotkin. “The effects of nonnative interactions

on protein folding rates: theory and simulation”. In: Protein Science 13.7

(2004), pp. 1750–1766.

[44] Marcio F Colombo, Donald C Rau, and V Adrian Parsegian. “Protein solvation

in allosteric regulation: a water effect on hemoglobin”. In: Science 256.5057

(1992), p. 655.

[45] Qiang Cui and Martin Karplus. “Allostery and cooperativity revisited”. In:

Protein Sci. 17.8 (2008), pp. 1295–307. doi: 10.1110/ps.03259908.

[46] Michael D Daily, Tarak J Upadhyaya, and Jeffrey J Gray. “Contact rear-

rangements form coupled networks from local motions in allosteric proteins”.

In: Proteins: Structure, Function, and Bioinformatics 71.1 (2008), pp. 455–

466.

[47] Kyle G Daniels, Yang Suo, and Terrence G Oas. “Conformational kinetics

reveals affinities of protein conformational states”. In: Proc. Natl. Acad. Sci.

U. S. A. 112.30 (2015), pp. 9352–9357.

[48] Oded Danziger et al. “Conversion of the allosteric transition of GroEL from

concerted to sequential by the single mutation Asp-155 U´ıAla”.¨ In: Proceedings ’ of the National Academy of Sciences 100.24 (2003), pp. 13797–13802.

139 [49] Avisek Das et al. “Exploring the conformational transitions of biomolecular

systems using a simple two-state anisotropic network model”. In: PLoS Com-

put. Biol. 10.4 (2014), e1003521. doi: 10.1371/journal.pcbi.1003521.

[50] K A Dill and H S Chan. “From Levinthal to pathways to funnels”. In: Nat.

Struct. Biol. 4.1 (1997), pp. 10–9.

[51] TAJ Duke and Dennis Bray. “Heightened sensitivity of a lattice of membrane

receptors”. In: Proceedings of the National Academy of Sciences 96.18 (1999),

pp. 10104–10108.

[52] William A Eaton et al. “Is cooperative oxygen binding by hemoglobin really

understood?” In: Nature Structural & Molecular Biology 6.4 (1999), pp. 351–

358.

[53] Stuart J Edelstein et al. “A kinetic mechanism for nicotinic acetylcholine re-

ceptors based on multiple allosteric transitions”. In: Biological cybernetics 75.5

(1996), pp. 361–379.

[54] J Even¨as,A Malmendal, and M Akke. “Dynamics of the transition between

open and closed conformations in a calmodulin C-terminal domain mutant”.

In: Structure 9.3 (2001), pp. 185–95.

[55] J Even¨aset al. “Backbone dynamics and energetics of a calmodulin domain

mutant exchanging between closed and open conformations”. In: J. Mol. Biol.

289.3 (1999), pp. 603–17. doi: 10.1006/jmbi.1999.2770.

[56] Johan Even¨aset al. “Ca2+ binding and conformational changes in a calmod-

ulin domain”. In: Biochemistry 37.39 (1998), pp. 13744–13754.

140 [57] Eran Eyal and Ivet Bahar. “Toward a molecular understanding of the anisotropic

response of proteins to external forces: insights from elastic network models”.

In: Biophys J 94.9 (2008), pp. 3424–35. doi: 10.1529/biophysj.107.120733.

[58] Diego U Ferreiro et al. “The energy landscape of modular repeat proteins:

topology determines folding mechanism in the ankyrin family”. In: Journal of

molecular biology 354.3 (2005), pp. 679–692.

[59] A.V. Finkelstein et al. “Understanding the folding rates and folding nuclei of

globular proteins”. In: Current Protein and Peptide Science 8.6 (2007). cited

By (since 1996) 1, pp. 521–536. url: http://www.scopus.com/inward/

record.url?eid=2-s2.0-39349111249&partnerID=40.

[60] H Frauenfelder, S G Sligar, and P G Wolynes. “The energy landscapes and

motions of proteins”. In: Science 254.5038 (1991), pp. 1598–603.

[61] O.V. Galzitskaya and A.V. Finkelstein. “A theoretical search for folding/unfolding

nuclei in three-dimensional protein structures”. In: Proceedings of the National

Academy of Sciences 96.20 (1999), pp. 11299–11304.

[62] O.V. Galzitskaya, S.O. Garbuzynskiy, and A.V. Finkelstein. “Theoretical study

of protein folding: Outlining folding nuclei and estimation of protein fold-

ing rates”. In: Journal of Physics Condensed Matter 17.18 (2005), S1539–

S1551. url: http://www.scopus.com/inward/record.url?eid=2-s2.0-

24144492709&partnerID=40.

[63] Debabani Ganguly and Jianhan Chen. “Atomistic details of the disordered

states of KID and pKID. Implications in coupled binding and folding”. In: J.

Am. Chem. Soc. 131.14 (2009), pp. 5214–23. doi: 10.1021/ja808999m.

141 [64] John C Gerhart and Arthur B Pardee. “The enzymology of control by feedback

inhibition”. In: J Biol Chem 237 (1962), pp. 891–896.

[65] Mark Gerstein and Nathaniel Echols. “Exploring the range of protein flexibil-

ity, from a structural proteomics perspective”. In: Curr. Opin. Chem. Biol. 8.1

(2004), pp. 14–9. doi: 10.1016/j.cbpa.2003.12.006.

[66] J Gifford, M Walsh, and H Vogel. “Structures and metal-ion-binding properties

of the Ca2+-binding helix-loop-helix EF-hand motifs”. In: Biochem. J. 405

(2007), pp. 199–221.

[67] N Go. “Theoretical studies of protein folding”. In: Annu. Rev. Biophys. Bioeng.

12 (1983), pp. 183–210. doi: 10.1146/annurev.bb.12.060183.001151.

[68] Chern Sing Goh, Duncan Milburn, and Mark Gerstein. “Conformational changes

associated with protein-protein interactions”. In: Curr. Opin. Struct. Biol. 14.1

(2004), pp. 104–9. doi: 10.1016/j.sbi.2004.01.005.

[69] Christian Gorba, Osamu Miyashita, and Florence Tama. “Normal-mode flex-

ible fitting of high-resolution structure of biological molecules toward one-

dimensional low-resolution data”. In: Biophysical journal 94.5 (2008), pp. 1589–

1599.

[70] Shachi Gosavi et al. “Topological frustration and the folding of interleukin-1β”.

In: Journal of molecular biology 357.3 (2006), pp. 986–996.

[71] K Gunasekaran, Buyong Ma, and Ruth Nussinov. “Is allostery an intrinsic

property of all dynamic proteins?” In: Proteins 57.3 (2004), pp. 433–43. doi:

10.1002/prot.20232.

142 [72] Kannan Gunasekaran, Chung-Jung Tsai, and Ruth Nussinov. “Analysis of

ordered and disordered protein complexes reveals structural features discrimi-

nating between stable and unstable monomers”. In: J. Mol. Biol. 341.5 (2004),

pp. 1327–41. doi: 10.1016/j.jmb.2004.07.002.

[73] Joseph A Hegler, Patrick Weinkam, and Peter G Wolynes. “The spectrum of

biomolecular states and motions”. In: HFSP journal 2.6 (2008), pp. 307–313.

[74] E.R. Henry and W.A. Eaton. “Combinatorial modeling of protein folding ki-

netics: Free energy profiles and rates”. In: Chemical Physics 307.2-3 SPEC.ISS.

(2004), pp. 163–185. url: http://www.scopus.com/inward/record.url?

eid=2-s2.0-8644232696&partnerID=40.

[75] Katherine Henzler-Wildman and Dorothee Kern. “Dynamic personalities of

proteins”. In: Nature 450.7172 (2007), pp. 964–72. doi: 10.1038/nature06522.

[76] Katherine A Henzler-Wildman et al. “Intrinsic motions along an enzymatic

reaction trajectory”. In: Nature 450.7171 (2007), pp. 838–844.

[77] Archibald Vivian Hill. “The possible effects of the aggregation of the molecules

of haemoglobin on its dissociation curves”. In: J Physiol (Lond) 40 (1910),

pp. 4–7.

[78] Ronald D Hills and Charles L Brooks. “Subdomain competition, cooperativity,

and topological frustration in the folding of CheY”. In: Journal of molecular

biology 382.2 (2008), pp. 485–495.

[79] R.D. Hills Jr. and C.L. Brooks III. “Insights from coarse-grained go models for

protein folding and dynamics”. In: International Journal of Molecular Sciences

143 10.3 (2009), pp. 889–905. url: http://www.scopus.com/inward/record.

url?eid=2-s2.0-63449129633&partnerID=40.

[80] Vincent J Hilser and E Brad Thompson. “Intrinsic disorder as a mechanism

to optimize allosteric coupling in proteins”. In: Proc. Natl. Acad. Sci. U. S. A.

104.20 (2007), pp. 8311–5. doi: 10.1073/pnas.0700329104.

[81] Vincent J Hilser, James O Wrabl, and Hesam N Motlagh. “Structural and en-

ergetic basis of allostery”. In: Annual Review of Biophysics 41 (2012), pp. 585–

609.

[82] Konrad Hinsen. “Analysis of domain motions by approximate normal mode

calculations”. In: Proteins Structure Function and Genetics 33.3 (1998), pp. 417–

429.

[83] Konrad Hinsen and Gerald R Kneller. “Solvent effects in the slow dynamics of

proteins”. In: Proteins: Structure, Function, and Bioinformatics 70.4 (2008),

pp. 1235–1242.

[84] JD Honeycutt and D Thirumalai. “Metastability of the folded states of globular

proteins”. In: Proceedings of the National Academy of Sciences 87.9 (1990),

pp. 3526–3529.

[85] William Humphrey, Andrew Dalke, and Klaus Schulten. “VMD: visual molec-

ular dynamics”. In: Journal of Molecular Graphics 14.1 (1996), pp. 33–38.

[86] Changbong Hyeon, Ruxandra I Dima, and D Thirumalai. “Pathways and ki-

netic barriers in mechanical unfolding and refolding of RNA and proteins”. In:

Structure 14.11 (2006), pp. 1633–1645.

144 [87] Changbong Hyeon, George H Lorimer, and D Thirumalai. “Dynamics of al-

losteric transitions in GroEL”. In: Proceedings of the National Academy of

Sciences 103.50 (2006), pp. 18939–18944.

[88] Tatyana I Igumenova, Andrew L Lee, and A Joshua Wand. “Backbone and side

chain dynamics of mutant calmodulin-peptide complexes”. In: Biochemistry

44.38 (2005), pp. 12627–39. doi: 10.1021/bi050832f.

[89] Kazuhito Itoh and Masaki Sasai. “Entropic mechanism of large fluctuation

in allosteric transition”. In: Proc. Natl. Acad. Sci. U. S. A. 107.17 (2010),

pp. 7775–80. doi: 10.1073/pnas.0912978107.

[90] Kazuhito Itoh and Masaki Sasai. “Statistical mechanics of protein allostery:

roles of backbone and side-chain structural fluctuations”. In: J. Chem. Phys.

134.12 (2011), p. 125102.

[91] Jie Jiang et al. “Site-specific modification of calmodulin Ca2+ affinity tunes

the skeletal muscle ryanodine receptor activation profile”. In: Biochem. J. 432

(2010), pp. 89–99.

[92] Gozde Kar et al. “Allostery and population shift in drug discovery”. In: Curr

Opin Pharmacol 10.6 (2010), pp. 715–22. doi: 10.1016/j.coph.2010.09.002.

[93] John Karanicolas and Charles L Brooks. “The origins of asymmetry in the

folding transition states of protein L and protein G”. In: Protein Science 11.10

(2002), pp. 2351–2361.

[94] John Karanicolas and Charles L. Brooks III. “Improved Go-like Models Demon-

strate the Robustness of Protein Folding Mechanisms Towards Non-native

Interactions”. In: Journal of Molecular Biology 334.2 (2003), pp. 309 –325.

145 issn: 0022-2836. doi: DOI : 10 . 1016 / j . jmb . 2003 . 09 . 047. url: http :

/ / www . sciencedirect . com / science / article / B6WK7 - 49XP57D - G / 2 /

aee0124e894188bc3a43256be90e2d46.

[95] Arthur Karlin. “On the application of “a plausible model” of allosteric proteins

to the receptor for acetylcholine”. In: Journal of theoretical biology 16.2 (1967),

pp. 306–320.

[96] Hiroo Kenzaki et al. “Cafemol: A coarse-grained biomolecular simulator for

simulating proteins at work”. In: Journal of Chemical Theory and Computation

7.6 (2011), pp. 1979–1989.

[97] Moon K Kim, Gregory S Chirikjian, and Robert L Jernigan. “Elastic models

of conformational transitions in macromolecules”. In: Journal of Molecular

Graphics and Modelling 21.2 (2002), pp. 151–160.

[98] Rachel E Klevit Rachel E et al. “1H-NMR studies of calmodulin”. In: European

Journal of Biochemistry 139.1 (1984), pp. 109–114.

[99] N Koga and S Takada. “Roles of native topology and chain-length scaling in

protein folding: a simulation study with a Go-like model”. In: J Mol Biol 313.1

(2001), pp. 171–80. doi: 10.1006/jmbi.2001.5037.

[100] Nobuyasu Koga et al. “Paddling mechanism for the substrate translocation by

AAA+ motor revealed by multiscale molecular simulations”. In: Proceedings

of the National Academy of Sciences 106.43 (2009), pp. 18237–18242.

[101] DE Koshland Jr, G Nemethy, and D Filmer. “Comparison of Experimental

Binding Data and Theoretical Models in Proteins Containing Subunits*”. In:

Biochemistry 5.1 (1966), pp. 365–385.

146 [102] Robert H Kretsinger and Robert H Wasserman. “Structure and evolution of

calcium-modulated protein”. In: Critical Reviews in Biochemistry and Molec-

ular Biology 8.2 (1980), pp. 119–174.

[103] H Kuboniwa et al. “Solution structure of calcium-free calmodulin”. In: Nat.

Struct. Biol. 2.9 (1995), pp. 768–76.

[104] S Kumar et al. “Folding and binding cascades: dynamic landscapes and popu-

lation shifts”. In: Protein Sci. 9.1 (2000), pp. 10–9. doi: 10.1110/ps.9.1.10.

[105] Shankar Kumar et al. “The weighted histogram analysis method for free-energy

calculations on biomolecules. I. The method”. In: Journal of computational

chemistry 13.8 (1992), pp. 1011–1021.

[106] Sibsankar Kundu et al. “Dynamics of proteins in crystals: comparison of ex-

periment with simple models”. In: Biophysical journal 83.2 (2002), pp. 723–

732.

[107] Massimo Lai et al. “Modulation of calmodulin lobes by different targets: an al-

losteric model with hemiconcerted conformational transitions”. In: PLoS Com-

put. Biol. 11.1 (2015), e1004063.

[108] Peter E Leopold, Mauricio Montal, and Jos´eN Onuchic. “Protein folding fun-

nels: a kinetic approach to the sequence-structure relationship.” In: Proceedings

of the National Academy of Sciences 89.18 (1992), pp. 8721–8725.

[109] Cyrus Levinthal. “Are there pathways for protein folding?” In: Journal de

chimie physique 65 (1968), pp. 44–45.

147 [110] Michael Levitt. “A simplified representation of protein conformations for rapid

simulation of protein folding”. In: Journal of molecular biology 104.1 (1976),

pp. 59–107.

[111] Michael Levitt and Arieh Warshel. “Computer simulation of protein folding”.

In: Nature 253.5494 (1975), pp. 694–698.

[112] Yaakov Levy, Jos´eN Onuchic, and Peter G Wolynes. “Fly-casting in protein-

DNA binding: frustration between protein folding and electrostatics facilitates

target recognition”. In: J. Am. Chem. Soc. 129.4 (2007), pp. 738–9. doi: 10.

1021/ja065531n.

[113] Yaakov Levy, Peter G Wolynes, and Jos´eN Onuchic. “Protein topology deter-

mines binding mechanism”. In: Proc. Natl. Acad. Sci. U. S. A. 101.2 (2004),

pp. 511–6. doi: 10.1073/pnas.2534828100.

[114] Yaakov Levy et al. “A survey of flexible protein binding mechanisms and their

transition states using native topology based energy landscapes”. In: J. Mol.

Biol. 346.4 (2005), pp. 1121–45. doi: 10.1016/j.jmb.2004.12.021.

[115] Wenfei Li, Wei Wang, and Shoji Takada. “Energy landscape views for inter-

plays among folding, binding, and allostery of calmodulin domains”. In: Proc.

Natl. Acad. Sci. U. S. A. 111.29 (2014), pp. 10550–10555.

[116] Sara Linse, Anna Helmersson, and Sture Forsen. “Calcium binding to calmod-

ulin and its globular domains.” In: Journal of Biological Chemistry 266.13

(1991), pp. 8050–8054.

148 [117] Christian L¨owet al. “Conformational switch upon phosphorylation: human

CDK inhibitor p19INK4d between the native and partially folded state”. In:

ACS chemical biology 4.1 (2008), pp. 53–63.

[118] Mingyang Lu and Jianpeng Ma. “A minimalist network model for coarse-

grained normal mode analysis and its application to biomolecular x-ray crystal-

lography”. In: Proceedings of the National Academy of Sciences 105.40 (2008),

pp. 15358–15363.

[119] Qiang Lu and Jin Wang. “Single molecule conformational dynamics of adeny-

late kinase: energy landscape, structural correlations, and transition state en-

sembles”. In: J. Am. Chem. Soc. 130.14 (2008), pp. 4772–4783.

[120] Patrik Lundstr¨omand Mikael Akke. “Quantitative analysis of conformational

exchange contributions to 1H-15N multiple-quantum relaxation using field-

dependent measurements. Time scale and structural characterization of ex-

change in a calmodulin C-terminal domain mutant”. In: J. Am. Chem. Soc.

126.3 (2004), pp. 928–35. doi: 10.1021/ja037529r.

[121] Patrik Lundstr¨om,Frans A A Mulder, and Mikael Akke. “Correlated dynam-

ics of consecutive residues reveal transient and cooperative unfolding of sec-

ondary structure in proteins”. In: Proc. Natl. Acad. Sci. U. S. A. 102.47 (2005),

pp. 16984–9. doi: 10.1073/pnas.0504361102.

[122] B Ma et al. “Folding funnels and binding mechanisms”. In: Protein Eng. 12.9

(1999), pp. 713–20.

149 [123] Alex D MacKerell Jr et al. “All-atom empirical potential for molecular mod-

eling and dynamics studies of proteins”. In: The journal of physical chemistry

B 102.18 (1998), pp. 3586–3616.

[124] A Malmendal et al. “Structural dynamics in the C-terminal domain of calmod-

ulin at low calcium levels”. In: J. Mol. Biol. 293.4 (1999), pp. 883–99. doi:

10.1006/jmbi.1999.3188.

[125] Paul Maragakis and Martin Karplus. “Large amplitude conformational change

in proteins explored with a plastic network model: adenylate kinase”. In: J.

Mol. Biol. 352.4 (2005), pp. 807–22. doi: 10.1016/j.jmb.2005.07.031.

[126] Sarah Marzen, Hernan G Garcia, and Rob Phillips. “Statistical mechanics of

Monod-Wyman-Changeux (MWC) models”. In: J. Mol. Biol. 425.9 (2013),

pp. 1433–60. doi: 10.1016/j.jmb.2013.03.013.

[127] L Masino, S R Martin, and P M Bayley. “Ligand binding and thermodynamic

stability of a multidomain protein, calmodulin”. In: Protein Sci. 9.8 (2000),

pp. 1519–29. doi: 10.1110/ps.9.8.1519.

[128] Silvina Matysiak and Cecilia Clementi. “Optimal combination of theory and

experiment for the characterization of the protein folding landscape of S6: how

far can a minimalist model go?” In: Journal of molecular biology 343.1 (2004),

pp. 235–248.

[129] Julien Maupetit, R Gautier, and Pierre Tuff´ery.“SABBAC: online Structural

Alphabet-based protein BackBone reconstruction from Alpha-Carbon trace”.

In: Nucleic acids research 34.suppl 2 (2006), W147–W151.

150 [130] Andreas May and Martin Zacharias. “Energy minimization in low-frequency

normal modes to efficiently allow for global flexibility during systematic protein–

protein docking”. In: Proteins: Structure, Function, and Bioinformatics 70.3

(2008), pp. 794–809.

[131] Bernardo A Mello and Yuhai Tu. “An allosteric model for heterogeneous re-

ceptor complexes: understanding bacterial chemotaxis responses to multiple

stimuli”. In: Proceedings of the National Academy of Sciences of the United

States of America 102.48 (2005), pp. 17354–17359.

[132] Philipp Metzner, Christof Sch¨utte,and Eric Vanden-Eijnden. “Transition path

theory for Markov jump processes”. In: Multiscale Modeling & Simulation 7.3

(2009), pp. 1192–1219.

[133] Tobias Meyer, David Holowka, and Lubert Stryer. “Highly cooperative opening

of calcium channels by inositol 1, 4, 5-trisphosphate”. In: Science 240.4852

(1988), p. 653.

[134] O Miyashita, J N Onuchic, and P G Wolynes. “Nonlinear elasticity, protein-

quakes, and the energy landscapes of functional transitions in proteins”. In:

Proc. Natl. Acad. Sci. U. S. A. 100.22 (2003), pp. 12570–5. doi: 10.1073/

pnas.2135471100.

[135] Osamu Miyashita, Peter G Wolynes, and Jos´eN Onuchic. “Simple energy

landscape model for the kinetics of functional transitions in proteins”. In: J.

Phys. Chem. B 109.5 (2005), pp. 1959–69. doi: 10.1021/jp046736q.

[136] J Monod, J Wyman, and J P Changeux. “On the allosteric transitions: a

plausable model”. In: J. Mol. Biol. 12 (1965), pp. 88–118.

151 [137] Hesam N Motlagh et al. “The ensemble nature of allostery”. In: Nature 508.7496

(2014), pp. 331–339.

[138] Franklin J Moy et al. “Assignments, secondary structure, global fold, and

dynamics of chemotaxis Y protein using three-and four-dimensional heteronu-

clear (13C, 15N) NMR spectroscopy”. In: Biochemistry 33.35 (1994), pp. 10731–

10742.

[139] V Mu˜nozand W A Eaton. “A simple model for calculating the kinetics of

protein folding from three-dimensional structures”. In: Proc. Natl. Acad. Sci.

U. S. A. 96.20 (1999), pp. 11311–6.

[140] Tarek S Najdi et al. “Application of a generalized MWC model for the mathe-

matical simulation of metabolic pathways regulated by allosteric enzymes”. In:

Journal of bioinformatics and computational biology 4.02 (2006), pp. 335–355.

[141] Prithviraj Nandigrami and John J Portman. “Coarse-grained molecular simu-

lations of allosteric cooperativity”. In: The Journal of chemical physics 144.10

(2016), p. 105101.

[142] Prithviraj Nandigrami and John J Portman. “Comparing allosteric transitions

in the domains of calmodulin through coarse-grained simulations”. In: The

Journal of chemical physics 144.10 (2016), p. 105102.

[143] Melanie R Nelson and Walter J Chazin. “Structures of EF-hand Ca 2+-binding

proteins: diversity in the organization, packing and response to Ca 2+ bind-

ing”. In: Biometals 11.4 (1998), pp. 297–318.

152 [144] Rhonda A Newman et al. “Interdomain cooperativity of calmodulin bound to

melittin preferentially increases calcium affinity of sites I and II”. In: Proteins:

Structure, Function, and Bioinformatics 71.4 (2008), pp. 1792–1812.

[145] Frank No´eand Stefan Fischer. “Transition networks for modeling the kinetics

of conformational change in macromolecules”. In: Current opinion in structural

biology 18.2 (2008), pp. 154–162.

[146] Frank No´eet al. “Constructing the equilibrium ensemble of folding path-

ways from short off-equilibrium simulations”. In: Proceedings of the National

Academy of Sciences 106.45 (2009), pp. 19011–19016.

[147] Ruth Nussinov and Chung-Jung Tsai. “The different ways through which

specificity works in orthosteric and allosteric drugs”. In: Current pharmaceu-

tical design 18.9 (2012), pp. 1311–1316.

[148] Susan E O’Donnell et al. “Thermodynamics and conformational change gov-

erning domain–domain interactions of calmodulin”. In: Methods in Enzymology

466 (2009), pp. 503–526.

[149] K Okazaki and Shoji Takada. “Dynamic energy landscape view of coupled

binding and protein conformational change: induced-fit versus population-shift

mechanisms”. In: Proc. Natl. Acad. Sci. U. S. A. 105.32 (2008), pp. 11182–7.

doi: 10.1073/pnas.0802524105.

[150] K Okazaki et al. “Multiple-basin energy landscapes for large-amplitude con-

formational motions of proteins: Structure-based molecular dynamics simula-

tions”. In: Proc. Natl. Acad. Sci. U. S. A. 103.32 (2006), pp. 11844–9. doi:

10.1073/pnas.0604375103.

153 [151] Jos´eNelson Onuchic, Zaida Luthey-Schulten, and Peter G Wolynes. “Theory

of protein folding: the energy landscape perspective”. In: Annual review of

physical chemistry 48.1 (1997), pp. 545–600.

[152] Emanuele Paci, Michele Vendruscolo, and Martin Karplus. “Validity of G¯o

models: comparison with a solvent-shielded empirical energy decomposition”.

In: Biophysical journal 83.6 (2002), pp. 3032–3038.

[153] H.Y. Park et al. “Conformational changes of calmodulin upon Ca2+ binding

studied with a microfluidic mixer”. In: Proceedings of the National Academy

of Sciences 105.2 (2008), p. 542.

[154] Linus Pauling. “The oxygen equilibrium of hemoglobin and its structural inter-

pretation”. In: Proceedings of the National Academy of Sciences 21.4 (1935),

pp. 186–191.

[155] Linus Pauling, Robert B Corey, and Roger Hayward. The structure of protein

molecules. 1954.

[156] Susan Pedigo and Madeline A Shea. “Discontinuous equilibrium titrations

of cooperative calcium binding to calmodulin monitored by 1-D 1H-nuclear

magnetic resonance spectroscopy”. In: Biochemistry 34.33 (1995), pp. 10676–

10689.

[157] Chad M Petit et al. “Hidden dynamic allostery in a PDZ domain”. In: Pro-

ceedings of the National Academy of Sciences 106.43 (2009), pp. 18249–18254.

[158] Nataliya Popovych et al. “Dynamically driven protein allostery”. In: Nature

structural & molecular biology 13.9 (2006), pp. 831–838.

154 [159] JJ Portman, S Takada, and PG Wolynes. “Microscopic theory of protein fold-

ing rates. II. Local reaction coordinates and chain dynamics”. In: J. Chem.

Phys. 114.11 (Mar. 2001), pp. 5082–5096.

[160] JJ Portman, S. Takada, and PG Wolynes. “Variational theory for site re-

solved protein folding free energy surfaces”. In: Phys. Rev. Lett. 81.23 (1998),

pp. 5237–5240.

[161] E.S. Price, M. Aleksiejew, and C.K. Johnson. “FRET-FCS Detection of Intra-

Lobe Dynamics in Calmodulin”. In: The Journal of Physical Chemistry B

(2011).

[162] Lidia Prieto and Antonio Rey. “Influence of the native topology on the folding

barrier for small proteins”. In: The Journal of chemical physics 127.17 (2007),

11B601.

[163] Xianghong Qi and John J Portman. “Capillarity-like growth of protein folding

nuclei”. In: Proc. Natl. Acad. Sci. U. S. A. 105.32 (2008), pp. 11164–9. doi:

10.1073/pnas.0711527105.

[164] Carl Roland Rabl et al. “Temperature jump kinetic study of the stability of

apo-calmodulin”. In: Biophys. Chem. 101-102 (2002), pp. 553–64.

[165] Sean E Reichheld, Zhou Yu, and Alan R Davidson. “The induction of folding

cooperativity by ligand binding drives the allosteric response of tetracycline

repressor”. In: Proceedings of the National Academy of Sciences 106.52 (2009),

pp. 22263–22268.

155 [166] Demian Riccardi, Qiang Cui, and George N Phillips. “Application of elastic

network models to proteins in the crystalline state”. In: Biophysical journal

96.2 (2009), pp. 464–475.

[167] Piotr Rotkiewicz and Jeffrey Skolnick. “Fast procedure for reconstruction of

full-atom protein models from reduced representations”. In: Journal of com-

putational chemistry 29.9 (2008), pp. 1460–1465.

[168] Melinda Roy et al. “The native energy landscape for interleukin-1β. Modula-

tion of the population ensemble through native-state topology”. In: Journal of

molecular biology 348.2 (2005), pp. 335–347.

[169] Hays S Rye et al. “GroEL-GroES cycling: ATP and nonnative polypeptide

direct alternation of folding-active rings”. In: Cell 97.3 (1999), pp. 325–338.

[170] Dina Schneidman-Duhovny, Ruth Nussinov, and Haim J Wolfson. “Automatic

prediction of protein interactions with large scale motion”. In: Proteins: Struc-

ture, Function, and Bioinformatics 69.4 (2007), pp. 764–773.

[171] Travis P Schrank, D Wayne Bolen, and Vincent J Hilser. “Rational modulation

of conformational fluctuations in adenylate kinase reveals a local unfolding

mechanism for allostery and functional adaptation in proteins”. In: Proceedings

of the National Academy of Sciences 106.40 (2009), pp. 16984–16989.

[172] Travis P Schrank et al. “Strategies for the thermodynamic characterization

of linked binding/local folding reactions within the native state application

to the LID domain of adenylate kinase from Escherichia coli.” In: Methods in

enzymology 492 (2010), pp. 253–282.

156 [173] Joan-Emma Shea, Jos´eN Onuchic, and Charles L Brooks. “Exploring the

origins of topological frustration: design of a minimally frustrated model of

fragment B of protein A”. In: Proceedings of the National Academy of Sciences

96.22 (1999), pp. 12512–12517.

[174] Madeline A Shea, Amy S Verhoeven, and Susan Pedigo. “Calcium-induced

interactions of calmodulin domains revealed by quantitative thrombin foot-

printing of Arg37 and Arg106”. In: Biochemistry 35.9 (1996), pp. 2943–2957.

[175] Julia M Shifman and Stephen L Mayo. “Exploring the origins of binding speci-

ficity through the computational redesign of calmodulin”. In: Proc. Natl. Acad.

Sci. U. S. A. 100.23 (2003), pp. 13274–9. doi: 10.1073/pnas.2234277100.

[176] Jun Shimada and Eugene I Shakhnovich. “The ensemble folding kinetics of

protein G from an all-atom Monte Carlo simulation”. In: Proceedings of the

National Academy of Sciences 99.17 (2002), pp. 11175–11180.

[177] B A Shoemaker, J Wang, and P G Wolynes. “Structural correlations in protein

folding funnels”. In: Proc. Natl. Acad. Sci. U. S. A. 94.3 (1997), pp. 777–82.

[178] B.A. Shoemaker, J. Wang, and P.G. Wolynes. “Exploring structures in protein

folding funnels with free energy functionals: the transition state ensemble”. In:

Journal of molecular biology 287.3 (1999), pp. 675–694.

[179] Benjamin A. Shoemaker and Peter G. Wolynes. “Exploring structures in pro-

tein folding funnels with free energy functionals: the denatured ensemble”.

In: Journal of Molecular Biology 287.3 (1999), pp. 657 –674. issn: 0022-2836.

doi: DOI:10.1006/jmbi.1999.2612. url: http://www.sciencedirect.com/

science/article/B6WK7-45R883G-R4/2/32345d0c68746b329bdf897a8440d952.

157 [180] Diwakar Shukla, Ariana Peck, and Vijay S Pande. “Conformational hetero-

geneity of the calmodulin binding interface”. In: Nature communications 7

(2016).

[181] Robert G Smock and Lila M Gierasch. “Sending signals dynamically”. In:

Science 324.5924 (2009), pp. 198–203. doi: 10.1126/science.1169377.

[182] ND Socci, Jos´eN Onuchic, and Peter G Wolynes. “Diffusive dynamics of the

reaction coordinate for protein folding funnels”. In: The Journal of chemical

physics 104.15 (1996), pp. 5860–5868.

[183] Guang Song and Robert L Jernigan. “vGNM: a better model for understanding

the dynamics of proteins in crystals”. In: Journal of molecular biology 369.3

(2007), pp. 880–893.

[184] Brenda R Sorensen and Madeline A Shea. “Interactions between domains of

apo calmodulin alter calcium binding and stability”. In: Biochemistry 37.12

(1998), pp. 4244–4253.

[185] ER Stadtman. “Allosteric regulation of activity”. In: Advances in En-

zymology and Related Areas of Molecular Biology, Volume 28 (1966), pp. 41–

154.

[186] Melanie I Stefan, Stuart J Edelstein, and Nicolas Le Nov`ere.“An allosteric

model of calmodulin explains differential activation of PP2B and CaMKII”.

In: Proc. Natl. Acad. Sci. U. S. A. 105.31 (2008), pp. 10768–73. doi: 10.1073/

pnas.0804672105.

158 [187] Melanie I Stefan, Stuart J Edelstein, and Nicolas Le Nov`ere.“Computing

phenomenologic Adair-Klotz constants from microscopic MWC parameters”.

In: BMC Systems Biology 3.1 (2009), p. 68.

[188] Johannes Stigler and Matthias Rief. “Calcium-dependent folding of single

calmodulin molecules”. In: Proc. Natl. Acad. Sci. U. S. A. 109.44 (2012),

pp. 17814–17819.

[189] James T Stull. “Ca2+-dependent cell signaling through calmodulin-activated

protein phosphatase and protein kinases minireview series”. In: Journal of

Biological Chemistry 276.4 (2001), pp. 2311–2312.

[190] Joanna F Swain and Lila M Gierasch. “The changing landscape of protein

allostery”. In: Current Opinion in Structural Biology 16.1 (2006), pp. 102–

108.

[191] Fumiko Takagi, Nobuyasu Koga, and Shoji Takada. “How protein thermody-

namics and folding mechanisms are altered by the chaperonin cage: molecu-

lar simulations”. In: Proceedings of the National Academy of Sciences 100.20

(2003), pp. 11367–11372.

[192] Hiroshi Taketomi, Yuzo Ueda, and Nobuhiro G¯o.“Studies on protein folding,

unfolding and fluctuations by computer simulation”. In: International journal

of peptide and protein research 7.6 (1975), pp. 445–459.

[193] F. Tama and Y.H. Sanejouand. “Conformational change of proteins arising

from normal mode calculations”. In: Protein Engineering Design and Selection

14.1 (2001), pp. 1–6.

159 [194] Florence Tama, Osamu Miyashita, and Charles L Brooks. “Flexible multi-

scale fitting of atomic structures into low-resolution electron density maps

with elastic network normal mode analysis”. In: Journal of molecular biology

337.4 (2004), pp. 985–999.

[195] Tirion. “Large Amplitude Elastic Motions in Proteins from a Single-Parameter,

Atomic Analysis”. In: Phys. Rev. Lett. 77.9 (1996), pp. 1905–1908.

[196] Nico Tjandra et al. “Rotational Dynamics of Calcium-Free Calmodulin Studied

by 15N-NMR Relaxation Measurements”. In: European Journal of Biochem-

istry 230.3 (1995), pp. 1014–1024.

[197] Swarnendu Tripathi and John J Portman. “Allostery and Folding of the N-

terminal Receiver Domain of Protein NtrC”. In: The Journal of Physical

Chemistry B 117.42 (2013), pp. 13182–13193.

[198] Swarnendu Tripathi and John J Portman. “Conformational flexibility and the

mechanisms of allosteric transitions in topologically similar proteins”. In: J.

Chem. Phys. 135.7 (2011), p. 075104. doi: 10.1063/1.3625636.

[199] Swarnendu Tripathi and John J Portman. “Inherent flexibility and protein

function: The open/closed conformational transition in the N-terminal domain

of calmodulin”. In: J. Chem. Phys. 128.20 (2008), p. 205104. doi: 10.1063/

1.2928634.

[200] Swarnendu Tripathi and John J Portman. “Inherent flexibility determines the

transition mechanisms of the EF-hands of calmodulin”. In: Proc. Natl. Acad.

Sci. U. S. A. 106.7 (2009), pp. 2104–9. doi: 10.1073/pnas.0806872106.

160 [201] Swarnendu Tripathi et al. “Conformational frustration in calmodulin–target

recognition”. In: Journal of Molecular Recognition 28.2 (2015), pp. 74–86.

[202] C J Tsai et al. “Folding funnels, binding funnels, and protein function”. In:

Protein Sci. 8.6 (1999), pp. 1181–90. doi: 10.1110/ps.8.6.1181.

[203] Chung-Jung Tsai, Antonio Del Sol, and Ruth Nussinov. “Protein allostery,

signal transmission and dynamics: a classification scheme of allosteric mecha-

nisms”. In: Mol. Biosyst. 5.3 (2009), pp. 207–16. doi: 10.1039/b819720b.

[204] T N Tsalkova and P L Privalov. “Thermodynamic study of domain organiza-

tion in troponin C and calmodulin”. In: J. Mol. Biol. 181.4 (1985), pp. 533–

44.

[205] Adrian Gustavo Turjanski et al. “Binding-induced folding of a natively un-

structured transcription factor”. In: PLoS Comput. Biol. 4.4 (2008), e1000060.

doi: 10.1371/journal.pcbi.1000060.

[206] Wendy S VanScyoc et al. “Calcium binding to calmodulin mutants monitored

by domain-specific intrinsic phenylalanine and tyrosine fluorescence”. In: Bio-

phys. J. 83.5 (2002), pp. 2767–80. doi: 10.1016/S0006-3495(02)75286-7.

[207] D Vigil et al. “Functional dynamics of the hydrophobic cleft in the N-domain of

calmodulin”. In: Biophys. J. 80.5 (2001), pp. 2082–92. doi: 10.1016/S0006-

3495(01)76182-6.

[208] Hans J Vogel. “Calmodulin: a versatile calcium mediator protein”. In: Bio-

chemistry and cell biology 72.9-10 (1994), pp. 357–376.

161 [209] B F Volkman et al. “Two-state allosteric behavior in a single-domain signaling

protein”. In: Science 291.5512 (2001), pp. 2429–33. doi: 10.1126/science.

291.5512.2429.

[210] Yvonne Waltersson et al. “Mutational effects on the cooperativity of calcium

binding in calmodulin”. In: Biochemistry 32.31 (1993), pp. 7866–7871.

[211] Chih Lueh A Wang. “A note on Ca2+ binding to calmodulin”. In: Biochemical

and Biophysical Research Communications 130.1 (1985), pp. 426–430.

[212] Jin Wang, Qiang Lu, and H Peter Lu. “Single-molecule dynamics reveals co-

operative binding-folding in protein recognition”. In: PLoS Comput Biol 2.7

(2006), e78.

[213] Scott J Weiner et al. “An all atom force field for simulations of proteins and

nucleic acids”. In: Journal of computational chemistry 7.2 (1986), pp. 230–252.

[214] Jonathan S Weissman et al. “Characterization of the active intermediate of a

GroEL–GroES-mediated protein folding reaction”. In: Cell 84.3 (1996), pp. 481–

490.

[215] P.C. Whitford et al. “Conformational transitions of adenylate kinase: switching

by cracking”. In: J. Mol. Biol. 366.5 (2007), pp. 1661–1671.

[216] PG Wolynes, Z Luthey-Schulten, and JN Onuchic. “Fast-folding eriments and

the topography of protein folding energy landscapes”. In: Chemistry & biology

3.6 (1996), pp. 425–432.

[217] James O Wrabl et al. “The role of protein conformational fluctuations in

allostery, function, and evolution”. In: Biophysical chemistry 159.1 (2011),

pp. 129–141.

162 [218] Jeffries Wyman and Stanley J Gill. Binding and linkage: functional chemistry

of biological macromolecules. University Science Books, 1990.

[219] Lei Yang, Guang Song, and Robert L Jernigan. “How well can we understand

large-scale protein motions using normal modes of elastic network models?”

In: Biophys. J. 93.3 (2007), pp. 920–9. doi: 10.1529/biophysj.106.095927.

[220] Sichun Yang and BenoˆıtRoux. “Src kinase conformational activation: thermo-

dynamics, pathways, and mechanisms”. In: PLoS Comput. Biol. 4.3 (2008),

e1000047. doi: 10.1371/journal.pcbi.1000047.

[221] Yiming Ye et al. “Probing site-specific calmodulin calcium and lanthanide

affinity by grafting”. In: J. Am. Chem. Soc. 127.11 (2005), pp. 3743–3750.

[222] Pengzhi Zhang et al. “Opposing Intermolecular Tuning of Ca 2+ Affinity for

Calmodulin by Neurogranin and CaMKII Peptides”. In: Biophysical Journal

112.6 (2017), pp. 1105–1119.

[223] Zhuqing Zhang and Hue Sun Chan. “Competition between native topology

and nonnative interactions in simple and complex folding kinetics of natural

and designed proteins”. In: Proceedings of the National Academy of Sciences

107.7 (2010), pp. 2920–2925.

[224] W. Zheng and B.R. Brooks. “Normal-modes-based prediction of protein con-

formational changes guided by distance constraints”. In: Biophys. J. 88.5

(2005), pp. 3109–3117.

[225] Wenjun Zheng, Bernard R Brooks, and Gerhard Hummer. “Protein confor-

mational transitions explored by mixed elastic network models”. In: Proteins

69.1 (2007), pp. 43–57. doi: 10.1002/prot.21465.

163 [226] Huan-Xiang Zhou and Attila Szabo. “Theory and simulation of stochastically-

gated diffusion-influenced reactions”. In: The Journal of Physical Chemistry

100.7 (1996), pp. 2597–2604.

[227] Pavel I Zhuravlev and Garegin A Papoian. “Protein functional landscapes, dy-

namics, allostery: a tortuous path towards a universal theoretical framework”.

In: Quarterly Reviews of Biophysics 43.03 (2010), pp. 295–332.

[228] D.M. Zuckerman. “Simulation of an ensemble of conformational transitions in

a united-residue model of calmodulin”. In: J.Phys. Chem. B 108.16 (2004),

pp. 5127–5137.

[229] Guanghong Zuo, Jun Wang, and Wei Wang. “Folding with downhill behavior

and low cooperativity of proteins”. In: PROTEINS: Structure, Function, and

Bioinformatics 63.1 (2006), pp. 165–173.

164