COOPERATIVE ALLOSTERIC LIGAND BINDING IN CALMODULIN
A dissertation submitted to Kent State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy
by
Prithviraj Nandigrami
December, 2017
c Copyright
All rights reserved
Except for previously published materials Dissertation written by
Prithviraj Nandigrami
B.Sc., University of Calcutta, 2006
M.Sc., Indian Institute of Technology Bombay, 2008
M.S., Brandeis University, 2010
Ph.D., Kent State University, 2017
Approved by
, Chair, Doctoral Dissertation Committee Dr. John J. Portman
, Members, Doctoral Dissertation Committee Dr. Hamza Balci
, Dr. Bj¨ornL¨ussem
, Dr. Robin Selinger
, Dr. Qi-Huo Wei
Accepted by
, Chair, Department of Physics Dr. James T. Gleeson
, Dean, College of Arts and Sciences Dr. James L. Blank Table of Contents
List of Figures ...... vi
List of Tables ...... xviii
Acknowledgments ...... xix
1 Introduction ...... 1
1.1 Overview ...... 1
1.2 Protein Structure ...... 2
1.3 Energy Landscape Theory ...... 4
1.4 Mechanism of Allostery in Proteins ...... 9
1.5 Calmodulin ...... 12
1.6 Organization of Dissertation ...... 15
2 Models of cooperative ligand binding ...... 18
2.1 Introduction ...... 18
2.2 The Hill Equation ...... 19
2.3 Cooperativity for binding two ligands ...... 21
2.4 Other models of cooperativity ...... 25
2.5 The MWC Model ...... 27
2.6 Generalized ensemble view of allostery ...... 31
2.7 Allostery in calmodulin ...... 33
iii 3 Coarse-grained models for conformational dynamics in proteins . 35
3.1 Introduction ...... 35
3.2 Native structure-based models for proteins ...... 36
3.3 An analytic model for CaM ...... 39
3.4 Simulation model for CaM ...... 42
3.4.1 Model for conformational transition ...... 44
3.4.2 Model for Ca2+-binding ...... 45
4 Comparing allosteric transitions in the domains of calmodulin
through coarse-grained simulations ...... 49
Abstract ...... 49
4.1 Introduction ...... 50
4.2 Methods ...... 53
4.3 Conformational Transitions of Isolated Domains ...... 60
4.4 Transition Kinetics ...... 67
4.5 Discussion ...... 68
5 Coarse-grained molecular simulations of allosteric cooperativity . 71
Abstract ...... 71
5.1 Introduction ...... 72
5.2 Methods ...... 74
5.3 Simulations of Binding a Single Ligand ...... 79
5.4 Simulations of Binding Two Ligands ...... 84
5.5 Binding Cooperativity ...... 86
5.6 Molecular Description of Ligand Binding ...... 90
iv 5.7 Concluding Remarks ...... 93
6 Thermodynamic and kinetic representations of cooperative allosteric
ligand binding in intact calmodulin ...... 95
6.1 Introduction ...... 95
6.2 Methods ...... 97
6.3 Binding Thermodynamics ...... 103
6.4 Binding Kinetics ...... 110
6.5 Concluding Remarks ...... 117
7 Outlook and future directions ...... 119
Appendices ...... 124
A Supplement for Chapter 4 ...... 125
A.1 Simulated probability of contact formation ...... 125
B Supplement for Chapter 5 ...... 127
B.1 Ligand-mediated contact pair distribution ...... 127
B.2 One-dimensional simulated free energy ...... 131
B.3 Exploring ligand contact strength and range ...... 132
v List of Figures
1.2.1 Different levels of hierarchy in protein structure. (A) A protein’s pri-
mary structure consists of its amino acid sequence. (B) Secondary
structure consists of organization of helices and sheets (only helix is
shown in the figure). (C) Tertiary structure is illustrated by one of the
four polypeptide chains (subunits) of hemoglobin. Here, N-terminal
represents amino terminus and C-terminal represents carboxyl termi-
nus of the polypeptide chain. (D) Quaternary structure is shown as
the arrangement of multiple polypeptide chains to form the functional
hemoglobin molecule. Adapted from Branden and Tooze (1991)[20]. . 3
1.3.1 Funnel-shaped protein folding landscape. Folding occurs through the
progressive organization of ensembles of structures on a free energy
landscape...... 6
1.3.2 Schematic representation of protein functional motion at the bottom
of the folding funnel. At the bottom of the funnel, protein dynamics
is sensitive to modulation by binding of ligand. The resulting energy
landscape upon binding of a ligand involves redistribution of popula-
tion. Also shown is the stabilization of a low energy conformer upon
binding a ligand...... 8
vi 1.4.1 Schematic representations of macroscopic and microscopic allosteric
binding. (A) In the macroscopic point of view, binding of a ligand sta-
bilizes the ligand-bound ensemble of conformations. (B) Microscopic
picture is more complex with rates determined by the relative stabi-
lization of conformations in the unbound and bound ensembles. . . . 11
1.5.1 Structure of calmodulin in different forms. (A) Ca2+-free structure of
calmodulin (PDB id: 1CFD), and (B) Ca2+-bound structure of calmod-
ulin (PDB id: 1CLL). Upon binding 4 Ca2+-ions to its binding loops,
calmodulin undergoes a large structural change that exposes its hy-
drophobic surface, and thereby enabling calmodulin to bind to target
proteins. (C) NMR structure of calmodulin bound to smooth muscle
myosin light chain kinase (PDB id: 2KOF). The calcium ions are shown
as silver circles. This visualization was made using Visual Molecular
Dynamics (VMD) software[85]...... 14
2.2.1 Schematic representation of Hill equation given by Eq. 2.1 for different
values of the Hill coefficient, nH. The sharpness of the curve increases
for higher values of nH. The x-axis represents ligand concentration,
[X], and y-axis shows fractional occupancy, Y ...... 20
vii 2.3.1 Thermodynamic cycle showing microscopic schematic representation
for binding two ligands of heterogeneous strength. Starting from unli-
gated state ligand binding stabilizes the state with both sites occupied
by ligand. The two routes via which the binding process proceeds
consists of partially bound states where a single site is occupied by a
ligand, while the other site is empty. The final state in the cycle is the
state with both binding sites occupied by ligand...... 22
2.3.2 Dependence of bound probability on the parameter c for a protein with
two binding sites of homogeneous strength. The x-axis represents lig-
and concentration and y-axis represents probability of states with both
sites occupied by ligand (left), and probability of states where only one
site is occupied by ligand (right). Non-cooperative binding corresponds
to c = 1. The binding curve becomes increasingly cooperative and the
peak of population of singly ligated states decreases for higher values
of c...... 23
2.5.1 Schematic representation of shift in population in MWC model. In the
unbound ensemble of conformations the ligand-free closed state has
higher stability in the free energy landscape and it exists in dynamic
equilibrium with the open state. Upon ligand binding, the ligand-
bound open state is stabilized. The x-axis represents reaction coordi-
nate that is used to define the closed and open states of the protein,
and the y-axis represents free energy (in arbitrary units). Also shown
are representative protein structures in the unbound and ligand-bound
ensemble of conformations...... 28
viii 2.5.2 States and corresponding statistical weights for a simplified description
of MWC model of allostery for a protein with two ligand binding sites.
In the unligated ensemble of conformations, the relative stability of the
open and closed states is set by the parameter . The singly-ligated
ensemble of conformations consists of a ligand bound to either the
closed state or the open state. The fully-loaded ensemble comprises of
both ligand bound to either the closed state or the open state of the
protein...... 30
2.6.1 Schematic of ensemble description of models of allosteric cooperativity
for a protein with two subunits. (A) The MWC model of allostery, (B)
KNF model, (C) A more generalized ensemble allosteric model that ac-
commodates all possible microstates of a protein with two binding sites.
Green shaded regions correspond to subunit interaction energy. The
two different shapes correspond to the closed and open conformations
of the protein. Colored shapes correspond to ligand unbound and un-
colored shapes correspond to ligand bound conformations. Blue shaded
regions show the ensemble of states for each framework. Adapted from
the framework developed by Hilser and co-workers[81]...... 32
3.3.1 Distribution of strain energy of residues in CaM domain. (a) and
(b) Change in strain energy for individual residues along the apo → holo structural change of nCaM and cCaM. (c) and (d) Residue strain
energy distributions at an intermediate state for nCaM and cCaM,
respectively. Adapted from the work by Tripathi and Portman[198]. . 43
ix 4.2.1 Aligned structures Ca2+-free (closed/apo) and Ca2+-bound (open/holo)
native conformations for (a) N-terminal domain and (b) C-terminal do-
main of Calmodulin. The closed state (pdb: 1cfd [103] ) is shown in
blue, and the open state (pdb: 1cll[34]) is shown in green. The closed
(apo) and open (holo) conformations of (a) nCaM (residue index 4–75)
consist of helices A, B and C, D with binding loops I and II respectively.
The closed (apo) and open (holo) conformations of (b) cCaM (residue
index 76–147) consist of helices E, F and G, H with binding loops III
and IV respectively. Secondary structure legend for nCaM and cCaM
are shown on top of the protein structures. The CaM structures were
made using visual molecular dynamics[85]...... 54
4.2.2 Simulated free energy (in units of kBT ) as a function of the global
progress coordinate ∆Q = Q Q for (a) nCaM and (b) cCaM. 57 closed − open 4.2.3 Heat capacity as a function of temperature for cCaM (red) and nCaM
(blue) for two relative stabilities of the open and closed basins. The
solid curves correspond to equally stable open and closed basins, and
in the dashed curves the open state occupies approximately 10% of the
total population...... 59
4.3.1 Free energy, in units of kBT , projected onto global order parameters
Qopen, Qclosed, and Q∩ for nCaM (a and c) and cCaM (b and d). The
intermediate in free energy surface of cCaM corresponds to an ensemble
of states with intact secondary structure but lacking stable tertiary
contacts...... 61
x 4.3.2 Local order parameter, q∩(i), plotted as a function of the global progress
coordinate, Q Q , for each residue of (a) nCaM and (b) cCaM. closed − open The color represents the probability of each residue forming native con-
tacts common to both the open and closed structures: low probability
is shown by red and high probability is shown by blue...... 63
4.3.3 Magnitude of the root mean square fluctuations for each residue for the
conformational ensembles along the transition pathway for (a) nCaM
and (b) cCaM. Each color corresponds to the value of ∆Q = Q closed −
Qopen indicated in the legend in (a)...... 64
4.3.4 Free energy, in units of kBT , projected onto global order parameters
Qclosed and Qopen for nCaM (a and c) and cCaM (b and d) with temper-
? ? ature Tsim = 0.89TF and Tsim = 1.08TF. At lower temperatures, the un- folded conformations are destabilized so that the transition mechanism
in both domains becomes more two-state. At higher temperatures, the
unfolded states are stabilized for both nCaM and cCaM...... 66
5.3.1 Simulated binding curves for the individual loops of (A) nCaM and (B)
cCaM. Lines are fits to the two state MWC model given by Eq. 5.6. (C)
Simulated mean number of bound ligands occupancy of binding sites
with two ligands for nCaM (blue) and cCaM (red) as a function of
ligand concentration. The solid lines plot n (µ) = pA0(µ) + p0B(µ) + h b i b b AB 2pb (µ) with probabilities given from the MWC model evaluated with the binding parameters found from fits of binding to each individual
loops...... 80
xi 5.3.2 Simulated free energy as a function of Qclosed and Qopen for binding
loops at ligand concentration c = Kd for nCaM (A,B) and cCaM (C,D).
The set of native contacts in the open and closed conformations are
separated into three groups: those that occur exclusively in either the
open or the closed native structures, and those that are common to
both states. For each of these groups, a local order parameter, qα(i), is
defined as the fraction of native contacts involving the ith residue. For
each of these groups, denoted by α = (open, closed, and ), a local ∩
order parameter, qα(i), is defined as the fraction of native contacts
th formed involving the i residue. Local order parameters qopen(i) and
th qclosed(i) are defined as the fraction of native contacts involving the i
residue that occur exclusively in the open and closed native structures,
respectively. Overall native similarity is monitored by a corresponding
global order parameter, Q = q (i) , where the average is taken over α h α i
the residues of the protein. Global order parameters, Qopen and Qclosed,
are the average of the corresponding local order parameters over the
residues of the protein. The open state ensemble are conformations
with 0.18 Q 0.35 and 0.55 Q 0.75...... 81 ≤ closed ≤ ≤ open ≤
5.3.3 Simultaneous fits of simulation data for a single ligand to pb(µ) and
po(µ) for individual binding loops. Solid curves are plots of pb(µ) and
po(µ) with c and o determined by a simultaneous fit to the simulation
data (shown as points)...... 82
xii A0 0B AB 5.4.1 Populations of ligation states pb (µ) (blue), pb (µ) (green), and pb (µ) (red) plotted as a function of Ca2+concentration for nCaM (top) and
cCaM (bottom). Simulation data shown as points. Solid curves plot
Eq. (5.9–5.11) from the MWC model. Dotted curves show plots of
the non-cooperative induced fit model of binding to independent sites
described by the partition function given in Eq. 5.13. Note some data
points are skipped for clarity...... 87
5.5.1 Thermodynamic cycle for binding two ligands...... 88
5.6.1 Simulated root mean square fluctuations (rmsf) for each residue for
nCaM (top) and cCaM (bottom) calculated at different ligand concen-
trations: high ligand concentration gives the fully saturated ensemble
(blue curve), low ligand concentration gives the unligated ensemble
(red curve), and at Kd (black curve). The rmsf curves are calculated for
each ensemble after aligning to the open native conformation. (Align-
ing to the closed conformation give similar curves.) Also shown is the
reference fluctuations given in Eq. 5.18 (green curve)...... 92
6.2.1 Heat capacity as a function of temperature for N-domain and C-domain
of CaM when the open (holo) state is destabilized to a relative pop-
ulation of 10%. Heat capacity is calculated using the WHAM[105] ≈ method...... 99
6.2.2 Binding and unbinding rates for loop I of N-domain of CaM as a func-
tion of concentration using the non-symmetric Monte Carlo simulation
scheme. The x-axis and y-axis represent concentration and rate, re-
spectively. Loops II, III, and IV show similar behavior (not shown). . 102
xiii 6.2.3 Free energy contours for the N-domain and C-domain in intact CaM.
N-domain shows a dominant two-state transition, while C-domain pop-
ulates an intermediate state showing a three-state transition behavior. 103
6.3.1 Fractional occupancy to the loops of intact domain (red), N-domain
(blue) and C-domain (green). The solid lines represent theoretical
binding curve calculated using the MWC model using the parameters,
c and o, obtained from simulations...... 105
6.3.2 Population of simulated ligation states as a function of concentration.
The points represent simulated data and the solid lines show popu-
lations calculated using the MWC model. Colors represent different
ligation states defined by whether a loop is occupied or empty. . . . . 106
6.3.3 Population of singly, doubly, triply, and fully-loaded ligation states
as a function of concentration for (top) heterogeneous binding loops
and (bottom) homogeneous binding loops. The homogeneous binding
strength is set to be the average values of binding free energies to the
closed and open states of the four binding loops...... 108
6.3.4 Total free energy (in kcal/mol) of simulated ligation states of intact
CaM. The blue lines represent free energy contribution for ligand bind-
ing to individual loops and the red lines show free energy stabilization
due to multi-body cooperative interactions...... 109
xiv 6.4.1 Full flux network for binding transitions showing all possible transitions
at c = Kd (top) and at c > Kd (bottom). The ensemble of states are
grouped by their degree of ligation. The vertical axis represents the
total free energy of stabilization. The width of the arrows represent
the amount of flux through a pair of ligation states. Red represents a
higher probability to bind and green represents a higher probability to
unbind...... 114
6.4.2 Simulated unbinding (left) and binding (right) flux networks at c = Kd.
The ligation states are placed according to their total stabilization.
The width of the arrows represent the amount of flux through a pair
of states. For binding transitions, red represents a higher probability
to bind and green represents a higher probability to unbind. For un-
binding transitions, red represents a higher probability to unbind and
green represents a higher probability to bind...... 116
6.4.3 Simulated flux networks for unbinding transitions (left) at c < Kd
and for binding transitions (right) at c > Kd. The ligation states are
placed according to their total stabilization. The width of the arrows
represent the amount of flux through a pair of states. For binding
transitions, red represents a higher probability to bind and green rep-
resents a higher probability to unbind. For unbinding transitions, red
represents a higher probability to unbind and green represents a higher
probability to bind...... 117
xv A.1.1Simulated contact map for nCaM and cCaM showing the probabil-
ity of formation of contacts between secondary structure elements of
nCaM for the ensemble of conformations in the transition state (A),
and cCaM in the ensemble of conformations in the intermediate state
(B). nCaM shows limited loss of contacts in the transition state en-
semble, while the intermediate of cCaM involves several regions of low
contact probability (highlighted in pink). Color represents the proba-
bility of contact formation. Secondary structures of nCaM and cCaM
are shown along x and y-axis...... 126
B.1.1Ligand-mediated contact pair distribution in the ensemble of open and
closed states for loop I in nCaM. The x-axis represents contact distance
(in A˚ ) and y-axis represents normalized count. The ligand-mediated
contact pairs are (A) 17 — 27, (B) 18 — 22, (C) 19 — 27, (D) 21 —
27, (E) 22 — 28...... 127
B.1.2Ligand-mediated contact pair distribution in the ensemble of open
and closed states for loop II in nCaM. The x-axis represents contact
distance (in A˚ ) and y-axis represents normalized count. The ligand-
mediated contact pairs are (A) 54 — 60, (B) 54 — 61, (C) 55 — 60,
(D) 56 — 60, (E) 56 — 61...... 128
B.1.3Ligand-mediated contact pair distribution in the ensemble of open
and closed states for loop III in cCaM. The x-axis represents contact
distance (in A˚ ) and y-axis represents normalized count. The ligand-
mediated contact pairs are (A) 18 — 27, (B) 18 — 28, (C) 19 — 26,
(D) 19 — 27, (E) 19 — 28, (F) 20 — 27, (G) 20 — 28, (H) 21 — 26. 129
xvi B.1.4Ligand-mediated contact pair distribution in the ensemble of open
and closed states for loop IV in cCaM. The x-axis represents contact
distance (in A˚ ) and y-axis represents normalized count. The ligand-
mediated contact pairs are (A) 54 — 64, (B) 55 — 59, (C) 56 — 64,
(D) 59 — 65, (E) 60 — 64...... 130
B.2.1Simulated free energy for nCaM (A,B,C) and cCaM (D,E,F) corre-
sponding to the ensemble of unligated (top), singly ligated (middle)
and fully saturated (bottom) conformations. The x-axis represents
simulated progress coordinate ∆Q = Q Q and the y-axis closed − open
represents simulated free energy in units of kBT ...... 131
B.3.1Simulated binding curves for loop I of nCaM with varying clig (left)
and σij (right). Consistent behavior is observed for other binding loops
(data not shown)...... 132
xvii List of Tables
5.3.1 Number of ligand-mediated contacts, dissociation constants, and bind-
ing free energies for the loops of CaM...... 79
5.5.1 Simulated microscopic and macroscopic equilibrium constants . . . . 89
6.3.1 Number of ligand-mediated contacts, and binding free energies for the
loops of CaM...... 104
xviii Acknowledgments
I owe my gratitude towards my family members for all their help and support through- out my graduate career. In particular, I would like to mention my mother, Kanika
Nandigrami, and father, Debiprasad Nandigrami, for all their encouragement over the years. My sister, Rupa Nandigrami, has also been a constant source of love, support and inspiration. The fulfillment of my graduate career would not have been possible without my wife, Srijani Chatterjee. She has been a vital source of love, care, support, concern, enthusiasm, and strength during the final phase of my graduate career.
I would like to express my deepest gratitude towards my esteemed advisor, Dr.
John J. Portman. I came to Kent State University with limited knowledge of the fascinating area of biological physics. Dr. Portman introduced me to this area of research and guided me through the thinking process that goes behind the setup of a problem. He taught me scientific writing and provided me with many opportunities to explore, critically analyze, and solve a problem. He was readily available to an- swer questions and provided perceptive and concrete feedback. I am indebted to Dr.
Portman for providing me many opportunities to grow as a scientist.
I am indebted to Dr. Robin Selinger for offering Computational Materials Science course that was crucial for me to get the confidence to learn programming. I sincerely thank her for being a constant source of motivation and positive thinking. I am grateful to Dr. Selinger for providing me the opportunity to work in her group for a summer semester.
xix I would like to mention other faculty members with whom I have had the oppor- tunity to interact. I am grateful to Dr. Declan Keane who motivated me to apply for graduate studies in the Physics department at Kent State. He made the whole transfer process from Brandeis University to Kent State University seamless. I am grateful to Dr. Keane for being my academic advisor and for all his helpful advice.
My special gratitude goes to Kelly Conley, Constance Reho, and other office staffs in the Physics department. I would like to mention Academic Laboratory Manager,
Mr. Greg Putman, for providing a great teaching experience during the semesters in which I was a teaching assistant.
I appreciate the help of my colleagues, Talant Ruzmetov and Daniel Gavazzi, for their time and valuable suggestions.
I would like to acknowledge financial support from the National Science Founda- tion for part of the research presented in this dissertation.
Finally, despite all the assistance provided by Dr. Portman and others, I alone remain responsible for the content of the material in this dissertation, including any errors or omissions which may unwittingly remain.
xx Chapter 1
Introduction
1.1 Overview
Protein molecules are involved in nearly all cellular functions that make life possible.
A typical cell contains approximately 106 types of protein molecules, each with a distinct essential function. Digestive enzymes (amylase, lipase, pepsin), for example, break down nutrients in food for absorption; transport proteins, such as hemoglobin, acts as a carrier of oxygen in human body; structural proteins compose cytoskeleton; hormone signaling proteins coordinate the activity of different body systems; anti- bodies protect the body from foreign pathogens; and proteins, such as myosin, are responsible for muscle contraction.
Many protein molecules are able to perform specific functions through their ability to bind other molecules with high specificity and exquisite control. Protein ligand interactions provide a ubiquitous strategy that enables nearly all biological processes.
Well-designed binding surfaces have evolved to stabilize protein-ligand complexes that are tuned to the specific functional needs of the cell. Thermodynamic stability, char- acterized by binding equilibrium constants, is one essential property of protein-ligand related functions. Often, protein function requires the ability to switch from ligand unbound population to ligand bound population (and vice-versa) with precise control in response to changes in environment. One of the most important ways proteins enhance sensitivity is by allosteric cooperativity which is often accomplished through a conformational change of the protein.
1 X-ray crystallography measurements provide a picture of a protein as a unique,
static three-dimensional folded structure. Although a protein’s folded structure is
essential to its function, the key to understanding protein-ligand interactions often
lies in its conformational dynamics[60, 65]. Conformational flexibility and dynamics
can even determine its functional specificity[104, 71, 203, 18, 227, 217, 81, 137]. The
dynamic nature of the ensemble of protein conformations is of particular importance
in understanding protein function, activation, and its sensitivity to local external
changes in environment such as ligand concentration. Understanding and quantify-
ing how a protein’s conformational dynamics determines thermodynamic and kinetic
aspects of ligand binding is a central focus of this Dissertation.
Next, I briefly describe protein structures at different levels of its structural hi-
erarchy. The structural classification at various levels arises due to complexity of
molecular level interactions of different structural elements that constitute a protein
molecule.
1.2 Protein Structure
Proteins are polymers composed of a sequence of amino acids. Each of the twenty amino acids consist of a central carbon atom, known as the α carbon, which is con-
nected via a bond to the amino group, a carboxyl group, and a hydrogen atom. A
protein’s structure can be summarized at different levels of a hierarchy: “primary”,
“secondary”, “tertiary”, and “quaternary” structures, as illustrated in Fig. 1.2.1.
The primary structure of a protein, its sequence, is the simplest level of protein struc-
ture. The secondary structure of a protein refers to the local folded structure that
forms within a polypeptide chain. These folded regions typically form due to inter-
actions between monomers through hydrogen bonds for example. Most commonly
2 occurring secondary structure elements in a protein molecule are the α helix and
β sheet first described by Pauling in Ref. [155]. The tertiary structure of a pro- tein is its three-dimensional structure. The tertiary structure is stabilized through non-covalent bonds, including hydrogen bonding, ionic bonding, dipole-dipole inter- actions, and hydrophobic forces. The next level of complexity in hierarchy is the quaternary structure which involves particular spatial arrangement and interactions between two or more polypeptide chains. Quaternary structure essentially describes how different polypeptide chains are assembled into complexes. Not all proteins have
(A) (B) (C) (D)
Sequence of amino acids Helix Long-range interaction Arrangement of multiple of subunits polypeptide chains
Figure 1.2.1: Different levels of hierarchy in protein structure. (A) A protein’s pri- mary structure consists of its amino acid sequence. (B) Secondary structure con- sists of organization of helices and sheets (only helix is shown in the figure). (C) Tertiary structure is illustrated by one of the four polypeptide chains (subunits) of hemoglobin. Here, N-terminal represents amino terminus and C-terminal represents carboxyl terminus of the polypeptide chain. (D) Quaternary structure is shown as the arrangement of multiple polypeptide chains to form the functional hemoglobin molecule. Adapted from Branden and Tooze (1991)[20]. quaternary structure. For example, proteins that exists as single chains, have no quaternary structure. High resolution protein structures are often essential to ratio- nalizing detailed molecular mechanisms that control protein function.
3 1.3 Energy Landscape Theory
In the early 1960s, experiments carried out by Anfinsen demonstrated that a protein’s unique three-dimensional structure is determined solely by its specific amino acid sequence[5, 4]. That is, no other cellular apparatus such as chaperones is needed to help a protein fold. This observation is often summarized as “sequence determines structure”. Because a large number of three-dimensional conformations is available to a typical polypeptide chain, it would apparently take an enormous amount of time to explore phase space to find a particular conformation. This incompatibility with biological requirements is called the “Levinthal’s paradox”[109] named after the scientist who first articulated the discrepancy. Levinthal estimated that if it takes a picosecond to sample a particular conformation, the timescale to fold for a protein through a random search would be of the order of the age of the universe.
This suggests that proteins have evolved efficient search methods to find the folded conformation from the huge conformational phase space. One proposed resolution to this search problem is that proteins may fold through a series of specific well defined steps. Consequently, researchers tried to identify a single pathway (or small set of pathways) that particular proteins take along their way to folding. One problem with this effort is that it focuses attention on any conformation that lives long enough to characterize experimentally (such as proline isomerization), even if there is weak evidence that those states are essential to the folding mechanism.
A new view emerged in the late 1980s and established in subsequent decade known as the “free energy landscape theory”[25, 27, 108, 151, 30, 21]. This alternative per- spective describes folding mechanism not as a single pathway of essential steps to the native state, but rather as a statistical search through a self-organized ensemble of
4 conformations within the energy landscape. A single pathway proposed early on is in some sense replaced by average properties of a large number of pathways. Per- haps somewhat surprisingly, accommodating the large number of pathways simplifies the description of folding with a small set of average statistical properties, or order parameters.
One way to understand the energy landscape of a protein is to compare it with the landscape of a polymer composed of monomers with random competing inter- actions. Random heteropolymers have rugged landscape with lots of different low energy conformations separated by barriers. Finding a unique structure for a random heteropolymer involves searching through a landscape similar to glassy systems with an improbability to find a particular conformation on a reasonable timescale. Unlike random heteropolymers, proteins have evolved to fold into unique, three-dimensional ordered structures determined by their amino acid sequence. The energy landscape theory proposes that naturally occurring proteins have landscapes shaped by an en- ergetic bias toward the native state. This kind of landscape significantly reduces its conformational search enabling a protein to fold in biologically relevant timescales.
Like random heteropolymers, proteins have some degree of ruggedness due to tran- sition barrier and misfolded states that tend to slow down folding kinetics. The energetic bias to the native state gives rise to a energy landscape in the shape of a rugged funnel (Fig. 1.3.1). This bias is known as the “principle of minimum frus- tration” [25, 27], stating that, on average conformations that are “native-like” have lower energy.
Much theoretical, computational, and experimental evidence over the last two decades support the energy landscape theory of protein folding. On the theoretical
5 side, much insight has come from unfrustrated Hamiltonians called G¯o-models with explicit energetic bias to the native state[192, 67]. G¯o-models are the limit where ruggedness play minimal role and the driving force that arises from strong energetic bias to the native state dominates the folding landscape. Analytic models [160, 159,
Unfolded Entropy
Q |E(Q)|
Native folded
Figure 1.3.1: Funnel-shaped protein folding landscape. Folding occurs through the progressive organization of ensembles of structures on a free energy landscape.
139, 2, 179, 163, 177, 178, 61, 62, 59], as well as simulations [173, 42, 99, 93, 152,
94, 113, 114, 58, 168, 35, 70, 229, 162, 78] based on G¯o-model Hamiltonian have been developed to predict protein folding mechanisms that compares well to experi- mental measurement. Native structure based models were first applied to study fast folding proteins, with sub-milisecond folding times[216, 151], and two-state folding
6 proteins[26, 12]. The success of these models to predict the folding mechanism in proteins encouraged researchers to apply it to other systems such as protein-ligand binding allostery[149, 115], coupled folding and binding[113, 205, 63], and folding cooperativity[229, 78].
Energy landscapes provide a useful framework to understand the intricacies of protein function and activation upon ligand binding[202, 122, 181]. Folded proteins sample a broad spectrum of low free energy native conformations at the bottom of the fun- nel. These low energy conformations are in dynamic equilibrium within the folded ensemble with transition rates controlled by free energy barriers[73]. Conformational
flexibility and dynamics among the thermodynamically accessible states enable a pro- tein to statistically respond to changing environmental conditions. For example, the distribution of states may shift in response to interactions with other proteins, or changes in environmental conditions.
Fig. 1.3.2 shows a schematic representation of modulation of the landscape upon binding a ligand. The landscape at the bottom of the funnel consists of a diverse ensemble of conformations. A ligand may favor a particular conformation due to structural compatibility, thereby stabilizing a previously metastable state within the unbound ensemble. The population of conformations at the bottom of the funnel thereby redistributes in response to ligand binding. That is, a conformational switch occurs from unligated to a ligated state upon ligand binding. The re-distribution of low energy conformations due to stabilization of a particular metastable state is a powerful strategy for allosteric control in proteins[72, 45, 92]. In fact, it has been proposed that allosteric conformational changes are a ubiquitous protein property essential to their role as the primary molecular machines in biology[72].
7 Conformational coordinate
Unbound conformations
Ligand +
Bottom of the folding funnel
Ligand-bound conformation is stabilized
Figure 1.3.2: Schematic representation of protein functional motion at the bottom of the folding funnel. At the bottom of the funnel, protein dynamics is sensitive to modulation by binding of ligand. The resulting energy landscape upon binding of a ligand involves redistribution of population. Also shown is the stabilization of a low energy conformer upon binding a ligand.
8 The dynamic equilibrium between low energy conformations is essential descrip-
tion for this “population shift” (or “conformational selection”) mechanism of allostery.
NMR experiments establish the existence of dynamic equilibrium between meta-stable
conformations of several proteins even in the absence of ligand support this pic-
ture[124, 54]. From a computational point of view, a minimal molecular model to
study ligand binding must be able to accommodate the protein conformational dy-
namics and sample the appropriate conformational ensembles. As a first approxi-
mation, a model must accommodate at least two metastable states connected by a
transition barrier. Furthermore, the transition between the two meta-stable states
should not be discrete. That is, the protein should be able to sample a broad range
of conformations within the two folded basins. One primary focus of the work in this
dissertation is to elucidate the allosteric ligand binding mechanism through simula-
tions of a model in which explicit conformational change of protein is coupled with
ligand binding.
1.4 Mechanism of Allostery in Proteins
The term allostery refers to any mechanism by which proteins communicate the effect of binding an external molecule (such as a ligand) at one site to a distal region of the molecule. This process is prevalent in biology and allows for its regulation of activity[185]. This regulation is often achieved through the control of dynamic equilibrium of the conformational ensemble[44] by a shift in population due to binding of small molecules, changes in temperature, concentration or mutation[117]. Allostery can also enhance the binding specificity of a biological macromolecule at the target site[147].
Several mechanisms exhibit allosteric control. One of the most common is through
9 a protein’s conformational change upon binding of a ligand. The kinetics of allostery
can be seen against the background of two conceptually distinct mechanisms. In
population shift mechanism, a ligand preferentially stabilizes a particular conforma-
tion among the ensemble of unligated conformations. An alternative viewpoint was
introduced by Koshland and coworkers, who postulated that a ligand binds to the
equilibrium ligand-free form with subsequent change in structure to the ligand-bound
form[101]. This binding mechanism is known as the “induced fit” binding.
As shown in Fig. 1.4.1, the binding mechanism of a ligand to a protein can be
described by different levels of resolution. For example, Fig. 1.4.1 (A) shows a “macro-
scopic” description of the ligand-induced conformational change in a protein. This
is a simplified version of the detailed kinetics since it does not describe the confor-
mational change of the protein. The equilibrium between the unbound and bound
conformational ensembles is balanced by the on and off rates with equilibrium con-
stant (Keq)
kon[L] (1.1) Keq = . koff
Here, [L] denotes the concentration of ligand, kon is the bimolecular binding rate,
and koff is the unbinding rate. A more detailed kinetics scheme includes transition
rates for the conformational change. As shown in Fig. 1.4.1 (B), for a protein with
open and closed states, conformational transitions occur with rates kuo and kuc, re-
spectively. The transition can also occur when ligand is bound with rates kbo and
kbc, respectively. The overall binding mechanism can proceed via two distinct routes
as shown in Fig. 1.4.1 (B). For example, ligand can bind weakly to the closed state
c c with forward rate kon[L], and can come off with rate koff . In induced fit binding, the weakly-bound intermediate conformation is stabilized by a conformational change to
10 Ligand [L] Unbound Bound kon
koff (A)
Unbound-Closed Unbound-Open kuo
kuc
Ligand [L] Ligand [L]
o c c k ko kon koff off on
kbo
kbc Bound-Closed Bound-Open (B) Figure 1.4.1: Schematic representations of macroscopic and microscopic allosteric binding. (A) In the macroscopic point of view, binding of a ligand stabilizes the ligand-bound ensemble of conformations. (B) Microscopic picture is more complex with rates determined by the relative stabilization of conformations in the unbound and bound ensembles.
11 the ligand-bound open state. In the other scenario known as conformational selection
(or population shift) mechanism, the conformational change to the unbound open
state occurs before ligand binding followed by subsequent transition to the ligand-
bound open state. Which route dominates for a particular protein and ligand is a
kinetic issue. It is challenging for experimental measurements to resolve binding ki-
netics of intermediate states[47]. Furthermore, focusing on only two basins may not
capture a particular protein’s conformational dynamics which may be considerably
complex.
Extending this simple kinetic scheme to proteins with multiple binding sites is
straightforward, though the microscopic binding mechanism can be rather complex.
In this case, each conformational state of protein can exist in either closed or open
conformation with the possibility of ligand binding to any of the conformational states
sampled by the protein. Hence, the ensembles would involve many more microstates.
Capturing all of the kinetic details within a simplified molecular model is challenging.
My research investigates the interplay between a protein’s conformational dynamics
and its binding properties. In particular, I focus on characterizing allosteric binding
thermodynamics and kinetics of calmodulin, a flexible protein with multiple hetero-
geneous sites.
1.5 Calmodulin
The Ca2+-binding protein calmodulin is a good model system to gain mechanistic insights into allosteric ligand binding properties. I report on binding properties of the individual N-terminal and C-terminal domains of calmodulin, as well as intact calmodulin. In this section, I briefly describe the biological functional properties and the importance of calmodulin as a regulatory protein.
12 Calmodulin (CaM) is a ubiquitous Ca2+-binding protein consisting of two struc-
turally similar, globular domains. The two domains of CaM, N-terminal domain
(nCaM) and C-terminal domain (cCaM), have similar secondary and tertiary struc-
tures. Each domain consists of two helix-loop-helix motifs (the EF-hands)[102] con-
nected by a flexible linker as shown in Fig. 1.5.1. The Ca2+-free structure of CaM
has been resolved by solution NMR experiment[103] and Ca2+-bound structure has
been X-ray crystallography measurements[34].
Although topologically similar, the two domains of CaM have distinct flexibilities,
melting temperatures, and thermodynamic Ca2+-binding properties[204, 116, 184,
127]. In the absence of Ca2+, the C-terminal domain is particularly dynamic[196],
and is less stable than the N-terminal domain in the intact protein as well as sepa-
rate isolated domains[23, 184, 127]. The C-terminal domain, which has a very low
denaturation temperature, is reported to be considerably unfolded under physiolog-
ical temperature[164]. Furthermore, NMR experiments monitoring the open/closed
transition of isolated cCaM have revealed local transient unfolding of helix F during
domain opening[121]. NMR experiments also reveal that the linker region connecting
the two globular domains of CaM is highly flexible even when CaM is not bound to
its target proteins[39].
The Ca2+-free “apo”-form and Ca2+-bound “holo” forms of calmodulin, as well
as calmodulin bound to a target peptide are shown in Fig. 1.5.1. On binding Ca2+- ions, CaM undergoes a conformational change exposing hydrophobic residues in each domain. The overall structure of calmodulin has a dumbbell shape which enables it to bind to target proteins with both domains like two hands holding a stick. The high flexibility of the domains allows CaM to adapt to bind different targets that play
13 (A) (B)
(C) Figure 1.5.1: Structure of calmodulin in different forms. (A) Ca2+-free structure of calmodulin (PDB id: 1CFD), and (B) Ca2+-bound structure of calmodulin (PDB id: 1CLL). Upon binding 4 Ca2+-ions to its binding loops, calmodulin undergoes a large structural change that exposes its hydrophobic surface, and thereby enabling calmodulin to bind to target proteins. (C) NMR structure of calmodulin bound to smooth muscle myosin light chain kinase (PDB id: 2KOF). The calcium ions are shown as silver circles. This visualization was made using Visual Molecular Dynamics (VMD) software[85].
14 crucial role in cell signaling, ion transport and cell death[189]. A key reason behind
CaM’s ability to bind to a wide variety of target proteins is non-polar interactions,
through the abundant methionine resides of CaM. The non-polar surface of CaM is
exposed by Ca2+binding process which allows CaM to bind to the non-polar regions
of the target proteins. It is interesting to note that since the non-polar grooves do
not have a specific shape, CaM acts as a versatile regulatory protein and its targets
are not required to possess any specific amino acid sequence or structural binding
motifs[208].
CaM binds to a diverse range of target proteins, and modulates various cellular
processes, such as muscle contraction, metabolism, nerve growth, and immune re-
sponse. One key reason behind CaM’s functional diversity is the flexibility of the
central helix region upon binding Ca2+ions, as well as the flexibility of the domains.
Additionally, binding of CaM to its target proteins involves a greater degree of flexi- bility of the interdomain linker region[143].
CaM’s ability to regulate a wide spectrum of crucial cellular processes has made it a model system for studies spanning last few decades. The Ca2+-induced structural change is a key step in CaM’s activation process. In this dissertation, I aim to give a molecular description of the activation of CaM with Ca2+ions. Coarse-grained simulation of CaM binding to target peptides have been reported recently by Cheung et.al.[201, 222]. In future work, it would be interesting to combine Ca2+activation and target peptide binding.
1.6 Organization of Dissertation
The dissertation is organized as follows:
In Chapter 2, I discuss thermodynamic binding cooperativity in proteins. Several
15 approaches to analyze experimental data are also described. Next, I introduce the
viewpoint of allostery as shift in population in the energy landscape perspective and
describe the framework of Monod-Wyman-Changeux (MWC) model of allostery. I
also discuss the assumption in the MWC model and why it fits our investigation
of allostery in Ca2+-CaM binding. Finally, a more recent viewpoint in terms of
generalized ensemble of allostery is briefly discussed.
In Chapter 3, I first give an overview of the different simulation and analytic
models that have been developed to study conformational dynamics of proteins. Then,
I briefly describe the coarse-grained model used in our simulations.
Next three chapters report our work on allosteric transitions in CaM. In Chapter 4,
I describe the open/closed transitions of isolated N-terminal domain and C-terminal
domain of CaM. Here, I explore how stability influences the transition mechanism
of domain opening. In Chapter 5, I describe results from a model that incorpo-
rates conformational dynamics and ligand binding. Here, I investigate the binding
cooperativity of Ca2+binding to the isolated domains of CaM. The approach used in
our study is a molecular realization of the classic Monod-Wyman-Changeux (MWC)
model of allosteric binding cooperativity. The simulations predict that the binding
strength of the loops of calmodulin are heterogeneous. For binding two Ca2+-ions,
the simulations predict that the two domains of CaM have distinct binding affinity
and cooperativity. I also provide a structural rationalization of the binding free en-
ergies of the loops in the simulated ensemble of unbound and bound conformations.
Chapter 6 focuses on extending our work on isolated CaM domains to investigating
four Ca2+binding to intact CaM. Thermodynamic simulations show that the hetero- geneity in CaM’s binding loops is conserved in the intact protein. Kinetics results
16 predict dominant binding routes within the ensemble of ligand-bound conformations.
17 Chapter 2
Models of cooperative ligand binding
2.1 Introduction
Allosteric cooperative binding is an essential component of many fundamental bio- chemical and physiological functions such as enzymatic activity. An allosteric protein has multiple binding sites in which the binding of ligand at one site influences sub- sequent binding at other sites. In cooperative binding, the stability of the bound complex is greater than the sum of the stabilities of binding to the individual sites.
Cooperativity plays crucial role in cell signaling, transcriptional regulation, and many other processes in cells.
Cooperative binding was first observed in early 1900s by Christian Bohr and coworkers who investigated oxygen binding to hemoglobin[19]. They measured the average saturation of hemoglobin with oxygen as a function of the partial pressure of oxygen. The shape of the sigmoidal binding curve suggests that binding oxy- gen to hemoglobin makes it easier for additional oxygen to bind. Throughout the twentieth century, various frameworks have been developed to describe cooperative binding of a ligand to a protein with multiple binding sites[218]. Early phenomeno- logical models were proposed by Hill, Pauling, Adair, and others. Of these, the classic Monod-Wyman-Changeux (MWC) model[136] and the Koshland-N´emethy-
Filmer (KNF) model[101] of allostery have had lasting influence in the interpretation of cooperativity. Nevertheless, in practice these provide a largely conceptual pic- ture. Gaining structural insight into the fundamental mechanism governing allostery
18 requires molecular understanding of the protein’s conformational dynamics[45]. Elu-
cidating the mechanism of allostery and associated binding cooperativity has enjoyed
renewed interest in recent years in part because of its natural place in landscape the-
ory as well as development of more refined experimental techniques[214, 169, 209, 48,
88].
Many types of proteins exhibit cooperative binding. While oxygen binding to
hemoglobin is perhaps the most famous example, several other molecular assemblies
that exhibit cooperative binding have been characterized in great detail including the
enzymes threonine deaminase[32], aspartate transcarbamylase[64], as well as ligand
gated ion channels like nicotinic acetylcholine receptors[95] and inositol triphosphate
(IP3) receptors[133]. These complexes served as model systems to illustrate cooper-
ativity as a fundamental protein strategy.
The main focus of my research in this dissertation is to characterize and provide
molecular insights into allosteric binding cooperativity of a protein with heteroge-
neous binding affinities. In this chapter, I give a brief overview of early framework
to characterize ligand binding cooperativity to interpret experimental binding titra-
tion curves. I also discuss various alternative interpretations developed to quantify
cooperative binding.
2.2 The Hill Equation
The first quantitative description of cooperative binding to a protein with multiple binding sites was developed by A.V.Hill[77] with a phenomenological equation repre- senting the fractional occupancy
[X]nH (2.1) Y = n , Kd + [X] H
19 where [X] represents ligand concentration, Kd denotes apparent dissociation constant, and nH is known as the Hill coefficient. A common form of the Hill equation convenient for fitting measured binding data is to rewrite Eq. 2.1 as
Y (2.2) log = nH log[X] log(Kd). 1 Y − −
I note that, in this formalism, the cooperativity (represented by the value of nH) is
assumed to be fixed and is independent of the degree of saturation of the protein by
ligand. As shown in Fig. 2.2.1, the average saturation exhibits sigmoidal behavior
1.0
0.8
0.6 Y 0.4
nH = 1 0.2 nH = 2 nH = 3 nH = 4 0.0 0 1 2 3 4 5 ln[X] Figure 2.2.1: Schematic representation of Hill equation given by Eq. 2.1 for different values of the Hill coefficient, nH. The sharpness of the curve increases for higher values of nH. The x-axis represents ligand concentration, [X], and y-axis shows fractional occupancy, Y .
as a function of ligand concentration. The value of concentration at which the value
of average saturation equals 0.5 is known as the dissociation constant (Kd). The
sharpness of the curves increases with increasing values of nH, which characterizes
the degree of cooperativity in ligand binding. Higher values of nH results in sharper
binding curve and decrease in corresponding values of Kd. For positive cooperativity,
20 the Hill coefficient is usually bound by 1 < nH < n, where n is the number of ligands.
A value of Hill coefficient equal to 1 indicates each site binds independently. That is,
binding at one site does not affect the binding at another site. When n n, the H ' model corresponds to very high cooperativity in which the only thermodynamically
relevant species are unligated or fully saturated states.
The Hill coefficient is generally used as a fitting parameter to match measured
binding curves. The physical interpretation of nH can be summarized as the number
of ligand molecules necessary to bind to a receptor for activation of the receptor and
to produce functional effect. This measurement only provides an overall macroscopic
description of the cooperative interaction between the binding sites and does not
provide insight into the microscopic nature of binding. The site-specific microscopic
information, and coupling between binding sites, for example, is not captured by this
description.
2.3 Cooperativity for binding two ligands
Cooperative binding for a protein with two binding sites can be conceptualized by the thermodynamic cycle shown in Fig. 2.3.1. To illustrate the influence of coopera- tivity on the binding curve, I treat the sites to be of uniform strength for simplicity.
The stability of binding a single ligand is denoted by the equilibrium constant, K.
Recruitment of a second ligand has an additional stability due to cooperativity repre- sented by a factor c > 1. The overall stabilization of the bound ensemble is therefore cK2. The partition function can be expressed in terms of ligand concentration and explicit coupling between the sites as
(2.3) Z = 1 + 2K[X] + cK2[X]2 ,
21 A KA A B B
B cAB > 1 cooperative K B cAB = 1 uncooperative K AB
c cAB < 1 anti-cooperative
A A B cAB KA B
Figure 2.3.1: Thermodynamic cycle showing microscopic schematic representation for binding two ligands of heterogeneous strength. Starting from unligated state ligand binding stabilizes the state with both sites occupied by ligand. The two routes via which the binding process proceeds consists of partially bound states where a single site is occupied by a ligand, while the other site is empty. The final state in the cycle is the state with both binding sites occupied by ligand.
where [X] is ligand concentration. The cooperativity parameter c represents the effective coupling between the sites. Nevertheless, the underlying mechanisms need not be specified in this purely phenomenological description.
The equilibrium constant K and the coupling parameter c can be expressed in terms of free energy of stabilization of the ligand-bound ensemble with respect to the unbound ensemble of conformations. In this notation, the partition function given by
Eq. 2.3 can be written as
(2.4) Z = 1 + 2e−βF [X] + e−β(2F +∆F )[X]2 , with F = k T log(K). The additional free energy of stabilization arising due to − B explicit coupling between the binding sites is represented by ∆F = k T log(c). A − B binding transition curve (or titration curve) connects the unbound ensemble of con- formations at low ligand concentration to the bound ensemble of conformations at high ligand concentration. As shown in Fig. 2.3.2, the binding curve obtained from
Eq. 2.3 for coupled sites (c > 1) is sharper than the binding curve for uncoupled sites
22 1.0 c = 1 1.0 c = 1 c = 10 c = 10 c = 50 c = 50 0.8 c = 100 0.8 c = 100
0.6 0.6
0.4 0.4
0.2 0.2 Bound probability Bound probability
0.0 0.0
4 2 0 2 4 4 2 0 2 4 − − − − ln[X] ln[X] Figure 2.3.2: Dependence of bound probability on the parameter c for a protein with two binding sites of homogeneous strength. The x-axis represents ligand concentration and y-axis represents probability of states with both sites occupied by ligand (left), and probability of states where only one site is occupied by ligand (right). Non- cooperative binding corresponds to c = 1. The binding curve becomes increasingly cooperative and the peak of population of singly ligated states decreases for higher values of c.
(c = 1). For uncoupled binding sites the binding is non-cooperative. For higher values of c > 1, the binding curve shows an increased cooperative behavior. Cooperative
binding also suppresses population of partially ligated states (only one site occupied)
in the ensemble of ligand bound conformations. This suppression stems from en-
hanced recruitment of a second ligand after one binding site is occupied. From an
experimental point of view, however, binding cooperativity is commonly quantified
in terms of the sharpness of binding curve since intermediate states are generally
difficult to characterize experimentally.
This framework can be generalized to accommodate heterogeneous binding sites
with
2 (2.5) Z = 1 + (KA + KB)[X] + cABKAKB[X] .
Here, KA and KB reflect equilibrium constants for binding a ligand to the sites (A and
23 B), respectively and cAB is the coupling interaction between the sites. For proteins
with more than two binding sites, the cooperativity is reflected in a partition function
with higher order interactions in addition to the two-body interaction term (cAB).
These multi-body interaction terms are representative of additional stabilization due
to coupling between two or more binding sites. While straightforward, achieving such
a detailed description via available experimental measurement may be very difficult.
For example, binding sites are often treated as equivalent in analysis of experimental
data, for simplicity, even when differences in binding sites are likely[116].
A major aim of my research in this dissertation is to characterize cooperativity
of binding Ca2+ions to the isolated domains as well as intact calmodulin (CaM). In particular, I focus on the heterogeneous nature of the binding loops in calmodulin.
In contrast, much early work on cooperative binding in calmodulin treats the binding sites on equal footing[116]. Assigning a site-specific binding strength within the avail- able experimental resolution is indeed challenging. Techniques designed to distinguish site-specific binding often alter the stability of the relevant states, making it hard to assess the relevance of the wild type protein[66]. Recent techniques that isolate the binding loops by grafting them to a scaffold presents a complicated experimental framework to interpret[221]. Due to the limitation on obtaining site-specific, micro- scopic description of ligand binding to proteins, much experimental work has relied on defining an “effective” uniform coupling parameter between binding sites. Analysis of my simulations do not have such limitations and I am able to extract binding free energies for individual sites in order to calculate cooperativity parameters.
24 2.4 Other models of cooperativity
In 1925, G.S. Adair hypothesized that cooperativity was not a fixed quantity, but dependent on the degree of ligand saturation. Using hemoglobin as model system[1],
Adair assumed that fully saturated hemoglobin is formed in stages via subsequent binding of one, two, three, and four oxygen molecules. The formation of intermediate states consisting of one, two, and three oxygen bound forms of hemoglobin starting from unbound hemoglobin is described in terms of an apparent macroscopic associa- tion constant Ki. The fractional average occupancy in this model for a protein with n ligand binding sites can be expressed as
2 n 1 KI [X] + 2KII [X] + + nKn[X] (2.6) Y n = ··· . n 1 + K [X] + K [X]2 + + K [X]n I II ··· n Here, n represents the number of ligand binding sites and the binding of i ligand
molecules is given by the macroscopic association constant Ki.
For hemoglobin, with 4 oxygen binding sites, n = 4. Hence, Eq. 2.6 takes the
form
2 3 4 1 KI [X] + 2KII [X] + 3KIII [X] + 4KIV [X] (2.7) Y = 2 3 4 , 4 1 + KI [X] + KII [X] + KIII [X] + KIV [X] which gives a measure of average occupancy of binding sites in terms of macroscopic
(site-agnostic) binding constants.
For the thermodynamic cycle shown in Fig. 2.3.1, the average fractional occupancy for heterogeneous sites can be expressed as
2 1 (K1[X] + 2K2[X] ) (2.8) Y II = 2 . 2 (1 + K1[X] + K2[X] )
Here, the macroscopic equilibrium constants, K1 and K2, represent ligand binding to
a single site and to both sites, respectively. The macroscopic binding constants are
25 often reported through the microscopic equilibrium constants to each site, KA and
KB. Comparing with Eq. 2.5 gives, K1 = KA + KB and K2 = cABKAKB[210, 156,
174, 184, 206, 144]. The stabilizing free energy for binding ligands to both sites is given by ∆G = RT ln K , which includes the cooperativity between the binding 2 − 2 sites.
Several other models have been developed to describe cooperative binding to macromolecules. Linus Pauling developed a framework by reinterpreting the Adair equation by assuming the binding constants in Eq. 2.7 incorporated a contribution from binding to a site and a cooperative free energy associated with binding multiple ligands[154]. This idea was later improved by Daniel Koshland and coworkers. The
Koshland-N´emethy-Filmer (KNF) model is based on the assumption that for a pro- tein with multiple subunits, each subunit can exist in either “active” or “inactive” conformations[101]. Ligand binding to a subunit in the inactive state induces a confor- mational change of that subunit to the active conformation. This so called induced fit model is one of the first that rationalizes cooperativity through the structural change in the protein.
Models that attempt to provide a comprehensive description of cooperative bind- ing in macromolecules often fall short of a molecular level description. A typical macroscopic binding titration curve obtained from experiment does not provide this information directly. Rather, microscopic binding information in some sense is “hid- den” within a typical binding titration curve. In my work, I aim to shed light on this microscopic information.
Another key aspect of cooperative ligand binding is to distinguish binding proper- ties of individual sites. The framework developed by Adair and other related models
26 describes cooperativity as an effective coupling between the sites, without distinguish-
ing differences in binding strengths of the sites because this information is not easily
available from experiment. Furthermore, cooperativity in these models is phenomeno-
logical, without appealing to a specific structural mechanism. In this dissertation, I
develop a molecular description of ligand binding for a protein with multiple sites
of heterogeneous binding strengths. To achieve this, I work within the framework
of an explicit molecular model of ligand binding where the binding is coupled with
conformational change. The Monod-Wyman-Changeux (MWC) model of allostery
provides one such framework to study ligand binding in allosteric proteins.
2.5 The MWC Model
The classic Monod-Wyman-Changeux (MWC) model of allostery accounts for con- certed allosteric transitions to describe cooperative binding[136]. The MWC model was originally developed to describe oligomeric proteins which have symmetric, identi- cal subunits, each containing one ligand binding site. This model assumes a thermal equilibrium between two interconvertible conformations of an allosteric protein: a
“tensed” state T and a “relaxed” state R. The tensed state, which is more stable than the relaxed state in the absence of the ligand, binds the ligand with lower affinity than the relaxed state as shown in Fig. 2.5.1. The equilibrium population thereby switches equilibrium from T to R upon ligand binding. A key assumption of the
MWC model is that all subunits of a protein makes a transition simultaneously. This
is often referred to as “strong coupling”.
Since its introduction over 50 years ago, the MWC model has been extended and
generalized to include proteins that exist in more than two states[53], conformational
27 Unbound
Bound Free Energy
Tense Relaxed Reaction Coordinate Figure 2.5.1: Schematic representation of shift in population in MWC model. In the unbound ensemble of conformations the ligand-free closed state has higher stability in the free energy landscape and it exists in dynamic equilibrium with the open state. Upon ligand binding, the ligand-bound open state is stabilized. The x-axis represents reaction coordinate that is used to define the closed and open states of the protein, and the y-axis represents free energy (in arbitrary units). Also shown are representative protein structures in the unbound and ligand-bound ensemble of conformations.
28 transitions and ligand binding with proteins that can bind to several types of lig-
ands[131, 140], as well as proteins with heterogeneous binding sites[187].
The MWC picture shown in Fig. 2.5.1 is a simplified representation of the po-
tentially broad conformational ensemble generally available to a dynamic protein. In
general, the landscape may be more complex than can be represented with two native
basins. Nonetheless, this picture of a shift in population between the basins upon lig-
and binding is a good starting point to develop a model for allosteric cooperativity.
In this framework, the cooperative coupling of binding sites arise from a concerted
conformational change.
In this dissertation, I develop a molecular dynamics simulation of CaM that is
essentially a molecular realization of the MWC model. For CaM, Nuclear Magnetic
Resonance experiments[13, 55, 54], as well as all atom molecular dynamics simula-
tion[207] suggest a dynamic equilibrium between the open and closed conformations
in the absence of Ca2+which supports our approach. In my research, I explore the molecular origins of allosteric cooperativity within the confines of an explicit molec- ular model where ligand binding is coupled with protein conformational change.
In the model described in the next chapter, the conformational dynamics between the two basins is coupled to Ca2+binding in calmodulin. Analysis of our simulation
for binding two Ca2+-ions to a domain of CaM follows the statistical weights shown in
Fig. 2.5.2. In this figure, I use the notation that the ligand concentration is controlled
by its chemical potential, µ k T log[X]. Thermodynamic properties can be easily ∝ B calculated through the partition function for two ligands. The free energies to bind to
the open and closed states are extracted from simulated titration curves for binding
to individual sites.
29 closed open β 1 e−
β(I µ) β β(I µ) e− c − e− e− o−
β(II µ) β β(II µ) e− c − e− e− o − Free Energy β(I +II 2µ) β β(I +II 2µ) e− c c − e− e− o o −
closed open Reaction Coordinate Figure 2.5.2: States and corresponding statistical weights for a simplified description of MWC model of allostery for a protein with two ligand binding sites. In the unligated ensemble of conformations, the relative stability of the open and closed states is set by the parameter . The singly-ligated ensemble of conformations consists of a ligand bound to either the closed state or the open state. The fully-loaded ensemble comprises of both ligand bound to either the closed state or the open state of the protein.
30 2.6 Generalized ensemble view of allostery
In the sequential (KNF) model of allostery, ligand binds to the T state weakly followed
by a conformational transition to the R state. In this picture, the R state in the unbound ensemble is missing. In the MWC model, this state is included in the unbound ensemble. These models can be placed in a general framework with more ligation states as emphasized by Hilser and coworkers[81]. For example, the MWC model asserts that the sites are in the same conformation (either both closed or both open) when the protein undergoes a global structural change. This neglects states with binding sites that can adopt different conformations (one site closed and the other site open). Such binding sites would be “weakly coupled” and hence are expected to have a lower cooperativity. Still, additional states allow more diverse strategies for tuning ligand binding affinity.
Both the KNF and MWC models have been applied extensively to investigate allostery. In spite of their success to quantitatively describe binding titration curves, the KNF and MWC models exist in a broader framework[81]. These models are a subset of an ensemble of microstates that include all combination of open and closed conformations for each site (Fig. 2.6.1). This expanded picture still simplifies the conformational states involved in allostery. In fact, there is evidence that allostery occurs even without a conformational change[158, 46, 157, 217]. Allostery can also involve negative cooperativity between binding pockets[165], which is difficult to ex- plain through structural change alone. Surface mutations, for example, can give rise to allosteric interactions without reorganization of structure[171, 172]. Nevertheless, this generalized picture is able to describe allostery of some systems where that is difficult for models like KNF and MWC to describe. As emphasized by Hilser and
31 (A)
(B)
(C)
Figure 2.6.1: Schematic of ensemble description of models of allosteric cooperativity for a protein with two subunits. (A) The MWC model of allostery, (B) KNF model, (C) A more generalized ensemble allosteric model that accommodates all possible microstates of a protein with two binding sites. Green shaded regions correspond to subunit interaction energy. The two different shapes correspond to the closed and open conformations of the protein. Colored shapes correspond to ligand unbound and uncolored shapes correspond to ligand bound conformations. Blue shaded re- gions show the ensemble of states for each framework. Adapted from the framework developed by Hilser and co-workers[81].
32 coworkers[80], allosteric properties of intrinsically disordered proteins (IDPs)[165],
such as negative cooperativity, require the inclusion of weakly coupled subunits.
2.7 Allostery in calmodulin
In spite of the existence of allosteric systems where structural information alone does not always provide mechanistic insight, MWC model of allostery have been applied with success to numerous systems[51, 52, 33, 24]. For CaM, the activation upon binding Ca2+-ions results in a large structural rearrangement of contacts. Structural details of both the Ca2+-free “closed” state and Ca2+-bound “open” state of CaM
is well characterized. Furthermore, the nature of the conformational change requires
the sites within a domain are strongly coupled. Hence, it seems plausible that the
MWC model of allostery can be applied faithfully to gain mechanistic insight into a
molecular description of allostery in CaM.
Experimentally, the nature of cooperativity for Ca2+-binding to the isolated N-
terminal and C-terminal domains, as well as to the intact domain of CaM has not
yet been fully resolved. Available strategies to measure the extent of cooperative
interactions between the binding loops, and between the two domains of CaM is often
sensitive to mutational effects on the stability of the domains. Typically, experiments
can detect the overall population of bound ligand to the protein but are unable to
detect the order that the loops are occupied. For example, experiments provide very
limited information about how Ca2+binding to the loops influences each other to
produce an overall macroscopic binding curve. Available reports suggest that the
coupling between the N-terminal and C-terminal domains in intact CaM is perhaps
weak, or even negative[184]. For intact CaM, some studies suggest that the Ca2+-
induced stabilization of the residues in the interdomain linker region mediates the
33 coupling between the two domains of CaM[175].
Capturing all these details in a single model is challenging. As mentioned earlier, envisioning a mechanism in which one site in the closed conformation while the other one in the open conformation is probably not physical for the open/closed transition of the domains. On the other hand, relaxing the coupling effect mediated by the interdomain linker between the two domains may be reasonable. In the absence of clear guiding principles to characterize allostery between the binding sites of CaM, I choose “strong” coupling approximation between the binding sites within a domain, as well as between the two domains of CaM. In fact, a recent work has reanalyzed available experimental data within this strong coupling approximation between the domains[186].
34 Chapter 3
Coarse-grained models for conformational dynamics in proteins
3.1 Introduction
Many computational models aim for a comprehensive description of conformational dynamics with atomistic detail. Nevertheless, for some applications such as protein folding, a lower resolution description that can capture long timescales is a practical alternative. These models can predict key features of transition mechanisms of pro- tein folding and other large scale conformational transitions to complement higher resolution models and interpret experimental results. Both theoretical and compu- tational methods have been developed to understand protein transition mechanisms.
Coarse-grained models, introduced in the pioneering work of Levitt and Warshel[111,
110] shortly after the first protein all-atom simulations[213, 123], offer a powerful approach to identify key organizing principles that are involved in complex interac- tion details of proteins. Coarse-grained models represent each amino acid with one or more units whose properties are defined by the specific amino acid type. For ex- ample, one site can be associated with the α-carbon of the amino acid. Later, the protein backbone can be reconstructed if necessary[129, 167]. Interactions can be parameterized by their intrinsic properties such as hydrophobicity and charge[84, 22].
Despite simplification, these models have proven to be highly successful in elucidating the principles concerning protein folding and associated interactions[160, 139, 61, 7,
74, 57, 150, 215, 6, 225, 79, 89, 49]. While falling short of the detail provided by ab initio all-atom models, the ability of coarse-grained models to simulate relevant
35 lengthscales and timescales that make it possible to directly compare, validate, and
improve simulation models.
To give a context for the approach developed in this dissertation, in this chapter
I introduce several ways coarse-grained models have been used to understand pro-
tein dynamics. Theoretical and computational approaches have different advantages
and weaknesses. Simulations of protein dynamics can accommodate a broader phase
space and allows for a more faithful representation of ensemble-averaged properties.
However, capturing the effect of certain physical properties at the atomic level may
be difficult to describe via simulations. Analytic models may be able to incorporate
some physical details directly and provide insight into conformational mechanisms
but they generally require severe approximations to be tractable. All-atom models
on the other hand often provide a detailed description but require higher computa-
tional cost. My experience in this work suggests that, employing simulation models
at a less-detailed level allows one to incorporate important experimental observations
directly in a tractable model, making them an attractive approach to gain insight
into the mechanism of allosteric cooperativity.
3.2 Native structure-based models for proteins
Many coarse-grained models make use of energy functions defined by the interactions that occur in the protein’s native structure, the so-called “native contacts”. These structure-based models are referred to by different names such as structure based, native-centric, or G¯omodels[192, 67]. In these models, interactions that are present in the native structure are stabilized, and other “non-native” contacts are destabilized.
This strong assumption corresponds to a smooth energy landscape dominated by the driving force from the principle of minimum frustration[25, 151]. It is perhaps no
36 surprise that initial support for this model came from work to understand the fastest
folding proteins for which this landscape ruggedness from non-native contacts likely
plays a relatively minor role in the folding kinetics.
Network models are among the simplest structure based models[195, 82, 7]. In this
framework, the ensemble of conformations within the native free energy basin is ap-
proximated by harmonic fluctuation about the native structure. Interactions between
residues are included as harmonic springs between α-carbons. Network models have been highly successful in characterizing fluctuations obtained from measured temper- ature factors of X-ray structures as well as Molecular Dynamics (MD) simulations[3,
106, 166].
The potential in the Elastic Network Model can be expressed as[195]
1 X 2 (3.1) U (R R0) = k ∆ (R0) R R , ENM | 2 IJ IJ | IJ − IJ;0| I where RIJ and RIJ;0 are distance between coarse-grained sites I and J in configuration R, and in the folded configuration R0, respectively. The term kIJ represents the spring constant, and ∆IJ (R0) denotes the matrix of native contacts that equal 1 if RIJ;0 < Rc, where Rc is a specified cutoff. Often, one is interested in characterizing the fluctuations R R and low frequency collective motions within the native h I · Ji basin. Network models have been extensively applied to study a broad variety of questions, including protein conformational dynamics[97, 134, 125, 224, 40, 225, 183, 118, 83], refining experimentally determined structures[194, 69], and modeling protein docking[170, 130]. Simulations of native-centric or G¯o-models is another coarse-grained approach used to study biological macromolecules. Similar to network models, contacts stabi- lizing the known reference folded structure define the attractive interactions in the 37 model. Although this approach can be applied with an all-atom representation, the simplest native-centric model represent each residue by its α carbon atom. Non- bonded interactions between residues in proximity in the native structure typically have Lennard-Jones type potential that takes its minimum at the separation distance in the reference folded structure. The non-bonded interactions between all other pairs, i.e. non-native interactions, are modeled by simple repulsive potential to capture the effect of excluded volume interactions and tends to minimize frustration. Bond angles and dihedral angles are also parameterized by the angles in the native structure. A primary motivation behind the development of these models is the energy land- scape idea that many proteins in nature are “minimally frustrated”. Non-native interactions generally gives rise to energetic frustration that slows kinetics of confor- mational transitions. By ignoring energetic frustration, native-centric models provide an idealized folding funnel[182, 50]. Because of their simplicity and low computational cost, native-centric models have enjoyed high popularity with wide ranging applications. More refined models include native-centric models with atomistic interaction details[176, 41], and “flavored” C α − models, in which the interaction strength depends on the chemical nature of the residues involved in making the contact[93]. Other native-centric models have been introduced that incorporate desolvation barriers[38], non-native interactions[43, 223], many-body cooperative interactions[8], and electrostatic interactions[112]. Confor- mational transitions between multiple conformations[228, 17, 150, 87, 100], as well as validation of experimentally obtained free energies of mutation[128] have also utilized native-centric models. Other areas of application of these models include modeling the effects of confinement and crowding on folding[191, 37], studies involving nucleic 38 acids[86], and to investigate coupled folding and binding of intrinsically disordered proteins[113, 212]. Large-scale conformational transitions in proteins often involve much longer timescale than that can be achieved from typical simulation timescales. Due in part to this limitation, theoretical approaches often rely on simplified, coarse- grained models that incorporate energy basins defined by two distinct conformations. The “closed” conformation is stable in the absence of a ligand, and an “open” confor- mation which is stable when ligated. The Elastic network and other related models describe allosteric transition as motion along low frequency normal modes of the closed state conformational basin[9, 7, 193, 10]. In this approach, protein dynam- ics for describing conformational changes depends entirely on conformations about a single free energy minimum. While this provides a natural rationale for protein conformational change as collective low energy motions[219, 11], a minimal model for the transition mechanism must accommodate both open and closed states. Al- losteric transitions have been modeled by coupling two meta-stable basins through an interpolation function defined by its energy[17, 150, 36, 149, 119, 220]. For example, minimal energy pathways have been computed for a potential energy surface based on the strain energies relative to each distinct mechanism[125, 40, 49]. 3.3 An analytic model for CaM The work on conformational transition in CaM in this dissertation is motivated by results from an analytic variational model of allostery[198, 197]. This model describes conformational changes in terms of the evolution of each residue’s local flexibility[199, 200, 197]. An alternative approach, provided by Itoh and Sasai, employs a model for which contacts of the two meta-stable structures are treated on equal footing rather than through an interpolated energy function[89, 90]. In this section, I briefly describe 39 variational model developed to study folding and allosteric transitions in proteins. In this analytic coarse-grained model, partially folded ensembles of structures can be characterized by their Gaussian fluctuations. Here, a protein conformation is rep- resented by the N position vectors of the C-α positions of the polypeptide backbone, ri. In this representation, partially ordered ensembles of protein configurations are described by a reference Hamiltonian 3 X N 2 (3.2) βH = βH + C [ri r ] , 0 chain 2a2 i − i i where β = 1 and a is mean bond length of a freely rotating chain. The first term kBT in Eq. 3.2 represents the protein backbone as a uniformly stiff homopolymer. The second term is an external field in which the N variational parameters, C , control { i} the magnitude of the fluctuations about the native C-α positions of the polypeptide N ri . The free energy landscape of partially folded ensemble of structures is explored via the reference Hamiltonian. The Hamiltonian that determines the population of partially folded ensembles is described by X (3.3) H = H + u(ri rj) . chain ij − ij∈|ij| Here, u(ri rj) is a pair potential and is the strength of contact between residues − ij i and j. The variational free energy for a partially folded ensemble specified by the constraints, C , at temperature T is given by { } (3.4) F [ C ] = k T log Z + H H . { } − B 0 h − 0i0 Here, Z0 is the partition function corresponding to the reference Hamiltonian, H0, and H H is the average computed with respect to the reference Hamiltonian. h − 0i0 40 The variational free energy given in Eq. 3.4 can be expressed as (3.5) F [ C ] = E[ C ] TS[ C ] , { } { } − { } where E[ C ] is the energy arising from formation of native contacts, and S[ C ] is { } { } the entropy loss of each residue. This framework can be adapted to study conformational transitions as well. Here, the energy function couples the two metastable basins. The reference Hamiltonian becomes 3 X N 2 (3.6) βH = βH + C [ri r (α )] . 0 chain 2a2 i − i i i In this case the variational constraints tune the magnitude of fluctuations of each residue about an interpolated native conformation of the protein N NI NA (3.7) ri (α ) = α ri + (1 α )ri , i i − i where α controls the conformation from the inactive (I; α = 1) and active (A; α = { i} i i 0) states of the protein. The energies computed with respect to the two basins, EI[ C , α ] and EA[ C , α ], are coupled through the interpolation function[197] { } { } { } { } EI[ C , α ] + EA[ C , α ] + E E[ C , α ] = { } { } { } { } 0 { } { } 2 s EI[ C , α ] EA[ C , α ] E 2 (3.8) { } { } − { } { } − 0 + ∆2 . − 2 Here, the coupling function is based on a conformation’s total energy. An alterna- tive coupling based on individual contacts has also been explored with the coupling function[198] (3.9) ( + )u = k T ln[1 + e−(ij +0)uij /kBT ] . ij 0 ij − B 41 Similar coupling function for two single basin energies shown in Eq. 3.8 has been used in other coarse-grained models to study protein conformational change[17, 125, 150, 215, 220]. The variational model has been applied to investigate allosteric transition mech- anisms in the isolated N-terminal (nCaM) and C-terminal (cCaM) domains of CaM. Notably, the two domains are predicted to have distinct transition mechanisms. In particular, the domain opening mechanism of cCaM involves local partial unfolding and refolding while nCaM remains folded throughout the transition[200, 198]. This is reminiscent of the cracking mechanism introduced in Ref. [134, 135, 198] whereby local unfolding relieves regions of local strain, thereby reducing an otherwise high free energy barrier. Recent all-atom simulation work predict cracking in the linker region of cCaM[180]. For the domains of CaM, Tripathi and Portman suggest that whether a region of protein will undergo cracking during its conformational transition is not solely determined by the magnitude of strain that develops during such transition (Fig. 3.3.1), but depends on details of contact maps of the two states as well. One motivation for the work in this dissertation is to investigate the robustness of the prediction of distinct domain opening mechanism of nCaM and cCaM. 3.4 Simulation model for CaM Although my primary goal is to study allosteric cooperativity of CaM, I first focus on simulating the open/closed conformational transition of nCaM and cCaM in the absence of Ca2+. This model couples two energy basins, one biased to the open (holo) conformation and the other to the closed (apo) reference conformation of CaM[150]. In the following sections, I briefly describe the general framework of the simulation models used in the studies. 42 (a) nCaM (b) cCaM A I B C II D E III F G IV H (apo) 1 (apo) 1 15 13.3 0.8 0.8 au ] au 10 au ] au 10 y [ 0.6 0.6 y [ 0 0 erg erg 0 α α 0.4 en 0.4 5 en 5 ain ain 0.2 Str 0.2 Str 0 0 (holo) 0 (holo) 0 4 14 24 34 44 54 64 74 76 86 96 106 116 126 136 146 Residue index Residue index (c) (d) 2 α = 0.4 α = 0.4 2 0 0 au ] au au ] au y [ y [ erg erg 1 1 en en ain ain Str Str 0 0 4 14 24 34 44 54 64 74 76 86 96 106 116 126 136 146 Residue index Residue index Figure 3.3.1: Distribution of strain energy of residues in CaM domain. (a) and (b) Change in strain energy for individual residues along the apo holo structural change of nCaM and cCaM. (c) and (d) Residue strain energy distributions→ at an intermediate state for nCaM and cCaM, respectively. Adapted from the work by Tripathi and Portman[198]. 43 3.4.1 Model for conformational transition A conformation in this coarse-grained model[150] is specified by the N position vectors of the C-α atoms of the protein backbone, R = r , r . For an energy { 1 ··· N } basin biased to the reference conformation, R0, the energy of a configuration R can be written as (3.10) V (R) = V (R R ) + V (R R ) + V (R R ). 0 local | 0 n | 0 nn | 0 The first term in Eq. 3.10 defines the coarse-grained backbone X X V (R R ) = K (b b0)2 + K (θ θ0)2 local | 0 b i − i θ i − i bonds angles X + K [1 cos(φ φ0)] φ − i − i dihedrals i + K(3)[1 cos 3(φ φ0)] , φ − i − i (3.11) where bi, θi, and φi denote bond lengths, bond angles, and dihedral angles, respec- tively. The corresponding values in the native structure are denoted with a super- 0 0 0 script: bi , θi , and φi . The non-bonded interaction between neighboring residues in the native structure (native contacts) have short-ranged attraction " 0 12 0 10# X rij rij (3.12) V (R R ) = 5 6 , n | 0 go r − r i 10 X d (3.13) V (R R ) = . nn | 0 rep r i 0 Here, rij is the distance between C-α atoms i and j in a conformation, R, and rij is the corresponding separation distance found in the reference structure, R0. 44 To study conformational changes between two meta-stable states, the energies of the corresponding native basins, V1(R) and V2(R), are coupled through an interpo- lation function[96] s V + V + ∆V V V ∆V 2 (3.14) V (R) = 1 2 1 − 2 − + ∆2 . 2 − 2 Here, the interpolation parameters, ∆ and ∆V , control the barrier height and the relative stability of the two basins. The single basin energies V1(R) and V2(R) are computed from Eq. 3.10 with modifications to some of the reference parameters in the potential in order to minimize conflicts between the two contact maps. This model has been developed by Takada and co-workers and is available in the simulation package Cafemol [96]. The source code is also available allowing us to modify the model for our studies. 3.4.2 Model for Ca2+-binding To study thermodynamics of calcium binding to the two EF-hand loops of each do- main of CaM, the binding event is modeled implicitly by adding a potential defined from the ligand-mediated contacts in the EF-hand loops of the open (holo) confor- mation " 0 2 # X rij rij (3.15) V = c exp − . bind − lig go − 2σ2 i,j ij Here, the sum is over residue pairs within a cutoff distance to a Ca2+ion in the holo reference conformation. For simplicity, I treat the binding energy parameters in Eq. 3.15 to be the same for each contact formed in the presence of Ca2+. I work within the Grand Canonical Ensemble to simulate Ca2+binding/unbinding events coupled to conformational change of the protein. The ligation state of each 45 binding loop (four in total) is determined stochastically through Monte Carlo steps attempted at a fixed rate during the protein’s conformational transitions. The at- tempt probabilities for binding and unbinding, α0→1 and α1→0, satisfy the detailed balance condition (3.16) α0→1p0 = α1→0p1. Here, p0 and p1 denote the equilibrium probabilities for the protein to be in the unbound and bound state, respectively. In Monte Carlo scheme, acceptance probability is determined based on detailed balance condition P α p (3.17) 0→1 = 1→0 1 . P1→0 α0→1 p0 Here, P0→1 and P1→0 denote acceptance probabilities for binding and unbinding, respectively. Eq. 3.17 can be expressed as P0→1 α1→0 (3.18) = exp [ (Vbind µ)/kBT ] . P1→0 α0→1 − − For thermodynamics simulations, we attempt to change the ligation state of each loop per Monte Carlo step. That is, α1→0 = α0→1. If the loop is unligated, a ligand is introduced (V V + V ) with probability → bind (3.19) P → = min[1, exp [ (V µ)/k T ]]. 0 1 − bind − B For unbinding transitions, if the loop is ligated, the ligand dissociates from the binding loop (V + V V ) with probability bind → (3.20) P → = min[1, exp [(V µ)/k T ]]. 1 0 bind − B 46 Here, µ is the chemical potential of a bound ligand. At equilibrium, µ equals the chemical potential of the ligand in solution, c (3.21) µ = kBT ln + µ0 , c0 where c is the ligand concentration, and c0 and µ0 are the reference concentration and reference chemical potential, respectively. The simulated binding curves are reported as function of the chemical potential, or equivalently, in terms of the relative ligand concentration defined through µ/k T = ln (c/c¯ ) where,c ¯ = c exp( µ /k T ). B 0 0 0 − 0 B For thermodynamic calculations, it is efficient to sample binding and unbinding events with the same frequency. Nevertheless, this choice is problematic for kinetics simulations. The symmetric Monte Carlo scheme does not accurately describe the concentration dependence of the binding rates. For bimolecular reactions, the bind- ing rate is proportional to ligand concentration, k c, while the unbinding rate, bind ∝ kunbind, is independent of ligand concentration. Accordingly, for kinetics I choose a non-symmetric scheme for binding and unbinding that still satisfies the detailed bal- ance. Here, unbinding is independent of concentration which suggests the acceptance probability (3.22) P1→0 = min[1, exp [Vbind/kBT ]]. To satisfy detailed balance, the attempt probabilities become (3.23) α0→1 = α1→0 exp [µ/kBT ] . with a binding acceptance probability of (3.24) P → = min[1, exp [ (V µ)/k T ]]. 0 1 − bind − B 47 Eq. 3.23 is satisfied by attempting unbinding events every τ0 steps, while binding is −1 attempted with probability τ0 exp [µ/kBT ]. This choice of Monte Carlo acceptance probabilities gives the expected dependence of simulated on and off rates for ligand binding. Takada and co-workers provide a similar approach albeit the ligand concentration is controlled by changing the attempt rate for binding[115]. The advantage of our formalism is that the ligand concentration is controlled by its binding chemical potential making the connection to thermody- namic models straightforward. The simulations of open/closed transitions and Ca2+binding thermodynamics and kinetics were parameterized based upon experimental reports of relative stability between the two basins, and folding temperature of the domains. This is discussed in more detail in Chapters 4, 5, 6. 48 Chapter 4 Comparing allosteric transitions in the domains of calmodulin through coarse-grained simulations Abstract Calmodulin (CaM) is a ubiquitous Ca2+-binding protein consisting of two struc- turally similar domains with distinct stabilities, binding affinities, and flexibilities. We present coarse grained simulations that suggest the mechanism for the domain’s allosteric transitions between the open and closed conformations depend on subtle dif- ferences in the folded state topology of the two domains. Throughout a wide temper- ature range, the simulated transition mechanism of the N-terminal domain (nCaM) follows a two-state transition mechanism while domain opening in the C-terminal domain (cCaM) involves unfolding and refolding of the tertiary structure. The ap- pearance of the unfolded intermediate occurs at a higher temperature in nCaM than it does in cCaM consistent with nCaM’s higher thermal stability. Under approximate physiological conditions, the simulated unfolded state population of cCaM accounts for 10% of the population with nearly all of the sampled transitions (approximately 95%) unfolding and refolding during the conformational change. Transient unfolding significantly slows the domain opening and closing rates of cCaM. This potentially influences the mechanism of Ca2+-binding to each domain1. 1Adapted from P. Nandigrami and J. J. Portman, The Journal of Chemical Physics 144, 105102 (2016); http://doi.org/10.1063/1.4943130 49 4.1 Introduction Allostery is central to the precise molecular control necessary for protein function. Indirect coupling between distant regions of a protein is often provided through a conformational transition between a “closed” (ligand-free) and “open” (ligand-bound) structure upon ligation. NMR experiments that reveal proteins exist in dynamic equilibrium with multiple conformers[13, 138, 55, 54, 209, 76] suggest that a protein’s conformational dynamics in the absence of a ligand plays an essential role in allosteric regulation[122, 190, 75, 18]. The functional dynamics of a folded protein occurs near the bottom of the funneled energy landscape, a part of the landscape generally more susceptible to perturbations than the self-averaged kinetic bottleneck that determines the mechanism of folding[26]. This sensitivity, while important for a protein’s ability to dynamically respond to environmental conditions and interaction with ligands, also makes the prospect of general organizing principles for allostery problematic[227]. In this work, we explore the sense in which the summarizing statement that native state topology determines the folding mechanism of small single domain proteins[12] carries over to large-scale conformational transitions. Due in part to limitations on computational timescales, much theoretical work modeling large-scale conformational transitions in proteins has focused on simplified, coarse-grained models based on the energy basins defined by the open and closed conformations. The Gaussian network and related models describe an allosteric tran- sition as motion along low frequency normal modes of the closed state conformational basin[9, 7, 193, 10]. While the dynamics about a single free energy minimum offers a natural rationale and clear description of the collective motions involved in the conformational change[219, 11], a minimal model capable of capturing the transition 50 mechanism must accommodate the change in dynamics as protein moves between the two distinct meta-stable free energy basins. Allosteric transitions have been modeled by several different methods in which two meta-stable basins are coupled through an interpolation based on its energy. For example, minimal energy pathways have been computed for a potential surface based on the strain energies relative to each minimum conformation to predict the transition mechanism[125, 40, 49]. Structure based simulations that couple two conformational basins have also been developed to understand the mechanism of allosteric transitions[17, 150, 149, 36, 119, 220]. Ad- ditionally, transition mechanisms have been described in terms of the evolution of each residue’s local flexibility using a coarse grained variational model [199, 200, 198, 197]. Itoh and Sasai present an alternative approach to predict allosteric transition mechanisms in which contacts from two meta-stable structures are treated on equal footing rather than through an interpolated energy function[89, 90]. We utilize coupled structure based simulation of the opening transition in the domains of calmodulin (CaM) to explore how subtle differences in the native state topology can lead to qualitative changes in the transition mechanism. This work is motivated in part by an intriguing theoretical prediction[200] that the domain opening mechanism of the C-terminal domain (cCaM) involves local partial unfolding and refolding while the N-terminal domain (nCaM) remains folded throughout the transition. These distinct transition mechanisms are in harmony with the Itoh and Sasai’s model that predicts cCaM has larger fluctuations than nCaM during domain opening[90]. Local unfolding in cCaM is found to relieve regions of high local strain during the transition[198] in agreement with the cracking mechanism of allosteric transitions discussed by Miyashita et al[134, 135]. 51 CaM is a ubiquitous Ca2+-binding protein consisting of two structurally similar globular domains connected by a flexible linker. Each domain consists of two helix- loop-helix motifs (the EF-hands) connected by a flexible linker as shown in Fig. 4.2.1. Although topologically similar, the two CaM domains have distinct flexibilities, melt- ing temperatures and thermodynamic Ca2+-binding properties[204, 116, 184, 127]. In the absence of Ca2+, the C-terminal domain is particularly dynamic[196] and is less stable than the N-terminal domain in the intact protein and when separated into isolated domains[23, 184, 127]. The C-terminal domain, which has a very low denaturation temperature, is reported to be considerably unfolded under physiolog- ical temperature[164]. Furthermore, NMR experiments monitoring the open/closed transition of isolated cCaM have revealed local transient unfolding of helix F during domain opening[121]. The simulations presented in this paper suggest that over a wide range of tem- peratures, domain opening in cCaM involves global unfolding and refolding, while the unfolded conformations are much less prominent in nCaM’s primarily two-state domain opening mechanism. The appearance of an unfolded intermediate at a suffi- ciently high temperature is expected and has been reported for similar simulations of the conformational transition of cCaM[36] and the homologous protein S100A6[150], as well as other proteins[17, 220]. Given the structural similarity of the two domains, it is harder to anticipate that the unfolded ensemble becomes locally stable at a significantly higher temperature in nCaM than it does in cCaM. Both the analytic model and simulations suggest that cCaM is more susceptible to unfolding during domain opening, despite employing very different approximations. Nevertheless, the 52 simulated intermediate is globally unfolded in contrast to the local unfolding pre- dicted by the analytic model. In terms of the kinetics, global unfolding and refolding significantly slows the simulated domain opening rate in cCaM which potentially can bias the partitioning of Ca2+-binding kinetics between induced fit and conformational selection for the two domains. 4.2 Methods We use a native-centric model implemented in the Cafemol simulation package[96] to study the open/closed conformational transitions of the isolated N-terminal and C-terminal domains of CaM. This model couples two energy basins, one biased to the open (holo) conformation and the other to the closed (apo) reference confor- mation[150]. To study allosteric transitions in the isolated Ca2+-free domains of calmodulin, the two energy basins are coupled via an interpolation function. The two energy basins are biased to the open (holo) reference native conformation and closed (apo) native reference conformation, respectively. The open and closed con- formations of the domains of CaM are shown in Fig. 4.2.1. A conformation in this coarse-grained model[150] is specified by the N position vectors of the C-α atoms of the protein backbone, R = r , r . For an energy basin biased to the reference { 1 ··· N } conformation, R0, the energy of a configuration R can be written as (4.1) V (R) = V (R R ) + V (R R ) + V (R R ). 0 local | 0 n | 0 nn | 0 53 (a) A I B C II D (b) E III F G IV H 10 20 30 40 50 60 70 80 90 100 110 120 130 140 Figure 4.2.1: Aligned structures Ca2+-free (closed/apo) and Ca2+-bound (open/holo) native conformations for (a) N-terminal domain and (b) C-terminal domain of Calmodulin. The closed state (pdb: 1cfd [103] ) is shown in blue, and the open state (pdb: 1cll[34]) is shown in green. The closed (apo) and open (holo) conforma- tions of (a) nCaM (residue index 4–75) consist of helices A, B and C, D with binding loops I and II respectively. The closed (apo) and open (holo) conformations of (b) cCaM (residue index 76–147) consist of helices E, F and G, H with binding loops III and IV respectively. Secondary structure legend for nCaM and cCaM are shown on top of the protein structures. The CaM structures were made using visual molecular dynamics[85]. 54 The first term in Eq. 4.1 defines the coarse-grained backbone X X V (R R ) = K (b b0)2 + K (θ θ0)2 local | 0 b i − i θ i − i bonds angles X + K [1 cos(φ φ0)] φ − i − i dihedrals i + K(3)[1 cos 3(φ φ0)] , φ − i − i where bi, θi, and φi denote bond lengths, bond angles, and dihedral angles, respec- tively. The corresponding values in the native structure are denoted with a super- 0 0 0 script: bi , θi , and φi . The non-bonded interaction between neighboring residues in the native structure (native contacts) have short-ranged attraction " 0 12 0 10# X rij rij (4.2) V (R R ) = 5 6 , n | 0 go r − r i 10 X d (4.3) V (R R ) = . nn | 0 rep r i 0 Here, rij is the distance between C-α atoms i and j in a conformation, R, and rij is the corresponding separation distance found in the reference structure, R0. In the study of allosteric transitions between the two domains of Ca2+-free CaM, the coefficients defining the energy function are set to their default values in Cafemol: (1) (3) Kb = 100.0, Kθ = 20.0, Kφ = 1.0 and Kφ = 0.5, go = 0.3, rep = 0.2 in units of kcal/mol, and d = 4A.˚ Trajectories are simulated using Langevin dynamics with a friction coefficient of γ = 0.25 and a timestep of ∆t = 0.2 (in coarse-grained units)[149]. With these parameters, the folding transition temperatures of the isolated o ◦ CaM domains are estimated from equilibrium trajectories to be TF(nCaM) = 333.6 K 55 c ◦ o and TF(nCaM) = 328.9 K for the open and closed state of nCaM, and TF(cCaM) = ◦ c ◦ 335.1 K and TF(cCaM) = 330.5 K for the open and closed state of cCaM, respectively. Experimentally, the isolated domains have similar folding transition tempera- ◦ o o tures of approximately 323 K[127]. Although TF(cCaM) and TF(nCaM) as well as c c ◦ TF(cCaM) and TF(nCaM) are within 2 K (with cCaM’s thermal stability slightly be- low nCaM’s ), coupling the open and closed basins significantly destabilizes cCaM with respect to nCaM (described below). Consequently, the simulations relevant to the domains of intact CaM, for which interactions between the domains, particu- larly with the linker region[148], reduce the folding temperature of the C-terminal domain to roughly 315◦K and increase the folding temperature of N-terminal domain to 328◦K[204, 184, 127]. To study conformational changes between two meta-stable states, the energies of the corresponding native basins, V1(R) and V2(R), are coupled through an interpo- lation function[96] s V + V + ∆V V V ∆V 2 (4.4) V (R) = 1 2 1 − 2 − + ∆2. 2 − 2 Here, the interpolation parameters, ∆ and ∆V , control the barrier height and the relative stability of the two basins. The single basin energies V1(R) and V2(R) are computed from Eq. 4.1 with modifications to some of the reference parameters in the potential in order to minimize conflicts between the two contact maps. (Please refer to Ref. [150, 149, 96] for details). To compare the simulated domain opening mechanisms most clearly, it is convenient to choose coupling parameters ∆ and ∆V so that the barrier between the two states is low enough to give sufficient sampling of the two states and equal stability of the open and closed conformations (a choice to improve sampling of the equilibrium transition kinetics). With ∆ = 14.0 kcal/mol and ∆V = 56 -4.0 (a) -6.0 -8.0 Free Energy -10.0 -12.0 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 -4.0 (b) -6.0 -8.0 Free Energy -10.0 -12.0 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 ∆Q Figure 4.2.2: Simulated free energy (in units of kBT ) as a function of the global progress coordinate ∆Q = Q Q for (a) nCaM and (b) cCaM. closed − open 57 2.15 kcal/mol for nCaM, and ∆ = 17.5 kcal/mol and ∆V = 0.25 kcal/mol for cCaM, the open and closed states are equally probable with a free energy barrier of 4k T ' B as shown in Fig. 4.2.2. With these parameters, the folding temperature for cCaM is approximately 25 degrees below the folding temperature of nCaM as indicated by the peaks in the heat capacity shown in Fig. 4.2.3. We report temperatures relative ? ◦ to the simulated folding temperature of cCaM, denoted as TF = 275.0 K. Although we have explored a wide range of temperatures, most of the results presented in this ? paper have Tsim = 0.96TF, a temperature slightly below the folding temperature of cCaM, and significantly below the folding temperature of nCaM. NMR experiments indicate that the closed state of cCaM is more stable than the open state under physiological conditions, accounting for roughly 90% of the popula- tion[124]. Assuming nCaM is similar, we adjust the relative stability of both domains through the coupling parameter ∆V to match this stability (∆V = 3.5 kcal/mol for nCaM, and ∆V = 4.0 kcal/mol for cCaM). As shown in Fig. 4.2.3, the folding tem- peratures of the domains are sensitive to this destabilization of the open state. The simulated folding temperatures of the two domains differ by approximately 18◦K, somewhat larger than the difference in experimental folding temperatures of the do- mains in intact CaM, approximately 13◦K[164]. To connect to the domain opening kinetics in intact CaM, we relate the simulated temperatures to the folding temper- atures of its N-terminal and C-terminal domains. With this choice, the physiological temperature 310◦K corresponds to simulation temperature of 95% of nCaM’s folding temperature, and 98% of cCaM’s folding temperature. Simulated conformational en- sembles are characterized through local and global structural order parameters based on the contacts formed in each sampled conformation. A native contact is considered 58 K) 1000 ◦ 800 600 400 200 Heat capacity (kcal/ 0 240 260 280 300 320 340 Temperature (◦K) Figure 4.2.3: Heat capacity as a function of temperature for cCaM (red) and nCaM (blue) for two relative stabilities of the open and closed basins. The solid curves correspond to equally stable open and closed basins, and in the dashed curves the open state occupies approximately 10% of the total population. to be formed if the distance between the residues is closer than 1.2 times the cor- responding distance in the native conformation. To characterize structural changes during the conformational transition, it is convenient to separate the set of native contacts in the open (holo) and closed (apo) conformations into three groups: those that occur exclusively in either the open or the closed native reference conformation, and those that are common to both states. For each of these groups, denoted by α = (open, closed, and ), we define a local order parameter, q (i), as the fraction of na- ∩ α tive contacts formed involving the ith residue. Overall native similarity is monitored by corresponding global order parameters, Q = q (i) , where the average is taken α h α i over the residues of the protein. The free energy parameterized by these global order parameters are used to identify locally stable conformational ensembles such as the open and closed basins. The transition rates between two coarse-grained ensembles are calculated from 59 equilibrium simulations of length 108 steps typically involving O(103) open/closed transitions for nCaM and O(102) open/closed transitions for cCaM. The transition rate between two states labeled by i and j is estimated by[28] Ni→j (4.5) ki→j = P , τ N → h ii k=6 i i k where τ is the mean time spent in state i between transitions, and N → are the h ii i j number of transitions from state i to state j. When the allosteric transition involves −1 only the open and closed states, Eq. 4.5 reduces to the two state rates, k → = τ o c h oi −1 and k → = τ , where τ and τ are the mean first passage times to leave the c o h ci h oi h ci open and closed state, respectively. 4.3 Conformational Transitions of Isolated Domains The populations of simulated conformations organized in terms of global order parameters are shown in Fig. 4.3.1. The free energy as a function of Qopen and Qclosed shows that the nCaM has a two-state domain opening and its conformational tran- sition is sequential. That is, contacts specific to the closed conformation are lost prior to formation of contacts specific to the open conformation which mostly form after transition state region. Fig. 4.3.1 also shows the free energy projected onto the order parameter monitoring common contacts, Q∩, and a progress coordinate for the conformational transition, ∆Q = Q Q . The global order parameter Q∩ closed − open monitors the overall structural integrity of the secondary structure as well as tertiary contacts within parts of the protein that do not have large conformational changes during the transition. As shown in Fig. 4.3.1, the common contacts in nCaM’s tran- sition state ensemble remain largely intact. In contrast, the simulated open/closed free energy for cCaM has a locally stable intermediate state. Simultaneously low 60 values of Qopen and Qclosed (both less than 0.3), and Q∩ (less than 0.7), indicate that the intermediate has significantly reduced tertiary structure. The probability to form nCaM cCaM 1.0 0.0 (a) (b) 0.8 2.5 − 0.6 5.0 open − Q 0.4 7.5 0.2 − 10.0 0.0 − 0.0 0.4 0.8 0.0 0.4 0.8 Qclosed Qclosed 1.0 0.0 (c) (d) 0.8 2.5 − 0.6 ∩ 5.0 Q 0.4 − 7.5 0.2 − 10.0 0.0 − -0.8 -0.4 0.0 0.4 0.8 -0.8 -0.4 0.0 0.4 0.8 Q Q Q Q closed − open closed − open Figure 4.3.1: Free energy, in units of kBT , projected onto global order parameters Qopen, Qclosed, and Q∩ for nCaM (a and c) and cCaM (b and d). The intermediate in free energy surface of cCaM corresponds to an ensemble of states with intact secondary structure but lacking stable tertiary contacts. individual contacts in the intermediate (data not shown) verifies that the secondary structure remains intact, though nearly all the tertiary interactions are lost. Since the barrier for the transition closed I (∆F † 4 k T) is higher than the barrier for → ' B I open (∆F † 1.5 k T), the intermediate can be considered to be part of cCaM’s → ' B extended open basin. To describe the transition mechanism at the residue level, we consider the local 61 order parameter q∩(i) of each residue as a function of the global progress coordinate ∆Q. As shown in Fig. 4.3.2, cCaM’s residues lose the majority of their common contacts upon opening (moving upward in the plot) and regain them later in the transition. Although the folding and refolding of residues in helices E and H are more gradual than other residues, nearly every residue (except the residues in the linker region between helices F and G) looses native tertiary structure. In contrast, the common contacts in nCaM remain intact throughout the transition, though the contacts involving specific residues in helices A and D and the β-sheets in the loops are strained. Limited loss of long range common contacts in nCaM reflect an increased flexibility of the folded transition state ensemble. The pairwise contact probability shown in Fig. A.1.1 in Appendix A provides an alternative description of the transition state ensemble in nCaM and intermediate ensemble in cCaM during domain opening. The secondary structure of both nCaM and cCaM is intact during the transition. However, nCaM’s contacts show a signature of limited loss of long range tertiary contacts particularly between helices A/D and helix A/C-D linker. In contrast, cCaM shows global loss of almost all the long- rage tertiary contacts in the intermediate ensemble. This description provides a complementary picture of the transition process as described by Fig. 4.3.2. A coarse-grained, analytic model, also predicts distinct transition mechanisms for each domain in which cCaM is susceptible to local unfolding during the open/closed transition, while nCaM remains folded[200, 198]. The conformational transition in the analytic model is described as the evolution of local flexibility along the transition route. Fig. 4.3.3 shows the simulated local flexibility for four discrete values of the progress coordinate, ∆Q. Although the fluctuations of the residues in both domains 62 A I B C II D 1.0 (a) 0.4 0.8 0.2 open 0.6 Q − 0.0 0.4 closed 0.2 Q − 0.2 0.4 − 0.0 10 20 30 40 50 60 70 sequence index E III F G IV H 1.0 (b) 0.4 0.8 0.2 open 0.6 Q − 0.0 0.4 closed 0.2 Q − 0.2 0.4 − 0.0 80 90 100 110 120 130 140 sequence index Figure 4.3.2: Local order parameter, q∩(i), plotted as a function of the global progress coordinate, Qclosed Qopen, for each residue of (a) nCaM and (b) cCaM. The color represents the probability− of each residue forming native contacts common to both the open and closed structures: low probability is shown by red and high probability is shown by blue. 63 increase and then decrease during the transition, the magnitude of the largest fluc- tuations are much greater in cCaM. In contrast to the global unfolding observed in the simulations, unfolding and refolding of cCaM predicted by the analytic model is localized to particular residues (primarily in the linker between helix F and G). 9 (a) 8 ∆Q = 0.4 ∆Q = 0.0 7 − ∆Q = 0.2 ∆Q = 0.2 6 − ˚ A) 5 4 rmsf ( 3 2 1 0 10 20 30 40 50 60 70 16 (b) 14 12 10 ˚ A) 8 rmsf ( 6 4 2 0 80 90 100 110 120 130 140 sequence index Figure 4.3.3: Magnitude of the root mean square fluctuations for each residue for the conformational ensembles along the transition pathway for (a) nCaM and (b) cCaM. Each color corresponds to the value of ∆Q = Qclosed Qopen indicated in the legend in (a). − 64 Exploring a range of temperatures reveals that both domains can exhibit a two- state transition mechanism or a transition mechanism that involves unfolding and refolding depending on the temperature (see Fig. 4.3.4). The transition mechanism at low temperatures is two state, involving primarily well folded conformational en- sembles throughout the transition. Increasing the temperature progressively stabilizes the unfolded ensemble until it becomes locally stable at a spinodal temperature, Ts. Above the spinodal temperature, the transition between the open and closed state involves unfolding and refolding of the domain. At high enough temperatures, the unfolded conformation becomes the most stable state. Although both domains fol- low similar transition scenarios as a function of temperature, the domains can have different transition mechanisms from each other because the spinodal temperatures are different. Comparing the two domains, cCaM has a lower spinodal temperature (T c 0.93T ?) than nCaM (T n 1.005T ?). For low temperatures, (T < T c), both the s ≈ F s ≈ F s c n domains have two state transitions. For intermediate temperatures (Ts < T < Ts ), the domain opening transition of nCaM is two state, while the transition of cCaM n involves unfolding and refolding. For higher temperatures (Ts < T ), the unfolded ensemble of nCaM is locally stable, but at this temperature the unfolded ensemble of cCaM is stabilized enough to become the global minimum. Focusing on the scenario when the open state is 10% of the total population and at a simulation temperature corresponding to T = 310◦K (to model intact CaM at physiological conditions), the simulated unfolded population is less than 1% for nCaM, and approximately 9% for cCaM. These equilibrium unfolded populations can be compared to reports of 2% for the N-terminal domain and 24% for the C-terminal domain in intact CaM based on thermodynamic stability measurements[127]. 65 nCaM cCaM 1.0 0 ? ? (a) T = 0.89TF (b) T = 0.89TF 0.8 2 − 0.6 4 open − Q 0.4 6 − 0.2 8 0.0 − 1.0 0 ? ? (c) T = 1.08TF (d) T = 1.08TF 0.8 2 − 0.6 4 − open Q 0.4 6 − 0.2 8 − 0.0 0.0 0.4 0.8 0.0 0.4 0.8 Qclosed Qclosed Figure 4.3.4: Free energy, in units of kBT , projected onto global order parameters Qclosed and Qopen for nCaM (a and c) and cCaM (b and d) with temperature Tsim = ? ? 0.89TF and Tsim = 1.08TF. At lower temperatures, the unfolded conformations are destabilized so that the transition mechanism in both domains becomes more two- state. At higher temperatures, the unfolded states are stabilized for both nCaM and cCaM. 66 4.4 Transition Kinetics Using Eq. 4.5 to calculate opening rates for each domain at Tsim, we find that unfolding and refolding along the transition route significantly slows cCaM’s domain opening rate compared to nCaM. Quantitatively, the domain opening and closing rates of −3 −1 nCaM, k → = k → = 2 10 ∆t , are 50 times larger than the effective opening o c c o × −5 −1 and closing rates of cCaM, k → = k → = 4 10 ∆t . o c c o × A closer look at cCaM’s kinetic transitions reveals that only 5% of its transi- ≈ tion paths proceed through direct transitions from the closed to open state without significant unfolding along the way. The rest of the transitions occur according to the kinetic equation slow fast (4.6) closed(c) I open(o), −5 −1 −4 −1 −3 −1 where k → = 4 10 ∆t , k → = 2 10 ∆t , k → = 8 10 ∆t , and c I × I c × I o × −3 −1 k → = 2 10 ∆t are the corresponding simulated rates between the open, closed, o I × and intermediate states. Equilibrium between the open and the unfolded intermediate is established quickly on the timescale of the conformational transition so that the unfolded intermediate establishes a steady-state population kc→I Pc + ko→I Po (4.7) PI = , kI→c + kI→o where Pc and Po are the equilibrium populations of the closed and open state re- spectively. The effective two-state kinetics for open/closed transition can be written as eff kc→I kI→o (4.8) kc→o = kI→c + kI→o 67 and eff ko→I kI→c (4.9) ko→c = . kI→c + kI→o Since k → k → , these expressions for the two-state rates can be simplified. I c I o The effective domain opening rate is determined by the unfolding of the closed state eff (4.10) k k → , c→o ≈ c I and the closing rate can be understood through the equilibration of the intermediate and open state eff PI (4.11) ko→c kI→c , ≈ Po where PI /Po = 0.2 is the population of the unfolded intermediate relative to the open state. The simulated effective two state rates for cCaM are consistent with this steady-state description of the kinetics. The slowing influence of the folding and unfolding transition persists when the open state is destabilized to 10% of the total population, with domain opening ap- proximately 45 times faster in nCaM than in cCaM at simulated temperatures that correspond to T = 310◦K. 4.5 Discussion Although the isolated domains of CaM are topologically similar, the simulated open/closed transition mechanisms are distinct due to the presence of an unfolded intermediate that appears in the free energy landscape at a different temperature for each domain. Two-state transition kinetics persist at higher temperatures in nCaM, whereas the unfolded ensemble is more readily stabilized in cCaM. Above the spin- odal temperature, transient unfolding and refolding of the domain occurs through 68 the locally stable unfolded intermediate (exemplified by cCaM at Tsim). Below the spinodal temperature, the transition is two-state like albeit with conformational dy- namics that anticipates the unfolded intermediate with high flexibility and stressed tertiary interactions (as in nCaM at Tsim). The unfolding and refolding along the open and closed transition is reminiscent of the cracking mechanism[134, 135, 215] in which regions of high local strain are relieved through unfolding and refolding in the transition region. Since the unfolded conformations involved in cracking are typically locally unstable, the domain open- ing of CaM most closely follows this canonical description at temperatures near the spinodal for the unfolded conformations. High temperature unfolded intermediates have been reported previously in simula- tions of the open/closed transition in cCaM[36] and the homologous protein S100A6[150]. Chen and Hummer found that the population of the open ensemble is comparable to that of a marginally stable unfolded ensemble within a narrow temperature range. They argue that the sensitive balance between unstable folding and unfolded pop- ulations explains why some experiments report an open/closed transition[55, 124, 207, 120, 121], and others report folding/unfolding transition for cCaM under similar conditions[164]. Our simulations suggest that subtle differences in the topology and stability of the two domains can result in distinct transition mechanisms. In particular, we find that the unfolded population is stabilized more readily in cCaM, a result consistent with the prediction that cCaM (and not nCaM) exhibits local folding and unfolding during opening[199, 198]. The C-terminal domain’s lower spinodal temperature may reflect its decreased overall relative thermodynamic stability. Indeed, nCaM is measured to 69 be more stable than cCaM in the absence of Ca2+[184], with cCaM being significantly unfolded at room temperature (20 – 25◦C)[127]. The transient unfolding and refolding observed in the simulations significantly slows the transition kinetics of cCaM. Several key observations of CaM dynamics have been reported, but how the dynamics of the individual domains compare is not clear from the literature. NMR studies of intact CaM in the absence of Ca2+report that cCaM is more dynamic than the nCaM, with an exchange time of 350 µs for cCaM[196]. This timescale is comparable to the folding and unfolding equilibration time of 200 µs for cCaM under similar conditions[164]. The dynamics of Ca2+-loaded cCaM with a mutation E140Q that stabilizes the open state and prevents binding to loop IV exhibits exchange on the faster timescale of 25 µs[54] and undergoes local transient unfolding[121]. The dynamics of both domains under similar conditions has been reported by Price and co-workers who used fluorescence correlation spectroscopy coupled to F¨orsterResonance Energy Transfer (FRET) to monitor the intramolecular dynamics of both nCaM and cCaM on the microsecond timescale[161]. They report that both domains have fluctuations on the 30 – 40 µs timescale in the absence of Ca2+. The Ca2+-dependence of the fluctuation amplitude, however, indicates that the observed fluctuations couple to the occupancy of the binding sites (and hence to domain opening) only in nCaM. Taken together, the evidence that the two domains have a different conformational timescale and/or mechanism is intriguing in light of the predictions from the coarse-grained simulations. Nevertheless, understanding how flexibility and transient unfolding influences domain opening dynamics of CaM requires further experimental clarification. 70 Chapter 5 Coarse-grained molecular simulations of allosteric cooperativity Abstract Interactions between a protein and a ligand are often accompanied by a redistribu- tion of the population of thermally accessible conformations. This dynamic response of the protein’s functional energy landscape enables a protein to modulate binding affinities and control binding sensitivity to ligand concentration. In this paper, we investigate the structural origins of binding affinity and allosteric cooperativity of binding two Ca2+-ions to each domain of calmodulin (CaM) through simulations of a simple coarse-grained model. In this model, the protein’s conformational transitions between open and closed conformational ensembles are simulated explicitly and ligand binding and unbinding is treated implicitly within the Grand Canonical Ensemble. Ligand binding is cooperative because the binding sites are coupled through a shift in the dominant conformational ensemble upon binding. The classic Monod-Wyman- Changeux model of allostery with appropriate binding free energy to the open and closed ensembles accurately describes the simulated binding thermodynamics. The simulations predict that the two domains of CaM have distinct binding affinity and cooperativity. In particular, C-terminal domain binds Ca2+with higher affinity and greater cooperativity than the N-terminal domain. From a structural point of view, the affinity of an individual binding loop depends sensitively on the loop’s structural compatibility with the ligand in the bound ensemble, as well as the conformational 71 flexibility of the binding site in the unbound ensemble1. 5.1 Introduction Conformational dynamics is essential for a protein’s ability to exhibit allostery. The coupling between two distant binding sites is frequently accomplished by a confor- mational change between a “closed” (apo) to an “open” (holo) conformation upon ligand binding[68]. Although the end point conformations often give valuable in- sight into protein function, a detailed description of the allosteric mechanism for a particular protein requires one to consider a broader conformational ensemble. The landscape theory of binding[122, 190, 18] acknowledges that a folded protein is in- herently dynamic and explores the thermally accessible conformational states in its native basin[75]. This conformational ensemble comprises the protein’s “functional landscape”[227]. While only a small subset of the states comprising the folding en- ergy landscape[26], the functional landscape determines how a protein responds to the changes in its local environment such as ligand interactions. Due to the het- erogeneous nature of the conformational ensemble, a ligand preferentially stabilizes some conformations more than others, causing the protein’s thermal population to redistribute to a ligated ensemble which in general has distinct equilibrium proper- ties[104, 181]. The ensemble nature of allostery accommodates a rich and diverse set of regulatory strategies and provides a general framework to understand binding thermodynamics and kinetics of specific proteins[81, 137]. Even simple landscapes with a small number of well defined basins separated by kinetic barriers can have subtle binding mechanisms because they depend on ligand interactions to short-lived 1Adapted from P. Nandigrami and J. J. Portman, The Journal of Chemical Physics 144, 105101 (2016); http://doi.org/10.1063/1.4943043 72 transient states. Experimental progress on this challenging kinetics problem has ap- peared only very recently[47]. In principle, affinities of metastable states can also be obtained from thermodynamic binding measurements, although such analysis may not always be practical. In this paper, we focus on the cooperative binding of two Ca2+-ions to the binding loops of the domains of Calmodulin (CaM) through equi- librium coarse-grained simulations. In this minimal model, the conformational transition between the open and closed ensembles are simulated explicitly and the dynamic shift in population due to ligand binding and unbinding is approximated by discrete jumps between a ligated and unli- gated free energy surfaces[96]. The protein dynamics are governed by a native-centric potential that couples the open and closed conformational basins while ligation is represented implicitly through ligand mediated protein contacts. This model, devel- oped by Takada and co-workers, has been used to investigate the kinetic partitioning of induced fit and conformational selection binding pathways[149] as well as mechan- ical unfolding of Calmodulin in the presence of Ca2+[115]. Here, we assume that the ligands bound to the protein are in equilibrium with a dilute solution and calculate binding thermodynamics as a function of ligand concentration. The model is param- eterized so that the closed basin is more stable than the open basin in the unligated ensemble. The two binding sites in the open and closed basins can be in the follow- ing ensemble of states: both sites un-occupied (unligated ensemble), either binding site occupied (partially saturated ensembles) and both sites occupied (fully saturated ensemble), thereby giving rise to a “four-state” description (see Fig. 5.5.1). Ligands interact with all conformations in the ensemble, but the affinity is largest for conformations within the open basin due to their high structural compatibility 73 with the ligand. Thus, the population shifts towards the open ensemble with increas- ing ligand concentration (see Fig. B.1.1 in Appendix B). The simulated ensembles have significant molecular fluctuations which modulate ligand affinities and affect the coupling between the binding sites. When binding thermodynamics are dominated by the open and closed ensembles, this model provides a molecular realization of the celebrated Monod-Wyman-Changeux (MWC) model of allostery[136, 31]. For bind- ing a single ligand the MWC model has four states: unligated-open, unligated-closed, ligated-open, and ligated-closed. Appealing to this simple four-state model allows us to extract binding free energies of the isolated sites in the simulated open and closed ensembles and to calculate the free energy associated with the cooperative coupling between the sites. The simulations connect the conformational ensemble underlying the protein’s dynamics with the MWC phenomenological binding parameters[45]. Early work on binding thermodynamics of CaM has revealed that the affinities and cooperativities of the N-terminal domain (nCaM) and the C-terminal domain (cCaM) are distinct despite their structural similarity[98, 211, 116, 14, 127]. Although some experimental data has been reanalyzed recently[186, 187, 107], the traditional analysis of thermodynamic binding data has not used a dynamic landscape (or MWC) frame- work[116, 184, 144]. Nuclear Magnetic Resonance experiments[13, 55, 54] and all atom molecular dynamics simulations[207] that show a dynamic equilibrium between the open and closed conformations of CaM’s domains in the absence of Ca2+support our approach. 5.2 Methods Calmodulin (CaM) is a small, 148 amino acid long protein consisting of two topolog- ically similar domains. Each domain consists of four α-helices and a pair of EF-hand 74 Ca2+-binding loops. The N-terminal domain (nCaM) has helices labeled A – D with binding loops I and II, and the C-terminal domain (cCaM) has helices labeled E – H with binding loops III and IV. We simulate open/closed allosteric transitions of the isolated domains of CaM using a native-centric model implemented in the Cafemol simulation package developed by Takada and co-workers[96]. This model couples two energy basins, one biased to the open (pdb: 1cll[34]) reference structure and the other biased to the closed (pdb: 1cfd[103]) reference structure. The energy of a conformation, specified by the N position vectors of the C-α atoms of the protein backbone, R = r , r , is given by { 1 ··· N } V (R) = (Vo(R) + Vc(R) + ∆V ) /2(5.1) q (V (R) V (R) ∆V )2 /4 + ∆2 , − o − c − where Vo(R) is the single basin potential defined by the open structure and Vc(R) is the single basin potential defined by the closed structure. The interpolation pa- rameters, ∆ and ∆V , control the barrier height and the relative stability of the two basins. Parameters defining the single energy basins are set to their default values with uniform contact strength. The simulation temperature is set below the folding transition temperature of each of the four conformations. Specifically, the simulation temperature is set to ? ? ◦ Tsim = 0.8TF where TF = 329 K is the folding transition temperature corresponding to the closed (apo) state of nCaM, the lowest transition temperature among the open and closed states of nCaM and cCaM. Equilibrium trajectories of length 108 steps are simulated using Langevin dynamics with a friction coefficient of γ = 0.25 and a timestep of ∆t = 0.2 (in coarse-grained units)[149]. Calcium binding to the two EF-hand loops of each domain of CaM is modeled 75 implicitly by adding a potential defined from the ligand-mediated contacts in the EF-hand loops of the open (holo) conformation " 0 2 # X rij rij (5.2) V = c exp − . bind − lig go − 2σ2 i,j ij Here, the sum is over pairs of residues that are each within 4.5 A˚ of a Ca2+ion and closer than 10.0 A˚ in the open (holo) conformation. The binding energy parameters clig, go, and σ are taken to be the same for each ligand-mediated contact for simplicity. The distribution of distance between implicit ligand-mediated contact pairs in the simulated ensemble of open and closed states for the binding loops in nCaM and cCaM are shown in Figs. B.1.1, B.1.2, B.1.3, B.1.4 in Appendix B. Binding cooperativity is influenced by the relative stability of the unligated open and closed states determined by ∆V and the binding free energy determined by Vbind. In principle, these parameters can be adjusted to match measured binding properties. In the absence of clear measured constraints, we choose parameters so that the relative stability between the open and closed states are the same for each domain. The transition barrier height is determined by ∆ which is set to 14.0 kcal/mol for nCaM and 17.5 kcal/mol for cCaM. Adjusting ∆V = 5.0 kcal/mol for nCaM and ∆V = 4.75 kcal/mol for cCaM while keeping other parameters fixed gives an energy difference between the unligated open and closed states, = 4 kBT for both domains. Experimentally, the folding temperatures of the N-terminal and C-terminal domains in intact CaM are approximately 328◦K and 315◦K, respectively[164]. Connecting to the domain opening kinetics in the intact protein, our simulation temperature corresponds to approximately 310◦K which is 95% of nCaM’s simulated folding tem- perature, and 98% of cCaM’s simulated folding temperature. For the results reported in this work, the binding energy parameters are set to 76 0 0 go = 0.3 (default value in Cafemol), clig = 2.5 and σij = (0.1)rij where rij is the corresponding separation distance in the open (holo) reference conformation. We have performed additional simulations to explore the dependence of binding thermo- dynamics on the ligand-mediated contact strength and interaction range. At higher values of clig and σij, the affinities of ligand binding to individual loops increase. Nev- ertheless, the slope of the titration curve at the midpoint of the transition (a measure of binding cooperativity) remains the same (Fig. B.2.1 in Appendix B). The simulated conformational ensembles are characterized structurally in terms of local and global order parameters based on the contacts formed in each sampled conformation. The set of native contacts in the open and closed conformations are separated into three groups: those that occur exclusively in either the open or the closed native structures, and those that are common to both states. A native contact in a given conformation is considered to be formed provided the distance between the two residues is closer than 1.2 times the corresponding distance in the native con- formation. Local order parameters qopen(i) and qclosed(i) are defined as the fraction of native contacts involving the ith residue that occur exclusively in the open and closed native structures, respectively. Overall native similarity is monitored by corre- sponding global order parameters, Q = q (i) and Q = q (i) , where open h open i closed h closed i the average is taken over the residues of the protein. We identify metastable con- formational basins from minima in the free energy computed through the population histogram parameterized by Qopen and Qclosed. Ligand binding/unbinding events coupled with a conformational change of the protein is modeled within the Grand Canonical Ensemble. Throughout the protein’s conformational transitions, the ligation state of each loop is determined stochastically 77 through a Monte Carlo step attempted every 1000 steps in the Langevin trajectory. If the loop is unligated, a ligand is introduced to the binding loop (V V + V ) → bind with probability (5.3) P → = min[1, exp [ (V µ)/k T ]]. 0 1 − bind − B If the loop is ligated, the ligand dissociates from the binding loop (V + V V ) bind → with probability (5.4) P → = min[1, exp [(V µ)/k T ]]. 1 0 bind − B Here, µ is the chemical potential of a bound ligand. At equilibrium, µ equals the chemical potential of the ligand in solution, c (5.5) µ = kBT ln + µ0 , c0 where c is the ligand concentration, and c0 and µ0 are the reference concentration and reference chemical potential, respectively. To compute binding curves, a series of simulations are performed, each at a different value of the ligand chemical potential. These simulated titration curves are reported as function of the chemical potential, or equivalently, in terms of the relative ligand concentration defined through µ/kBT = ln (c/c¯ ) wherec ¯ = c exp( µ /k T ). 0 0 0 − 0 B This approach with Monte Carlo acceptance rates given in Eq. 5.3 and Eq. 5.4 is oriented towards binding thermodynamics from the outset. Takada and co-workers present a different choice motivated by ligand binding kinetics[149]. Instead of in- troducing a chemical potential, ligand concentration enters their model through a variable binding attempt rate, while the attempt rate of unbinding is fixed. Binding titration curves can also be calculated in this model, but as a function of the binding attempt rate rather than the concentration directly[115]. 78 Table 5.3.1: Number of ligand- mediated contacts, dissociation con- stants, and binding free energies for the loops of CaM. a a Ncon Kd/c¯0 c o loop I 5 0.054 -1.3 -3.1 loop II 5 0.062 -1.3 -2.9 loop III 8 0.018 -1.3 -4.1 loop IV 5 0.13 -0.5 -3.0 a in kcal/mol 5.3 Simulations of Binding a Single Ligand We first consider Ca2+binding exclusively to each individual loop by simulating the conformational change of the entire domain while permitting binding only to a single site. As shown in Fig. 5.3.1 (A) and Fig. 5.3.1 (B), the bound population as a function of ligand concentration, pb(c), follows a typical sigmoidal profile connecting a fully unbound population at low concentration and a fully bound population at high concentration. The overall binding strength of the individual loops is reflected in the dissociation constant, Kd, shown in Table. 6.3.1. Binding affinities of nCaM’s loops are nearly the same, whereas the affinities of cCaM’s loops are significantly different, with K(IV) 7 K(III). Comparing the binding strength of CaM’s loops, our d ≈ d simulations predict that K(III) < K(I) K(II) < K(IV). d d ≈ d d It is reasonable to expect that binding affinities from a uniform native-centric model correlate with the number of ligand mediated contacts, Ncon. While loop III does indeed have the most contacts and the greatest binding affinity, accounting for reduced affinity of loop IV compared to the loops of nCaM, each with the same number of contacts, requires more careful explanation. Such subtlety is not surprising because binding strength is sensitive to a protein’s conformational flexibility that modulates 79 1.0 (A) 1.0 (B) 2.0 (C) Loop-I Loop-III nCaM 0.8 0.8 Loop-II Loop-IV 1.6 cCaM 0.6 0.6 i 1.2 b n h 0.4 0.4 0.8 0.2 0.2 Bound Probability Bound Probability 0.4 0.0 0.0 0.0 -10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0 -10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0 -10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0 ln (c/c¯0) ln (c/c¯0) ln (c/c¯0) Figure 5.3.1: Simulated binding curves for the individual loops of (A) nCaM and (B) cCaM. Lines are fits to the two state MWC model given by Eq. 5.6. (C) Simulated mean number of bound ligands occupancy of binding sites with two ligands for nCaM (blue) and cCaM (red) as a function of ligand concentration. The solid lines plot A0 0B AB nb(µ) = pb (µ) + pb (µ) + 2pb (µ) with probabilities given from the MWC model evaluatedh i with the binding parameters found from fits of binding to each individual loops. ligand interactions in both the open and closed ensembles. The MWC model provides insight into the affinities for the individual binding loops. Using notations in Ref. [126], the bound population in the MWC model can be expressed as