TEMPORAL AND STERIC ANALYSIS OF IONIC PERMEATION AND

BINDING IN NA+,K+-ATPASE VIA MOLECULAR DYNAMIC SIMULATIONS

A dissertation presented to

the faculty of

the Russ College of Engineering and Technology of Ohio University

In partial fulfillment

of the requirements for the degree

Doctor of Philosophy

James E. Fonseca

June 2008 2

2008

James E. Fonseca

All rights reserved 3

This dissertation entitled

TEMPORAL AND STERIC ANALYSIS OF IONIC PERMEATION AND

BINDING IN NA+,K+-ATPASE VIA MOLECULAR DYNAMIC SIMULATIONS

by

JAMES E. FONSECA

has been approved for

the Department of Electrical Engineering and Computer Science

and the Russ College of Engineering and Technology of Ohio University by

Savas Kaya

Associate Professor of Electrical Engineering and Computer Science

Dennis Irwin

Dean, Russ College of Engineering and Technology 4 Abstract

Fonseca, James, Ph.D., June 2008, Electrical Engineering

Temporal and Steric Analysis of Ionic Permeation and Binding in

Na+,K+-ATPase via Molecular Dynamic Simulations (206 pp.)

Director of Dissertation: Savas Kaya

Interdisciplinary research has become a mature approach for the development of novel, integrated solutions for many complex problems in basic science and applied technology. The convergence of biology and nanotechnology is particularly promising from an engineering perspective. This dissertation will use computer simulations to investigate the structure-function of the P-type ATPases, a class of vital biological transmembrane proteins. A detailed understanding of protein function at the atomic level and associated time scale is not only important for biomedical research but also vital for the design and development of engineering applications, such as self- assembling molecular devices. This work’s methodology will show that significant insight into the structure-function relationship of ion-motive ATPases can be derived by a combination of simulation tools and analysis techniques including molecular dynamic trajectories, steric pathway analysis, and electrostatic calculations.

Approved:

Savas Kaya

Associate Professor of Electrical Engineering and Computer Science 5

My dissertation is dedicated to my grandfathers, Ernest Hart and James Fonseca,

and dissertation mentor, Dr. Bob Rakowski. 6 Acknowledgments

I would like to thank my graduate advisor, Dr. Savas Kaya, for his guidance and commitment to my research and education over the past six years. He has nurtured my interest in semiconductor devices, numerical methods, and nanotechnology, cul- minating in my current research. I hope that his energy, attention to detail, and enthusiasm have been instilled in me.

Dr. Bob Rakowksi was an established authority on the Na+,K+-ATPase. His interest in exploring new methods of investigation of this protein led to the interdis- ciplinary collaboration from which this work resulted. I am greatly indebted to him for his advice and mentorship during my Ph.D. research.

I am grateful to my dissertation committee for their helpful questions and advice as well as their time invested.

I would like to thank my parents, my girlfriend, Monica, and my friends for their support in this endeavor.

I am very appreciative for the support I have received from the EECS departmen- tal, GSS, and CMSS. I would like thank Dr. Scott Hooper for the use of his parallel computing system. I would also like to thank Tammy Jordan, EECS graduate secre- tary, for her help over the course of my graduate studies.

The majority of the computational time was provided by the Ohio Supercomputer

Center. Supported by NIH Grant NS-022979 and NSF Grant NSF-0622158. 7 Table of Contents

Page

Abstract ...... 4

Dedication ...... 5

Acknowledgments ...... 6

List of Tables ...... 14

List of Figures ...... 15

1 Introduction ...... 17

1.1 Scope of Work ...... 17

1.2 Motivation: Biomimetic Devices ...... 18

1.2.1 Beyond Silicon ...... 19

1.2.2 Advantages ...... 21

1.2.3 Challenges ...... 22

1.2.4 Molecular Devices: ATPase Proteins ...... 23

1.2.5 Structure-Function ...... 24

1.2.6 Biomedical Relevance ...... 25

1.2.7 Computational Angle ...... 26

1.3 The Na+,K+-ATPase ...... 28

1.3.1 Biophysical Background ...... 28

1.3.2 Physiology ...... 30

1.4 Objectives of this Research ...... 33 8

1.5 Contributions of this Research ...... 35

1.6 Chapter Summaries ...... 37

1.6.1 Glossary and Appendicies ...... 39

2 ...... 40

2.1 Introduction ...... 40

2.2 Molecular Dynamics Algorithm ...... 41

2.3 Topology ...... 42

2.4 ...... 43

2.4.1 Parameterization ...... 46

2.5 Periodic Boundary Conditions ...... 46

2.6 Thermodynamic Ensemble ...... 48

2.7 Energy Minimization ...... 49

2.8 Water Model ...... 50

2.9 Lipid Model ...... 50

3 Modeling the Na+,K+-ATPaseinaLipidBilayerSystem ...... 52

3.1 Overview ...... 52

3.2 Na+,K+-ATPase Structure ...... 53

3.3 Homology Modeling ...... 54

3.4 Choice of Template: SERCA ...... 54

3.5 Sequence Alignment ...... 55

3.5.1 Sequence Offset ...... 60 9

3.5.2 Na+,K+-ATPase N-Domain ...... 62

3.5.3 Model Creation ...... 63

3.5.4 P-loop Optimization ...... 64

3.6 Lipid Bilayer Creation ...... 66

3.6.1 Hole Creation ...... 67

3.7 Solvation and System Equilibration ...... 68

3.8 Na+,K+-ATPase X-Ray Crystal Structure ...... 72

3.9 Summary ...... 73

4 MD Simulation and Analysis of the Ca2+ ATPase (SERCA) ...... 74

4.1 Introduction ...... 74

4.2 Simulations Performed ...... 75

4.3 Methods and Analysis ...... 76

4.4 Results ...... 79

4.4.1 Cytoplasmic Pathway as Determined by CAVER ...... 79

4.4.2 Pathways to the Lumenal Space Predicted by CAVER .... 81

4.4.3 Occlusion Site Connection as Determined by CAVER ..... 85

4.4.4 Summary of Cytoplasmic and Lumenal Pathways as Deter-

mined by CAVER ...... 85

4.4.5 Ca2+ Ion Movement from Site II ...... 86

4.4.6 Ca2+ Ion Occlusion at Site I ...... 89 10

4.4.7 MD Simulations of Ca2+ Ions Placed Along the Putative Lu-

menal Pathway ...... 91

4.4.8 Valence Analysis of Ca2+ Occlusion Sites ...... 93

4.4.9 Electrostatic Calculations of the Transmembrane Environment 94

4.5 Summary ...... 95

5 MD Simulation and Analysis of the Na+,K+-ATPase ...... 97

5.1 Introduction ...... 97

5.2 Na+ Binding Sites as Determined by E1 Simulation ...... 98

5.2.1 Na+ Ion Binding without Water ...... 101

5.2.2 Na+ Site I Binding Site ...... 101

5.2.3 Na+ Site II Binding Site ...... 105

5.2.4 Na+ Site III Binding Site ...... 108

5.3 Na+ Ion Binding with Water in E1 Conformation ...... 110

5.3.1 Na+ SitesIandIIwithWater ...... 110

5.3.2 Na+ Site III Binding with Water ...... 112

5.3.3 Water Involvement in Na+ Binding Sites ...... 112

5.4 K+ Binding Sites Determined by Simulation of E2 Conformation ... 114

5.4.1 K+ Site II Binding ...... 118

5.4.2 K+ Site I Binding ...... 119

5.5 Steric Investigation of Ion Permeation Pathways ...... 121

5.5.1 E1 Conformation ...... 122 11

5.5.2 E2P Conformation ...... 123

5.6 Electrostatic Analysis of E1 and E2P Models ...... 125

5.6.1 Electrostatic Pathway Analysis ...... 126

5.6.2 Electrostatic Binding Site Analysis ...... 129

5.7 Summary ...... 130

6 Conclusions and Future Work ...... 133

6.1 Summary and Conclusions ...... 133

6.2 Future Work ...... 138

6.2.1 Na+,K+-ATPase X-ray Structure ...... 139

6.2.2 SERCA X-ray Structure with Open Lumenal Pathway .... 140

6.2.3 Computational Outlook ...... 140

Bibliography ...... 142

A Publications ...... 162

A.1 Journal Articles ...... 162

A.2 Conference Presentations ...... 163

B Glossary ...... 164

C Methodology ...... 170

C.1 Overview ...... 170

C.2 Homology Modeling ...... 171

C.2.1 Automated Sequence Alignment ...... 172

C.2.2 Manual Alignment ...... 173 12

C.2.3 N-Domain Superposition ...... 173

C.2.4 Homology Model Building ...... 174

C.2.5 Homology Model Optimization ...... 175

C.3 System Preparation ...... 177

C.3.1 Membrane Creation ...... 177

C.3.2 Topology and Forcefield Files ...... 178

C.3.3 Membrane Energy Minimization ...... 181

C.3.4 Protein Orientation ...... 183

C.3.5 Protein Centering ...... 184

C.3.6 Protein Protonation ...... 186

C.3.7 Protein Position Restraints ...... 186

C.3.8 Protein Topology ...... 187

C.3.9 System Concatenation ...... 187

C.3.10 System Topology Creation ...... 188

C.4 Membrane Hole Creation ...... 189

C.4.1 System Box Adjustment ...... 192

C.5 System Solvation with Water ...... 193

C.5.1 Charge-Neutral System Created by Ion Addition ...... 195

C.5.2 System Topology Update ...... 195

C.5.3 New Index File Creation ...... 196

C.6 Full System Energy Minimization ...... 197 13

C.7 Position Restrained MD ...... 198

C.8 Fully Unrestrained MD ...... 199

C.8.1 MD Run Extension ...... 200

C.8.2 Permeant Ion Simulations ...... 200

C.9 CAVER Pathway Analysis ...... 201

C.10 Simulations and Electrostatic Analysis ...... 204

C.10.1 Applied Electric Field Simulation ...... 205

C.10.2 Particle Mesh Ewald (PME) Calculation ...... 205

C.10.3 DX File Averaging via OpenDX ...... 205 14 List of Tables

3.1 Sequence Numbering Offset ...... 61

4.1 Overview of SERCA Simulations ...... 76

5.1 Overview of Na+,K+-ATPase Simulations ...... 99

5.2 Residues Coordinating the Na+ Ion at Site I...... 102

5.3 Residues Coordinating the Na+ Ion at Site II...... 107

5.4 Residues Coordinating the Na+ Ion at Site III...... 108

5.5 Residues Involved in Coordinating the K+ Ion at Site I...... 115

5.6 Residues Involved in Coordinating the K+ Ion at Site II...... 116 15 List of Figures

1.1 Research Focus ...... 19

1.2 Devices Beyond Silicon ...... 21

1.3 Simulation System Size and Complexity ...... 27

1.4 Post-Albers Diagram of the Forward Pumping Cycle of Na+,K+-ATPase 29

1.5 Main Structural Conformations of the P-type ATPases ...... 32

1.6 Methodology Flow Chart ...... 36

2.1 Molecular Dynamics Algorithm ...... 42

2.2 Illustration of Periodic Boundary Conditions (PBC) ...... 47

3.1 Sequence Alignment of Na+,K+-ATPase and SERCA ...... 58

3.2 Homology Modeling Process ...... 65

3.3 Lipid Membrane Bilayer Equilibration ...... 69

3.4 Area per Lipid Equilibration ...... 70

3.5 RMSD of Protein During Simulations ...... 71

3.6 Comparison of Na+,K+-ATPase Homology Model to Structure .... 72

4.1 RMSD Showing SERCA Stability During Simulations ...... 77

4.2 Ca2+ Trajectory and CAVER Analysis of Intracellular Ion Pathway in

SERCA ...... 79

4.3 CAVER Analysis of Lumenal Pathway in E2 Form of SERCA .... 82 16

4.4 CAVER Analysis of Lumenal Pathway in E2P Form of SERCA ... 83

4.5 Ca2+ Movement from Binding Site II ...... 89

4.6 Ca2+ Occlusion at Site I ...... 91

5.1 Na+ Binding Sites I and II ...... 100

5.2 Na+ Site I Coordination ...... 103

5.3 Na+ Site II Coordination ...... 106

5.4 Solvated E2 intracellular pathway ...... 117

5.5 E2 Conformation K+ Coordination Residues ...... 119

5.6 E2P Extracellular Pathways ...... 123

5.7 Electrostatic Landscape of E1 Intracellular Pathway ...... 127

5.8 Electrostatic Cross-Sections of E1 Conformation ...... 128

5.9 Electrostatic Cross-Sections of E2P Conformation ...... 129 17 Chapter 1

Introduction

1.1 Scope of Work

This dissertation will use computer simulations to investigate the structure-function of the P-type ATPases, a class of biological transmembrane proteins. These proteins accomplish a vital physiological role and are ubiquitous in higher level organisms, including humans [1, 2]. Besides having an obvious biomedical importance (pharma- ceuticals), the study of these protein may have important nanotechnological impli- cations such as electrochemical energy conversion, sensing, etc. [3, 4]. A wealth of experimental literature has explained much about the biochemical functions of these proteins [5, 6, 7]. However, many atomic-level details pertaining to the structure- function relationship remain unresolved [8], without which this class of proteins may not be fully applied in bio-nano molecular sensors, devices, and biomimetic materials.

Only in recent years have computer tools, such as molecular dynamic simulations used here, begun to be used to investigate biological proteins [9, 10, 11, 12]. Aside from eliminating the need for very demanding biophysical studies, such computational approaches bridge structural and functional studies in an efficient manner that is not possible with experimentation. Accordingly, I have developed and deployed in this work a novel and complete simulation methodology to study P-type ATPases 18 in general and the Na+,K+-ATPase and SERCA in particular. Figure 1.1 shows the focus of this research. To accomplish this challenge, new structural models have been created and incorporated into a biologically accurate environment in order to perform molecular dynamic simulations. Simulation analysis is then compared with experimental results from the literature and future research directions are discussed.

The work presented here covers a variety of fields. It is an electrical engineering approach to a biological problem which itself has key electrical properties. This in- terdisciplinary research aspect is central to the development of nanotechnology [13].

Researchers from many fields have turned towards the investigation of biological prob- lems due to the challenges this field presents and the advances that better understand- ing of biological processes may bring about not only for biology but nanotechnology as well.

1.2 Motivation: Biomimetic Devices

Biomimetic devices are those which are based on biological structures or derived from synthetic processes [14]. Interdisciplinary research has become a prerequisite

for the development of novel, integrated solutions for a growing number of complex

problems in pure science and applied technology. In particular, the convergence of

biology and nanotechnology is particularly promising, especially from an engineering

perspective. The realm of nanotechnologies inspired by biological designs or contain-

ing biological devices is vast [3, 4, 15, 16]. Although the notion of design and creation 19

Na+,K+

ATPases

Membrane Proteins

Molecular Dynamic Simulation Increasing Focus Increasing Modeling of Biomimetic Devices

Figure 1.1: Narrowing focus of this research is illustrated starting with the wide field of biomimetic device modeling. Molecular dynamic simulations are an tool which may be used to investigate proteins that reside in cells’ lipid bilayer membranes. The methodology in this work can be applied to a wide variety of ATPase proteins and the Na+,K+-ATPase in particular. of structures and devices at atomic and molecular level precision was spawned years ago [17] it has been only in recent years that this field has become a reality and begun to produce feasible technologies [4].

1.2.1 Beyond Silicon

The proliferation of semiconductor-based devices has driven, and been driven by, the exponential increase in the processing power of electronic circuits. These advances are largely due to decreases in device size. Current lithography techniques, the back- bone of present-day semiconductor device fabrication, have continued to push the limits of scale and can now produce devices with feature size of tens of nanometers. 20

However, for some time the ability of technologies to continue to produce such gains has come into question [18]. Many factors involved in the production of electronic devices are exhibiting not only a greater number, but more difficult, barriers to over- come. Technological hurdles, such as the quest to replace SiO2 with new high-k gate dielectric, are exacerbated by stringent fabrication methods. The cost of construct- ing a state-of-the-art semiconductor device fabrication facility is currently above five billion dollars. Furthermore, the 2007 International Technology Roadmap for Semi- conductors states that manufacturable solutions for technologies required in 2008 are not yet known [19]. As many times as these limits have been overcome, this pro- cess cannot continue ad infinitum. Inherent atomic hetereogeneities such as dopant

fluctuation and surface roughness exist due to process control. Stochastic processes

involved in top-down manufacturing approaches will continue to degrade device per-

formance and reliability as devices are scaled smaller [20]. Power densities of current

microchips are approaching that of rocket nozzles; power consumption and resultant

heat dissipation present two of the most significant challenge to current development

of high performance processors [21]. Because of silicon devices’ stature, it is possible

that novel materials and devices will be first incorporated into silicon technology to

extend its capabilities but may ultimately replace it in some applications. 21

1/size

Cellular Quantum Automata Dots Self-directed assembly metallic particle

SiO2 SET Nonplanar Switches Gate

D Source Drain Molecular, DNA/Protein devices

G G

S S Nanotubes Si RTD (NDR) B

top-gate channel

Bulk CMOS channel back-gate is olation High-k gate buried oxide

halo oxide channel Depletion layer Double-Gate CMOS

Well doping isolation Metal gate

2010 2020 2030 year

Figure 1.2: Overview of the diversity of devices being investigated to replace conven- tional CMOS technology. The aforementioned difficulties in Silicon CMOS transistor technology has led researchers to consider post-CMOS options which include a vari- ety of device options. Such options include ‘evolutionary’ solid-state devices such as DG-MOSFETs, Resonant Tunneling MOSFETs (RTD), and Single-Electron Transis- tors (SET) in the next 10 years, followed by more ‘revolutionary’ technologies based on soft/organic media such as molecules, proteins and self-assembled nanostructures. Therefore our work relates to this ‘revolutionary’ stage where dramatically new pos- sibilities will be considered. Of course, the exact timing and options for this stage are still unclear. Image attribution: [22].

1.2.2 Advantages

Biomolecular devices exhibit many advantages that make their use appealing in engineered devices. Their size is generally in the realm up to 1-20 nm, which is as small as current critical dimensions in semiconductor processing [19]. They are 22 self-assembled, which allows for identical reproduction while requiring only a par- ticular starting reaction. This self-assembly occurs continually in cells as ribosomes read mRNA and create proteins from constituent amino acids. A bottom-up ap- proach allows exceedingly small biomolecular devices to be placed precisely compared with difficulties involved in top-down fabrication of electronic device features [23].

Biomolecular devices show manufacturing promise by consisting of benign materials and relatively cost-effective production [24].

1.2.3 Challenges

Many challenges exist before biological components could be incorporated into bio-nano devices [24]. For instance, large protein structural changes are orders of

magnitude slower than electron transfer in solid state devices. How can biological

devices be utilized or modified so that this type of speed disadvantage can be over-

come? An acceptable infrastructure would need to organize and store devices, as

well as provide a method of communication to the macroscopic realm. Will these

structures be biological or artificial in nature? My work deals with yet another ques-

tion: How do the structural and functional characteristics of a protein relate to each

other? Detailed knowledge of the structure-function relationship of ion motive AT-

Pases may allow for their inclusion in the construction of nanoscale systems such as

those mentioned below. 23

1.2.4 Molecular Devices: ATPase Proteins

Aside from the biomedical research importance of P-type ATPases in biomedical research, they constitute tested blue-prints for molecular engines. Development of these devices may have a significant impact on applications through use as molec- ular scale energy converters, electro-chemical gradient controllers, multi-state logic devices, and specialized sensors [4, 25, 26]. These capabilities have led researchers to begin to explore a vast array of alternative methods for inspiration and construction of a new generation of devices [24]. One research direction has been to incorporate biomolecular elements into conventional device designs, thereby producing novel ap- plications through the combination of components seemingly standard to their respec- tive environments. Additionally, the most novel designs employ biomolecular entities for uses other than which they were designed. This approach is facilitated by the fact that many types of complicated macro devices have molecular level counterparts, such as motors, sensors, and scaffolding [24]. Although many ‘blue-sky’ biomolecular devices are years away, some commercial products incorporating proteins are already available. A light-activated protein pump, bacteriorhodopsin, has been used to create a viable memory device [27]. The P-type ATPases are very appealing devices from this type of engineering standpoint. The ability to determine (read) and manipulate

(write) the protein’s state, together with a signaling framework (ionic solutions) and structural basis (lipid membrane) are the essential constituents of a memory device.

P-type ATPases exist in vivo in two main conformational states and could be used 24 as memory devices. Other researchers are investigating biological molecules such as chlorophyll [26], which may be able to accommodate four or more distinct struc- tural states, giving a distinct information density advantage over conventional binary memory elements.

The Na+,K+-ATPase is, at its core, a voltage dependent charge translocation de-

vice. Rather than being encased immobile in a semiconductor , this device

operates in aqueous environment using a lipid bilayer membrane for structural sup-

port. The Na+,K+-ATPase undergoes large conformational changes that affect the

electrostatic landscape within and around the protein. Charge movement in tradi-

tional transistors is a product of electrostatic landscape and controlled by applied

voltage biases. In the Na+,K+-ATPase, concerted protein conformational changes fa- cilitate charge movement in the form of cations. The Na+,K+-ATPase has therefore intriguing functionality that is based on a time-dependent architecture. This corre- lation between the structural and functional characteristics of the protein comprises the basis of determining how the Na+,K+-ATPase operates.

1.2.5 Structure-Function

A goal of Na+,K+-ATPase research is the detailed description of structural changes that occur during the transport cycle of ATPases and the relationship of those changes with the function of ion transport. Despite a general appreciation of the Na+,K+-

ATPase’s electro-physiological response in long time scales (>1ms), relatively less is 25 known about its atomic structure-function relationship and how ions are selectively transported in short time scales [28]. A detailed understanding of protein function at the atomic level, therefore, is not only important for pure science and biomedical research but also imperative for engineering applications to benefit from the Na+,K+-

ATPase and related proteins. Although atomic resolution crystal structures provide essential structural information, the detailed physical-chemical mechanism of the con- formational changes that result in ion transport remain to be elucidated. Specifically, researchers want to determine in what ways a particular amino acid affects each step of the protein’s operation.

1.2.6 Biomedical Relevance

Transmembrane proteins play a central role in cell physiology. Jens Christian

Skou received the 1997 Nobel Prize in Chemistry “for the first discovery of an ion- transporting enzyme, Na+,K+-ATPase” [29]. The Na+,K+-ATPase enables specific ionic exchanges to occur and modulate the transmembrane potential. Biochemical studies and electrophysiological techniques have provided nearly all that is known about the structure-function relationship of the Na+,K+-ATPase [5]. The study of the Na+,K+-ATPase and related proteins has far-reaching medical implications.

Maintenance of the electrochemical gradients of Na+ and K+ is essential for electri- cal signaling. For example, minor dysfunction of the Na+,K+-ATPase has long been known to result in cardiac arrhythmia and more recently has been implicated in some 26 migraine headaches [30]. A highly homologous P-type ATPase, the H+,K+-ATPase, is currently under study because of its role in allowing parietal cells to move protons to create hydrochloric acid in the stomach [31]. The H+,K+-ATPase research has a direct pharmaceutical application to produce drugs that allow for better control of the inhibition of this protein leading to an antacid effect that can help treat peptic ulcers [32]. The study of the Na+,K+-ATPase is crucial for a better understanding of the protein’s operation and its role in cellular physiology.

1.2.7 Computational Angle

This research will use high performance computational methods to investigate the

structure-function relationship of P-type ATPases. The core of this work, molecu-

lar dynamic simulations in silico can provide details on atomic interactions that are

difficult or impossible to study via experiment. Similar simulations have provided

significant successes regarding other membrane proteins [11]. Simulation of trans-

membrane protein systems has only been feasible in the last few years. Simulation

methodologies and have been under continual refinement to streamline this

process and provide advanced analysis tools. Additionally, high performance compu-

tational (HPC) resources, namely processing power, have become widely available in

recent years, most notably due to widespread adoption of parallel computing systems

which are relatively inexpensive compared with previous vector-based computing sys-

tems. The complexity of one of the simulation systems is shown in Figure 1.3.This 27

Figure 1.3: This cutaway image displays the level of complexity of a membrane pro- tein and lipid bilayer system comprised of 200,000 atoms. The protein’s backbone structure is shown in colored cartoon representation. Colors correspond to those de- scribedinFigure1.5. The protein is surrounded by the lipid bilayer membrane (grey spheres) and water (blue surface). The thickness of the membrane is about 4 nm.

advance is directly responsible for increasingly larger systems and longer timescale

simulations [33]. Despite this progress, significant challenges remain. Modeling meth-

ods are necessary to study proteins of which a complete structural description is not

available. Even with the processing power now available, technological boundaries in

the coming years will continue to require novel approaches to simulation and analysis

[34]. For instance, software tools are significantly behind those available for semicon- 28 ductor device studies. This work seeks to clarify the methodology behind creation of in silico biological systems suitable for simulation and analysis.

1.3 The Na+,K+-ATPase

1.3.1 Biophysical Background

Transmembrane proteins account for 20-30% of a typical organism’s genome, which is a manifestation of the proteins’ importance and variety. These essential proteins allow for many types of cellular regulationandcanbecontrolledinavari- ety of different manners, such as by voltage, pH, or physical stress. There are two major classes of proteins that control the movement of ions across biological mem- branes: channels/porins and transporters. Channels and porins selectively (in most cases) allow the passive movement of ions and other molecules down the electro- chemical gradient across the transmembrane. Ion channels, such as the KcsA K+

channel, typically have permeation rates on the order of 106 per second. In 1998, the crystallization of KcsA (the first crystallized transmembrane protein) provided an atomic-level structure yielding detailed information regarding ion permeation and gating [35]. The channel is gated by small protein changes that either block the permeation pathway or open it to allow for potassium-selective movement straight through the cell’s membrane. This allows for the very high permeation rates. On the other hand, transporters actively power uphill movements. The Na+,K+-ATPase 29 must work to move Na+ and K+ ions against their electrochemical gradients. This process involves large conformational changes that are fueled by the hydrolysis of adenosine triphosphate (ATP), the standard energy currency of the cell. ATP hy- drolysis and binding of cations, facilitated by large, slow, conformational changes, allow the Na+,K+-ATPase to function in vivo at a rate around 100 Hz. A simplified diagram of the Na+,K+-ATPase cycle is shown in Figure 1.4.

Intracellular Extracellular

(Na3) - E1 - P Na3 - E1- ATP Occluded State P - E2 - Na3 for Sodium Na+ Na+

Na+ Na+ Na+

Na+ E1- ATP P - E2

K+ K+ K+ K+ Occluded States K2 - E1- ATP P - E2 - K2 for Potassium ATP - E2- (K2) E2 - (K2)

Figure 1.4: Post-Albers diagram of the forward pumping cycle of Na+,K+-ATPase. One main class of conformational states (E1) allows ion access from the cytoplasm while the other main group of states (E2), allows ion access to the lumenal side. Starting from the upper left of the figure, uptake of three (denoted by the subscript) Na+ ions closes the cytoplasmic pathway, leading to an occluded state. Next, the extracellular pathway opens and the Na+ is released. K+ is taken up, which dephos- phorylates the protein and subsequently occludes the K+. A new ATP molecule binds to the protein and K+ is released to the cytoplasm. This process achieves “ping-pong” translocation of ions across the lipid membrane. 30

The Na+,K+-ATPase is classified as a type IIc P-type ATPases. The “P-type”

moniker indicates that the proteins of this class are phosphorylated during operation.

Other types of P-type ATPases include SERCA in the IIb class and heavy metal trans-

porting ATPases in the IB class. The IIc class is widely distributed and is present

in the surface membrane of nearly all animal cells. The Na+,K+-ATPase is essential for life as it maintains a transmembrane voltage and helps to regulate osmotic bal- ance. The Na+ gradient it creates is used to power Na+-assisted cotransport of other molecules across the membrane. Even though electrophysiological and biochemical studies have contributed a great deal of information on the sodium pump’s function in the last century, it has been only recently that molecular-level understanding has begun [2]. Still, many structure-function aspects, crucial to the full understanding of this pump, remain elusive [28]. These include protein residues involved that create ion binding sites, the extent of protein conformational changes, residues involved in the creation of ion permeation pathways, and the location and nature of gating processes.

This research’s focus will be on the ion binding sites and permeation pathways that allow for ion translocation.

1.3.2 Physiology

Adenosine triphosphate, or ATP, serves as the “molecular currency” of intracel- lular energy processes. P-type ATPases hydrolyze intracellular ATP, become phos- phorylated (hence the name P-type) and change their structure at various stages of 31 the transport cycle in order to accomplish their function. Biophysical experiments have determined that the Na+,K+-ATPase uses the Gibbs free energy derived from the process of breaking the last phosphate bond from ATP (hydrolysis) to exchange

3 intracellular Na+ ions with two extracellular K+ ions, a reversible process [36]. The

cytoplasm consists of a Na+ concentration an order of magnitude lower than the ex- tracellular fluid, while the reverse is true for K+. This cleavage provides the energy for both the movement of these ions against a strong electrochemical gradient and for the protein conformational changes that allow this ionic exchange to occur. The

Na+,K+-ATPase facilitates maintenance of Na+ and K+ concentration gradients by compensating for ion leakage across the membrane. The concentration gradient pro- vides for secondary processes, such as action potentials and transportation of other ions and metabolites related to other cellular processes.

In electrophysiological experiments, ion concentrations and the ATP/ADP ratio can be controlled to study the rate of the reaction that takes place on a millisec- ond time-scale, almost 3 orders of magnitude slower than permeation through ion channels. This difference in time scale is a consequence of the fact that a cycle of conformational changes is required to translocate just a few ions uphill via a pump whereas millions of ions per second can passively permeate through an open ion chan- nel. The most-likely mechanism of active transport is an “alternating access” model in which the transported ions are first bound at one face of the membrane, then

“occluded” in a state in which the ions are not directly accessible to either side, and 32

finally released to the opposite side of the membrane. This process is a consequence of changes in protein conformation and changes in the ion binding affinity of the transport sites [37]. These two major conformations, E1 and E2, as shown in Figure

1.5, allow ion access to the intracellular and extracellular sides of the protein, respec-

tively. A detailed understanding of the mechanism of ion translocation will require

knowledge of the structure of the various intermediate states of the enzyme and the

kinetics of the transitions between these states. ? ? Extracellular

? Membrane ?

Intracellular ? ?

E1 E2

Figure 1.5: Depiction of the large movement of the three intracellular domains and reorientation of the TM helices (light blue) between conformation E1, which allows ion access to the intracellular side of the membrane and E2, which allows ion access to the extracellular side. Question marks indicate unknown nature of ion binding sites (ovals) and ion permeation pathways (arrows). The transmembrane helices are shown in light blue, N domain in red, P domain in dark blue, and A domain in green. The thickness of the membrane is about 4 nm. 33 1.4 Objectives of this Research

X-ray crystallography has provided structures which give a depiction of a protein frozen at particular point in its physiological cycle. On the other hand, physiological studies yield functional characteristics of the protein in question. While these and other experimental methods have produced nearly all of which that is known about the Na+,K+-ATPase and related proteins, one common disadvantage with these ap-

proaches is that they are generally difficult and time consuming. A particular type

of experiment sequentially mutates protein amino acids to determine the functional

effect on the protein. This simulation work will aid this type of study by discussing

specific amino acids which may play a role in ion binding or permeation. Data from

structural and functional studies provide a basis for computational work. Methods

such as molecular dynamic simulations show promise to be able to investigate similar

characteristics of these proteins with a smaller investment of time, effort, and re-

sources. Experimental data can also provide a foundation with which to substantiate

claims made through the use of computational methods.

Objectives of this research fall into two main categories. First, by utilizing molec- ular dynamic simulations of atomic-level Na+,K+-ATPase structures in an appropri- ate membrane/water environment, I will show that this computational approach can corroborate experimental results. Specifically, ion interactions with specific protein amino acids can be determined and compared with results from structural studies, homology models, and mutagenesis experiments. This type of analysis will provide 34 further detail regarding the nature of the ion binding sites and ion permeation path- ways (see Figure 1.5). It is generally agreed that in the Na+,K+-ATPase the first two

Na+ ion binding sites and both of the K+ ion binding sites are analogous to the two

Ca2+ ion binding sites in SERCA. However, the binding site of the third Na+ ion has been a longstanding topic of discussion [38] and a key element that this research will address. The ion permeation pathways, especially the intracellular pathway, are not well-defined. The nature of ion binding sites and ion permeation pathways constitute the practical problems to address for this study and can be investigated with the proposed computational methodology.

Specific research objectives are listed below:

• Create Na+,K+-ATPase homology models based on SERCA with incorporation

of available functional and structural data

• Simulate stable, biologically accurate systems of Na+,K+-ATPase in a mem-

brane and water environment

• Determine, as shown by simulation, amino acids related to Na+ and K+ ion

binding by observation of molecular dynamic trajectories

• Determine, as shown by simulation, putative ion permeation pathways using

combination of electrostatic and steric analysis on molecular dynamic trajecto-

ries

• Provide details of computational methodology for future and related work 35

The second main objective relates to furthering the state of the available com- putational methodologies. Though many advantages exist, computational tools and methodologies as they relate to biophysical engineering studies are in a period of dramatic growth and are still undergoing significant advancements. Therefore, the second goal of my research will be to advance the proposed method of membrane protein analysis to allow for a stronger relationship between the tandem processes of experimentation and simulation. A detailed description of the modeling approaches and software tools used in this work will be provided. This description will include a discussion and commands used for the initial stages of homology model construction, followed by system creation, simulation, and analysis. An emphasis is placed on how this research on the Na+,K+-ATPase may easily be extended to the study of related proteins.

In summary, my research will show how computational work may supplement

experimental research. This will be accomplished via investigation of biophysical

properties of the Na+,K+-ATPase and SERCA proteins.

1.5 Contributions of this Research

There are several specific contributions that my research will make. A sequence alignment will be created based on experimental evidence from the Na+,K+-ATPase and SERCA. From this, Na+,K+-ATPase homology models will be created in multiple

physiological conformations that have not been studied elsewhere. This will be the 36

first work to perform long time-scale molecular dynamic simulations on SERCA and

Na+,K+-ATPase homology models which have been incorporated into an accurate lipid bilayer membrane. This is also the first extensive use of electrostatic calculations and steric pathway analysis to investigate these P-type ATPases. Finally, I will present a detailed computational methodology which can be used for examination of a wide variety of lipid membrane proteins, especially those with ionic transport properties. An overview of this methodology is given in Figure 1.6. Finally, Appendix

A contains a list of publications and professional presentations that are derived from this work.

Figure 1.6: Overview of the methodology shows that the homology modeling process is followed by membrane and protein system preparation. After system equilibration, molecular dynamic trajectories are analyzed with valence, electrostatic and molecular surface tools to investigate protein and ion characteristics. 37 1.6 Chapter Summaries

Chapter 1 presented background information on the need to look beyond tradi- tional electrical engineering approaches. This field has produced a consistent string of evolutionary advances, but there is a growing evidence that new, revolutionary designs will be required in the future to allow continued exponential growth. The

Na+,K+-ATPase has been studied for its importance to cellular physiology, but it also shows promise in its ability to function as a component in engineered designs.

However, the atomistic operations of the Na+,K+-ATPase and related proteins are not understood nearly as well as current semiconductor designs. This is a byproduct of the macromolecule’s complexity in space and time. Clarification of the protein’s structure-function relationship will be useful to both fields. Finally, the physiology of the Na+,K+-ATPase as it relates to this work was introduced.

Theory and application of molecular dynamic simulations are discussed in Chapter

2. An overview of history, abilities, and limitations of molecular dynamics is followed

by discussion of the basic algorithms in the software. Aspects of the simulations

are described in detail, such as the force field that models atomic interactions and

molecular topologies which describes, for instance, bonding structures.

Chapter 3 discusses the creation of homology models of the Na+,K+-ATPase in several key physiological conformations. The premise behind homology modeling is introduced and the genetic sequence alignment with a related protein template is described. The alignment is significantly improved over previous models due to inclu- 38 sion of various experimental results and Na+,K+-ATPase and related H+,K+-ATPase sequences from multiple species. The structural basis for the model is strengthened using availble Na+,K+-ATPase structural data. The chapter discusses the formation of a lipid bilayer membrane and insertion of the protein into the membrane. Initial steps of bringing this system into equilibration are presented. System setup and equi- libration information is given for Na+,K+-ATPase simulations. Details regarding the

SERCA simulations are given in Chapter 4. Finally, the recently released Na+,K+-

ATPase X-ray crystallographic structure is compared with the homology model in the same conformation.

Chapter 4 is the first of two result-oriented chapters. As a stepping stone during this research, molecular dynamic simulations and subsequent analysis are performed on SERCA, which is a Ca2+-ATPase that is very similar to the Na+,K+-ATPase in sequence, structure, and function. This approach allows significant improvements to be made in the simulation and analysis methodologies without involvement of the homology modeling process. The SERCA chapter describes published research of the author with others that investigates ion binding locations and ion permeation path- ways using a variety of methods. The chapter shows that the methodology developed is able to corroborate existing experimental data as well as to suggest protein amino acids that may play a role ion permeation.

The Na+,K+-ATPase’s operation is the focus of my research and Chapter 5 builds

on the simulation and analysis methodology explored in the previous chapter. The 39

Na+,K+-ATPase simulations refined the techniques explored during execution and analysis of SERCA simulations. Simulations study the Na+,K+-ATPase homology models in three conformational states. Protein amino acids involved in binding sites are investigated as well as pathways that may be involved in ion translocation.

Chapter 6 summarizes the work accomplished and its contributions to the field as well as future research directions are discussed.

1.6.1 Glossary and Appendicies

The interdisciplinary nature of this work necessitates a glossary (Appendix B)of technical methods, field-specific terms, and abbreviations. Appendix A contains a list of publications and professional presentations that are derived from this work.

Last, an extensive appendix (Appendix C) has been compiled to serve as a virtual tutorial. The goal of including these technical details is to aid other researchers embarking on related research. Those with a biology background may be unfamiliar with computational approaches and those with an engineering background may have less experience in dealing with biological aspects of the problem. The appendix helps make sense of the variety of software packages, some of which have a steep learning curve. It is hoped that a complete, detailed account of the steps taken from homology modeling through analysis methods will streamline related work by future researchers. 40 Chapter 2

Molecular Dynamics

2.1 Introduction

This chapter discusses the theory and application and molecular dynamic (MD) simulations. An overview of MD simulation is given, followed by simulation details, such as the force field which models atomic interactions and topologies which describe the nature of molecules’s atoms and bonds. MD simulations use computer software to describe the interaction of atoms under known laws of physics in a finite system over time. Originally developed for the study of liquids in the 1960s [39, 40], current versions have been extended for materials science and the study of biomolecules.

Newton’s equations of motion are numerically solved with forces given by negative derivatives of a potential function. The core loop of the simulations calculates the force on each atom and updates the atom’s position and velocity for each subsequent step.

The following discussion is meant to be an introduction to general molecular dy- namic simulations. This is a broad topic, so only methods and algorithms specific to the software package used in this work, GROMACS, are discussed, although the premise for other software packages is generally similar [41, 42, 43]. Independent variables are discussed whenever possible. However, if a particular value for a param- 41 eter is used throughout the work, it is used in the text. Most of these values have widespread acceptance in the modeling literature. However, values different than those commonly used are discussed. Details of the methodology including specific simulation steps and parameters are discussed in Appendix C.

2.2 Molecular Dynamics Algorithm

An overview of the basic MD algorithm is presented in Figure 2.1 [44]. Given a starting configuration consisting of atom positions and velocities, the simulation software alternates between calculating forces on each atom at each time step, and updating the atoms positions due to those forces. The algorithm repeats for the required number of steps. First the forces on each atom are calculated as sum of the non-bonded atom pairs, bonded interactions, and restraining and/or external forces. Temperature and pressure coupling algorithms are performed, then velocities are scaled according to the temperature. Coordinates are updated via a constraint al- gorithm to maintain stable bond lengths and bonding angles. After atom coordinates are updated based on the potential function, a linear constraint solver, LINCS [45], is performed to reset bond lengths and angles to their nominal values and allows rel- atively long time steps of 2 fs. Velocities are corrected based on the new coordinates generated by the constraint algorithm. Atomic positions are updated by numerically solving Newton’s equations of motion using a leap-frog algorithm. Finally, system values such as atomic positions, velocities, energies, and temperature are written to 42

MD Update Algorithm

Given: Positions r of all atoms at time t Velocities v of all atoms at time t - Δt/2 Accelerations F / m on all atoms at time t. (Forces are computed disregarding any constraints) Total kinetic energy and virial 1. Compute scaling factors λ and μ 2. Update and scale velocities: v’ = λ(v + aΔt) 3. Compute new unconstrained coordinates: r’ - r + v’Δt 4. Apply constraint algorithm to coordinates: constrain (r’→r’’; r) 5. Correct velocites for constraints: v = (v’’ - r)/Δt 6. Scale coordinates and box: r = μr’’; b =μb

Figure 2.1: MD algorithm adapted from [44]. The algorithm is repeated for a user- defined number of steps.

files. When appropriate, initial velocities based on a Maxwellian distribution are generated.

2.3 Topology

Topology files are required for each molecule in a simulation. The topology file describes the species of atoms in each molecule and their interactions such as bonds, angles, and dihedrals that must be maintained throughout the simulation. Molecules are particles with defined interaction functions. Three types of molecules are defined in this study: the protein of interest, lipid molecules, and water. Though not tech- nically molecules, ions are handled by the software in a similar manner. GROMACS 43 was initially designed to study proteins in solution [46]. Therefore, water, ions, and protein amino acids have standard force fields within GROMACS. Incorporation of lipid membranes requires additional molecular topology files and force field interac- tions as described in Appendix C.

2.4 Force Field

Force fields are comprised of a set of potential functions and parameters for inter- actions [44]. Potential functions represent specific types of interactions and restraints for each type of atomic species included in the simulation system. Potential func- tions in GROMACS are divided into three type: bonded, nonbonded, and restraint

[46, 41]. Bonded interaction types include bond lengths, bond angles, and dihedral angles. These are known as bonded because they represent atoms’ interactions held together by covalent bonds. Fixed lists of these interactions are listed in the topology

file for each type of molecule. Bond and angle restraints are modeled as harmonic oscillators, while dihedrals are modeled as cosine expansions. Potentials of the non- bonded variety are described by a Leonnard-Jones potential and Coulomb interaction.

The Leonnard-Jones potential is used to model Van der Waals forces. The model ac- counts for a repulsive effect at close range due to the Pauli exclusion principle and an attractive effect at longer distances due to transitory dipoles, also known as Lon- don dispersion forces. The Leonnard-Jones approximation has widespread usage due to its similarity to empirical values, ease of implementation, and computational effi- 44 ciency. Interactions between charged, nonbonded atoms are accomplished with real and reciprocal space summations. Atoms within a given cutoff radius are handled in real space with Coulomb’s Law. The calculation’s complexity is O(N 2), which means that the computational requirements increase as the square of the number of particles. This approach is not feasible for use with all particle interactions. The particle-mesh-Ewald (PME) summation [47] is used for Coulombic interactions longer than the cutoff. This method is significantly faster because it has a computational complexity of order O(N log N). A uniform three-dimensional grid is created over the simulation space and each atomic charge is discretized over its 64 closest grid points

(4 x 4 x 4). Discretization of the charges enables fast Fourier transforms to move charges to reciprocal space. The charges are summed quickly via convolution and converted back to real space to yield the potential caused by long range electrostatic interactions.

The final category of potentials includes user-defined restraints and external forces.

Restraints of this type used in my work are limited to position restraints and applied electric fields. Position restraints use a harmonic potential function to harmonically hold atoms to a particular position or within a given plane. For example, restraints are applied to specific lipid atoms to restrict these molecules to the membrane plane during equilibration stages. Applied electric fields also have the overall effect of an applied force. 45

The simulations use a twin-range approach for cutoff of potential functions. The cutoff for Van der Waals interactions is 1.4 nm. At this range, the error of the cutoff effect is minimal [48]. For Coulombic interactions, atom pairs less that 0.9 nm from each other uses Coulomb’s law, while more distant pairs use the PME summation.

Error resulting from the cutoff of London dispersion forces (the long range component of the Leonnard-Jones interaction) is minimized, and the burdensome calculation of

Coulomb’s Law is minimized by passing off more distant pairs to the PME summation.

Determination of nonbonded neighbors within a given distance is computationally expensive. Therefore, dynamic pair-lists, updated every 10 simulation steps, keep track of nearby nonbonded atoms. Interactions between atoms within the Coulombic cutoff are calculated at every time step, but interactions of atoms whose distance is between the Coulombic cutoff and Van der Waals cutoff are only recalculated at every

10 simulation steps. This calculation is used until the next pair-list is updated. The premise for this optimization method is that the movement of distant charges affects the interaction substantially less than movement of nearby charges. Interaction with a neutral molecule or moiety near the cutoff distance may cause an artifact wherein part of the dipole is within the cutoff and part outside. Therefore, molecular topologies define sets of atoms with a nonzero integer net charge, called charge groups. The pair-list creation step determines the interaction state of an entire charge group to avoid the artificial creation of a large charge interaction due to a cutoff. 46

2.4.1 Parameterization

Parameterization is a key element of force fields. The parameters that describe potential functions are chosen such that the simulation results accurately depict em- pirical results. For example, the bonded interaction between a C-O bond and C=O bond will both have a harmonic bonded potential, but with different parameters to adjust the scaling of the oscillator. Although it is important to be aware of this aspect of the force field, the typical end-user does not need to modify these unless defining a new molecule.

2.5 Periodic Boundary Conditions

Implementation of periodic boundary conditions (PBC) is a technique to minimize edge effects of the cubic, finite system and enable the use of PME electrostatics

[46, 47]. The system of interest is aware of 26 images, or translated copies, of itself such that the original system is in the center of 3 x 3 x 3 array. This construction allows interactions between atoms to be calculated with a minimum image convention.

The interaction between a given atom in the nominal system and any other interacting atom is determined using the image that provides the shortest distance between the two atoms. A two-dimensional example is shown in Figure 2.2. Thus, the system behaves as if it has no boundaries. All molecules behave as if they are in an infinite system and are free to move from one side of the box to the other. In practice this 47

Figure 2.2: Two-dimensional example of periodic boundary conditions (PBC). Grey square represents simulation system with eight translated copies surrounding it. Atom of interest is shown in red, while the remainder of the system is represented by three atoms shown in blue. Interactions are indicated by black lines. Only the nearest image of each particle interacts and is used for force calculation, a method called minimum image convention. only occurs with water and lipid molecules that wrap around from one side of the box to the other during the simulation. An important consideration when simulating macromolecules that occupy a large portion of the simulation box, such as the proteins studied in this work, is that a solvent molecule must not be able to interact with both sides of the macromolecule. This unrealistic situation is avoided by ensuring that the length of each box vector is larger than the length of the macromolecule plus twice the cutoff for nonbonded interactions. 48 2.6 Thermodynamic Ensemble

GROMACS supports several thermodynamic ensembles. This study incorporates a constant pressure and temperature (NPT, since the number of atoms is constant) ensemble. Temperature is coupled to a external 310K bath [49]. Pressure is kept at

1 atm by adjusting the system box size. By keeping pressure, rather than volume constant, the simulation box is allowed to deform which in turn enables the lipid bilayer to assume an equilibrated density. The lipid density is an important factor in determining whether the system has equilibrated to a sufficient degree to allow further simulations. The temperature and pressure are kept close to their nominal values through a method that weakly couples them to an external bath. Temperature coupling is required in order to keep the temperature from shifting due to numerical inaccuracies and accumulation of effects from short range interaction cutoffs.

Pressure coupling used in this study is semi-isotropic [49]. Along the z-axis, per- pendicular to the lipid bilayer, pressure coupling is kept separate from the coupling along the x-y plane, parallel to the bilayer. This approach is favorable for use with lipid bilayer simulations because it allows the bilayer to adopt a lipid density dur- ing equilibration phases that approaches empirical values. Pressure and temperature coupling are implemented using Berendsen weak coupling schemes [49]. Any change in the system temperature from the nominal value is exponentially decayed back to the bath temperature with a given time constant. Atomic velocities are scaled to achieve temperature changes. A further consideration of temperature coupling is the 49 coupling groups. Simulation approximations cause small inaccuracies wherein energy exchange between different parts of the system, (e.g. between water and protein).

Water experiences a larger effect from interaction cutoffs, and will tend to heat up while the protein cools down. Water, lipid, and protein are each coupled separately.

Because the simulations generally have at most a few dozen ions, they are coupled with the water so that strong interactions of one or two ions do not drastically affect the group’s energy.

2.7 Energy Minimization

Energy minimization methods adjust atomic locations to move a system’s energy conformation downhill to a local minimum [46]. This study used the steepest de-

scents algorithm due to its parallelizability and robustness. The purpose of energy

minimization is not to bring the system to an equilibrated state, but to remove steric

clashes and relax the system such that further equilibration procedures may be ac-

complished. After calculating forces on all atoms and potential energy landscape,

new atomic positions are calculated based on a given maximum displacement. The

displacements are also normalized such that the atom with the largest force acting on

it will move this maximum displacement. If this new atomic configuration produces

a lower potential energy, the new positions are accepted and the maximum displace-

ment is increased slightly. A higher energy conformation rejects the new positions and 50 reduces the maximum displacement. The algorithm can be run for a given number of steps or until the maximum force component becomes smaller than a specified value.

2.8 Water Model

The simulations use the popular simple-point-charge (SPC) water model [49]be- cause of its computationally efficiency. The model comprised three centers of charge; negative charge on the oxygen atom and positive charges at the locations of the hy- drogen atoms. In an average solvated system with lipid and protein, water molecules comprise about 90% of all atoms. As an aside, continuum methods such as [50]ad- dress this problem with an atomic-detailed protein, but water and membrane regions are characterized with a dielectric constant. These relatively new models are promis- ing but do not provide detailed ion-water interactions which may be crucial to the investigation of ion binding and permeation.

2.9 Lipid Model

The lipid model is based on the phospholipid, 1-palmitoyl-2-oleoyl-sn-glycero-3- phosphocholine (POPC) and is a standard choice for similar simulations [51, 52, 9].

Each amphipathic phospholipid consists of 52 atoms that comprise a hydrophilic phos- phate head and two hydrophobic tails. The topology and accompanying force field 51 have been added to the GROMACS gmx forcefield for lipid membrane simulations

[53]. 52 Chapter 3

Modeling the Na+,K+-ATPase in a

Lipid Bilayer System

3.1 Overview

This chapter discusses the creation of homology models of the Na+,K+-ATPase in several key physiological conformations and the procedure of setting up an protein- membrane-water environment in which to study the protein. These stages of the research are indicated on Figure 1.6. The premise behind homology modeling is in- troduced and genetic sequence alignment with the SERCA protein template structure is discussed in detail. Various experimental results and Na+,K+-ATPase and related

H+,K+-ATPase sequences from multiple species were incorporated in the alignment

in order to improve these models over that of the earlier models. The model is also

augmented using available Na+,K+-ATPase structural data of the intracellular N do- main. The chapter discusses the software formation of a lipid bilayer membrane and insertion and of the protein into the membrane. Steps required to bring the systems into equilibration are presented. Finally, the recently released Na+,K+-ATPase X-ray crystallographic structure is compared with the homology model. 53

3.2 Na+,K+-ATPase Structure

Understanding of the Na+,K+-ATPase’s operation would be more complete given its structure in atomic detail. A number of structural studies in recent years have begun to paint a picture of the Na+,K+-ATPase. Cryo-electron microscopy studies have provided two structures of the entire Na+,K+-ATPase α subunit in the E2 con- formation [54, 55]. The resolutions of structures lie in the 10 A˚ realm and structures with this level of detail are very useful for describing domain-level attributes, surface- exposed loops, and large conformational changes. However, low resolution structures consist of nebulous densities corresponding to atoms and do not have the resolution required to investigate atomic interactions and cannot be used as a basis for MD simulation. Besides the low-resolution structures, the Na+,K+-ATPase’s cytoplas- mic N domain has been solved in high-resolution using X-ray crystallography [56] and nuclear magnetic resonance (NMR) [57]. These structures provide a blueprint to increase confidence in the N domain, and therefore the rest of the model, of the homol- ogy models. A recent article discussed the X-ray determination the Na+,K+-ATPase structure in one physiological conformation [58]. This structure provided considerable support for the homology modeling and system preparation process discussed in this chapter. 54 3.3 Homology Modeling

Comparative protein modeling, or homology modeling, is a bioinformatical tech-

nique that creates a three-dimensional model of a protein of interest (known as the

target) from a homologous protein (the template) for which a high-resolution struc-

ture is available. Homologous refers to the level of similarity of the side chains of the

amino acid sequence between the target and template. For instance, two differing hy-

drophobic residues would have a higher homology than a hydrophobic residue and an

acidic residue. This approach is derived from the generally accepted viewpoint that

proteins with similar amino acid sequences have, in turn, significantly similar struc-

ture. In the current work, the protein of interest is the Na+,K+-ATPase. The two predominant steps of homology modeling are alignment selection and model creation.

The alignment phase determines the congruence between the target and template sequences, while the model creation phase uses the target’s sequence and template’s structure to generate the homology model.

3.4 Choice of Template: SERCA

The first step in homology modeling is to choose the template. From the large family of P-type ATPases, the sarco(endo)plasmic reticulum Ca2+-ATPase (SERCA) is the most appropriate choice. This protein operates in a similar manner to the

Na+,K+-ATPase. The major physiological difference is that it exchanges two Ca2+ 55 ions for two or three protons during each pumping cycle. Amino acid sequence analysis and structural comparison of Na+,K+-ATPase electron microscopy data with SERCA

X-ray structures imply that SERCA and the Na+,K+-ATPase share the same tertiary structure, or overall fold, in the available SERCA conformations [55]. Other P-type

ATPases are believed to share same fold as well. The amino acid sequences of SERCA and the Na+,K+-ATPase are 27% identical [38]. The transmembrane region, involved with coordinating ion binding and transport in both proteins, has a high similarity of ∼60% [59]. The proteins also share key motifs involving, for instance, ATP bind- ing and hydrolysis. Finally, SERCA is the only P-type ATPase with multiple high resolution structures available in a variety of physiological conformations [60].

3.5 Sequence Alignment

The sequence alignment between the Na+,K+-ATPase target and the SERCA tem- plate is the most important step in any homology modeling process. Once a model has been produced with a particular alignment, neither can be changed later in the modeling or simulation process. Automated sequence alignment may be successful for proteins sharing upwards of 40% identity [61] and an initial automated alignment produced with the homology modeling software Modeller aligned most key motifs.

Motifs are the sequences of residues with high homology that have similar struc- ture and function in their respective proteins. Highly conserved motifs are generally aligned by automated procedures. For example, P-type ATPases’ P domains include 56 a phosphorylation motif, DKTGT, which contains the aspartate (D) residue that is phosphorylated with the gamma phosphate of ATP. This motif is the hallmark of all P-Type ATPases. Other conserved motifs include the KGAPE nucleotide domain sequence which creates an ATP binding pocket and the PEGL transmembrane helix four sequence which forms a cation binding location.

Neither the beta subunit, nor the gamma subunit were included in the homology models. These subunits are believed to play a regulatory, rather than functional role.

Additionally, no high resolution structural data is available making their incorporation into a model difficult, if not impossible. Only the type IIc P-type ATPases (Na+,K+-

ATPase and H+,K+-ATPase but not SERCA) are found in vivo with a beta subunit.

Automated alignment procedures align two or more sequences using an approach based on imitation of genetic mutations. Each sequence represented by a string of characters; each character denotes an amino acid residue. Gaps are introduced to in- dicate residue(s) that, due to mutations, have no counterpart in the other sequence(s).

Each putative alignment is assigned a fitness score that is rewarded or penalized based on the similarity of the sequences. This score is summed by comparing each pair of aligned residues. Contributions to each pair due to the change of one particular amino acid residue, known as a point mutation, are derived from empirical data that represent the probability of mutation between particular residues. For example, the alignment of two different hydrophobic residues will produce a larger positive contri- bution to the fitness score whereas the alignment of an acidic residue with a basic 57 residue would have a negative effect. A residue may be aligned with a gap (repre- sented by a dash). Gaps represent genetic indels (insertions or deletions) and may be inserted in either of the sequences to accommodate the alignment of amino acid residues before or after the gap. Introduction of a gap penalizes the score because it indicates a difference between the two sequences. The algorithm determines if the gain made by the alignment of nearby residues must overcome the gap penalty.

Larger gaps are more penalized than shorter ones. Automated procedures generally provide a strong initial alignment, but in the sequence regions with lower homology it is imperative that the alignment be manually checked and adjusted to account for experimental data.

There are similar Na+,K+-ATPase homology models based on SERCA [62, 59, 38].

The alignment created for my work agrees in a large part with these studies but has slight changes due to additional experimental data. The alignment, shown in Figure

3.1, is similar to that of a proposed H+,K+-ATPase model [31]. Besides the human

α1Na+,K+-ATPase sequence [63] used for model creation, three additional sequences were incorporated to facilitate comparison of my alignment with this H+,K+-ATPase alignment: rat non-gastric H+,K+-ATPase (Uniprot accession P54708) [64], sheep

Na+,K+-ATPase (UniProt accession P04074) [65], and rabbit gastric H+,K+-ATPase

(UniProt accession P27112) [66]. Automatic alignment procedures provided gap 23 as

shown in Figure 3.1. One significant change from previous models was a three-residue

shift in transmembrane helix (TM) 8. This changed the traditional alignment of 58

1 SMDDHKLSLD ELHRKYGTDL SRGLTSARAA EILARDGPNA LTPPPTTPEW IKFCRQLFGG 1 MEAAHSKSTE ECLAYFGVSE TTGLTPDQVK RHLEKYGHNE LPAEEGKSLW ELVIEQFEDL

61 FSMLLWIGAI LCFLAYSIQA ATEEEPQNDN LYLGVVLSAV VIITGCFSYY QEAKSSKIME 61 LVRILLLAAC ISFVLAWFEE G---EETITA FVEPFVILLI LIANAIVGVW QERNAENAIE M1 1 M2 121 SFKNMVPQQA LVIRNGEK-- MSINAEEVVV GDLVEVKGGD RIPADLRIIS ANG--CKVDN 118 ALKEYEPEMG KVYRADRKSV QRIKARDIVP GDIVEVAVGD KVPADIRILS IKSTTLRVDQ 23 177 SSLTGESEPQ TRSPDFTNE- --NPLETRNI AFFSTNCVEG TARGIVVYTG DRTVMGRIAT 178 SILTGESVSV IKHTEPVPDP RAVNQDKKNM LFSGTNIAAG KALGIVATTG VSTEIGKIRD 4 234 LASGLEGGQT PIAAEIEHFI HIITGVAVFL GVSFFILSLI L------EYTWLEAVI 238 QMAATEQDKT PLQQKLDEFG EQLSKVISLI CVAVWLINIG HFNDPVHGGS WIRGAIYYFK M3 5 284 FLIGIIVANV PEGLLATVTV CLTLTAKRMA RKNCLVKNLE AVETLGSTST ICSDKTGTLT 298 IAVALAVAAI PEGLPAVITT CLALGTRRMA KKNAIVRSLP SVETLGCTSV ICSDKTGTLT M4 344 QNRMTVAHMW FD------NQIHEAD------TT ENQSGVSFDK 358 TNQMSVCKMF IIDKVDGDFC SLNEFSITGS TYAPEGEVLK NDKPIRSG------MMTVAHMW FD------NQIHEAD------TT ------TFDK 678 375 TSATWLALSR IAGLCNRAVF ------Q ANQENLPILK RAVAGDASES ALLKCIELCC 406 QFDGLVELAT ICALCNDSSL DFNETKGVY------EKVGEATET ALTTLVEKMN RSPTWTALSR I------AGLCNRAVFK RDTAGDASES ALLKCIELSC 910 426 G------SV KEMRERYAKI VEIPF----N STNKYQLSIH KNPNT----- 454 -VFNTEVRNL SKVERANACN SVIRQLMKKE FTLEFSRDR- ---KSMSVYC S----PAKSS G------SV RKMRDRNPKV AEIS------YQLSIH ERED------11 12 13 14 15 16 460 -----SEPQH LLVMKGAPER ILDRCSSILL HGKEQPLDEE LKDAFQNAYL ELGGL--GER 505 RAAVG----N KMFVKGAPEG VIDRCNYVRV GTTRVPMTGP VKEKILSVIK EWGTGRDTLR -----NPQSH VLVMKGAPER ILDRCSSILV QGKEIPLDKE MQDAFQNAYL ELGGL--GER 17 18 513 VLGFCHLF------LPD EQFPEGFQFD TDDVNFPIDN LCFVGLISMI 561 CLALATRDTP PKREEMVLDD SSRFMEY------ETD LTFVGVVGML VLGFCQLN------LPS GKFPRGFKFD TDELNFPTEK LCFVGLMSMI 19 20 554 DPPRAAVPDA VGKCRSAGIK VIMVTGDHPI TAKAIAKGVG IISEGNETVE DIAARLNIPV 601 DPPRKEVMGS IQLCRDAGIR VIMITGDNKG TAIAICRRIG IFGENEE------D IVEHVASKLNIPAT 21 621 SQVNPRDAKA CVVHGSDLKD MTSEQLDDIL KYHTEIVFAR TSPQQKLIIV EGCQRQGAIV 648 ------VAD RAYTGREFDD LPLAEQREAC RRAC--CFAR VEPSHKSKIV EYLQSYDEIT NVFANR 22

674 AVTGDGVNDS PALKKADIGV AMGIAGSDVS KQAADMILLD DNFASIVTGV EEGRLIFDNL 699 AMTGDGVNDA PALKKAEIGI AMG-SGTAVA KTASEMVLAD DNFSTIVAAV EEGRAIYNNM 23 734 KKSIAYTLTS NIPEITPFLI FIIANIPLPL GTVTILCIDL GTDMVPAISL AYEQAESDIM 758 KQFIRYLISS NVGEVVCIFL TAALGLPEAL IPVQLLWVNL VTDGLPATAL GFNPPDLDIM M5 M6 794 KRQPRNPKTD KLVNERLISM AYGQIGMIQA LGGFFTYFVI LAENGFLPIH LLGLRVDWDD 818 DRPPRSPKEP -LI-SGWLFF RYMAIGGYVG AATVGAAAWW FMYAEDGPGV TYH----QLT 24 25M7 26 854 RWINDVEDSY GQQ--WTYEQ RKIVEFTCHT AFFVSIVVVQ WADLVICKTR RNSVFQQGM- 872 HFMQCTEDHP HFEGLDCEIF EAPEPMTMAL SVLVTIEMCN ALNSLSENQS LMRMPP---W 27 M8 ISIEMCQ 28 29

911 KNKILIFGLF EETALAAFLS YCPGMGVALR MYPLKPTWWF CAFPYSLLIF VYDEVRKLII 929 VNIWLLGSIC LSMSLHFLIL YVDPLPMIFK LKALDLTQWL MVLKISLPVI GLDEILKFIA M9 M10 971 RRRPGG 989 RNYLEG

Figure 3.1: Sequence alignment of Na+,K+-ATPase (top row) and SERCA (second row). Third row is used for: sequence of Na+,K+-ATPase N domain crystal structure (PDB ID: 1Q3I) in brown text; human phosphatase used to model large P domain loop in orange text, and “ISIEMCQ” motif that supports choice of alignment in TM8 in grey text. Transmembrane regions, A domain, N domain, and P domain are light blue, green, red, dark blue, respectively. Transmembrane helices are numbered (TM1, TM2, etc.) and portions within the membrane are denoted by black boxes. Entire SERCA sequence is shown; Na+,K+-ATPase N and C termini truncated as discussed in Section 3.5. 59

VTIE/VVVQ (SERCA/Na+,K+-ATPase) to VTIEMCN/VSIVVVQ [67, 31]. Inclu-

sion of the gastric H+,K+-ATPase sequence ISIEMCQ links the SERCA and Na+,K+-

ATPase alignments. This detail is important due to the region’s proximity to the ion binding cavity. Besides TM8, automated alignment procedures work well for the he- lices TM4 and TM6 which are involved in Ca2+ binding in the SERCA structure. The lack of gaps between this pair of helices allows the TM5 alignment to fall into place.

Other changes have been made in regions away from the binding sites to offer an improved alignment. For example, the proposed H+,K+-ATPase sequence alignment

[31] was used for the large lumenal loop between TM6 and TM7. The alignment choice for TM1 and TM2 (gap 1 in Figure 3.1) was strengthened by recent mutage- neses studies on this region [68], the conserved QE sequence at the end of TM2 [31], and the predicted hinge functionality of Na+,K+-ATPase’s L57FGGF and SERCA’s

F57EDLL. The helix-coil structure of FEDLL in SERCA matches with the flexibility provided by the adjacent glycine residues in the Na+,K+-ATPase [31]. Automated procedures create only a single gap in the alignment because two gaps impart a large penalty on the fitness score. Alignment of the LV805/LI motif (gaps 24-25 in Figure

3.1) also leads to an alignment of an IG motif in transmembrane helix 7. This mo- tif is conserved among all Na+-, K+-andH+,K+-ATPases [31] due to its recognized function as a pivot point between TM5 and TM7 in SERCA [69]. Gaps 2-5, 21, 22, and 24-29 in Figure 3.1 are attributed to the H+,K+-ATPase alignment [31]. The

Na+,K+-ATPase N domain crystal structure [56], discussed in the next section, en- 60 abled gaps 6-20 in Figure 3.1 The N domain structure is key to predicting an accurate homology model because of the low sequence homology between the Na+,K+-ATPase and SERCA in this domain [62].

Finally, 40 residues from the N-terminus and 8 residues from the C-terminus of the Na+,K+-ATPase sequence that extend beyond SERCA’s sequence were removed.

Any homology model of Na+,K+-ATPase with SERCA as a template must use this approach. The absence of structural data for these residues creates dangling chains that severely interfere with incorporation of the homology models into a lipid bilayer.

Furthermore, neither region is well conserved among type II ATPases. Although these regions have been implicated in other roles [70], there is no evidence that they are directly related to ion permeation or ion binding. For these reasons, they are no longer considered in these models. The automated alignment for the Na+,K+-ATPase was created with Modeller’s salign (structural salign) command which uses SERCA structural data to enhance the initial alignment. Throughout the alignment process, sequences from the E2P (PDB ID: 1WPG) structure were arbitrarily selected for this purpose. Structural alignment techniques offer improved alignments and subsequent improved homology models [71].

3.5.1 Sequence Offset

An accepted numbering convention does not exist to ease discussion of homologous residues among various species and isoforms. Most research focuses on the use of the 61

Table 3.1. Sequence Numbering Offset

Model Offset Example Sequence Reference

This work 0 GLU295 human α1NA

Rakowski +39 GLU334 human α1[26]

Sweadner +34 GLU329 rat α1[62]

Vilsen +32 GLU327 pig α1[58]

Gadsby +38 GLU333 Xenopus α1[72]

Horisberger +39 GLU334 Bufo α1[73]

Taniguchi +32 GLU327 rat α1[67]

Note. — Reference offsets are provided for comparison of particular residues between this research and other models.

The majority of the offset is due to the shortening of the

N-terminus of the Na+,K+-ATPase which has no homologous residues in the SERCA template. See Section 3.5.1 for further details. 62

Na+,K+-ATPase sequence, while the species from which the sequence comes is of lesser interest. There is extremely high level of homology (away from the N- and

C-termini) among Na+,K+-ATPase sequences among species. However, there is an

especially high variability in the length of the N-terminus (where residue numbering

begins) of the various P-type ATPases. Since the numbering each species’ sequence

begins at the N-terminus, this situation causes confusion when discussing a particular

homologous residue. Numbering conventions used in previous homology models may

even vary among the same species. To alleviate this difficulty, the offsets of the

numbering system of several pertinent Na+,K+-ATPase homology modeling articles are given in Table 3.1. The approach taken in this work is to simply define the offset between this and other research. Therefore there is no question as when a reference is made to a particular residue. Note that numbering issues remain for comparison with sequences of lower homology, such as SERCA, where gaps in the sequence alignment make it impossible to use a single value for the offset.

3.5.2 Na+,K+-ATPase N-Domain

Structural data besides that of SERCA was used to improve the Na+,K+-ATPase homology model. High resolution structures of the N domain of the rat α1isoform

of the Na+,K+-ATPase derived from nuclear magnetic resonance (NMR) (PDB ID

code: 1MO7) [57] and the porcine α2 isoform from X-ray crystallography (PDB ID

code: 1Q31) [56] were used in two ways to improve this region in the homology 63 models. Sequence data from the x-ray structure was used to create the alignment in this region. A significant portion of the model was therefore based on data from the Na+,K+-ATPase, rather than SERCA. The N domain structures augmented the

SERCA structures during the model creation by Modeller.

The N domain composes a Rossman fold, and based on four of the available

SERCA conformations, changes negligibly over the course of the pump cycle. The

N domains of three SERCA structures representing different physiological structures

(E1, E2P, and E2) (1SU4, 1WPG, and 2AGV) were each compared separately to each other. All were found to have an α-carbon RMS deviation of less than 2 A,˚ which is less than the experimental resolution of the SERCA structures. Therefore the

Na+,K+-ATPase N domain structure can be used regardless of the template confor- mation. The N domain structures were superposed onto the given SERCA structures.

During the model creation step, this N domain data is used rather than SERCA’s N domain data.

3.5.3 Model Creation

Modeller employs the method of satisfaction of spatial restraints to construct a

Na+,K+-ATPase model from the alignment and SERCA structure. Figure 3.2 shows

an example of model creation from alignment and template structure. The goal of

this phase of homology modeling is to determine the most probable structure for a

certain sequence given its alignment with the template structure [74]. Satisfaction of 64 spatial restraints is an iterative procedure that is initialized with extracted distance and angle data from the template structure. Distance and angle restraints are created for the model and are expressed as empirically-based probability density distributions that are optimized to produce the model. Modeller’s automodel command was used to generate five models, each of which is slightly different based on varying initial structures. The energy of each model was evaluated by Modeller’s DOPE (Discrete

Optimized Protein Energy) function and assigned a score. The model with the lowest energy was selected for further optimization. Once a sequence alignment has been created for the target Na+,K+-ATPase and template SERCA, it is a straightforward process to create additional models based on other available x-ray structures. The modeling process was repeated to create models based on the SERCA structures representing the E1, E2, and E2P conformations.

3.5.4 P-loop Optimization

The 20-residue loop from T601 to D620 in the Na+,K+-ATPase alignment is ex- tremely long for ab initio structure determination [61]. Two methods were enlisted to facilitate the creation of the most accurate loop model possible. First, a BLAST search [75] from the [60] was queried with the 20-residue sequence to search for available structural data for proteins that contain similar sequences. A human phosphoserine phosphatase protein (PDB ID code: 1NNL) [76] has a similar sequence as shown in Figure 3.1. In the 1NNL structure, the sequence takes on a 65

Na+,K+ Q N A Y L E L G G L SERCA L S V I K E W G T G R α5

D

β6

L H C F G L V R E G R T A L A L C R L T

Figure 3.2: Sequence alignment gap coupled with structure to produce homology model. A sequence alignment of the α 5 helix and β 6 sheet sections of the N domain of the Na+,K+-ATPase and SERCA are shown in orange (top row) and cyan (bottom row), respectively. Conserved residues are shown with a black background. SERCA has two additional residues (R-arginine and D-aspartic acid), which create a gap in this region and are shown in stick representation. These residues, along with the remainder of the SERCA structure are shown in cartoon representation in cyan. The final Na+,K+-ATPase model is shown in cartoon representation in orange. SERCA crystallized in the E1P conformation is shown as cartoon representation in cyan. helix-loop-chain characteristic. The secondary structure prediction software, PROF

[77, 78], gave similar results when presented with the 20-residue sequence. The TVE-

DIAA sequence had the highest possible probability of adopting a helical conforma-

tion. A Modeller script shown in Appendix C was written to optimize the P-loop

in which the 7-residue sequence was restrained to a helical secondary structure. The

three neighboring residues occurring before and after the 20-residue sequence were

also optimized to allow flexibility from the SERCA structure at the locations where

the loop joined the rest of the protein. Twenty models were created with different 66 perturbations of the loop segment. The top several models agreed, indicating that the twenty perturbations were sufficient. Each model was assessed a score by DOPE and the conformation with the lowest energy was chosen as the final model for simulation.

The loop model protrudes from the P domain, which agrees with the low resolution cryoelectron microscopy structure of the Na+,K+-ATPase [55] and the more recent

X-ray crystallographic structure of the Na+,K+-ATPase [79].

3.6 Lipid Bilayer Creation

The molecular dynamic software package, GROMACS [46], was used to create a

lipid bilayer of 512 POPC lipids from four smaller pre-equilibrated blocks [53]. These

blocks were chosen because all necessary files relating to their topology as well as

changes to the GROMACS forcefield are freely available [53]. Because crystallized

structures contain little or no lipid orientation information, the actual location of

the protein with respect to the bilayer is not well defined. The software TMDET

[80] makes use of structural and hydrophobic data from the protein and was used to

determine the transmembrane orientation. TMDET’s accuracy was later supported

during the fully unrestrained simulations by both a relatively constant area per lipid

and the small amount movement of the protein with respect to the membrane. 67

3.6.1 Hole Creation

Two different strategies were adopted to deal with the task of creating a hole in the bilayer to acommodate the protein. This step is particularly difficult with the

Na+,K+-ATPase due an asymmetrical transmembrane region which adopts a tilted orientation in the membrane.

The first method described was used for the SERCA simulations of Chapter 4, whereas the second method was used for the Na+,K+-ATPase simulations of Chapter

5. One advantage of the second method is the elimination of the molecular surface calculation, which has the benefit of reducing the total number of software programs in the methodology. More importantly, the newer method is automated in that it removes the step where overlapping lipids are removed manually. An automated methodology is more reliable and simpler to explain and share. Both methods pro- duced similar equilibrated systems and are discussed below.

In the first method, between 12 to 20 of the most severely overlapping lipids were removed manually from each side of the bilayer for each system. Next, a GROMACS supplement [81] was used in conjunction with protein model molecular surface data

[82] to force lipid atoms inside the protein outwards along the plane of the mem-

brane towards the protein surface. This approach has been shown to produce stable

membranes without affecting the lipid density [81]. Systems with lipids whose tails

are in, or move into, unnatural positions are particularly prone to crash during early 68 simulations stages. This process involved a considerable amount of trial and error to determine which lipids to remove to achieve a stable membrane.

The second lipid equlibration method (see Section C.4) stretches the centers of lipid molecules outward along the membrane plane using the protein’s center as the origin [83]. The lipid conformations themselves are not perturbed. In my work, the initial total membrane area was increased by a factor of 4. After the transformation, lipids which overlap the protein are removed to create the hole. Then, alternating steps shrink the lipid by a small factor (5%) and perform energy minimization to remove steric conflicts. The process (shown in Figure 3.3) is repeated until the area per lipid reaches POPC’s empirical value of 64 A˚2 [84]. While it is important to shrink the system as close as possible to the experimental area per lipid, the system will have another opportunity to achieve a more desirable area per lipid during the position-restrained phase of simulation.

3.7 Solvation and System Equilibration

The following sections describe parameters for the Na+,K+-ATPase simulations.

The SERCA details are very similar [85]. A three-dimensional system box size was chosen that encompassed the lipid (x-y plane) and allowed for sufficient (approxi- mately 2 nm) space between the protein and top and bottom system boundaries.

The spacing was large enough so that the opposite ends of the protein cannot inter- act due to the periodic boundary conditions. The entire system was solvated with 69

a) b)

c) d)

Figure 3.3: Lipid membrane bilayer equilbration process depicted in four steps. Lipid molecules shown in VDW representation; phosphate heads are brown (phosphorus) and red (oxygen), and tails are cyan. a) Initial membrane. b) Membrane has been expanded and lipid molecules near protein (center) have been removed from the sys- tem. c) Membrane after 12 steps of shrinkage. d) Final system after 25 shrinkage steps. Protein surface shown in transparent blue. Note that the protein remains in the system throughout the equilbration process, but is not shown in images a-c for clarity. Scale bar is approximately 10 nm for each figure.

water and centered appropriately using GROMACS’ genbox and editconf commands.

The simple point charge (SPC) model was used for water molecules [86]. An example

of the constructed system is shown in Figure 1.3.

Random water molecules were converted to Na+ ions to neutralize overall system charge due to unprotonated protein side chains. All simulations were performed with

GROMACS version 3.3.1. The GROMOS87 force field was augmented with lipid- optimized non-bonded parameters [87]. Energy minimization (EM) was performed to a tolerance of 100 kJ mol−1 nm−1 and position-restrained (PR) MD was run for 70

Figure 3.4: Area per lipid plotted during position restrained MD phase. Top and bottom layers have been averaged. E2P simulation was run first. Plot indicates that 250 ps would provide ample equilibration time for the E1 and E2 simulations, performed at a later date. Area per lipid values remained stable throughout the unrestrained simulations.

at least 250 ps (Figure 3.4). In all PR steps each lipid’s phosphorus head as well as two end-tail carbons were harmonically restrained in the lipid membrane plane.

Protein heavy atoms were also restrained. Semi-isotropic coupling with τ =1.5ps was used to maintain a pressure of 1 bar with a compressibility of 4.5x10−5 bar−1.

This technique was designed for membrane protein simulations and couples the x- and y-axes (the membrane plane) separately from the z-axis. The trio of protein, lipids, and water/ions was each coupled separately to a temperature bath of 300K and was maintained with Berendsen coupling with τ equal to 0.1 ps [49]. Long- range electrostatic interactions over 14 A˚ were handled with the particle-mesh-Ewald 71 RMS Deviation of distances between C-alpha atoms 0.6

0.5

0.4

0.3 E2P E2 RMSD (nm) 0.2 E1

0.1

0 0 2000 4000 6000 8000 Time (ps)

Figure 3.5: Root Mean Square Deviation (RMSD) of α-carbon atoms during un- restrained MD phase. For all simulations, data after 3 ns was used. All systems achieved stable measurements of several parameters (area per lipid, protein RMSD, and protein Φ/Ψ angles) that indicate of good protein stability in MD simulation.

(PME) approach [47]. Van der Waals interactions were cut-off at 14 A.˚ The LINCS algorithm [45] was used to constrain bond lengths and a time step of 2 fs was used in all simulations. Root mean square deviation (RMSD) of subsequent fully unrestrained simulations indicates good protein stability (Figure 3.5) and is comparable to other work [33]. Trajectory data analysis was performed only on data taken after the

RMSD of the protein α-carbon atoms stabilized, which was chosen to be 3 ns for all simulations. The PROCHECK software was run on protein structures at 3 ns and found 74.1, 77.4, and 76.5% of residues to be in the most favored Φ/Ψ regions in the E1, E2, and E2P simulations, respectively [88]. These angles remained stable throughout the data analysis time, indicating good protein stability [12]. 72 a) b)

Figure 3.6: Two viewpoints of the Na+,K+-ATPase crystal structure (opaque) and homology model (transparent) used in my work. The view is along the membrane plane. Approximate extent of membrane is shown by horizontal black lines. Figure a) shows the orientation difference in the TM10 helix. The location of the crystal structure’s C-terminus, shown in yellow, forces the cytoplasmic end of TM10 outward from the helix bundle. In the homology models, these eight residues were truncated because there is no homology with the SERCA template. Figure b) depicts the differences between TM2 of the structure and model. While the complete role of TM2 in ion binding and permeation is unknown, its proximity to the binding cavity (black oval) and putative intracellular ion pathway (black rounded rectangle) indicates its possible importance in these aspects. Finally, there are some minor visible differences in the extracellular loops. Small differences in flexible protein loops are unavoidable even with current homology modeling loop optimization techniques.

3.8 Na+,K+-ATPase X-Ray Crystal Structure

After construction of these homology models and during preparation of this dis-

sertation, an X-ray crystallographic structure of the Na+,K+-ATPase in the E2P conformation became available in December, 2007 [58]. Although the resolution of

3.5 Ais˚ moderate, this new structural data provides a basis for comparison of the

homology models introduced in my work. Two key differences are depicted in Figure

3.6. A full analysis between the crystal structure and the homology models produced 73 here is beyond the scope of my work, but the high resemblance of the E2P model with the crystallized structure supports the methodology used to produce my models.

3.9 Summary

This chapter has discussed the alignment and homology modeling construction of

Na+,K+-ATPase models in three physiologically relevant conformations (E1, E2, and

E2P). Experimental data was incorporated into the sequence alignment (for exam- ple Munson [31] and Sweadner [62]) which was subsequently coupled with structural

data from SERCA (Toyoshima [62, 59, 38]), the N domain of the Na+,K+-ATPase

[56], and a human phosphatase structure [76] to produce reliable Na+,K+-ATPase homology models. The Na+,K+-ATPase homology model in the E2 state is the first of its kind and is imperative for investigation of K+ bindings sites. These models were incorporated into accurate simulation environments consisting of protein, membrane, water, and ions. All models displayed very good structural characteristics in terms of RMSD stability Φ/Ψ angles during equilibration [12]. The lipid bilayer membrane was also stable in terms of area per lipid. The Na+,K+-ATPase homology model in the E2P form was compared with the new X-ray crystal structure of Na+,K+-ATPase

[58]. This pair agreed very well given the existence of regions with no homology be- tween Na+,K+-ATPase and SERCA. These homology simulation systems have set the foundation for long-time-scale molecular dynamic simulations discussed in Chapter

5. 74 Chapter 4

MD Simulation and Analysis of the

Ca2+ ATPase (SERCA)

4.1 Introduction

One of the goals of this work is determine the extent to which homology modelled proteins may be simulated to investigate various structure-function properties. Long time scale simulations of SERCA and the Na+,K+-ATPase in an accurate lipid envi- ronment have not been attempted before. The growth of accessible high-performace computing systems coupled with associated biomoleculear computational tools now allow for simulation studies described here. In order to validate the simulation and analysis methodology, X-ray crystallography structures of the sarco(endo)plasmic reticulum Ca2+ (SERCA) were incorporated in several simulations.

As a stepping stone during this research, MD simulations and subsequent analy- sis were performed on SERCA, which is a Ca2+-ATPase that is very similar to the

Na+,K+-ATPase in sequence, structure, and function. This approach allowed signifi- cant improvements to be made in the simulation and analysis methodologies without involvement of the homology modeling process. The SERCA chapter describes our published research that investigated ion binding location and ion permeation path- 75 ways using a variety of methods. The chapter shows that the methodology developed is able to corroborate existing experimental data as well as suggest protein amino acids that may play a role ion permeation.

4.2 Simulations Performed

Four long-scale simulations were performed, the details of which are shown in

Table 4.1. The first three, designated E1, E2P, and E2a, each simulated 1 ns of fully unrestrained MD as described in Section 4.3. These three simulations represent

SERCA in several states of the Post-Albers cycle and did not contain Ca2+ ions.

The states, E1, E2P, and E2, were based on crystal structures publicly available in the Protein Data Bank (PDB ID codes: 1SU4, 1WPG and 2AGV, respectively)

[89, 90, 91]. These purpose of these simulations was to determine the pathway of ion translocation by means of the program CAVER [92] and electrostatic calculations using Adaptive Poisson-Boltzmann Solver (APBS) [93]. E2b, the fourth simulation, was a 3.2 ns continuation of the E2a simulation with the addition of two Ca2+ ions as described in Section 4.4. The inclusion of Ca2+ ions allowed hypotheses to be made regarding the binding sites and the cytoplasmic ion permeation pathway. Figure 4.1

shows the Root Mean Square Deviation (RMSD) of α-carbon atoms, which indicates

good protein stability over the course of the simulations [12]. Three additional short

simulations based on the E2P system included Ca2+ions with the intent of probing the lumenal permeation pathway. 76

Table 4.1. Overview of SERCA Simulations

ID Length Ions Analysis Methods Key Sections

E1 1 ns No CAVER & APBS analysis 4.4.1

E2P 1 ns No CAVER & APBS analysis 4.4.1,4.4.2,4.4.3

E2a 1 ns No CAVER & APBS analysis 4.4.1,4.4.4

E2b 3.2 ns 2 Ca2+ Ca2+ ions placed at sites I and II 4.4.5,4.4.6,4.4.8

E2Pa 100 ps 1 Ca2+ Probed lumenal pathway with Ca2+ 4.4.7

E2Pb 20 ps 1 Ca2+ Probed lumenal pathway with Ca2+ 4.4.7

E2Pc 20 ps 1 Ca2+ Probed lumenal pathway with Ca2+ 4.4.7

Note. — Simulation identification, length, use of ions and analysis is given. The last column indicates section in the text that discuss each simulation.

4.3 Methods and Analysis

Details of the methodology can be found elsewhere [85], but are very similar to those described in Section 3.7. The protein analysis tool, CAVER, was used to deter- mine buried pathways from locations inside the protein, specifically the ion binding cavity, to the outside. CAVER explored the three-dimensional for unoccupied space having a minimum diameter of 1 A.˚ The space was followed along 77

0 500 1000 1500 2000 2500 3000

0.3

0.25

0.2

0.15 E1 CA RMSD (nm) CA 0.1 E2P E2a E2b 0.05

0 0 200 400 600 800 1000 Time (ps)

Figure 4.1: Root Mean Square Deviation (RMSD) of α-carbon atoms during simula- tions. E1, E2P, and E2a relate to the (0, 1000 ps) horizontal axis, while E2b relates to the (0, 3200 ps) axis. The increase in RMSD beginning at 800 ps of the E2P simulation is due to flexible residues of segments which are far away from the ion binding and permeation region of interest. These segments include the large TM7-8 lumenal loop, the N and C termini, and several short N domain segments. These small positional changes do not significantly affect the ion binding and permeation pathways and are not directly related to ion pumping. This conclusion is supported by the fact that the CAVER results from later times were no different than those near the beginning of the simulation. In addition, PROCHECK [88] analysis of the protein structure at 1000 ps shows 82.4% of residues are in the most favored conformation. PROCHECK gives a very good prediction of accuracy in MD simulations and this value is well within the range of acceptable protein behavior [12]. a continuous pathway and success declared if the program managed to find a vacant route from the binding site (starting point) to a point on either the inside or outside surface of the protein in contact with the aqueous medium. This pathway need not be large enough to unequivocally accommodate a Ca2+ ion, as the interaction between the protein and ion can facilitate ion transport through these regions, as shown in the simulations of Section 4.4.7. This might be called a “toothpaste tube” model for ion 78 permeation through pathways that in the absence of the ion are too narrow. CAVER analysis was performed on snapshots from the fully unrestrained MD trajectories of simulations E1, E2P, and E2a at 10 ps intervals. Combination of the CAVER and

APBS results gave putative pathways that may be involved in ion transport. These resulting putative pathways were used as a guide to determine where to place the ions in the permeant ion simulations of Section 4.4.7. The origins of the CAVER algorithm for each binding pocket were determined by averaging the coordinates of α-carbon

atoms whose residues were shown to bind Ca2+ ions in the E1 crystal structure [89].

Additional numerical tools used for post-processing of MD output included VALE and APBS. VALE has been shown to yield accurate predictions of metal ion binding site locations in a variety of protein structures [94]. VALE analyzes each point on a three-dimensional rectangular grid using an empirically-based algorithm to determine favorable locations for specific ion binding due to main and side chain oxygen atoms, as well as water. This software was used to calculate site valences for Ca2+ ions on a

0.25 A˚ grid. APBS solves the Poisson-Boltzmann equation on a 3D grid to yield an electro-potential map that indicates potential values throughout the system. Each of these software programs can be used in conjunction to augment the analysis toolset.

In this way, we seek to show that use of these methods of analysis on MD simulations may provide computational approaches to help reinforce existing biophysical data and generate new hypotheses for experimental verification. 79

TM2L

TM5 TM1

TM4

TM2C

Figure 4.2: Ca2+ ion’s trajectory from site II in the E2b simulation. Spheres indicate Ca2+ ion’s position from the beginning (red) to the end (blue) of the simulation. A representative CAVER pathway from the E2a simulation is shown in black mesh to indicate the diameter of the CAVER pathway. Part of TM2 has been removed to enable the sites to be seen. Notice that the ion trajectory moves behind TM4 (on the TM5 side). However, the CAVER result indicates a preference for the front-side passage on the side facing TM1 and 2. The grey surface in the background represents the lipid bilayer and shows that the CAVER pathway moves all the way into the solvated cytoplasm.

4.4 Results

4.4.1 Cytoplasmic Pathway as Determined by CAVER

This section discusses the relationship between the trajectory of Ca2+ ions in the E2b simulation and the CAVER analysis of E1, E2a, and E2P as they pertain to the cytoplasmic pathway. Figure 4.2 shows the trajectory of the site II Ca2+ 80 ion from the E2b simulation, as well as a group of E2a CAVER output pathways leading from site II to the cytoplasm. CAVER pathways from the E2P and E1 simulations were similar, but not as pronounced, indicative of being closer to a closed state. The CAVER pathways show a slightly different way of reaching the ion’s final location than the E2a simulation. The CAVER pathways move on the TM1-2 side of TM4, whereas the permeant ion E2b simulation shows the Ca2+ ion moving on the TM5 side of TM4. There are two reasons that the CAVER results for this part of the pathway are less predictive. First, the diameter narrows to 1.7 A˚ along paths between TM2 and TM4, significantly less than the Ca2+ van der Waals diameter of

2.2 A.˚ Second, the trajectory of the E2b simulation showed only a few individual water molecules following the CAVER pathway on the TM2 side of TM4. Water molecules on the TM5 side of TM4 create by far the most consistent connection extending from the cytoplasmic vestibule to binding site II. On the other hand, the remainder of the CAVER result is plausible for two reasons. First, the ion rejoined the group of CAVER pathways by 2600 ps of the E2b simulation. Second, the remainder of the CAVER pathways after this point agreed with another proposed cytoplasmic pathway [90]. The aforementioned discrepancy is a good indication of the usefulness of supplementing CAVER analysis with other methods, such as permeant ion MD simulations. Starting from the final simulated location of the Ca2+ ion, the CAVER diameters increased linearly to 8 A˚ over the next 10 A˚ as the paths enter the water

filled vestibule. Over the last stretch, the paths’ diameters increased gradually to 10 81

A,˚ until the cytoplasm was reached. The remainder of the CAVER pathway exited the protein through a relatively wide (∼ 12 Aby˚ ∼ 12 A)˚ pore. There are many possible acidic residues (GLU55, GLU58, GLU109, GLU243, ASP254, ASP245, GLU258) in this vicinity that may contribute to the ion capture processes [89]. Although the cytoplasmic pathway is difficult to investigate experimentally, the pathway suggested by the latter portion of the CAVER results is reasonable and warrants experimental validation and clarification.

4.4.2 Pathways to the Lumenal Space Predicted by CAVER

CAVER results from the E2a and E2P simulations gave similar pathways as shown in Figures 4.3 and 4.4. The main stalk of the egress pathways is situated between

TM1, 2, 4, 5 and 6 in both conformations. The site II paths moved towards the lumen. The site I paths split into two, the first of which joined the site II path in the transmembrane area near E90, while the second met closer to the lumen. In both cases the pathways joined at a central locus between TM4, 5, 6, and 7 roughly at the level of the lumenal plane of the lipid bilayer. The first site I group of paths passed on the cytoplasmic side of the TM5-6 loop, while the second passed on the lumenal side of the TM5-6 loop. The reunification of site I and II pathways supports a sequential release mechanism [95]. The lumenal site I pathway rejoined the site II pathway near the top of the transmembrane helices, which is nearly into the solvated lumenal space. The lumenal site I pathway in the E2P simulation narrowed to a 82

I II

Figure 4.3: CAVER results from the E2a simulation showing sterically accessible pathways to the lumenal side of the membrane. The view is along the membrane plane with the cytoplasm at the bottom. Red circles indicate approximate positions of binding sites I and II. Lumenal exit pathways originating from Ca2+ site I (II) are shown in green (blue). Residues homologous to those that affect lumenal ion transport in mutagenesis studies of Na+,K+-ATPase are shown as black sticks. The extent of water permeation during the simulation is shown by the water molecules (shown as boomerangs). diameter of 1.2 A˚ before it reached the lumen. Both of these observations strengthen the conjecture that the site I pathway joins the site II pathway on the cytoplasmic side of the TM5-6 loop.

There are several key differences between the E2a and E2P results. An argument can be made that the E2P conformation should be more open to the lumen. The pro- tein conformational change that occurs in vivo should correspond to the movement from the crystallized E2 to E2P structure [91]. The E2P state precedes E2 in the 83

I II

Figure 4.4: CAVER results from the E2P simulation show less restriction in pathways than in the E2a simulation. Displayed elements are the same as in Fig. 4, with the addition of GLU90 in cyan and VAL93 in pink, whose sidechains point inward towards TM4 and the large agglomeration of CAVER pathways from site II. Note the intermixing of pathways and the greater extent of lumenal water permeation towards site II.

forward pump cycle and this dephosphorylation (change from E2P to E2) occurs after the protons are taken up from the lumenal medium. The E2P CAVER pathways cor- roborate this viewpoint because they are less confined in their trajectories, especially in the paths from site II. This could indicate larger protein sidechain fluctuations or larger spacing between the transmembrane helices. Because there are more wa- ter molecules situated farther into the transmembrane region of the E2P simulation, larger inter-helical spacing and creation of a solvated pore is a reasonable explanation for this observation. Many of the site II pathways moved towards and intertwined 84 with the site I pathways. This indicates the relative “openness” of the transmembrane region between TM4 and 6. The E2P results, therefore, suggest a conformation with a relatively open pathway to the lumenal space. Mutagenesis studies have explored the role of transmembrane residues along the putative lumenal pathway of the Na+,K+-

ATPase [96, 97, 98, 72]. MODELLER [61] was used to produce a sequence alignment of SERCA with the Na+,K+-ATPase and allowed investigation of homologous residues along the SERCA pathways found by CAVER. This alignment agreed with other pub- lished work [38, 59]. SERCA residues whose homologues had a significant effect on transport in the Na+,K+-ATPase are TYR294, ILE298, ALA301, VAL304, GLU309, and GLY310 on TM4, TYR763, ILE765, SER767, GLU771, CYS774, and PHE776 on TM5, LEU781, PRO784, GLU785, ALA786, and ILE788 on the TM5-6 loop, and

PRO789, VAL790, ASN796, and ASP800 on TM6. These residues are depicted in

Figures 4.3 and 4.4. All of these residues were in the vicinity of the CAVER pathways, except for some lipid-facing residues, most notably ILE765, PHE776, LEU781, and

VAL790. Even though these CAVER pathways correlate with the mutated residues that have an effect on transport, it is difficult to make any further conclusions about the detailed architecture of the lumenal pathway. This investigation using CAVER has suggested that TM1 and 2 are also along the lumenal pathway. Recent mutagene- sis experiments have shown that mutation of the LEU99 residue (LEU65 in SERCA) on TM1 in the Na+,K+-ATPase have a significant affect on K+ affinity [68]. This residue comes into close contact with the pathways from site II in the CAVER results 85 of simulations E2a and E2P, but not in the site II pathways to the cytoplasm in simulation E1. This observation further supports the postulate that the mid TM1 region (which includes LEU65) has little importance on Na+ interactions in the E1 conformation. Accessibility experiments have not been performed along TM2. Sec- tion 4.4.7 introduces Ca2+ ions to probe this pathway and comments on experiments that may further elucidate important amino acid residues in this region.

4.4.3 Occlusion Site Connection as Determined by CAVER

From the CAVER results of the E1, E2P and E2a simulations, sites I and II also appear to have a direct connection (not shown) that passes between TM4 and 5. In the E2b simulation, before the Ca2+ moved from binding site II, water molecules from the cytoplasmic vestibule past site II all the way to site I. This provides a putative cytoplasmic egress pathway for site I that warrants further investigation.

4.4.4 Summary of Cytoplasmic and Lumenal Pathways as

Determined by CAVER

Overall, CAVER analysis showed much promise. Putative cytoplasmic and lume- nal pathways were determined. The cytoplasmic path deviated slightly from the E2b simulation, but interaction with GLU309 and the remaining pathway through the cytoplasmic pore are reasonable [89]. The lumenal pathways from sites I and II may join and share the same exit locus, indicative of a single file release mechanism. TM1 86 and 2, which have only been partially studied via mutagenesis experiments, may play a role in forming a water filled pore through which cations may flow. CAVER also predicts a group of paths that split TM5 and TM7 and protrude into the cytoplasm just under the bilayer. However, there is a dearth of coordinating sidechain oxygen atoms along this route and electrostatic analysis shows the region is not conducive to cation permeation. CAVER analysis, which is limited to steric considerations, should be coupled with other methods such as permeant ion simulations or electrostatic calculations to investigate putative water or ion pathways.

4.4.5 Ca2+ Ion Movement from Site II

The Ca2+ ion associated with site II was inserted manually at the geometric

center of carboxyl oxygen atoms of ASN796 and ASP800 and the carbonyl oxygen

atom of ILE307. These atoms, along with main chain carbonyl oxygen atoms of

VAL304 and ALA305, and side chain oxygen atoms of ASP800 and GLU309 were

shown to coordinate the site II Ca2+ ionintheE1structure[89]. During the E2b simulation, this ion was held in tight coordination with side chain oxygen atoms of

ASN768, GLU771, and ASN796, and the main chain oxygen atom of ALA305, along with 3 water molecules for about 250 ps. GLU771 also contributed to coordination,

fluctuating between 2.6 and 4.2 A˚ over this period. Over the next 50 ps, GLU771 coordination was lost and the ion moved nearly 2 A˚ normal to the membrane towards the cytoplasm. Replacing E771 coordination were ILE307O and ALA306 in the first 87 coordination shell, along with two water molecules. By 400 ps, five water molecules comprise the first shell, with carbonyl oxygen atoms from ALA305, ALA306, ILE307,

PRO308, GLU309, carboxyl oxygen atoms of ASN796 and ASN768 at distances of 3.5 to 5 A,˚ indicative of the second coordination shell. Starting at 600 ps, ALA306 and

ILE307 coordinate the first shell for the next 200 ps. Next, the ion was coordinated by an 8th water molecule, and for the remainder of the simulation, only water fills the first shell. This solvation allowed the ion to move more freely towards to the cytoplasm.

From 700 ps to 900 ps, the ion moved 7.5 A˚ normal to the membrane. This movement accounted for the majority of the 10 A˚ it moved in this direction over the course of the simulation. At 800 ps, PRO308, ASP59O, LEU253 began to coordinate the ion loosely in the second shell. The ion moved away from PRO308 and ASP59O at around 1300 ps, but coordination with L253O is maintained until 2000 ps. At 2100 ps, PRO308 and both side chain oxygen atoms of GLU309 joined the second coordination shell of the ion, as shown in Figure 4.5. These oxygen atoms stayed in the 4 to 5 A˚ range for the next 500 ps of the simulation. This result corroborates the speculation that

GLU309 interacts with Ca2+ in the water-accessible vestibule occurring in the E2 conformation as well as coordinating binding site II in the E1 conformation [89]. In this manner, the movement of the Ca2+ is shown to occur between several (ASP59,

LEU253, PRO308, and GLU309) coordinating residues in this vestibule. However, during the simulation period, no protein oxygen atoms replaced a water molecule from the first coordination shell. This scenario allows for the ion to move about 88 in the water filled cavity and interact with the numerous protein oxygen atoms in the region, while simultaneously preventing its own strong coordination with any particular site. The trajectory of the Ca2+ is displayed in Figure 4.2. During a 600 ps extension of the site II Ca2+ simulation (not shown), the ion begins to coordinate

fleetingly with the sidechain oxygen atoms of ASP254. This interaction was indicative

of a possible move away from the binding site, farther into the water vestibule and

towards the cytoplasm. However, this ASP254 coordination was lost and the ion

moved back into a position directly between ASP59 and GLU309. Thus, the ion has

been solvated and has entered this cytoplasmic vestibule, but was not yet released

into the cytoplasm during the simulation. The E2 crystal structure contains no Ca2+

ions and the transmembrane conformation is indicative of its state after Ca2+ release

[91]. The inclusion of a Ca2+ ion at site II in an E2 conformation that is not conducive to Ca2+ occlusion should bias the ion to leave site formed by these E1 binding site

residues. Structural and functional evidence substantiate Ca2+ access between the cytoplasm and binding sites in the E2 conformation [99]. Thus, the E2 conformation of the cytoplasmic pore may be favorable for Ca2+release into the cytoplasm. This experimental evidence is supported by the E2a CAVER analysis as well as the E2b trajectory of the simulated ion. 89

PRO308

TM1 GLU309

TM4

TM2

Figure 4.5: The Ca2+ ion that has moved from site II is shown with coordinating water molecules in red and white. The view is along the lipid bilayer membrane. Water is shown in surface representation (grey-blue) and indicates the wide, solvated vestibule connecting to the cytoplasm in the downward-right direction of this image. Part of TM2 has been removed for . PRO308 and GLU309, with which Ca2+ coordinates, are shown in stick representation.

4.4.6 Ca2+ Ion Occlusion at Site I

In the E2b simulation, two Ca2+ ions were introduced in locations near the residues that coordinate ions in the E1 structure. The site I Ca2+ ion was manually added at the center of carboxyl oxygen atoms of GLU908, THR799, and ASN768. This location was chosen because these atoms, along with sidechain oxygen atoms of GLU771 and

ASP800, coordinate Ca2+ in both the E1 crystal structure [89]andE1simulations

[100]. This ion stayed bound tightly at this location for the entire 3.2 ns simulation.

A representative frame from the trajectory is shown in Figure 4.6. ASN768, a known 90 binding residue in the E1 configuration, was quickly replaced by SER767 for the ma- jority of the simulation time. Other E1 binding residues that did not participate were THR799 and GLU908. Atoms that coordinated this Ca2+ ion for nearly all of the simulation were carbonyl oxygen atoms of VAL795 and SER767 and two pairs of carboxyl oxygen atoms from GLU771 and ASP800 at an average distance of 2.62 A.˚

Of these, only the GLU772 and ASP800 residues bind this ion in the E1 structure.

Besides the first coordination shell oxygen atoms listed above, the same two water molecules coordinated the ion for nearly the entire simulation. This is one more wa- ter molecule than in the E1 crystal structure. These observations indicate that the

E2 binding site has a coordination geometry that is significantly different from the

E1 binding site geometry. This is most likely due to the conformation’s preference for protons over Ca2+, as discussed later in this section. Although the ion remained

bound over the course of this simulation, the coordinating atoms differed significantly

from those that bind the site I ion in the E1 structure. The Ca2+ binding affinity in the E2 state is lower than that in the E1 state [36]. The E2 binding configuration may be such that it facilitates lumenally-directed ion release. For instance, the ad- ditional coordinating water molecule may be indicative of increasing hydration with ion movement away from the occlusion site. Further studies may be better able to investigate this predicted movement. 91

TM8 TM6L VAL795 GLU771

TM2 ASP800 SER767

TM6C

Figure 4.6: Binding site I with Ca2+ in blue is shown from a viewpoint parallel to the bilayer. Coordinating residues are shown in stick form, along with two water molecules in van der Waals representation. Part of TM6 has been removed for better visualization. Lumenal and cytoplasmic ends are indicated on TM6. Grey spheres in the background indicate lipid atoms.

4.4.7 MD Simulations of Ca2+ Ions Placed Along the Puta-

tive Lumenal Pathway

Three short E2P simulations, (E2Pa, E2Pb and E2Pc) with Ca2+ ions placed along the lumenal pathway are shown here from among eight that have been performed.

This approach has been successful in similar work with the H+,K+-ATPase [32].

The starting structure was taken from the 500 ps time frame of the E2P simulation described above. The ions were added manually in different locations of the non- solvated space along the site II lumenal pathway, as determined by CAVER. Steepest 92 descent energy minimization was followed by 20 ps of position-restrained MD with force constants of 1,000 and 1,000,000 kJ/mol/nm2 on the protein’s heavy atoms and the Ca2+ ion respectively. This allowed for the water molecules to equilibrate. The

first of these simulations (E2Pa) showed ion movement towards the lumenal cavity.

GLU90 and VAL300 coordinated the ion with 4 or 5 water molecules for the first 25 ps.

VAL300 coordination was lost and the ion moved noticeably towards the lumen while maintaining coordination with GLU90 and 6 water molecules. In a second simulation

(E2Pb) the ion moved in the other direction to site II and was coordinated by VAL304,

ILE307, GLU309, GLU771, LEU792 and ASN796. This coordination is similar to that shown in the E1 crystal structure [89]. A third simulation (E2Pc) showed a

slight lumenal movement, and coordination with GLU90, VAL93, and PRO789. At

10 ps, a water molecule began to coordinate with the ion on its cytoplasmic side,

which led to ion movement towards site II. A loss of PRO789 coordination at 15 ps

was immediately replaced by GLU309 (part of the known Ca2+ binding site). The ion remained between GLU90, GLU309 and the water molecule for the remainder of the simulation. The GLU90 homolog in the human α1Na+,K+-ATPase is ASN129 and is conserved among multiple type IIc P-type ATPases. It appears that this residue may constitute a molecular “backstop” that controls/facilitates ion transport between binding site II and the solvated space among the lumenal transmembrane loops. From these observations, we predict that mutations that replace GLU90, and possibly VAL93 and PRO789, will produce “leaky” SERCA pumps. The simulation 93 time scales are much shorter than biological time scales of the protein’s pumping cycle. The permeant ion simulations discussed here probe only part of the putative ion permeation pathway. The permeant ions may move quickly through some regions, while spending the great majority of their time in local regions of high affinity. For example, the first of the permeant ion simulations (E2Pa) depicts very fast movement of a Ca2+ ion through a portion of the putative pathway. It then meets a solvated

space where it is coordinated strongly by GLU90 and VAL300. It is probable that

the ion will remain in this location for a time on the order of micro or milliseconds

before moving on. These simulations of ion permeation relate only to the rapid

movement of ions during their access or egress from their binding sites and not to

the slow conformational changes in the protein pumping cycle that are responsible

for gating (alternating accessibility to each side). The simulations do not give insight

into the large, long time-scale changes related to the protein conformational changes

responsible for gating.

4.4.8 Valence Analysis of Ca2+ Occlusion Sites

The timestep of the E2b simulation corresponding to 2600 ps was analyzed with

VALE using a 0.25 A˚ grid. This output showed a small volume of points with an average valence of 2.08 within the van der Waals radius of the water-solvated Ca2+

ion. The Ca2+ at site I was held in a shell with a valence of 1.97. These values

close to 2.0 indicate a strong likelihood of the locus being a binding site. This high 94 valence can be explained by the ions’ influence on their surroundings to create a highly coordinated shell of water molecules. While VALE analysis may be used to search for putative ion binding sites, we have noted before [101] and here that it is difficult to draw any strong conclusions from valence analysis of these proteins unless complementary evidence can be obtained using other methods. The putative binding sites determined by VALE are dependent in part by carboxyl oxygen atoms on fluctuating amino acid sidechains. This creates a large number of false-positive sites determined by VALE. Whereas additional pathways in CAVER analysis help to define the putative pathways, increasing the number of VALE calculations only increases the number of false positive sites in a non-localized fashion.

4.4.9 Electrostatic Calculations of the Transmembrane En-

vironment

Calculations of the E2a simulation were made using APBS. In some instances, the electrostatic data was quite helpful, such as in the elimination of a possible cyto- plasmic pathway as discussed in Section 4.4.4. Overall, the electrostatic calculations provided much less definitive information than the MD ion simulations and CAVER analysis. Results (not shown) from main chain and side chain fluctuations greatly influence the potential landscape at any given time step. Volumetric potentials at many time steps were averaged to counter this problem. However, this caused the po- tential landscape to be not well localized. In this way, electrostatic calculations were 95 subject to the same limitation as the VALE analysis. Studies performed at a later date on the Na+,K+-ATPase (discussed in Section 5.6) were able to show a clearer electrostatic landscape because of an updated methodology.

4.5 Summary

The cytoplasmic and lumenal ion permeation pathways of SERCA have been studied with MD simulations and a variety of computational methods. Permeant ion MD simulations showed stable binding of the site I Ca2+ ion, whereas the site

II ion was released towards the cytoplasm. This movement mostly agreed with the pathway predicted by CAVER. The cytoplasmic CAVER pathways deserve further experimental investigation. The lumenal CAVER pathway in the E2 conformation showed separate paths from sites I and II, which converged before escaping from the protein. E2P analysis by CAVER showed similar, but less restricted pathways, indicating a more open lumenal pathway. Water permeation into the site II pathway suggested that this may be the main route for ion permeation, which agrees with research on the relate Na+,K+-ATPase [72]. Transmembrane helices 1 and 2 may

help to form a water-filled pore in which ions move from the lumenal medium to their

binding sites. Placement of ions along the putative CAVER pathways showed promise

for predicting residues with which the ions interact. GLU90 and VAL93 on TM2 have

been predicted to affect ion transport. Valence and electrostatic tools were more

limited in their usefulness due to delocalization effects. Permeant ion simulations or 96 other related methods should allow further refinement of our knowledge concerning the access and egress pathways to and from the ion binding sites in SERCA and related P-type ATPases. 97 Chapter 5

MD Simulation and Analysis of the

Na+,K+-ATPase

5.1 Introduction

This chapter builds on the simulation and analysis methodology developed in the previous chapter on the SERCA protein. These Na+,K+-ATPase simulations refined the techniques explored during the SERCA research. Simulations study the Na+,K+-

ATPase homology models prepared in Chapter 3. Three physiologically relevant conformational states were used as structural templates: E1, E2, and E2P (see Figure

1.4).

The amino acid residues which bind cations in the simulations of the E1 and E2 states were analyzed to determine their similarity to SERCA and Na+,K+-ATPase structures as well as previous homology modeling research. The SERCA E1 crystal structure with bound Ca2+ ions presented an opportunity to study the homologous

Na+ ions at sites I and II as well as the Na+,K+-ATPase’s unique third Na+ site III location. The E2 Na+,K+-ATPase homology models were used to study the pair of

K+ ion binding sites. The pathway analysis software, CAVER, was run on a second

set of simulations in order to examine the lumenal ion pathway in the E2P form 98 and the intracellular ion pathway in the E1 form. A third set of simulations were performed to conduct electrostatic analysis of the E1 and E2P forms. Table 5.1 lists simulations discussed in this chapter.

5.2 Na+ Binding Sites as Determined by E1 Sim-

ulation

As discussed in Section 3.7, 3 ns of fully unrestrained MD (simE1a) were per-

formed to bring the E1 Na+,K+-ATPase homology model system to an acceptable equilibration state. The atomic coordinates from the timeframe at 3 ns were used as the initial conditions for the simulation simE1b. Three Na+ ions were introduced into the E1 system to determine the binding sites’ proximity to transmembrane helices and to specific amino acids. The total system charge was kept neutral in all simu- lations with added ions. Na+ ions were manually positioned approximately between the following residues:ASN744, GLU747, THR775, ASP776, and VAL888 for site I;

VAL290, ALA291, VAL293, GLU295, ASP772, and ASP776 for site II; TYR739,

GLY774, THR775, and GLU922 for site III. The amino acid residues for sites I and II were chosen because they are the homologues of residues that bind Ca2+ ions in the

E1 SERCA structure [89] and the cation binding sites in the Na+,K+-ATPase have been proposed to be homologous to SERCA’s calcium sites [102]. The residues which formed the initial location of the ion at site III were predicted via valence analysis of 99

Table 5.1. Overview of Na+,K+-ATPase Simulations

ID State Length Init. Sim. Restrained Field Ions Text

simE1PR E1 0.25 ns EM lipid & backbone none none none

simE1a E1 3 ns simE1PR none none none 5.2

simE1b E1 2 ns simE1a none none 3 Na+ 5.2.1

simE1c E1 2 ns simE1a none none 3 Na+ 5.3

simE1e E1 5 ns simE1a none none none 5.5.1

simE1f E1 3 ns simE1a backbone 200 mV none 5.6

simE2PR E2 0.25 ns EM lipid & backbone none none none

simE2a E2 3 ns simE2PR none none none 5.2

simE2b E2 2 ns simE2a none none 2 K+ 5.4

simE2c E2 1 ns simE2a none none 2 K+ 5.4.2

simE2PPR E2P 0.5 ns EM lipid & backbone none none none

simE2Pa E2P 3 ns simE2PR none none none 5.2

simE2Pb E2P 5 ns simE2Pa none none none 5.5.2

simE2Pc E2P 3 ns simE2Pa backbone 200 mV none 5.6

Note. — ‘PR’ in simulation ID indicates position restrained simulation. ‘Init. Sim.’ means initial coordinates for given simulation; ‘EM’ refers to energy minimized structure produced during equilibration phase. Other columns indicate: atoms which were position-restrained during the simulation, applied electric field, ions placed at binding sites, and section in the text where the simulation is discussed. 100

GLU747 ALA291 2 1 VAL290 ASP772

2 I 1 THR775 II VAL293 2 2 1 ASP776 1 GLU295

Figure 5.1: Binding sites I and II. TM helices 4, 5, and 6 are shown as ribbons in red, orange, and yellow, respectively. In VDW representation, the Na+ ions(IandII) are shown at 50% scale and oxygen atoms coordinating ions are shown at 20% scale. Amino acid residues that contain coordinating oxygen atoms are labeled. Residues with two coordinating carboxyl oxygen atoms are numbered as described in the text. Distance between Na+ ions is 4.5 A.˚ aNa+,K+-ATPase homology model based on SERCA [59]. However, the sidechains

believed to create the binding site in that homology model were manually adjusted

to ensure high affinity environments for the ions. The aim of this approach is to

determine ion binding residues through simulation with a minimal amount of manual

intervention. 101

5.2.1 Na+ Ion Binding without Water

Na+ ions were added as described above to initialize simE1b. The system was energy minimized and then subjected to a 20 ps simulation in which all protein atoms and ions were restrained. Unrestrained simulation of length 2 ns was started which resulted in a small movement of ions during the first 100 ps of simulation.

During this time, the ions at site I and II moved relatively small distances: 2.0 Aand˚

0.76 A,˚ respectively. The ions’ locations relative to the protein are shown in Figure

5.1. Coordination of the ions remained stable during this time as shown in Figures

5.2 and 5.3. In all of the ion binding discussions, a focus was maintained on main

chain carbonyl oxygen atoms and side chain carboxyl oxygen atoms. This is because

because a bound Na+ ion should be surrounded by oxygen atoms [103]. Nitrogen and carbon atoms appeared in atom distribution plots (not shown) near bound Na+ ions but were due to supporting protein structures and did not coordinate Na+ directly

[104].

5.2.2 Na+ Site I Binding Site

Coordination of the site I Na+ ion during the simulation was provided by five carboxyl oxygen atoms of GLU747 (residue 771 in SERCA PDB ID: 1SU4), THR775

(799), and ASP776 (800). Coordination was also provided a carboxyl oxygen atom of ASP772 (796) and a carbonyl oxygen atom of ALA291 (330). The number of coordinating ligands agreed with empirical results of crystallographic structures of 102

Table 5.2. Residues Coordinating the Na+ Ion at Site I.

TM simE1b simE1c SERCA E1 Ogawa HM

4 ALA291(O) ALA291(O) – –

5 – – ASN768(X) ASN783(X)

5 – SER743(X) – –

5 GLU747(2X) GLU747(2X) GLU771(X) GLU786(X)

6 ASP772(X) – – –

6 THR775(X) THR775(X) THR799(X) THR814(X)

6 ASP776(2X) ASP776(2X) ASP800(X) ASP815(X)

8 – – GLU908(X) GLN930(X)

NA – 2 H2O2H2O1H2O

Note. — Simulations with (simE1c) and without (simE1b) water with

SERCA structure [89] and other homology modeling work [59]. Homol- ogous residues listed in the same row. Coordinating atoms shown in parenthesis: main chain oxygen (O), side chain oxygen (X), both side chain oxygen atoms (2x). Numbering convention given in Table 3.1.Left column lists the transmembrane helix. 103

Figure 5.2: Plot showing distances between site I Na+ and coordinating oxygen atoms during simE1b. The group of coordinating atoms at distance near 2.4 A˚ is indicative of the Na+ ion’s first coordination shell. Atoms near the 4.3 A˚ distance coordinated the ion’s second coordination shell. The carboxyl oxygen atoms of ASP776 alter- nated coordination for the majority of the simulation before relaxing to a more stable orientation.

enzymes activated by Na+ binding [105]. In that study, the number of Na+ ligands ranged from four to six. This simulation indicated coordination by seven atoms, but close inspection of distances between the site I Na+ and coordinating carboxyl oxygen atoms of GLU747 and THR775 indicates they were effectively competing for coordination, as shown by the fluctuation of coordination distances at 1600 ps and

1800 ps in Figure 5.2. Details regarding the residues that coordinate the site I Na+

ion are listed in Table 5.2 and key differences are discussed below. The simulations agreed well with previous predictions [62] regarding the interactions of key residues, such as GLU747 and ASP800. This result indicated a reliable homology model and stable simulation system. 104

Present in SERCA site I coordination, but lacking in this simulation was site I interaction with GLN891 (GLU908) on transmembrane helix (TM) 8, most likely due to a shift in alignment in this transmembrane helix (Section 3.5) from a consensus alignment of Na+,K+-ATPase, H+,K+-ATPase, and SERCA. This alignment shift moved the residue in the homology model away from binding site I. The site I Ca2+

in the SERCA E1 state was located directly between TM5 and TM6. This simulation,

as well as the newly released Na+,K+-ATPase structure with bound K+ congeners indicated that the site I cation may occupy a location not directly between TM5 and TM6, but more towards site II and away from TM8. This observation would explain why the simulated Na+ ion at site I did not interact with residues located on TM8; namely GLN891. The mutation between glutamine in Na+,K+-ATPase and glutamate in SERCA, besides the difference of one less carboxyl oxygen atom in the case of glutamine, indicated that this residue may play a reduced role in the binding of the site I cation in the Na+,K+-ATPase as well as in the H+,K+-ATPase.

The overall difference in the Na+,K+-ATPase’s binding cavity due to the presence of a third cation may account for the change in binding coordination, since site III is believed to be located generally between site I and TM8 [106]. Glutamine 891 is essential for Na+,K+-ATPase activity and has been shown to play a role in site

III coordination rather than site I as previously believed [67]. Previous homology

modeling studies that lacked molecular dynamic simulations implicated GLN891 in

Na+ site I coordination. However, the absence of the site I Na+ ion coordination by 105

GLN891 is supported by functional studies and corroborated by the results of the simulation approach of this research. Site I coordination by ASP772 in simE1b was not found in SERCA, but this residue coordinates site II in the simulation. As shown in Figure 5.1, the proximity of ASP772 to both ions implies that minor perturbations in the binding cavity may lead to this discrepancy. ASP772 is required for normal

Na+ transport [107]. An observation of these simulations is that the ALA291 residue

coordinates the site I Na+ in the simE1b (as well as simE1c discussed below). In

SERCA, the homologue of ALA291 coordinates the site II ion [108]. Figure 5.1 shows the proximity of ALA291 to both ions in the E1 homology model. While this residue’s location clearly indicates its significance to the cation binding cavity, further study is needed to better determine the exact nature of its role in the Na+,K+-ATPase.

There exists good agreement between the simulation results and previous research from structural, functional, and modeling studies. However, several residues, SER743,

ASN744, and especially GLU891 deserve further refinement of their proposed role in

Na+ binding.

5.2.3 Na+ Site II Binding Site

The location of the site II Na+ ion binding site is generally accepted to be between

TM5 and an unwound section of TM4 corresponding to the PEGL motif [62]. In sim- ulation simE1b, the site II Na+ ion was coordinated by carbonyl oxygen atoms of

VAL290 (304) and VAL293 (307) along with two pairs of carboxyl oxygen atoms con- 106

Figure 5.3: Plot showing distances between site II Na+ ion and coordinating oxygen atoms during simE1b. The group of coordinating atoms at a distance near 2.4 Ais˚ indicative of the Na+ ion’s first coordination shell.

tributed by ASP772 (796) and GLU295 (309) as shown in Table 5.3. The homologues

of both valine residues, the glutatmate residue, and the aspartate residue form the

binding coordination shell for the site II Ca2+ ion in the X-ray structure of SERCA.

This result indicates strong agreement between the site II binding configuration of

SERCA and homologous residues in the Na+,K+-ATPase.

The binding residues in the simulation were also the same as those predicted in work by both Rakowski and Ogawa [38, 59]. The binding site is slightly different because that work also predicted coordination by ALA291 (330) and ASP776 (815).

However, in the simE1b simulation, ALA291 coordinates site I rather than site II.

The carbonyl oxygen atom lies less than 4 A˚ from the ion at site II, indicating that a minor perturbation in the transmembrane backbone structure may cause the alanine residue to coordinate either ion. Similarly, the ASP776 residue (ASP800 in SERCA), 107

Table 5.3. Residues Coordinating the Na+ Ion at Site II.

TM simE1b simE1c SERCA E1 Ogawa HM

2 – PHE107(O) – –

4 VAL290(O) VAL290(O) VAL304(O) VAL329(O)

4 – – ALA305(O) ALA330(O)

4 VAL293(O) VAL293(O) ILE307(O) ALA332(O)

4 GLU295(2X) GLU295(X) GLU309(X) GLU334(X)

6 ASP772(2X) ASP772(2X) ASN796(X) ASP811(X)

6 – – ASP800(X) ASP815(X)

Note. — SimE1b has no water molecules in the binding cavity, while

simE1chadwateraddedasdiscussedinthetext.Rightmostcolumns

are SERCA E1 structure [89] and other homology modeling work [59].

Table conventions are the same as those described in Table 5.2.

which coordinates site I and site II ions in SERCA (and site I Na+ ioninsimE1b), is only 4.2 A˚ from the site II Na+ ion in simE1b. ASP776 is required for normal cation transport in the Na+,K+-ATPase [107] and its proximity to both ions is shown in Figure 5.1. Though known to affect cation transport [109], due to its proximity to both site I and II, the role of ALA291 is not clearly defined and needs further investigation. Does this residue bind the same ion always, or is it simply a piece of a generalized binding cavity? 108

Table 5.4. Residues Coordinating the Na+ Ion at Site III.

TM simE1b simE1c Rakowski HM SERCA E1 Ogawa HM

5 TYR739(X) – TYR778(X) TYR763(X) TYR778(X)

6 GLY774(O) (GLY774(O)) GLY813(O) VAL798(O) GLY813(O)

6 THR775(O) (THR775(O)) THR814(O) THR799(X) THR814(O)

9 GLU922(2X) GLU922(O,2X) GLU961(X) SER940 GLU961(X)

9 – THR923(X) – – –

NA – 1 H2O2H2O2H2O2H2O

Note. — Residues involved in coordinating the Na+ at site III. Table conventions are the

same as those described in Table 5.2. Residues for SERCA [89] are direct homologues of

the homology modeling predictions of the homology modeling results produced separately

by Rakowski [38]andOgawa[59].

5.2.4 Na+ Site III Binding Site

Although the binding residues of the first two Na+ binding sites are supported by multiple mutagenesis studies [6], the exact nature of third binding site has re- mained elusive and has had multiple proposals [110, 59, 38, 106], one which is located among TMs 5, 6, 8, and 9. In simE1b, the ion added to investigate site III was coordinated by the residues listed in Table 5.4. These residues matched results from other homology modeling work [38, 59]. The MD simulation was able to determine 109 these residues without inclusion of water molecules in the binding cavity. Therefore, a strong advantage of this methodology is that it does not require aprioriknowledge of the number, location, or orientation of water molecules to accurately predict cation binding residues. This result is significant because the site III Na+ is believed to be coordinated with two water molecules [38]. Two mutagenesis studies have impli- cated the GLU922 residue in a Na+ specific site [67, 6]. TYR739PHE, GLY774ALA,

THR775ALA, and GLU922 mutants changed the affinity for inracellular and extra- cellular Na+ and strongly suggest that the mutations altered the third Na+ binding site [106]. The results from simE1b corroborated these experiments regarding the involvement of these residues for Na+site III and suggested that the third Na+ ion is coordinated by TYR739, GLY774, THR775, and GLU922. The exact nature of this binding site is complicated by the strong possibility of water interaction. A more thorough analysis is called for regarding these four residues, as well as the TYR739 residue, and various scenarios of water and ion binding configurations. Replicating the exact role of water interaction would in turn help clarify protein residue interactions.

Simulation analysis is a strong candidate for this because, unlike many experimental methods, explicit water molecules may be studied. Finally, this analysis begs for further investigation to determine what, if any, passage exists between the site III cavity and the cavity for sites I and II. 110

5.3 Na+ Ion Binding with Water in E1 Conforma-

tion

To determine the effect of water molecules on the binding configuration of the

E1 Na+,K+-ATPase, three Na+ ions and three water molecules were added in the binding cavity to set the initial conditions for simE1c. The three Na+ ions had the same starting coordinates as simE1b, but water molecules were introduced between the ions at sites I and II, sites II and III, and near site III. This configuration was chosen to investigate the hypothesis that the site I Na+ ion is coordinated with one water molecule and the site III Na+ ion is coordinated with two water molecules

[38, 111]. An equilibration phase of energy minimization and 20 ps of MD simulation with the protein backbone and Na+ ions restrained were performed to remove steric overlaps and allow water molecules to relax.

5.3.1 Na+ Sites I and II with Water

As shown in Tables 5.2 and 5.3, the protein coordination of the site I and site II ions in simE1c with water was very similar to the coordination found in simE1b without water. One water molecule coordinated with the ion closest to site III while two water molecules coordinated with the ion at site I. One of these water molecules near site I took a position near the location of the ASP772 side chain. This aspartic acid residue dually coordinated site I and II Ca2+ ions and Rb+ (a K+ congener) ions in the 111

SERCA E1 and Na+,K+-ATPase E2P X-ray crystal structures, respectively [89, 58].

The aspartatic acid residue also coordinated both Na+ ions in the E1 simulation without water molecules (simE1b). However, with water (simE1c), the side chain was pushed towards the site II ion, explaining the absence of coordination between

ASP772 and the site I ion in simE1c. This movement in turn allowed only one carboxyl oxygen atom of the GLU295 residue to coordinate the site II ion. In simE1c, site I and II ions were 4.5 A˚ away from each other. In the E1 SERCA crystal structure, this distance was 5.7 A,˚ which was due to the site I Ca2+ ion’s location directly between TM5 and TM6. As shown in 5.1,thesiteINa+ took a position closer towards the TM4 helix. This binding configuration was very similar to to that of the

X-ray structure of Na+,K+-ATPase with bound K+ congeners. SimE1c predicted a site I binding location not directly between TMs 5 and 6 as in SERCA, but slightly towards TM4 and site II. This cation binding location was very similar to that found in the K+ (Rb+) binding sites of the Na+,K+-ATPase E2P crystal structure [58].

The Ca2+ ion at site I of SERCA was coordinated with a water molecule and this result was supported by related homology model work by Ogawa [59]. While the crystallographic result shows that Ca2+ in this site was coordinated with a water molecule, it is possible that the homologous site in the Na+,K+-ATPase may have a different construction. The change in position from the confined space between TM5 and TM6 also suggested that two water molecules may coordinate the site I Na+ ion, as indicated by simE1c. 112

5.3.2 Na+ Site III Binding with Water

During simE1c, the water molecule added near the site III Na+ took the place of the THR775 carboxyl oxygen atom, which pushed this Na+ ion towards GLU922 and had the subsequent effect of introducing coordination with TYR923. THR775 then coordinated the ion via the water molecule. The site III ion began close to TYR739, which was predicted to bind Na+ in other homology modeling work [59], but moved up to 7 A˚ away and remained at this distance. The introduction of TYR923 appeared to come at the expense of TYR739’s involvement. Distances of all 5 oxygen atoms involved in site III Na+ coordination (not shown) settled down to the 2.30 to 2.35

A˚ range nearly immediately during the simulation. The relatively large size of the putative site III binding cavity made determination of water molecules especially difficult because of the wide range of possible locations and orientations. Besides the difference in tyrosine coordination, the pair of simulations with and without water agreed very well and corroborated previous homology modeling work [38, 59]. Further discussion of open questions regarding the nature of site III was giving in Section 5.2.4.

5.3.3 Water Involvement in Na+ Binding Sites

In the SERCA E1 structure, there are two water molecules which coordinate the site I Ca2+. In the E2 structure, there exist five water molecules in the binding cavity, presumably in a state after Ca2+ release [90]. No water molecules were resolved in

the Na+,K+-ATPase structure [58]. There is little consensus regarding the number 113 of water molecules involved in ion binding in the major conformational states and among the P-type ATPases in general. Yet, cation binding site affinity is highly de- pendent on water involvement [94] and cation selectivity may be a product of subtle differences in positions of side chains and water molecules [58]. Furthermore, carbonyl and carboxyl oxygen atoms may indirectly coordinate a cation via a water molecule.

Therefore, awareness of a particular amino acid’s importance (via, for instance, mu- tatgenesis experiments) on ion binding may come by way of an oxygen-water-cation arrangement. This situation complicates the process of accurately including water molecules in simulations. Information regarding water involvement in binding sites has generally been a product of X-ray crystallography structures because of the dif-

ficulty in determining water involvement by techniques such as electrophysiological experiments.

SimE1c investigated the role of water in the Na+ binding sites in the E1 con- formation. The site I ion was located between TMs 4, 5, and 6, which allowed the ion space to coordinate with two water molecules. Overall, protein ligature of site I and site II ions was very similar to the simulation without water (simE1b). Inclusion of water molecules in some cases caused indirect coordination by amino acids which had directly coordinated ions in simE1b. With water molecules added manually in the binding cavity, the overall ion coordination may be reliant in part on the initial locations of the protein, water molecules, and ions, even with relatively long timescale simulations. In particular, the site III ion binding configuration indicated that a water 114 molecule may change binding coordination. Further study is required to determine the reliability of water involvement via MD simulations such as those presented here.

Despite this drawback, the difficulty of garnering water involvement data via exper- imental methods implies that computational work such as that presented here may provide a new method in the establishment of water involvement in cation binding sites of P-type ATPases.

5.4 K+ Binding Sites Determined by Simulation of

E2 Conformation

SERCA has been crystallized in an E2 conformation (PDB ID: 2AGV) [90]which

represents the protein after uptake of extracellular cations as illustrated in Figure 1.4.

The counterpart of Ca2+ ions in SERCA are H+ ions, which cannot be resolved by

X-ray diffraction. Additionally, SERCA may transport either 2 or 3 protons per cycle

[112]. Therefore, the detailed cation binding configuration of the E2 state is not clear.

In order to investigate the Na+,K+-ATPase binding sites in E2 conformation, two K+

ions were placed at the geometric centers of residues ASN744, GLU747, THR775,

ASP776, and VAL888 for site I and VAL290, ALA291, VAL293, GLU295, ASP772, and ASP776 for site II. These residues’ homologues coordinated Ca2+ in the SERCA

E1 structure. Because of the lack of resolved bound cations in the E2 structure, the E1 binding residues provide a reasonable starting point because the binding residues are 115

Table 5.5. Residues Involved in Coordinating the K+ Ion at Site I.

TM simE2b simE2c Na+,K+-ATPase E2P Ogawa HM

5 – ILE771(O) – –

5 – THR740(O,X) THR772(O,X) –

5 – – LEU773(O) –

5 – SER743(O,X) SER775(O,X) SER782

5 – ASN744(X) ASN776(X) –

6 THR775(O,X) – – –

6 – – GLU779(2X) GLU786

6–––ASP811

6 ASP776(2X) ASP776(2X) ASP808(X) ASP815

8 PHE884(X) – – –

8–––GLN930

Note. — Residues determined by other homology modeling work are shown in the rightmost column [59]. Initial position of ion has a significant effect on coordination, as evidenced by results of simE2b and simE2c. Table conventions are the same as those described in Table 5.2. 116

Table 5.6. Residues Involved in Coordinating the K+ Ion at Site II.

TM simE2b simE2c Na+,K+-ATPase E2P Ogawa HM

4 (VAL290(O)) (VAL290(O)) VAL322(O) VAL329

4 – – – ALA330

4 (VAL293(O)) VAL293(O) VAL325(O) VAL332

4 (GLU295(X)) GLU295(X) GLU327(2X) –

5 (ASN744(X)) (ASN744(N)) ASN776(X) ASN783

6 – – GLU779(X) GLU786

6 LEU769(O) (LEU769(O)) – –

6 ASP772(2X) ASP772(X) ASP804(2X) ASP811

NA 3 H2O2H2O– 1H2O

Note. — Residues in parenthesis coordinate the ion indirectly. Open intracellular

pathway facilitated water movement to site II. Residues determined by Na+,K+-

ATPase X-ray structure [58] and other work [59] are shown. Table conventions are

the same as those described in Table 5.2. believed to be shared in the E1 and E2 conformations [1]. Simulations were executed in the same manner of those described in Section 5.2.1. Residues that coordinated

K+ ions during the simulations are shown in Tables 5.5 and 5.6. 117

772 II

ASP772

Figure 5.4: Solvated pathway extending from cytoplasm to binding site II in the 2AGV conformation. Water in blue surface representation and VDW representation for three water molecules that coordinated the K+ at site II. Transmembrane helices are shown in ribbon form and are colored as follows: TM1 green, TM2 blue, TM3 purple, TM4 red, TM5 orange, TM6 yellow. ASP772 (GLU295) is shown in yellow (red) stick representation with carboxyl oxygen atoms as red spheres. VAL293 (not shown) is behind the site II K+ ion. Lipid molecules are shown in VDW spheres with grey tails, and red and brown phosphate heads. The volume of water shown connects with the cytoplasm (not shown) to the lower right. 118

5.4.1 K+ Site II Binding

Coordinating residues of the site II K+ ion in the E2 conformation are shown in

Table 5.6 and indicate a strong agreement between the simE2b result and Na+,K+-

ATPase E2P crystal structure [58]. Only one residue in the crystal structure did not coordinate ions in the simulation. The sidechain of residue GLU747 (779 in their numbering), which also coordinated the site I ion in the X-ray structure, was directed towards the lumen, away from the binding cavity. Water molecules initially located in the permeated region moved during the simulation to coordinate with the K+ near site II. Water on the cytoplasmic side had permeated into the protein region among

TMs 1-4 during the equilibration phase. It is possible that simulation of the homology model based on the E2 state has yielded a more open intracellular pathway by relaxing towards the E1 state. For the majority of the simulation, the K+ ion was coordinated by two water molecules. A third water molecule coordinated the site II K+ ion for last 200 ps of simE2c. The permeated region corresponded to a great degree with the solvated intracellular pathway suggested by the E1 conformation simulations of

SERCA (Section 4.4.1)andtheNa+,K+-ATPase (Section 5.5.1). This region has been proposed as an intracellular ion pathway in SERCA due to its structural accessibility with site II [89]. Though the ion was well-solvated, extended simulations (not included in Table 5.6) of up to 4 ns, as well as applied electric fields were unable to break the coordination between the ion and the ASP772 residue. Overall, the simulation methodology produced results that strongly corresponded to the Na+,K+-ATPase 119

a)V290 b) V290

II I II V293 I D772 D772 V293 D776 E295 D776 E295

Figure 5.5: Beginning (a) and end (b) of simE2c. Site I and II K+ ions are shown with nearby residues. Key residues are labeled. Thick residues are VAL290 and VAL293. SimE2b and simE2c indicated that these valines initially coordinated the site II K+ ion, but were most likely to be replaced in the coordination shell by water molecules from the putative intracellular pathway. Water movement shown may be representative of the first steps of intracellular K+ release (towards the lower right of the image). crystal structure. Water permeation suggested that a solvated intracellular pathway among TMs 1-4 leads to binding site II.

5.4.2 K+ Site I Binding

During simE2b, the K+ ion at site I moved close to TM8, near the putative Na+

site III. This was most likely because the initial location of this ion was based on the

SERCA site I Ca2+ binding residues (see Section 5.4). Of the six residues that bind

K+ site I in the Na+,K+-ATPase structure, only 3 (ASN776, GLU779, and ASP808) have homologues which bind Ca2+ site I in SERCA E1 conformation. This observa- tion indicated that there may be significant differences in the specific coordination 120 structures among P-type ATPases. The K+ ion that began among the putative site

I binding residues moved over 10 A˚ away and was coordinated by residues similar to thesiteIIINa+ ion in simE1b Table 5.5. Therefore, a second simulation (simE2c)

was run with the same initial conditions except that the position of the ion was ad-

justed 3.6 A˚ to be located more directly between TMs 5 and 6 in accordance with

the Na+,K+-ATPase X-ray structure. All other initial conditions prior to EM and 20 ps of PR MD were the same. The site II K+ had identical protein binding residues, although a slight reorientation of the two coordinating water molecules caused very minor differences in the exact coordination shell as listed in the coordination table.

In this case the site I K+ ion maintained its position (Figure 5.5) between TMs 5

and 6 which generally agreed with the Na+,K+-ATPase X-ray structure. Although, in the X-ray structure, the ions are essentially touching. The 3.6 A˚ difference in initial location is small considering that the Van der Waals radius of K+ is 2.75 A.˚ The result indicated the importance of appropriate placement of ions and subsequent validation of ion binding sites in this type investigation using MD simulation. This simulation also indicated how water molecules may penetrate site II in order to solvate this ion for intracellular release. The K+ site I results of simE2c corroborate the overall binding conformation elucidated by the Na+,K+-ATPase X-ray structure. 121 5.5 Steric Investigation of Ion Permeation Path-

ways

The protein analysis tool, CAVER, was used to determine buried pathways from locations inside the protein, specifically the ion binding cavity, to the outside of the protein. CAVER explored the three-dimensional protein structure for unoccupied space having a minimum diameter of 1 A.˚ The space was followed along a continuous pathway and success declared if the program managed to find a vacant route from the binding site (starting point) to a point on either the inside or outside surface of the protein in contact with the aqueous medium. Details of CAVER analysis are given in

Section C.9. This pathway need not be large enough to unequivocally accommodate aNa+ or K+ ion. Interaction between the protein and ion can facilitate ion trans- port through narrow regions, as depicted in, for instance, in the K+ channel [35].

CAVER analysis was performed separately on 5 ns of the E1 unrestrained simula- tion (simE1e) and 5 ns of the E2P unrestrained simulation (simE2Pb). Each showed a stabilized backbone structure according to RMSD of the backbone carbon atoms

(Figure 3.5). All timeframe coordinates from the simulation were superposed onto the orginal timeframe using the α-carbon atoms of the transmembrane helices. This step removed pathway delocalization due to protein translation (predominately in the membrane plane) during the simulation. From the full 5 ns trajectory, timeframes at

25 ps intervals were extracted for CAVER analysis. 122

5.5.1 E1 Conformation

Because of the difficulty of investigating intracellular ion transport via experimen- tal methods, significant debate remains regarding the intracellular Na+ pathway [106].

The E1 conformation was analyzed in an attempt to determine an intracellular path- way from the cytoplasm to the binding cavity. CAVER was run using approximate locations of each of the three putative binding site locations as starting positions.

Of the pathways originating from the area of the binding cavity associated with site

I, the vast majority moved through Site III, indicative of an open volume between the two locations. With a CAVER origin near Site II, the majority of paths moved towards the cytoplasm between TMs 2 and 3 and under the kink of TM 1 near the intracellular side of the lipid membrane. This result supported a similar proposal for the SERCA intracellular pathway based on structural data [89] and it may be possi- ble that other P-type ATPases share this characteristic. This result was essentially the same as the CAVER analysis of the SERCA simulation in E1 conformation (sim.

E1; Figure 4.2). Site III analysis was more limited. Due in part to the location’s proximity to the membrane, CAVER was unsuccessful in finding additional pathway information, rendering it unable to support or contradict a putative pathway near the

TM6-7 loop [106, 113]. Computational methodologies involving MD simulation and

CAVER pathway analysis were able to add validity to previous claims regarding the intracellular ion pathway in Na+,K+-ATPase. The data supports a solvated vestibule which leads from the A domain towards site II. 123 a) b) ASP884,885

PHE284

Figure 5.6: Two views of the transmembrane region with two pathways (pathA in blue and pathB in green) representative of all extracellular pathways found from CAVER analysis of simE2Pb. View in b) is rotated approximately 45 degrees to the right from view in a). Paths diverge around GLU747, which was shown to bind Ca2+ in the SERCA E1 structure. TM 2 has been removed for clarity. Residues known to affect ion transport and/or binding are shown in licorice repre- sentation. With corresponding colors, they are: TM2 (tan) LEU65; TM3/4 loop (green) GLU275,TYR276, TRP278; TM4 (red) GLU280, ILE283, PHE284, GLY287, VAL290, GLU295, GLY296; TM5 (orange) TYR739, LEU741, SER743, GLU747, PRO750; TM5/6 loop (cyan) ALA757, PRO760, LEU761, PRO762, GLY764; TM6 (yellow) THR765, VAL766, ASP772, ASP776; TM7/8 loop(blue) ASP852, ASP853. Phosphor atoms from lipid heads, indicating the extent of the membrane, are shown as grey spheres.

5.5.2 E2P Conformation

The most straightforward view of ion permeation through the transmembrane region is that of a single pathway from the cytosplasm to extracellular space with a central binding cavity. On the extracellular side, this hypothesis is supported by the observation of sequential Na+ release [95, 114]. Out of 1000 (50 per timeframe) 124

CAVER trials that originated near site II, only one was successful in finding a pathway to the extracellular space. This may be interpreted that CAVER was unable to

find a pathway leading away from site II in the E2P Na+,K+-ATPase model. This result is believed to be due in part to the SERCA structure used as template for the E2P Na+,K+-ATPase homology model. Because this is a ‘pseudo’ E2P structure,

the extracellular pathway is not fully open [91, 102]. The single CAVER pathway

extended from site II adjacent to TMs 1 and 2, similar to SERCA results (Section

4.4.2). Starting CAVER analysis from a position midway between sites I and III

yielded several dozen pathways which extended between TMs 4, 5, and 6. Near the

lumen, the paths (shown in Figure 5.6) diverged into two, designated pathA and

pathB. Residues in the TM7/8 loop are known to affect Na+ transport but not K+

[115], while on the other side of the lumen the TM3/4 loop was found to be a strong

determiner of K+ affinity, but not Na+ affinity [116]. Both pathways were adjacent to

PHE284 at the N-terminus of TM4, which has been proposed as an external gate to the extracellular pathway [96]. The pathways were strongly correlated with Na+,K+-

ATPase residues and their homologues in SERCA and the H+,K+-ATPase which are known to affect ion transport [73, 115, 116, 68, 98, 96, 117, 118, 67, 119, 109].

The consolidated grouping of extracellular pathways originating near site III sup- ported the proposal that the Na+ ion from site III is the first to be released [106].

This conclusion is suggested because release of the first Na+ ion is associated with a large charge movement and is followed by binding site reorganization [120]. The 125 high-field access pathway to the Na+ site is modified and the remaining two Na+

are released with little charge translocation [120]. The consensus pathway is also in agreement with a recently released SERCA structure (PDB ID: 3B9B) in which the protein adopted an open lumenal pathway [102]. Although investigation of this structure was too recent (December, 2007) for detailed inclusion in this work, it is discussed in the recommendations for future work (6.2.2). Visual inspection of this structure indicated a wide pore between TMs 4, 5, 6 which stretched from the bind- ing cavity to the lumen. This experimental result strongly agreed with the CAVER pathway analysis on the simulations of the Na+,K+-ATPase.

5.6 Electrostatic Analysis of E1 and E2P Models

Electrostatic analysis of the simulation space was determined for the E1 (simE1f) and E2P (simE2Pc) conformations. For each simulation, 3 ns of restrained MD was run starting with the atomic coordinates and velocities from the last timeframe of the fully unrestrained equilibration simulations (simE1a and simE2Pa, respectively).

Protein backbone atoms were restrained as described in similar work [51]. Simulations

were run with an applied electric field which simulated a negative potential inside the

cell. An electric field of 0.0528 V/nm over the 3.8 nm membrane provided a 200 mV

potential to the entire 16.8 nm simulation axis normal to the membrane. A choice of

potential slightly larger than that found in vivo allowed the electrostatic landscape

under investigation to be intensified. The potential is well within the bounds of 126 simulation methodologies [51]. A Particle Mesh Ewald (PME) approach was utilized via a modified version of the PMEpot VMD software plugin kindly provided by Marcos

Sotomayor [51]. The arrangment of atoms resulting from the applied field provided the reaction potential and was calculated with Poisson’s equation to determine the electrostatic potential based on atomic charge coordinates. The modified PMEpot plugin augmented this potential by summing it with a graduated applied potential to determine the total resultant potential throughout each simulation system. Cross- sections of electrostatic data through the membrane were inspected to ensure that a stable system with applied field was obtained. The first ns of each simulation was treated as an equilibration period; therefore this data was not included in the analysis.

Electrostatic potentials were calculated on a grid of 112 x 112 x 132 points, allowing a grid density of one point per 1.7 A˚3. Point charges were delocalized with a Ewald smoothing factor of 0.25. Each 2 ns of simulation produced 1000 timeframes at 2 ps intervals. Further details of the analysis is provided in Section C.10.

5.6.1 Electrostatic Pathway Analysis

Several pertinent observations were made from the electrostatic data. In the E1 conformation, a region of large negative potential corresponding to the cation binding cavity transmembrane region was evident. As shown in Figures 5.7 and 5.8,thereisa region of strong negative potential that leads from the intracellular A domain to the binding cavities. This pathway between TMs 1 and 4 was postulated based on the X- 127

0.5 0.0 [V] -0.5 -1.0

N P

A

1 nm

Figure 5.7: Three-dimensional landscape of a slice of electrostatic data which shows the negativity of the postulated intracellular pathway in the E1 conformation. The slice cuts between TM1 and TM4 and crosses the cation binding pocket. Intracellular domains are labeled; dashed line indicates putative intracellular cation pathway. A 200 mV electric field in the direction of the cytoplasm was applied during the 3 ns simulation. Image created with OpenDX [121].

ray crystal structure of SERCA [89]. The electrostatic analysis corroborated CAVER

pathway data for the E1 form of Na+,K+-ATPase (Section 5.5.1) and water solvation analysis for the E2 form of Na+,K+-ATPase (Section 5.4), as well as the E2 form of

SERCA (Section 4.4.1). This region has a close proximity to TM1, which has been proposed to be responsible for closing the intracellular gate [122, 123]. This research has supported the theory that this is the most likely cation access/egress pathway 128

a) b) c)

abcdef

0.5 0.0 [V] -0.5 -1.0 d) e) f)

Figure 5.8: Six cross sections of the electrostatic landscape are shown with their locations indicated on the protein display at right. Lack of an lumenal cation pathway in this E1 conformation is evident. Horizontal black lines indicate approximate extent of the lipid membrane. Data slice in c) corresponds to data displayed in Figure 5.7.

for at least the binding cavity comprising sites I and II in the Na+,K+-ATPase and possibly other P-type ATPases.

The closed lumenal pathway is evidenced by the high potential region separating the binding cavity from the lumenal space as shown in Figures 5.7 and 5.8.Thelack of pathway is due in part to the lack of solvated space between the binding cavity and lumenal space. In the simulation based on the SERCA structure resembling the

E2P conformation, electrostatic analysis also depicted a closed state. The absence of a clear lumenal pathway (as shown in Figure 5.9) in the electrostatic analysis of the Na+,K+-ATPase in E2P form supports CAVER pathway analysis. Future work 129

0.5 0.0 [V] -0.5 -1.0

a) b) c)

a b cdef d) e) f)

Figure 5.9: Six cross sections of the electrostatic landscape from the simulation of Na+,K+-ATPase in the E2P form. Cross section locations indicated on the protein in cartoon representation. Cross section ‘c’ most closely resembles the cross section depicting an open intracellular pathway in Figure 5.8.

(Section 6.2.2) discusses the potential of a new SERCA structure to shed light on this situation.

5.6.2 Electrostatic Binding Site Analysis

Though the electrostatic analysis was successful in suggesting putative intracel- lular and lumenal pathways, the ability of this method to accurately determine the location of cation binding sites appeared to be more limited. Cation binding affinity may be strongly affected by minor changes in the orientations of amino acid side chains and involvement of water molecules [58]. This viewpoint was supported by previous cation valence calculations on SERCA using the VALE analysis software 130

[101]. Due to a high reliance by these methods on binding site affinity and there- fore, accurate localization of binding sites and orientations of amino acid sidechains, determination of binding site location via electrostatic analysis may prove difficult.

Thermodynamic fluctuations delocalize the effects of the negatively charged binding residues involved in cation binding. Additionally, the interaction between the ions and side chains is absent; side chains will not orient themselves as they would if an ion were present. Electrostatic analysis has proved difficult to depict specific locations ion binding sites, but has been shown to have substantial merit in providing details regarding ion access and egress pathways.

5.7 Summary

This chapter has presented simulation study results on the Na+,K+-ATPase. The methodologies used here were extended from previous work on SERCA (Chapter 4,

[85]) and are discussed in detail in Appendix C.TheNa+,K+-ATPase was studied by incorporating new Na+,K+-ATPase homology models into biologically relevant simulation systems with a lipid bilayer membrane and water. Equilibration and pro- duction MD simulations were discussed. An emphasis was placed on deriving results from the production run trajectories to investigate the nature of the cation binding sites and ion permeation pathways. Due to the physiological conditions under which these homologous SERCA structures were crystallized, E1 and E2 simulations were 131 performed primarily to investigate Na+ and K+ binding, respectively, while the E2P simulations were run to investigate the lumenal ion pathway.

The first analysis focused on determination of the Na+ binding sites using the simulation of the E1 conformation. Specific protein amino acids that coordinated each of the three Na+ ions were discussed; overall the results agreed in a large part with experimental data (e.g. [59, 98]). The Na+ ions at sites I and II shared overall positions with that of structural data of bound Ca2+ ions in SERCA [59]. The elusive

third Na+ ion binding site was determined by this method to exist in a location that

was supported with functional experimental data [106]. The involvement of water in

cation coordination at the binding sites was also discussed. K+ ion binding sites were studied using the simulation of the Na+,K+-ATPase model in the E2 conformation.

The results of amino acid residues involved in coordinating these ions agreed not only with functional experimental data (e.g. [96]), but also with the recently released

Na+,K+-ATPase structure in the E2P form with bound K+ congener [58]. This result strongly suggests that the methodology for ion binding site determination discussed here is especially applicable to the further study of Na+,K+-ATPase and related proteins.

Steric pathway analysis was completed on MD simulation trajectories of Na+,K+-

ATPase models in the E1 and E2P conformations. This approach was designed to facilitate determination of the intracellular and lumenal ion permeation pathways, re- spectively. A putative intracellular pathway was found by this method which agreed 132 with that found using the same approach with SERCA and with the predication made via the SERCA X-ray structure [89]. The lumenal pathway investigation was also suc-

cessful in suggesting a pathway, but was more limited because of the physiological

state in which the E2P form of SERCA was crystallized. The steric pathway analysis

of MD simulation trajectories is particularly appealing because of the difficulty of

investigating the intracellular pathway experimentally. Pathway investigations were

also completed using calculated electrostatic landscapes of the Na+,K+-ATPase un- der an applied electric field to mimic the cellular membrane potential found in vivo.

Although there was limited detail to determine binding site locations, ion perme- ation pathways could be deduced from the electrostatic data. These results strongly agreed with those determined via the CAVER pathway analysis and corroborated experimental findings [59, 72]. 133 Chapter 6

Conclusions and Future Work

6.1 Summary and Conclusions

The limited technological prospects of silicon CMOS device technology for further scaling after 2015 demand that more exotic device technologies are also considered for building information systems of the future. Motivated with massive economic po- tential for growth propelled with ubiquitous wireless devices and embedded artificial intelligence applications, new doorways for revolutionary device designs which have been infeasible in the past must be attempted. One avenue that may lead to a new generation of nanoscale sensors, motors, and logic devices is that inspired by biolog- ical blueprints. This type of biomimetic devices is based on biological structures or derived from synthetic processes. Membrane proteins found in the class of P-type

ATPases are intriguing candidates for research as versatile nanodevices. Among the advantages over conventional device designs are the small size of the proteins and their ability to self-assemble. Encouraged by these prospects, my dissertation has fo- cused on development of novel simulation, modeling and analysis strategies for P-type

ATPases in general.

My dissertation has focused on two charge transporting membrane proteins in particular; the Na+,K+-ATPase and the Ca2+ transporter, SERCA. Because of their 134 importance in human physiology, P-type ATPases have been studied with a vari- ety of experimental techniques and a good deal is known about both their struc- tural and functional characteristics. Yet, due to the complexity of these biological macromolecules, many details of their atomic mechanisms remain unclear. SERCA is especially difficult to investigate using typical experimental methods, such as elec- trophysiology, but much is known of the Na+,K+-ATPase’s functionality. SERCA X- ray crystallography research has provided multiple high resolution structures, while

Na+,K+-ATPase crystallography has just recently been forthcoming. Computational analysis such as that presented in this work therefore allows a new approach to link, via genetic homology, vital structural data of SERCA with the extensive functional data on the Na+,K+-ATPase. The advent of increasingly inexpensive computational power has enabled computer simulations of biological macromolecules, such as the

P-type ATPase membrane proteins in this research, in an accurate biological envi- ronment. This field is undergoing extensive growth in the capabilities of modeling, simulation, and analysis tools. My research has a strong relevance to the electri- cal engineering and biophysical fields by providing a better understanding of the

Na+,K+-ATPase and SERCA from both physiological and nanotechnological stand- points. Specific contributions from this research are listed below:

• Elaborate Na+,K+-ATPase homology models in the E1, E2, and E2P physio-

logical states were developed using input from experimental data, notably from

Munson [31] and Sweadner [62], to improve initial automated alignments. The 135

model in the E2 physiological state is the first of its kind. Multiple sources of

structural data (namely SERCA from Toyoshima [62, 59, 38]andtheNa+,K+-

ATPase N Domain [56]) were coupled with results from functional studies as

well as related H+,K+-ATPase homology modeling research by Rakowski [38]

and Ogawa [59]. These models, novel by themselves, provided the basis for

long-time-scale MD simulations.

• Stable MD simulations greater than 5 ns were performed on three systems (E1,

E2, and E2P) containing Na+,K+-ATPase homology models. These are the first

simulations of the Na+,K+-ATPase in biologically accurate environment with

a lipid bilayer membrane and water. A similar set of simulations was run on

SERCA. The trajectories from these simulations have provided a basis for study

of cation binding sites and ion permeation pathways in these proteins.

• The binding site locations and specific protein amino acid residues coordinating

the three Na+ and two K+ ions have been investigated in the Na+,K+-ATPase.

Ca2+ ion binding sites in SERCA were also analyzed. Overall, the residues as

determined by simulation agreed very well with experimental evidence of func-

tional studies by, for example, Guennoun [98, 117], and structural studies by

Morth [58]. However, the binding sites require further investigation to deter-

mine their exact nature including the roles of protein amino acids and water

molecules. 136

• This work has, for the first time, utilized MD trajectories to calculate a highly

resolved time-averaged (1000 frames over 2 ns trajectories) three-dimensional

electrostatic landscape in Na+,K+-ATPase. Calculations were performed under

an applied field (equivalent to a 200 mV membrane potential) to mimic physi-

ological conditions. This method corroborated CAVER’s steric analysis in the

investigation of lumenal and intracellular ion permeation pathways.

• My research has used the CAVER steric pathway tool to analyze time-averaged

results of trajectories from MD simulations of the Na+,K+-ATPase in the E1

and E2P conformations. This analysis corroborated previous predictions for the

intracellular pathway [38] and suggested characteristics of a putative lumenal

pathway. Steric pathway analysis was found to agree strongly with electrostatic

analysis, thereby providing two supporting methods for ion pathway determi-

nation. There is also significant potential for these methods to determine the

lumenal pathway of these P-type ATPases using recent Na+,K+-ATPase struc-

tural data [58].

• A methodology for the simulation study of P-type ATPases was produced. The

steps from homology modeling leading up to analysis were explained in detail

regarding the Na+,K+-ATPase and SERCA. The goal was to provide to other

researchers undertaking related studies a description of the significant steps

required for simulation and analysis. The methodology (provided in detail in 137

Appendix C) is applicable to future work on not only Na+,K+-ATPase and

SERCA, but also to P-type ATPases and membrane proteins in general.

• Preliminary studies on a Ca2+ transporting protein, SERCA, were performed.

This protein was subjected to similar analysis of cation-coordinating amino acids

during long time-scale simulations and steric pathway analysis was calculated on

MD simulation trajectories. Ion binding residues and the intracellular perme-

ation pathway corroborated structural observations [59]. Because the SERCA

protein is experimentally inaccessible, my work has potentially very important

implications for its study.

• MD simulations of membrane proteins longer than 1 ns are computationally

very difficult due to the sheer size of the system. A system that is biologically

appropriate for simulation contains around 200,000 atoms in a simulation box

with dimensions of 14 nm x 14 nm x 17 nm. 1 ns of simulation on a system

of this size may require several days on 16 parallel Pentium 4 processors. This

work is at the forefront of employing all simulation tools in this burgeoning

field and invested substantial computational resources from the Ohio Super-

computer Center (OSC), which awarded it (through an independent evaluation

panel) their largest (30,000 unit: 300,000 processor-hours) service for standard

research projects. No other P-type ATPase except Dr. Richard Law’s recent

H+,K+-ATPase work facilitated by the massive computational resources at the 138

Lawrence Livermore National Laboratory has attempted MD simulations on

P-type ATPase proteins for similar scales of time and space [32].

• Although not included in this dissertation for the sake of unity in research

theme, the homology modeling methodology, including sequence alignment and

homology models, has been useful for others working with functionality of P-

Type ATPases, as evidenced by the published work by Saida Gunennoun on the

H+,K+-ATPase [70]. In that paper, my homology modeling work provided ad-

ditional verification for an observed experimental anomaly. My homology mod-

eling research is also currently involved in work with Jingping Hao, a graduate

student in Department of Biological Sciences, who is investigating the trans-

port properties of a Na+,K+-ATPase inhibitor molecule, tetra-ethylammonium

(TEA).

6.2 Future Work

There are many ways to extend this research in the future. Two key avenues for future work are each based on X-ray crystallographic structures. This experimental data can be coupled with the development of better software and faster computer processors to further expand this field. 139

6.2.1 Na+,K+-ATPase X-ray Structure

The high resolution X-ray structure of the Na+,K+-ATPase provides an excit- ing opportunity to greatly improve the quality of Na+,K+-ATPase homology models based on SERCA. The Na+,K+-ATPase structure is an extremely important devel- opment [112]. This structure of the E2P conformational state may be investigated directly by placing simulating it in a lipid bilayer. The kink in TM10, as well as other structural characteristics such as the β and γ subunits, could be incorporated into homology models based on several available SERCA structures. The effect of this unpredicted kink has a significant impact on the transmembrane structure near the Na+ site III binding cavity and may be important to binding and transport of this ion. The X-ray structure also allows a detailed comparison of P-type ATPase homology models to the X-ray crystal structure of Na+,K+-ATPase. The goal in that study would be to determine the accuracy of various Na+,K+-ATPase-SERCA alignments as well as homology modeling techniques. The validity of each technique could be ascertained in order to provide better homology models based on SERCA.

Improved modeling techniques could be applied to other P-type ATPases, such as the H+,K+-ATPase or even less homologous P-type ATPases which transport heavy metals. 140

6.2.2 SERCA X-ray Structure with Open Lumenal Pathway

The recent availability of SERCA structure in a new E2P form will lend itself to more accurate simulations of the lumenal pathway [102]. This structure is a consid- erable improvement over previous SERCA E2P structures in the way of ascertaining the amino acid residues involved in lumenal ion transport [90, 69]. Limitations of studying the available E2P forms were discussed in, for instance, Section 5.6.Simu- lation and analysis may be done directly on SERCA, or the structure may be used as a template for the Na+,K+-ATPase. This conformation should allow a much more detailed level of investigation of pathway analysis via CAVER and electrostatic anal- ysis via VMD or APBS. One may also investigate the pathway in order to determine the order of sequential Na+ release or K+ uptake.

6.2.3 Computational Outlook

Continued development of simulation, visualization, and analysis tools should ease the technical burden on future research. NAMD and VMD sister programs for molec- ular dynamic simulation and visualization have made significant strides in the last few years. The force fields, topologies, and various VMD plugins are conducive to membrane protein simulation in a lipid environment. NAMD has been able to scale very efficiently with large numbers of processors [33]. Besides advanced visualization,

VMD’s tcl scripting ability gives a great deal of flexibility in the preparation and analysis of simulation systems. For these reasons, current and future research has 141 a choice between the two strong software suites of GROMACS and NAMD. Despite this growth, full-atom simulations approaching biological time scales are impossible in the near future [34].

A reduced-model approach attempts to minimize the computational burden of simulating complex biological systems. In these methods, details of the simulation system are reduced by generalizing regions of the system away from the area of inter- est. For example, the software BioMOCA uses an atomistic picture of the protein with a continuum model for the lipid and solvent [50]. Treating these adjunct regions with a simple dielectric constant can greatly reduce the computational time and resources required for a given simulation. Other models reduce even further to mimicking ion channels with only a few explicit atoms, allowing for the investigation of protein func- tion over a wide range of biologically relevant ion concentrations [124]. These methods provide a less detailed picture of the overall atomic structure, but enable simulations under biologically relevant conditions and timescales. While a continued increase in parallel computing power will enable molecular dynamic simulations to more closely represent realistic systems, reduced models have many advantages as well. The best simulation approach is dependent on the problem and the questions being asked, but most likely lies somewhere between the full-atom molecular dynamic simulations and the reduced models. 142 Bibliography

[1] W. Kuhlbrandt, “Biology, structure and mechanism of p-type atpases,” Nature

reviews. Molecular cell biology, vol. 5, no. 4, pp. 282–295, 2004.

[2] I. M. Glynn, “A hundred years of sodium pumping,” Annual Review of Physi-

ology, vol. 64, no. 1, p. 1, 2002.

[3] Y. Astier, H. Bayley, and S. Howorka, “Protein components for nanodevices,”

Current opinion in chemical biology, vol. 9, no. 6, pp. 576–584, 2005.

[4] C. M. Niemeyer and C. A. Mirkin, Nanobiotechnology: Concepts, Applications

and Perspectives. Wiley-VCH, 2004, 2004.

[5] J. H. Kaplan, “Biochemistry of na,k-atpase,” Annu Rev Biochem, vol. 71,

pp. 511–35, 2002.

[6] P. L. Jorgensen, K. O. Hakansson, and S. J. D. Karlish, “Structure and mech-

anism of na,k-atpase: Functional sites and their interactions,” Annual Review

of Physiology, vol. 65, no. 1, pp. 817–849, 2003.

[7] J. B. Lingrel and T. Kuntzweiler, “Na+,k(+)-atpase,” JBiolChem, vol. 269,

pp. 19659–62, Aug 5 1994.

[8] J. D. Horisberger, “Recent insights into the structure and mechanism of the

sodium pump,” Physiology (Bethesda), vol. 19, pp. 377–87, Dec 2004. 143

[9] D. P. Tieleman, H. J. C. Berendsen, and M. S. P. Sansom, “An alamethicin

channel in a lipid bilayer: Molecular dynamics simulations,” Biophysical Jour-

nal, vol. 76, pp. 1757–1769, April 1 1999.

[10] S. Berneche and B. Roux, “A microscopic view of ion conduction through the k+

channel,” Proceedings of the National Academy of Sciences, vol. 100, pp. 8644–

8648, July 22 2003.

[11] J. Gumbart, Y. Wang, A. Aksimentiev, E. Tajkhorshid, and K. Schulten,

“Molecular dynamics simulations of proteins in lipid bilayers,” Current Opinion

in Structural Biology, vol. 15, pp. 423–431, 8 2005.

[12] R. J. Law, C. Capener, M. Baaden, P. J. Bond, J. Campbell, G. Patargias,

Y. Arinaminpathy, and M. S. P. Sansom, “Membrane protein structure quality

in molecular dynamics simulation,” Journal of Molecular Graphics and Mod-

elling, vol. 24, no. 2, pp. 157–165, 2005.

[13] M. C. Roco and W. S. Bainbridge, Societal Implications of Nanoscience and

Nanotechnology. New York: Springer, 2001.

[14] M. Sarikaya, “Biomimetics: Materials fabrication through biology,” Proceedings

of the National Academy of Sciences, vol. 96, pp. 14183–14185, December 7

1999. 144

[15] M. G. van den Heuvel and C. Dekker, “Motor proteins at work for nanotech-

nology,” Science, vol. 317, pp. 333–336, July 20 2007.

[16] R. Soong, G. Bachand, H. Neves, A. Olkhovets, H. Craighead, and C. Monte-

magno, “Powering an inorganic nanodevice with a biomolecular motor,” Sci-

ence, vol. 290, pp. 1555–1558, November 24 2000.

[17] R. P. Feynman, “There’s plenty of room at the bottom: An invitation to enter

a new field of physics,” Engineering and Science, pp. 22–26, 1960.

[18] M. Lundstrom, “Applied physics: Enhanced: Moore’s law forever?,” Science,

vol. 299, pp. 210–211, January 10 2003.

[19] “The international technology roadmap for semiconductors.” www.itrs.net.

[20] J. Fonseca and S. Kaya, “Accurate treatment of interface roughness in nanoscale

dg mosfets using non-equilibrium green’s functions,” Solid-State Electronics,

vol. 48, no. 10, pp. 1843–1847, 2004.

[21] K. Kordas, G. Toth, P. Moilanen, M. Kumpumaki, J. Vahakangas, A. Uusimaki,

R. Vajtai, and P. M. Ajayan, “Chip cooling with integrated carbon nanotube

microfin architectures,” Applied Physics Letters, vol. 90, no. 12, p. 123105, 2007.

[22] C. Lau and P. Wong, “Dod solid state electronics basic research,” 2004. 145

[23] G. Whitesides, J. Mathias, and C. Seto, “Molecular self-assembly and

nanochemistry: a chemical strategy for the synthesis of nanostructures,” Sci-

ence, vol. 254, pp. 1312–1319, November 29 1991.

[24] C. R. Lowe, “Nanobiotechnology: the fabrication and applications of chemical

and biological nanostructures,” Current opinion in structural biology, vol. 10,

no. 4, pp. 428–434, 2000.

[25] C. Montemagno and G. Bachand, “Constructing nanomechanical devices pow-

ered by biomolecular motors,” Nanotechnology, vol. 10, no. 3, pp. 225–231,

1999.

[26] V. Iancu and S.-W. Hla, “Realization of a four-step molecular switch in scanning

tunneling microscope manipulation of single chlorophyll-a molecules,” Proceed-

ings of the National Academy of Sciences, vol. 103, pp. 13718–13721, September

12 2006.

[27] K. J. Wise, N. B. Gillespie, J. A. Stuart, M. P. Krebs, and R. R. Birge, “Opti-

mization of bacteriorhodopsin for bioelectronic devices,” Trends in biotechnol-

ogy, vol. 20, no. 9, pp. 387–394, 2002.

[28] H. J. Apell, “How do p-type atpases transport ions?,” Bioelectrochemistry,

vol. 63, pp. 149–56, Jun 2004.

[29] “Nobelprize.org.” http://nobelprize.org/nobel prizes/chemistry/laureates/1997/. 146

[30] L. Segall, A. Mezzetti, R. Scanzano, J. J. Gargus, E. Purisima, and R. Blostein,

“Alterations in the alpha2 isoform of na,k-atpase associated with familial hemi-

plegic migraine type 2,” Proc Natl Acad Sci U S A, vol. 102, pp. 11106–11, Aug

2 2005.

[31] K. Munson, R. Garcia, and G. Sachs, “Inhibitor and ion binding sites on the

gastric h,k-atpase,” Biochemistry, vol. 44, no. 14, pp. 5267–5284, 2005.

[32] R. J. Law, K. Munson, F. Lightstone, and G. Sachs, “Understanding the mech-

anism of gastric h,k-atpase and designing new antacid drugs,” in Biophysical

Society 51st Annual Meeting, March 2007.

[33] P. L. Freddolino, A. S. Arkhipov, S. B. Larson, A. McPherson, and K. Schul-

ten, “Molecular dynamics simulations of the complete satellite tobacco mosaic

virus,” Structure, vol. 14, no. 3, pp. 437–449, 2006.

[34] D. Bader, Petascale Computing: Algorithms and Applications.NewYork:

Chapman and Hall/CRC Press, Taylor and Francis Group, 2008.

[35]D.A.Doyle,J.M.Cabral,R.A.Pfuetzner,A.Kuo,J.M.Gulbis,S.L.Cohen,

B. T. Chait, and R. MacKinnon, “The structure of the potassium channel:

Molecular basis of k+ conduction and selectivity,” Science, vol. 280, pp. 69–77,

April 3 1998.

[36] P. Lauger, Electrogenic Ion Pumps. Sunderland, MA: Sinaur Associates, 1991. 147

[37] P. Artigas and D. C. Gadsby, “Na+/k+-pump ligands modulate gating of

palytoxin-induced ion channels,” Proc Natl Acad Sci U S A, vol. 100, pp. 501–5,

Jan 21 2003.

[38] R. F. Rakowski and S. Sagar, “Found: Na(+) and k(+) binding sites of the

sodium pump,” News Physiol Sci, vol. 18, pp. 164–8, Aug 2003.

[39] B. J. Alder and T. E. Wainwright, “Studies in molecular dynamics. i. general

method,” The Journal of Chemical Physics, vol. 31, pp. 459–466, August 1959

1959.

[40] J. D. Bernal, “The bakerian lecture, 1962. the structure of liquids,” Proceedings

of the Royal Society of London.Series A, Mathematical and Physical Sciences

(1934-1990), vol. 280, pp. 299–322, 07/28 1964.

[41] E. Lindahl, B. Hess, and D. van der Spoel, “Gromacs 3.0: a package for molec-

ular simulation and trajectory analysis,” Journal of Molecular Modeling,vol.7,

pp. 306–317, 08/17 2001.

[42] D. A. Case, T. E. Cheatham, T. Darden, H. Gohlke, R. Luo, K. M. Merz,

A. Onufriev, C. Simmerling, B. Wang, and R. J. Woods, “The amber biomolec-

ular simulation programs,” Journal of , vol. 26, no. 16,

pp. 1668–1688, 2005. 148

[43] J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa,

C. Chipot, R. D. Skeel, L. Kal, and K. Schulten, “Scalable molecular dynamics

with ,” Journal of Computational Chemistry, vol. 26, no. 16, pp. 1781–

1802, 2005.

[44] D. van der Spoel, E. Lindahl, B. Hess, A. R. van Buuren, E. Apol, P. J. Meu-

lenhoff,D.P.Tieleman,A.L.T.M.Sijbers,K.A.Feenstra,R.vanDrunen,

and H. J. C. Berendsen, “Gromacs user manual version 3.3,” 2005.

[45] B. Hess, H. Bekker, H. J. C. Berendsen, and J. G. E. M. Fraaije, “Lincs: A

linear constraint solver for molecular simulations,” Journal of Computational

Chemistry, vol. 18, no. 12, pp. 1463–1472, 1997.

[46] H. J. C. Berendsen, D. van der Spoel, and R. van Drunen, “Gromacs:

A message-passing parallel molecular dynamics implementation,” Computer

Physics Communications, vol. 91, no. 1, pp. 43–56, 1995.

[47] T. Darden, D. York, and L. Pedersen, “Particle mesh ewald: An nlog(n) method

for ewald sums in large systems,” The Journal of chemical physics, vol. 98,

no. 12, pp. 10089–10092, 1993.

[48] M. Patra, M. Karttunen, M. T. Hyvonen, E. Falck, P. Lindqvist, and I. Vat-

tulainen, “Molecular dynamics simulations of lipid bilayers: Major artifacts due

to truncating electrostatic interactions,” Biophysical Journal, vol. 84, pp. 3636–

3645, June 1 2003. 149

[49] H. J. C. Berendsen, J. P. M. Postma, W. F. van Gunsteren, A. DiNola, and J. R.

Haak, “Molecular dynamics with coupling to an external bath,” The Journal

of chemical physics, vol. 81, no. 8, pp. 3684–3690, 1984.

[50] T. A. van der Straaten, G. Kathawala, A. Trellakis, R. S. Eisenberg, and U. Ra-

vaioli, “Biomoca–a boltzmann transport monte carlo model for ion channel sim-

ulation,” Molecular Simulation, vol. 31, no. 2, pp. 151–171, 2005.

[51] M. Sotomayor, V. Vasquez, E. Perozo, and K. Schulten, “Ion conduction

through mscs as determined by electrophysiology and simulation,” Biophysi-

cal Journal, vol. 92, pp. 886–902, February 1 2007.

[52] M. Sotomayor and K. Schulten, “Molecular dynamics study of gating in the

mechanosensitive channel of small conductance mscs,” Biophysical Journal,

vol. 87, pp. 3050–3065, November 1 2004.

[53] D. P. Tieleman, S. J. Marrink, and H. J. C. Berendsen, “A computer perspective

of membranes: molecular dynamics studies of lipid bilayer systems,” Biochimica

et Biophysica Acta (BBA)/Reviews on Biomembranes, vol. 1331, no. 3, pp. 235–

270, 1997.

[54] H. Hebert, P. Purhonen, H. Vorum, K. Thomsen, and A. B. Maunsbach, “Three-

dimensional structure of renal na,k-atpase from cryo-electron microscopy of

two-dimensional crystals edited by m. f. moody,” Journal of Molecular Biology,

vol. 314, no. 3, pp. 479–494, 2001. 150

[55] W. Rice, H. Young, D. Martin, J. Sachs, and D. Stokes, “Structure of na+,k+-

atpase at 11-a resolution: Comparison with ca2+-atpase in e1 and e2 states,”

Biophysical Journal, vol. 80, pp. 2187–2197, May 1 2001.

[56] K. O. Hakansson, “The crystallographic structure of na,k-atpase n-domain at

2.6˚a; resolution,” Journal of Molecular Biology, vol. 332, no. 5, pp. 1175–1182,

2003.

[57] M. Hilge, G. Siegal, G. W. Vuister, P. Guntert, S. M. Gloor, and J. P. Abrahams,

“Atp-induced conformational changes of the nucleotide-binding domain of na,k-

atpase,” Nature structural biology, vol. 10, p. 468, 06 2003.

[58] J. P. Morth, B. P. Pedersen, M. S. Toustrup-Jensen, T. L. M. Sorensen, J. Pe-

tersen, J. P. Andersen, B. Vilsen, and P. Nissen, “Crystal structure of the

sodium-potassium pump,” Nature, vol. 450, pp. 1043–1049, 12/13 2007.

[59] H. Ogawa and C. Toyoshima, “Homology modeling of the cation binding sites

of na+k+-atpase,” Proc Natl Acad Sci U S A, vol. 99, pp. 15977–82, Dec 10

2002.

[60] H. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig,

I. Shindyalov, and P. Bourne, “The protein data bank,” Nucleic Acids Research,

vol. 28, pp. 235–242, January 1 2000. 151

[61] M. A. Marti-Renom, A. C. Stuart, A. Fiser, R. Sanchez, F. Melo, and A. Sali,

“Comparative protein structure modeling of genes and genomes,” Annual Re-

view of Biophysics and Biomolecular Structure, vol. 29, pp. 291–325, 2000.

[62] K. J. Sweadner and C. Donnet, “Structural similarities of na,k-atpase and serca,

the ca(2+)-atpase of the sarcoplasmic reticulum,” Biochem J, vol. 356, pp. 685–

704, Jun 15 2001.

[63] K. Kawakami, T. Ohta, H. Nojima, and K. Nagano, “Primary structure of the

alpha-subunit of human na, k-atpase deduced from cdna sequence,” Journal of

Biochemistry, vol. 100, pp. 389–397, January 1 1986.

[64] M. Crowson and G. Shull, “Isolation and characterization of a cdna encoding the

putative distal colon h+,k(+)-atpase. similarity of deduced amino acid sequence

to gastric h+,k(+)-atpase and na+,k(+)-atpase and mrna expression in distal

colon, kidney, and uterus,” Journal of Biological Chemistry, vol. 267, pp. 13740–

13748, July 5 1992.

[65] G. E. Shull, A. Schwartz, and J. B. Lingrel, “Amino-acid sequence of the cat-

alytic subunit of the (na+ + k+)atpase deduced from a complementary dna,”

Nature, vol. 316, pp. 691–695, 08/22 1985.

[66] K. Bamberg, F. Mercier, M. A. Reuben, Y. Kobayashi, K. B. Munson, and

G. Sachs, “cdna cloning and membrane topology of the rabbit gastric h+/k(+)- 152

atpase alpha-subunit,” Biochimica et Biophysica Acta (BBA), vol. 1131, pp. 69–

77, 1992.

[67] T. Imagawa, T. Yamamoto, S. Kaya, K. Sakaguchi, and K. Taniguchi, “Thr-774

(transmembrane segment m5), val-920 (m8), and glu-954 (m9) are involved in

na+ transport, and gln-923 (m8) is essential for na,k-atpase activity,” Journal

of Biological Chemistry, vol. 280, pp. 18736–18744, May 13 2005.

[68] A. P. Einholm, J. P. Andersen, and B. Vilsen, “Importance of leu99 in trans-

membrane segment m1 of the na+,k+-atpase in the binding and occlusion of

k+,” Journal of Biological Chemistry, June 6 2007.

[69] C. Toyoshima and G. Inesi, “Structural basis of ion pumping by ca2+-atpase

of the sarcoplasmic reticulum,” Annu Rev Biochem, vol. 73, pp. 269–92, 2004.

[70] S. Guennoun-Lehmann, J. Fonseca, J.-D. Horisberger, and R. Rakowski, “Pa-

lytoxin acts on na+,k+-atpase but not nongastric h+,k+-atpase,” Journal of

Membrane Biology, vol. 216, pp. 107–116, 04/24 2007.

[71] L. R. Forrest, C. L. Tang, and B. Honig, “On the accuracy of homology modeling

and sequence alignment methods applied to membrane proteins,” Biophysical

Journal, vol. 91, pp. 508–517, July 15 2006.

[72] N. Reyes and D. C. Gadsby, “Ion permeation through the na+,k+-atpase,”

Nature, vol. 443, pp. 470–474, 09/28 2006. 153

[73] O. Capendeguy and J.-D. Horisberger, “The role of the third extracellular loop

of the na+,k+-atpase alpha subunit in a luminal gating mechanism,” The Jour-

nal of physiology, vol. 565, no. 1, pp. 207–218, 2005.

[74] A. Sali, “Comparative protein modeling by satisfaction of spatial restraints,”

Molecular Medicine Today, vol. 1, no. 6, p. 270, 1995.

[75] S. Altschul, T. Madden, A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. Lip-

man, “Gapped blast and psi-blast: a new generation of protein database search

programs,” Nucleic Acids Research, vol. 25, pp. 3389–3402, September 1 1997.

[76] Y. Peeraer, A. Rabijns, C. Verboven, J. franois Collet, E. V. Shaftingen, and

C. D. Ranter, “High-resolution structure of human phosphoserine phosphate

in open conformation,” Acta Crystallographica: Section D, vol. 59, p. 971, 06

2003.

[77] B. Rost and C. Sander, “Prediction of protein secondary structure at better

than 70% accuracy,” Journal of Molecular Biology, vol. 232, no. 2, pp. 584–599,

1993.

[78] B. Rost, P. Fariselli, and R. Casadio, “Topology prediction for helical trans-

membrane proteins at 86pp. 1704–1718, August 1 1996.

[79] B. Vilsen, June 13th 2007. personal communication. 154

[80] G. Tusnady, Z. Dosztanyi, and I. Simon, “Tmdet: web server for detecting

transmembrane regions of proteins by using their 3d coordinates,” Bioinfor-

matics, vol. 21, pp. 1276–1277, April 1 2005.

[81] J. D. Faraldo-Gmez, G. R. Smith, and M. S. P. Sansom, “Setting up and op-

timization of membrane protein simulations,” European Biophysics Journal,

vol. 31, no. 3, pp. 217–227, 2002.

[82] M. F. Sanner, A. J. Olson, and J.-C. Spehner, “Reduced surface: An efficient

way to compute molecular surfaces,” Biopolymers, vol. 38, no. 3, pp. 305–320,

1996.

[83] C. Kandt, W. L. Ash, and D. P. Tieleman, “Setting up and running molecular

dynamics simulations of membrane proteins,” Methods, vol. 41, no. 4, pp. 475–

488, 2007.

[84] B. Knig, U. Dietrich, and G. Klose, “Hydration and structural properties of

mixed lipid/surfactant model membranes,” Langmuir, vol. 13, no. 3, pp. 525–

532, 1997.

[85] S. K. J E Fonseca and R. F. Rakowski, “Temporal and steric analysis of ionic

permeation and binding in serca via molecular dynamic simulations,” Nanotech-

nology, vol. 18, no. 42, p. 424022 (9pp), 2007. 155

[86] H. J. C. Berendsen, J. P. M. Postma, W. F. van Gunsteren, and J. Hermans,

“Interaction models for water in relation to protein hydration,” Nature, vol. 224,

pp. 175–177, 1969.

[87] O. Berger, O. Edholm, and F. Jahnig, “Molecular dynamics simulations of a

fluid bilayer of dipalmitoylphosphatidylcholine at full hydration, constant pres-

sure, and constant temperature.,” Biophysical Journal, vol. 72, pp. 2002–2013,

May 1 1997.

[88] R. A. Laskowski, M. W. MacArthur, D. S. Moss, and J. M. Thornton,

“Procheck: a program to check the stereochemical quality of protein struc-

tures,” Journal of Applied Crystallography, vol. 26, pp. 283–291, Apr 1993.

[89] C. Toyoshima, M. Nakasako, H. Nomura, and H. Ogawa, “Crystal structure

of the calcium pump of sarcoplasmic reticulum at 2.6 a resolution,” Nature,

vol. 405, pp. 647–55, Jun 8 2000.

[90] K. Obara, N. Miyashita, C. Xu, I. Toyoshima, Y. Sugita, G. Inesi, and

C. Toyoshima, “Structural role of countertransport revealed in ca2+ pump crys-

tal structure in the absence of ca2+,” Proc Natl Acad Sci U S A,p.,Sep6

2005.

[91] C. Toyoshima, H. Nomura, and T. Tsuda, “Lumenal gating mechanism revealed

in calcium pump crystal structures with phosphate analogues,” Nature, vol. 432,

pp. 361–8, Nov 18 2004. 156

[92] M. Petrek, M. Otyepka, P. Ban, P. Koinov, J. Koca, and J. Damborsk, “Caver:

a new tool to explore routes from protein clefts, pockets and cavities,” BMC

Bioinformatics, vol. 7, no. 1, p. 316, 2006.

[93] N. Baker, D. Sept, S. Joseph, M. Holst, and J. A. McCammon, “Electrostatics

of nanosystems: Application to microtubules and the ribosome,” Proceedings of

the National Academy of Sciences, vol. 98, pp. 10037–10041, August 28 2001.

[94] M. Nayal and E. D. Cera, “Predicting ca2+-binding sites in proteins,” PNAS,

vol. 91, pp. 817–821, January 18, 1994 1994.

[95] M. Holmgren, J. Wagg, F. Bezanilla, R. F. Rakowski, P. D. Weer, and D. C.

Gadsby, “Three distinct and sequential steps in the release of sodium ions by

the na+/k+-atpase,” Nature, vol. 403, pp. 898–901, Feb 24 2000.

[96] J. D. Horisberger, S. Kharoubi-Hess, S. Guennoun, and O. Michielin, “The

fourth transmembrane segment of the na,k-atpase alpha subunit: a systematic

mutagenesis study,” JBiolChem, vol. 279, pp. 29542–50, Jul 9 2004.

[97] J. Lanyi, “Bacteriorhodopsin as a model for proton pumps,” Nature, vol. 375,

pp. 461–463, 1995.

[98] S. Guennoun and J. D. Horisberger, “Cysteine-scanning mutgenesis study of the

sixth transmembrane segment of the na,k-atpase alpha subunit,” FEBS Lett,

vol. 513, pp. 271–281, 2002. 157

[99] J. V. Moller, P. Nissen, T. L.-M. Sorensen, and M. le Maire, “Transport mech-

anism of the sarcoplasmic reticulum ca2+-atpase pump,” Current opinion in

structural biology, vol. 15, no. 4, pp. 387–393, 2005.

[100] R. F. Rakowski, S. Kaya, and J. Fonseca, “Electro-chemical modeling chal-

lenges of biological ion pumps,” Journal of Computational Electronics,vol.4,

pp. 189–193, 04/01 2005.

[101] J. Fonseca, S. Kaya, S. Guennoun, and R. Rakowski, “Temporal analysis of

valence & electrostatics in ion-motive sodium pump,” Journal of Computational

Electronics, vol. 6, pp. 381–385, 09/08 2007.

[102] C. Olesen, M. Picard, A.-M. L. Winther, C. Gyrup, J. P. Morth, C. Oxvig,

J. V. Moller, and P. Nissen, “The structural basis of calcium transport by the

calcium pump,” Nature, vol. 450, pp. 1036–1042, 12/13 2007.

[103] I. D. Brown and A. Skowron, “Electronegativity and lewis acid strength,”

Journal of the American Chemical Society, vol. 112, no. 9, pp. 3401–3403, 1990.

[104] M. Nayal and E. D. Cera, “Valence screening of water in protein crystals reveals

potential na+ binding sites,” Journal of Molecular Biology, vol. 256, no. 2,

pp. 228–234, 1996. 158

[105] E. D. Cera, “A structural perspective on enzymes activated by monovalent

cations,” Journal of Biological Chemistry, vol. 281, pp. 1305–1308, January 20

2006.

[106] C. Li, O. Capendeguy, K. Geering, and J.-D. Horisberger, “A third na+-

binding site in the sodium pump,” PNAS, vol. 102, pp. 12706–12711, September

6, 2005 2005.

[107] T. A. Kuntzweiler, J. M. Arguello, and J. B. Lingrel, “Asp804 and asp808 in the

transmembrane domain of the na,k-atpase alpha subunit are cation coordinating

residues,” JBiolChem, vol. 271, pp. 29682–7, Nov 22 1996.

[108] C. Toyoshima, H. Nomura, and Y. Sugita, “Crystal structures of ca2+-atpase

in various physiological states,” Ann N Y Acad Sci, vol. 986, pp. 1–8, Apr 2003.

[109] M. Mense, L. A. Dunbar, R. Blostein, and M. J. Caplan, “Residues of the

fourth transmembrane segments of the na,k-atpase and the gastric h,k-atpase

contribute to cation selectivity,” JBiolChem, vol. 275, pp. 1749–56, Jan 21

2000.

[110] K. O. Hakannson and P. L. Jorgensen, “Homology modeling of nak-atpase a

putative third sodium binding site suggests a relay mechanism compatible with

the electrogenic profile of na translocation,” Ann. N.Y. Acad. Sci., vol. 986,

pp. 163–167, 2003. 159

[111] J. M. Kaplan, R. J. Seeley, and H. J. Grill, “Daily caloric intake in intact and

chronic decerebrate rats,” Behav Neurosci, vol. 107, pp. 876–81, Oct 1993.

[112] D. C. Gadsby, “Structural biology: Ion pumps made crystal clear,” Nature,

vol. 450, pp. 957–959, 12/13 2007.

[113] A. Shainskaya, A. Schneeberger, H. J. Apell, and S. J. Karlish, “Entrance port

for na(+) and k(+) ions on na(+),k(+)-atpase in the cytoplasmic loop between

trans-membrane segments m6 and m7 of the alpha subunit. proximity of the

cytoplasmic segment of the beta subunit,” JBiolChem, vol. 275, pp. 2019–28,

Jan 21 2000.

[114] D. C. Gadsby, R. F. Rakowski, and P. D. Weer, “Extracellular access to the

na,k pump: pathway similar to ion channel,” Science, vol. 260, pp. 100–3, Apr

2 1993.

[115] O. Capendeguy, P. Chodanowski, O. Michielin, and J.-D. Horisberger, “Access

of extracellular cations to their binding sites in na,k-atpase: Role of the second

extracellular loop of the alpha subunit,” The Journal of General Physiology,

vol. 127, pp. 341–352, February 27 2006.

[116] H. Schneider and G. Scheiner-Bobis, “Involvement of the m7/m8 extracellular

loop of the sodium pump alpha subunit in ion transport. structural and func-

tional homology to p-loops of ion channels,” Journal of Biological Chemistry,

vol. 272, pp. 16158–16165, June 27 1997. 160

[117] S. Guennoun and J.-D. Horisberger, “Structure of the 5th transmembrane seg-

ment of the na,k-atpase alpha subunit: a cysteine-scanning mutagenesis study,”

09/29 2000. ID: S0014-5793(00)02050-0.

[118] G. Sanchez and G. Blanco, “Residues within transmembrane domains 4 and

6 of the na,k-atpase subunit are important for na+ selectivity,” Biochemistry,

vol. 43, no. 28, pp. 9061–9074, 2004.

[119] W. J. Rice and D. H. MacLennan, “Scanning mutagenesis reveals a similar

pattern of mutation sensitivity in transmembrane sequences m4, m5, and m6,

but not in m8, of the ca2+-atpase of sarcoplasmic reticulum (serca1a),” Journal

of Biological Chemistry, vol. 271, pp. 31412–31419, December 6 1996.

[120] D. W. Hilgemann, “Channel-like function of the na,k pump probed at microsec-

ond resolution in giant membrane patches,” Science, vol. 263, pp. 1429–32, Mar

11 1994.

[121] G. Abram, P. Kirchner, D. Thompson, and M. Tignor, “Opendx,” 2007.

[122] C. Toyoshima and T. Mizutani, “Crystal structure of the calcium pump with

a bound atp analogue,” Nature, vol. 430, pp. 529–35, Jul 29 2004.

[123] T. L.-M. Sorensen, J. V. Moller, and P. Nissen, “Phosphoryl transfer and

calcium ion occlusion in the calcium pump,” Science, vol. 304, pp. 1672–1675,

June 11 2004. 161

[124] D. Boda, D. Gillespie, W. Nonner, D. Henderson, and B. Eisenberg, “Com-

puting induced charges in inhomogeneous dielectric media: Application in a

monte carlo simulation of complex ionic systems,” Physical Review E. Statis-

tical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics, vol. 69,

p. 046702, Apr 2004. 162 Appendix A

Publications

A.1 Journal Articles

J. E. Fonseca, S. Kaya and R. F. Rakowski, Temporal and Steric Analysis of Ionic

Permeation and Binding in Na+K+-ATPase via Molecular Dynamic Simulations, in

preparation, 2008.

J. E. Fonseca,S.Mishra,S.Kaya,andR.F.Rakowski,ExplorationofNa+,K+-

ATPase ion permeation pathways via molecular dynamic simulation and electrostatic analysis, J. Comp. Elec., in press.

J. E. Fonseca, S. Kaya and R. F. Rakowski, Temporal and steric analysis of ionic permeation and binding in SERCA via molecular dynamic simulations, Nanotechnol- ogy, 18 (42), 424022-8, 2007.

S. Guennoun, J. E. Fonseca,J.D.Horisberger,andR.F.Rakowski,Palytoxin

Targets Na+,K+-ATPase but not non-gastric H+,K+-ATPase, J. Membrane Biology,

216 (2-3), 107-16, 2007.

J. Fonseca, S. Kaya, S. Guennoun, and R. F. Rakowski, Temporal Analysis of

Valence and Electrostatics in Ion-Motive Sodium Pump, J. Comp. Elec., 6 (1-3),

381-5, 2007. 163

R. F. Rakowski, S. Kaya and J. E. Fonseca, ElectroChemical Modeling Chal- lenges of Biological Ion Pumps, J. Comp. Elec., 4 (1-2), 189-93, 2005.

A.2 Conference Presentations

J. Fonseca, R. F. Rakowski, and S. Kaya, Models, Electrostatics and Molecular

Dynamics of the Na+,K+-ATPase, OCCBIO, Ohio University, June 2006.

J. Fonseca, S. Kaya, and R. F. Rakowski, Modeling of Binding Sites and Electro- statics in the Ion-Motive Sodium Pump IEEE Nano, Cincinnati, OH, July 2006. 164 Appendix B

Glossary

alpha-carbon; α-carbon in an amino acid residue, the backbone carbon atom closest to the functional group. Denotes nominal location of residue and is used to compute RMSD.

alpha helix; α helix secondary structure motif which forms a right handed coil

amino acid residue one of twenty molecules which are the building blocks of

proteins; carbon and nitrogen atoms create the backbone linkage (main chain of the

protein) while side chains provide functional characteristics

amphipathic an amphipathic lipid is mostly hydrophobic put has a polar end

ATP adenosine tri-phosphate: a molecule which serves as the basic energy cur-

rency of cells Na+,K+-ATPase uses energy dervied

Binding location location inside a protein formed by amino acids that provide a high affinity site for a particular ion species

APBS Adaptive Poisson-Boltzmann Solver: software package that numerically solves the Poisson-Boltzmann equation to describe electrostatic interactions between molecules

beta sheet secondary structure motif wherein several beta strands form a twisted sheet 165

C-terminus the end of the amino acid chain which is terminated by a free car- boxyl group (-COOH); in type II P-type ATPases this end is associated with TM10; see N-terminus

CAVER computational tool to determine sterically-accessible locations in a given protein structure

comparative protein modeling see homology modeling

conformation protein structure associated with a specific physiological state

constraint method by which GROMACS maintains integrity of atomic bond lengths, etc. during simulation

domain a specific sequence of residues or physical section of a protein that is involved in a particular task

DOPE Discrete Optimization Protein Energy: Modeller function that statistic

potential optimized for model assessment. Essentially, a measure of the model’s

energy.

DX File type used for volumetric data.

E1 physiological state of P-type ATPases in which the intracellular pathway is

open

E2 physiological state of P-type ATPases in which the extracellular or lumenal

pathway is open

E2P physiological state of P-type ATPases indicating a phosphorylated E2 con-

formation 166

EM Energy Minimization: Computational method performed by GROMACS which moves a system to a lower energy level on the energy landscape by slightly moving atoms. It is a prelimary equilibration step whose main function is to perturb overlapping atoms so that they may undergo MD simulation.

fs femptosecond: 10−15 seconds

GROMACS software to perform energy minimization, molecular dynamics, and postprocessing functions.

helix see alpha helix

HPC High Performance Computing: use of many networked computers to per-

form intensive computational tasks

homology modeling Also known as comparative protein modeling, it is a tech-

nique used to create a model of a protein given that protein’s proteonomic sequence

and a known structure from another closely related protein.

human α 1 Na+,K+-ATPase isoform studied in this work

isoform one of several very similar forms of a particular protein. Humans have

three (females) or four (males) Na+,K+-ATPase isoforms which have varying cation affinities. This work studies the α1 form, as it is the most common throughout the body.

lumen access/egress pathway. access/egress pathway on the opposite side of the membrane from the cytoplasm. In Na+,K+-ATPase, it is the extracellular side, whereas in SERCA it is inside the reticulum 167

MD Molecular Dynamics: computational method in which interactions of atoms are simulated over time to determine various properites of a system

Modeller homology modeling software used in this work

N-terminus the end of the amino acid chain which is terminated by a free amine group (-NH2); in type II P-type ATPases this end is associated with the A domain

and TM1; see C-terminus

Na+,K+-ATPase Na+,K+ pump; sodium pump: transmembrane protein which uses energy derived from dephosphorylation of one molecule of ATP to transport 3

Na+ ions and 2 K+ ions across the cell membrane

NMR Nuclear Magnetic Resonance: method by which electrical field is used to refine molecular structures

ns nanosecond 10−9 seconds

ps picosecond 10−12 seconds

parallel computing High Performace Computing (HPC) approach that speeds completion of a task by dividing it among dedicated networked computers

PDB Protein Data Bank: 1) an online repository of 3-D protein structures 2) the

file format used to define these structures

PME Particle Mesh Ewald: underlying method used by 1) GROMACS to handle long range electatic interactions and 2) VMD to perform electrostatic analysis of a given biomolecular system 168

POPC 1-Palmitoyl-2-Oleoyl-sn-glycero-3-PhosphoCholine: a phospolipid whose model used in the lipid bilayer membranes in this work

PR Position Restrained: a type of molecular dynamic simulation in which the locations of specific groups of atoms are held in place using harmonic potentials force. It is used in equilibration processes and applied field simulations.

residue see amino acid residue

restraint user-defined method in which GROMACS controls locations of partic- ular groups atoms during simulation for equilibration and analysis purposes

RMSD Root Mean Square Deviation: measurement of the goodness of fit of one structure to another. It is typically used determine stability of a protein over the course of a simulation by measuring movement of the protein’s α-carbon atoms.

sequence, alignment set of two or more proteonomic sequences

sequence, genetic string of letters that represent the primary structure of DNA

sequence, proteonomic string of letters that respresents the amino acids of a protein

structure, secondary general three-dimensional form of a protein or macro- molecule that is characterized by α helices, beta sheets, and loops

structure, primary atomic composition and chemical bonds of a protein or macromolecule

structure, tertiary the three-dimensional structure of a protein, as defined by the atomic coordinates 169

SERCA sarco (endo) plasmic reticulum ATPase; Ca2+ pump: transmembrane protein that transports Ca2+-ATPase and protons across the membrane using the energy derived from the dephosphorylation of one molecule of ATP. It is used as the template for homology modeling of nak in this work

TM TransMembrane (helix): used to refer to a particular transmembrane α helix; e.g. TM5. The P-type ATPases discussed in this work all have ten transmembrane helices.

TMDET TransMembrane DETermination: computational tool that predicts the orientation and location of a transmembrane protein with respect to the lipid bilayer membrane

VALE software program that determines locations of high affinity for specific ion species within and on the surface of a protein

VMD Visual Molecular Dynamics: visualization program for large biomolecular systems that supports three-dimensional graphics and scripting 170 Appendix C

Methodology

C.1 Overview

This work was completed on a PowerMac G5 and MacBook Pro. The Mac OS X is ideal for this type of research. Many research software programs are available compiled and readily installed. Also, nearly all software is designed to run on unix machines, which the Macs handle effortlessly. OS X also provides an outstanding environment that removes the difficulties of running any of the various unix flavors. One recieves the versatility and stability of a unix machine with the ease of use of a Windows machine.

Most of the tools discussed below can be run in short order on any desktop com- puter with the exception of the MD simulations. In this work, the parallelized version of GROMACS was compiled on Pentium 4, Itanium, and Opteron clusters run by the

Ohio Supercomputer Center. GROMACS did not scale efficiently when more than 16 processors were used. Because each system is different and various simulation param- eters affect run times on systems differently, it is best to run a range of benchmarking tests to find the optimal configuration for each particular body of work.

It is hoped that the information in this appendix will give ample details of the methodology used in the work and provide other researchers with an outline to con- 171 duct similar investigations. In this appendix, homology modeling is discussed, then set-up and equilibration of the lipid-protein system, followed by MD simulation and

finally analysis methodologies. Several analysis programs and their uses are discussed.

The goal of this appendix is to serve as a guide with detailed instructions, rather than a tutorial, for other researchers seeking to run MD simulations on membrane proteins.

A sensible approach would be to become familiar with each software by undertak- ing their respective tutorials and then working through this guide to navigate the membrane protein simulation process.

C.2 Homology Modeling

This section discusses the process of creating a homology model of the Na+,K+-

ATPasebasedonSERCA.Anoverviewofthe homology modeling process and align- ment details are given in Chapter 3. The current section provides details of working with the homology modeling software, Modeller. In Modeller’s parlance, the SERCA sequence is the template and the Na+,K+-ATPase sequence is the target. Atomic coordinates derived from X-Ray or NMR techniques are called structures. The list of atomic coordinates produced by Modeller is called a homology model. 172

C.2.1 Automated Sequence Alignment

Automated sequence alignment was performed with a Modeller script such as that shown below. In this script, the SERCA E2P (PDB ID:1WPG) crystal structure was loaded into memory along with four sequences of human α1Na+,K+-ATPase and other related ATPases. This script can be used as a guideline; the Modeller manual provides details of the commands used below. Although a model with a human α1Na+,K+-ATPase target was desired, these other sequences facilitated later manual refinment of the alignment. Modeller’s salign (structural align) command was used to create an initial sequence alignment using the the proteonomic sequence and structural data from the SERCA E2P. Note that lines begining with a ’#’ are comments. ############################################ # align.py # this code creates an automated alignment ############################################## #from import * #log.verbose() env = environ() env.libs.topology.read(file=’$(LIB)/top_heav.lib’) #env.io.atom_files_directory = ’../atom_files’ aln = alignment(env) code = ’1WPG1’ chain = ’A’ mdl = model(env, file=code, model_segment=(’FIRST:’+chain, ’LAST:’+chain)) aln.append_model(mdl, align_codes=code+chain, atom_files=code) aln_block = len(aln) aln.append(file=’ha1_fasta.ali’, align_codes=’ha1’) aln.append(file=’rat_hk_new.ali’, align_codes=’rat_hk’) 173 aln.append(file=’rabbit_hk.ali’, align_codes=’rabbit_hk’) aln.append(file=’sheep_nak.ali’, align_codes=’sheep_nak’) aln.salign(align_block=aln_block,align_what=’BLOCK’,alignment_type= ’PROGRESSIVE’) aln.write(file=’alignment_2.ali’, alignment_format=’PIR’) aln.write(file=’alignment_2.pap’, alignment_format=’PAP’) aln.write(file=’alignment_2.fasta’, alignment_format=’fasta’)

C.2.2 Manual Alignment

The sequence alignment from an automated tool such as Modeller was manually adjusted and checked via methods such as those described in Section 3.5.Onlyvery

highly homologous sequences could be assumed to have correct automated alignments.

Although there are many manual sequence alignment tools, VMD’s MultiSeq plugin

was used in the majority of this work. Note that the Na+,K+-ATPase N domain sequences were loaded into MultiSeq and added to the sequence alignment file so that their structural data were able to be incorporated in the model building stage.

C.2.3 N-Domain Superposition

A final step before model building was to align the Na+,K+-ATPase N domain structures with the SERCA structure. The additional Na+,K+-ATPase structural data was particularly helpful due to the low sequence homology in this region. Mod- eller’s superpose command was used in the following script to geometrically trans- form the N domain structure (PDB ID: 1Q3I) such that it was superposed onto the homologous SERCA region. The superpose script must be modified and rerun if ad- 174 ditional structural data is to be used. The script was run three times to align the

N domains to each SERCA structure. In the model building script of Section C.2.4, an additional NMR structure of the Na+,K+-ATPase N domain was used (PDB ID:

1MO7). VMD was used to visually inspect SERCA and the Na+,K+-ATPase N do- mains to ensure that the new PDB files were transformed correctly.

############################################ # superpose_1Q3I_1WPG.py # superpose NaK N domain onto SERCA’s N domain ############################################## from modeller import * env = environ() env.io.atom_files_directory = ’../atom_files’ mdl = model(env, file=’1WPG1.PDB’) mdl2 = model(env, file=’unmoved_structs/1Q3I.pdb’) aln = alignment(env, file=’hand_aligned2b.ali’,align_codes=(’1WPG1’, ’1Q3I_A’)) atmsel = selection(mdl).only_atom_types(’CA’) r2 = atmsel.superpose(mdl2, aln) mdl2.write(file=’1Q3I.pdb’,model_format=’PDB’)

C.2.4 Homology Model Building

After the final alignment was created and checked to ensure it agreed with ex-

perimental results, the file “make model.py” was used to create five initial homology models. The sample Modeller script used the E2P (PDB ID:1WPG) crystal structure as a template. The automodel command indicated to Modeller to use the X-ray (PDB

ID: 1Q3I) and NMR N domain (PDB ID: 1MO7) structures to improve the model in this region. The Modeller log was examined as it listed the DOPE scores of each model. 175

############################################## # build_1WPG_models.py # this code builds the models ############################################## from modeller.automodel import * env = environ() env.libs.topology.read(file=’$(LIB)/top_heav.lib’) code = ’1WPG1’ chain = ’A’ n_domain_crys = ’1Q3I’ n_domain_nmr = ’1MO7’ a = automodel(env, alnfile=’hand_aligned2b.ali’, knowns = (code, n_domain_crys, n_domain_nmr), sequence = ’ha1’) a.starting_model = 1 a.ending_model = 5 a.make() env.system("mkdir "+code+"dir") env.system("mv ha1.V* "+code+"dir") env.system("mv ha1.B* "+code+"dir") env.system("mv ha1.D* "+code+"dir")

C.2.5 Homology Model Optimization

The final step in creating the Na+,K+-ATPase homology model was to refine the long loop in the P domain. This loop posed a special difficulty because it was twenty residues long. A new loopmodel class was defined to refine human α1 residues G598 to A623 (N645 to D650 in SERCA), which included 3 residues on either side of the loop. This allowed some flexibility in refining the loop to fit with the rest of the model. Seven residues were restrained to an alpha helical secondary structure due to the structure of a human phosphatase with which the P-loop had a high sequence homology [76]. Begninng with the best of five models (lowest DOPE score) created in 176 the previous step, twenty models with a refined loop were created, and the one with the best DOPE score was selected as the final homology model.

############################################## # refine_p_loop.py # Loop refinement of an existing model ############################################## from modeller import * from modeller.automodel import * log.verbose() env = environ() # directories for input atom files env.io.atom_files_directory = ’./:../atom_files’ # Create a new class based on ’loopmodel’ class myloop(loopmodel): # Pick the residues to be refined by loop modeling def select_loop_atoms(self): # we have 20 residue insertion in Nak # plus we want to provide some freedom for the last 3 # residues on each. this is a HUGE loop to optimize # on the 3 residues on each side of the insertion return selection(self.residue_range(’598’, ’623’)) #this routine restrains certain residues to the #helix-loop-extended scheme used by Munson (Rost) def special_restraints(self,aln): rsr = self.restraints at = self.atoms rsr.add(secondary_structure.alpha(self.residue_range(’601:’, ’607:’))) m = myloop(env, inimodel=’ha1.B99990002.pdb’, # initial model sequence=’ha1’, # code of the target loop_assess_methods=assess.DOPE) m.loop.starting_model= 1 # index of the first loop model #this will take a while m.loop.ending_model = 20 # index of the last loop model m.loop.md_level = refine.slow_large m.make() 177 C.3 System Preparation

There are many details that must be considered when a membrane protein system is to be created for molecular dynamic (MD) simulation. The study of membrane be- havior is a field unto itself and care must be taken to ensure that the lipid depicts the in vivo structure as accurately as possible. The membrane must stabilize the

backbone structure of the protein. Membrane proteins in simulations with no mem-

brane quickly lose any similarity to their initial tertiary structure. Membranes can

cause considerable difficulties if care is not taken to ensure they are not properly

equilibrated. Pre-equilibrated building blocks and related topology files are available

[83] and can be used to create a bilayer large enough to enclose the protein of in-

terest. Note that systems built with “pre-equilibrated” lipid blocks must undergo

considerable further equilibration once a membrane protein is introduced.

C.3.1 Membrane Creation

For the Na+,K+-ATPase, a bilayer created with a 2 by 2 array of POPC building blocks yielded an acceptable starting structure. The GROMACS command genconf was used to assemble the array. The final structure had two blocks along the x-axis and two blocks along the y-axis. The z-axis is normal to the membrane bilayer by convention. genconf f popc.gro nbox 2 2 1 o 2x2_popc_lipid.gro 178

This structure underwent an energy minimization (EM) phase, which moved atoms down an energy landscape, successively reducing energy until a defined tolerance was reached. This process removed minor atomic overlaps and reconciled bond angles and distances to ensure viable conditions for MD simulation. In general, EM does not ensure that the simulation will not crash due to other non-equilibration issues.

C.3.2 Topology and Forcefield Files

Before beginning the simulations, the correct lipid topology files need to be in or- der. Lipid topologies and force field parameters must be incorporated manually, since

GROMACS does not include them. Peter Tieleman’s website (http://moose.bio.ucalgary.ca/) provides “lipid.itp”, which contains lipid parameters, and the force field interaction parameters for the GROMOS87 forcefield. The website also has the lipid topology

file “popc.itp” for POPC molecules.

The topology and forcefield parameters can be incorporated by including them in the “ffgmx.itp”, “ffgmx.atp”, “ffgmxnb.itp” and “ffgmxbon.itp” files included with a standard GROMACS installation. These files’ default location is in the

“$GMX ROOT/share//top” directory. Since it is typical that the computer systems on which these simulation will be run will have a maintained version of GRO-

MACS already installed, the use of modified versions of these files require the user to place the modified versions in the simulation’s work directory.

Copy and paste the following sections of the lipid.itp into the corresponding files: 179

1. [ nonbond_params ] to ffgmxnb.itp 2. [ pairtypes ] to ffgmxnb.itp 3. [ atomtypes ] to ffgmx.atp 4. [ dihedraltypes ] to ffgmxbon.itp

Ensure that the following files are in the current directory before running the energy minimization step: lipid_em.mdp EM parameters ffgmx.atp force field ffgmx.itp force field ffgmxbon.itp force field ffgmxnb.itp force field popc.itp lipid topology system.top system topology system1.gro system coordinates

A sample em.mdp file is shown below:

; lipid_em.mdp title = new_membrane cpp = /usr/bin/cpp include = -I../top constraints = none integrator = steep emtol = 100 emstep = 0.01 tinit = 0.0 nstxout = 1000 nstvout = 1000 nstlog = 1000 nstenergy = 1000 nstxtcout = 1000 xtc_grps = SOL POP energygrps = SOL POP nstlist = 10 ns_type = grid rlist = 0.9 coulombtype = PME rcoulomb = 0.9 180 rvdw = 1.4 tcoupl = Berendsen tc-grps = SOL POP tau_t = 0.1 0.1 ref_t = 310 310 Pcoupl = Berendsen pcoupltype = semiisotropic tau_p = 1.0 compressibility = 4.5e-5 4.5e-5 ref_p = 1.0 1.0 gen_vel = no gen_temp = 310 gen_seed = 1618

A typical system topology file is:

;lipid_system.top ;Include Position restraint file #include "ffgmx.itp" #include "popc.itp" ;Include Position restraint file #ifdef LIPPOSRES #include "lipid_posre.itp" #endif

; Include water topology #include "spc.itp"

#ifdef POSRES_WATER ; Position restraint for each water oxygen [ position_restraints ] ; i funct fcx fcy fcz 1 1 1000 1000 1000 #endif

[ system ] ; Name PR_2x2

[ molecules ] ; Compound #mols 181

POP 512 SOL 9840

GROMACS’ grompp command was used to create the simulation topology and the mdrun command was used to run the energy minimization. Examples of these commands are listed below. The d suffix indicates that the double precision ver- sion of the commands was used. The $NUM PROC variable indicates the number of processors that were used. Note that with a “processor” in this case refers to a processing unit. Therefore, four “quad-core” Opteron processors correlate to 16

GROMACS processors. The mpiexec commmand was used to submit jobs to the

PBS parallel system queue. Details may be different depending on the parallel system used.

C.3.3 Membrane Energy Minimization

This example indicates that the energy minimization was run on a parallel system, but most desktop computers should be able to handle the energy minimization calcu- lations in several hours. Of course, this is dependent on system size and other factors.

All MD simulations simulations that are longer than a few ns should probably be run on a parallel system if possible. grompp_d -np $NUM_PROC -f lipid_em.mdp -c 2x2_popc_lipid.gro -p lipid_system.top -o lipid_em.tpr mpiexec mdrun_mpi_d -np $NUM_PROC -s lipid_em.tpr -o after_lipid_em -c after_lipid_em -e after_lipid_em -g after_lipid_em -x after_lipid_em -v > lipid_em_out.run 182

After running energy minimization, a position restrained simulation was run to be ensure that the lipid membrane has achieved an acceptable area per lipid value. The

“em.mdp” file, with several changes was used to build the “lipid pr.mdp”. Consult the GROMACS manual for a description of all parameters to ensure that the values used are appropriate for each simulation. The optimize fft parameter was turned off because the fft optimization algorithm uses the system time as an input parameter, rendering it impossible to recreate a simulation exactly if needed. It was not found to have a significant impact on performance. Also, note that in the .mdp file, -

DLIPPOSRES is defined, but the corresponding #ifdef statement in the system topology file uses #ifdef LIPPOSRES.

;lipid_pr.mdp define = -DLIPPOSRES constraints = all-bonds integrator = md nsteps = 125000 ;250 ps gen_vel = yes gen_temp = 310 gen_seed = 1618 optimize_fft = no comm_grps = POP SOL

The commands to run the position restrained simulation are similar to those used for energy minimization. grompp_d -np $NUM_PROC -f pr.mdp -c after_em.gro -p lipid_system.top -o pr.tpr mpiexec mdrun_mpi_d -np $NUM_PROC -s lipid_pr.tpr -o after_lipid_pr -c after_lipid_pr -e after_lipid_pr -g after_lipid_pr -x after_lipid_pr -v > lipid_pr_out.run 183

Water molecules were removed from the downloaded membrane blocks because a later step using the inflategro script ignores them. The number of atoms listed in line 2 of the .gro files was updated after running the grep command. Note that whenever the number of molecules is changed, the system topology must be updated as well. grep v SOL after_lipid_pr.gro > lipid_after_pr_no_water.gro

C.3.4 Protein Orientation

The lipid without water and the protein were loaded into VMD. VMD commands were entered in the VMD-TkConsole. The transformation matrix from software

TMDET will be similar to that shown below [80]. The TMDET webserver accepted a membrane protein in .pdb form and returned a transformation matrix that can be used to properly align the protein with the lipid membrane.

This matrix was entered into VMD in manner shown below. A 4 by 4 matrix is required by the move command, so the bottom row {0001} wasaddedasTMDET did not include it. When constructing this matrix, be sure to add spaces between the brackets below as in “}{”, not “}{”. set mymove {{0.99934930 -.03243649 0.01577429 -81.85899740} 184

{0.02072185 0.15835702 -0.98716444 -15.721035} {0.02952216 0.98684901 0.15892613 -6.09399223} {0 0 0 1}}

The protein was selected as VMD’s “top” molecule and the following two com- mands were entered in the console: set myprot [atomselect top "all"]

$myprot move $mymove

After these steps, the protein was oriented correctly with respect to the bilayer.

The TMDET transformation matrix produced protein coordinates such that the Z- axis coincides with the membrane normal and the new origin was at the midpoint of the membrane’s width. The protein-lipid interface was observed during the fully unrestrained MD simulations to make certain the protein did not move a considerable amount.

C.3.5 Protein Centering

The inflategro script used later section expands the location of each lipid molecule

(though not the lipids themselves) across the x-y plane [83]. The inflategro script works best if the center of the box is at the location (x/2, y/2, z/2). This was already true of the membrane coordinates, but the oriented protein was translated to align correctly with the membrane. A script from the VMD manual, “geom center.tcl”, returns the center of the membrane as a vector, and was used to center the protein. 185

The move command is used to shift the protein such that it was centered in the mem- brane. These steps aligned the lipid and protein because the proteins transmembrane center was set to the origin by TMDET. Proteins that are highly asymmetrical with respect to the z-axis may be translated in the x-y plane for visualization purposes.

**geom_center.tcl** proc geom_center {selection} { # set the geometrical center to 0 set gc [veczero] # [$selection get {x y z}] returns a list of {x y z} # values (one per atoms) so ge t each term one by one foreach coord [$selection get {x y z}] { # sum up the coordinates set gc [vecadd $gc $coord] } # and scale by the inverse of the number of atoms return [vecscale [expr 1.0 /[$selection num]] $gc] } **geom_center.tcl**

The “geom center.tcl” file was placed in the present working directory and the

source command was used to run the .tcl in the VMD console.

The lipid molecule was selected as the “top” molecule and following commands

were performed in the console to translate the protein:

set lipid [atomselect top "all"] geom_center $lipid This returns a vector (x y z) set moveforward {x y z} $myprot moveby $moveforward $myprot writepdb protein_aligned.pdb 186

C.3.6 Protein Protonation

This step was used to create a new structure with protons added to the protein

sidechains based on an internal GROMACS algorithm.

pdb2gmx -f protein_aligned.pdb -ff gmx -o protein_al_proto.pdb

C.3.7 Protein Position Restraints

Position restraint files were created which list protein atoms that were restricted during the lipid and protein preparation phase. The force constant for the energy minimization steps of the bilayer preparation phase was 100,000 kJ mol−1 nm−2.

During the preparation phase, the inflategro script scaled the lipid locations, bringing them very close to the protein. This interaction may cause the protein to deform if it was not strongly held in place. In practice, the larger restraint locked the atom(s) in place (even under strong short-range electrostatic interactions). The position restraint

file created with the genpr command restrained all atom species, including hydrogen atoms. Select Group 1 (Protein) at the prompt.

In a later step, the position restrained equilibrium phase will restrain only heavy atoms (Section C.7). The equilibration force for the inflategro script is 100 times larger than that used for position-restrained MD. The weaker restraint allows signif- icant short-term perturbations. genpr -f protein_al_proto.pdb -o posre_prep.itp fc 100000 187

C.3.8 Protein Topology

The protein topology file, “topol.top”, was renamed to “protein.itp”. The line with “#include ”ffgmx.itp”” was removed as this was included in the “system.top”

file and it cant be included twice. “Protein X” under the moleculetype directive was changed to “Protein”. All lines after the dihedral list at the end of the file were removed.

The final line should resemble:

9541 9544 9543 9542 2.

C.3.9 System Concatenation

VMD wrote a .pdb file for the reoriented protein. It needed to be in a .gro format to concatenate with the energy-minimized lipid structure created in Section C.3.1.

This was accomplished with GROMACS’ editconf command. editconf f protein_al_proto.pdb o protein_al_proto.gro cat protein_al_proto.gro 2x2_lipid_after_pr_no_water.gro > system1.gro

From the “system1.gro” file, the Z vector of the protein box listed on the first line (third value) separating the protein coordinates with the lipids was noted. The number of lipid atoms listed on the third row was also noted. The three lines between the protein and lipid data were deleted. An trial and error process was needed to choose the the system Z vector based on the protein Z vector. A system vector of

16.00 was sufficient for a typical protein Z vector of 12.00. The final line was similar 188 to: 12.33526 12.21782 16.00000. The Z vector of the box needed to be adjusted such that when the system was solvated, no solvent atoms were within twice the larger of the Van der Waals cutoff or the Coulombic cutoff of opposite sides of the protein.

Finally, the atom total in 2nd line of the file was updated by adding the number of lipid atoms to the number of protein atoms.

C.3.10 System Topology Creation

The “system.top” was created with lines removed from the end of the protein topology file from the output of pdb2gmx. A sample “system.top” file is shown below:

;system.top ;Include Position restraint file #include "ffgmx.itp"

#include "protein.itp" #ifdef PROPOSRES #include "posre_prep.itp" #endif

#include "popc.itp"

[ system ] ; Name PR_2x2

[ molecules ] ; Compound #mols Protein 1 POP 512 189 C.4 Membrane Hole Creation

The following files were placed in the present working directory to conduct steps

needed to conduct the membrane hold creation. The em.mdp file was reused from

the previous membrane EM phase. em.mdp EM parameters ffgmx.atp force field ffgmx.itp force field ffgmxbon.itp force field ffgmxnb.itp force field inflategro inflategro script popc.itp lipid topology posre_prep.itp protein position restraints protein.itp protein topology run_inf_gro script to loop inflategro/EM system.top system topology system1.gro system coordinates

An overview of the inflategro script methodology was given in Section 3.6.1. Details of

the inflategro script can be found on Peter Tieleman’s website (http://moose.bio.ucalgary.ca/).

A first iteration of the inflategro script was run manually to inflate the membrane by

a factor of four.

perl inflategro system1.gro 4 POP 14 inflated_bilayer0.gro 5 areaperlipid0.dat

All lipids within the cutoff distance of 14 A˚ (4th inflategro argument) were re-

moved by the command. The “system.top” file was updated accordingly. The next

series of steps was completed via a script, “run inf gro”. It took input from “in-

flated bilayer0.gro”, ran EM, and then called inflategro. In this step, and the remain-

ing steps, inflategro scaled the lipid x and y coordinates by 0.95 and then re-centered 190 the protein in the box. This process was continued until the inflategro command returned an area per lipid less than the experimental value. A guess (detailed below) was made for the number of steps; then an appropriate output structure was chosen.

For example, N steps of inflategro may be run, but the best structure (based on the area per lipid) was the output from step N-3. The experimental value of area per lipid of POPC is 64 A˚2 [84].

The area per lipid from the manual run of inflategro was listed in areaperlipid0.dat, and was used to estimate a reasonable loop counter. This area, 9.43 nm2, was scaled by 0.952 each step and determined the number of steps to reach an area of 64 A˚2 per lipid. 9.43*(0.95)$^{2N}$=0.6 log$_{0.9025}$(0.0636)=N where N is 27, rounded from 26.87.

The “em.mdp” file was based on the “lipid em.mdp” file with the following changes due to the addition of the protein: xtc_grps = POP Protein energygrps = POP Protein tc-grps = POP Protein tau_t = 0.1 0.1 ref_t = 310 310

The “run inf gro” script is shown below: # run_inf_gro script puts a hole in the membrane #!/bin/bash X=0 while [ $X -le 27 ] do grompp -f em.mdp -c inflated_bilayer$X.gro -p system.top -o 191 em$X.tpr -v > grompp$X.out mdrun -s em$X.tpr -o after_em$X -c after_em$X -e after_em$X -g after_em$X -x after_em$X -v > mdrun$X.out

Y=$((X+1)) perl inflategro after_em$X.gro 0.95 POP 0 inflated_bilayer$Y.gro 5 areaperlipid$Y.dat > inflategro_output$X.out

X=$((X+1)) done

Keep in mind that the value of N was only an estimate, and the “areaperlipid.dat” outputs were checked to determine which structure to use, or if more iterations needed to be executed. After the script was run, the appropriate step with which to build the system was determined. The previous step’s output was used (i.e. the structure with closest area per lipid larger than the experimental value) because the inflategro script overestimates the area of the protein.

The EM outputs were concatenated using a script such as the following: #concatenate script cat after_em0.gro after_em1.gro after_em2.gro after_em3.gro after_em4.gro after_em5.gro after_em6.gro after_em7.gro after_em8.gro after_em9.gro after_em10.gro after_em11.gro after_em12.gro after_em13.gro after_em14.gro after_em15.gro after_em16.gro after_em17.gro after_em18.gro after_em19.gro after_em20.gro after_em21.gro after_em22.gro after_em23.gro after_em24.gro > all.gro

The GROMACS command trjconv was used to convert the .gro frames to a GRO-

MACS .xtc trajectory. The protein group was selected for centering and the system was chosen as output. 192 trjconv f all.gro o all.xtc s em0.tpr center zero

The RMSD of the protein was checked by running g rmsdist in an X11 window to ensure that it had not been significantly perturbed. The protein backbone was selected at the prompt. The RMS deviation was minimal (¡ 1 Angstrom). g_rmsdist f all.xtc s em0.tpr w

C.4.1 System Box Adjustment

This step centered the system in the box. The border of water on the cytoplasmic and lumenal ends of the protein facilitated post-production run analysis. An initial solvation of the system was completed with the following command: editconf -f after_em25.gro -o after_em25_adjusted.gro -d 0

The center of the system was given in the output:

system size : 15.953 16.353 12.604 (nm) center : 6.830 6.734 4.293 (nm) box vectors : 13.687 13.556 16.000 (nm)

To adjust the system center, half the x and y values of the “box vectors” output were given as input to another editconf command. The z vector passed to the editconf command was manually adjusted. The genbox and editconf procedure was repeated until the system was properly centered. The extent of the water box indicated whether the system was centered correctly, but the steps in the next section produced the final solvated system. The following two commands were repeated with adjustment of z vector value (6) until appropriate solvation was acquired. 193 editconf -f after_em25.gro -o after_em25_adjusted.gro -center 6.830 6.734 6 genbox -cp after_em25_adjusted.gro -cs -o system_h2o.gro

C.5 System Solvation with Water

The genbox algorithm is geometrically based, rather than thermodynamically based, so a simple solvation step such as that just performed will introduce water molecules in many locations where they should not exist: namely among the hy- drophobic tails of the lipid bilayer molecules. Solvation methods must either prevent introduction of these water molecules or remove them afterwards. Genbox uses a parameter file to store van der Waals raddi of atoms. One method to prevent undesir- able water molecules is to adjust this file by artificially increasing the size of the lipid tail atoms [83]. Therefore, simply increasing the radii of the last four carbon atoms of each tail of the POPC molecules prevented the placement water molecules in this region. These atoms are labeled C30, C31, CA1, CA2, and C46, C47, C48, and C50 in the topology file. A 5 A˚ radius was found to remove the water molecules among the tails of the bilayer. The “vdwradii.dat” file, shown below with radii in nm, was returned to its original state after the genbox command. The default location for

this file is “$GMX ROOT/share/gromacs/top/”.

Vdwradii.dat ; Very approximate VanderWaals radii ; only used for drawing atoms as balls or ; for calculating atomic overlap. ; longest matches are used 194

; ’???’ or ’*’ matches any residue name ; ’AAA’ matches any protein residue name ... ??? C30 0.5 ??? C31 0.5 ??? CA1 0.5 ??? CA2 0.5 ??? C47 0.5 ??? C48 0.5 ??? C49 0.5 ??? C50 0.5

The final solvation command was run: genbox -cp after_em25_adjusted.gro -cs -o system_h2o.gro

It was also important to remove water molecules that were close to the protein because they may have been introduced in inappropriate areas. The solvated system was loaded into VMD and the Tk Console was opened. The command below saved all atoms to the new pdb except water molecules within 5 A˚ of the protein. set my_prot [atomselect top "protein or resname POP or same residue as resname SOL and not within 5 of protein"]

$my_prot writepdb system_no_ions.pdb

Other water molecules were removed by visual inspection in VMD and manual editing of the coordinate file. Finally, the “system.top” file was updated to reflect the new number of water molecules. This can be accomplished by analyzing the .pdb file, running the GROMACS make ndx command, or using the num command on an atom selection in VMD. 195

C.5.1 Charge-Neutral System Created by Ion Addition

To use the genion command, a GROMACS .tpr file was required. grompp -f em.mdp -c system_no_ions.pdb -p system.top -o system_no_ions.tpr

The output also contained the charge of the system, thereby indicating how many ions must be added to create a charge-neutral system. A charge of -25 was indicated the output shown below:

NOTE: System has non-zero total charge: -2.499995e+01

The genion command was used to convert water molecules to Na+ ions. genion -s system_no_ions.tpr -o system_with_ions.pdb -np 25 -pname Na -random

In this case, 25 water molecules were converted to Na+ ions to counteract the negative charge. For genion, the “SOL” group was selected at the “Select a con-

tinuous group of solvent molecules” prompt. Since the ions were introduced in place

of randomly selected water molecules, the structure was checked to ensure that no

ions were introduced in the vicinity of areas of interest, namely near the lumenal and

cytoplasmic sides of the transmembrane region.

C.5.2 System Topology Update

The “system.top” topology file was updated to reduce the number of “SOL” molecules that were replaced by Na+ ions. It is important to keep the “system.top” 196

file updated as a system is built or edited. The molecules directive then showed the following information:

[ molecules ] ; Compound #mols Protein 1 POP 507 SOL 68318 Na 25

C.5.3 New Index File Creation

In order to temperature couple the water (“SOL”) and Na+ ion (“Na”) groups, a new index file was created. The GROMACS utility make ndx wascalledinthe following manner: make_ndx -f 1SU4_with_ions.pdb

The “SOL” and “Na” groups were selected using the “—” character. This choice created a new group, “SOL Na”. The new index file was saved with a default name of “index.ndx” by quiting make ndx.

13 | 14

The “em.mdp” file was updated to change the “SOL” group to “SOL Na”. It was important to temperature couple the group of Na+ ions with the water group.

Because the Na+ group contained only a few ions, a change in the velocity of a

single ion could significantly impact the temperature to which the group is coupled.

Coupling the ions to the solvent removed this issue. 197 C.6 Full System Energy Minimization

Energy minimization was run to remove steric conflicts, which would otherwise cause MD to fail immediately. A list of necessary files and examples of some are given below. The parameters for files varied greatly depending on the system and desired conditions. Changes to the “em.mdp” and “system.top” files are given below. It was helpful at this point to copy the relevant files to a new directory to separate them from the system setup files described in the previous sections.

;em.mdp xtc_grps = Protein POP SOL_Na energygrps = Protein POP SOL_Na tc-grps = Protein POP SOL_Na tau_t = 0.1 0.1 0.1 ref_t = 310 310 310

;system.top #include "ions.itp" [ molecules ] ; Compound #mols Protein 1 POP 505 SOL 66818 Na 25

The grompp and mdrun commands were run in a similar manner as before: grompp -f em.mdp -c 1SU4_with_ions.pdb -p system.top -o em.tpr -n index.ndx mdrun -s em.tpr -o after_em -c after_em -e after_em -g after_em -x after_em -v 198 C.7 Position Restrained MD

In order to bring the system to a more stable state, artificial restraints were added to specific groups of atoms in a method called Position Restrained (PR) MD. The

“lipid posre.itp” and “posre.itp” restraint files were used for atoms of the lipid and protein, respectively. The genpr command was used to create a “posre.itp” file that contained position restrained force constants for the protein’s heavy atoms. Execution of genprcreated restraints for only the first molecule of the input file, so it did not matter that the input file contained water molecules, lipid molecules, and ions. genpr -f after_em -o posre.itp fc 1000

The “lipid posre.itp” file was created manually and is shown below. The phos-

phate head (atom 8) and one carbon atom on each tail (atoms 50,51) were harmoni-

cally restrained to the z-plane with a force constant of 1,000 kJ mol−1 nm−2.

;posre.itp [ position_restraints ] ; atom type fx fy fz 1 1 1000 1000 1000 5 1 1000 1000 1000 6 1 1000 1000 1000

;lipid_posre.itp [position_restraints] ; atom type fx fy fz 8 1 0.0 0.0 1000.0 50 1 0.0 0.0 1000.0 51 1 0.0 0.0 1000.0 199

Several changes were made to the .mdp parameter file and the system topology file.

Both files began with their counterparts from the previous EM step and were updated with the changes listed below. The “pr.mdp” file was based on the “em.mdp” file and the “system.top” file reflected the change to include protein position restraints.

Position restrained trajectories were checked to ensure system stability. The average area per lipid was verified to relax to a stable value as discussed in Section 3.7.

;pr.mdp define = -DPROPOSRES -DLIPPOSRES constraints = all-bonds integrator = md nsteps = 250000 ;500 ps gen_vel = yes gen_temp = 310 gen_seed = 1618

;system.top #include "protein.itp" #ifdef PROPOSRES #include "posre.itp" #endif

C.8 Fully Unrestrained MD

Simulations with no restraints were begun once position restrained MD provided a suitably stable simulation system, as described in Chapter 3. The changes from the “pr.mdp” file to create a “full.mdp” file were minor. Using a semicolon, the line defining the position restraints was changed to a comment. Second, the parameter

“gen vel” was changed to “no”. This approach kept system energy continuous by 200 using atom velocities supplied by the energies in the .edr file from the PR simulation output.

;full.mdp ;define = -DPROPOSRES -DLIPPOSRES gen_vel = no

C.8.1 MD Run Extension

A simulation that has been run to completion can be extended with the use of the tpbconv command. Once the initial run input (.tpr) file has been created with

grompp in the initial fully unrestrained run, only the .tpr, .edr (energy), .ndx (index),

and .trr (trajectory) files are needed to extend a simulation. Note that the .trr file

format should be used for extending runs. The .xtc file format is in a compressed

trajectory format and is suitable for analysis, but should not be used for simulation

input because its values have been truncated to reduce file size. The sample below

shows how input data for a 2000 ps simulation ( 0 2000 suffix) was used to create a run input for the next 2000 ps simulation ( 2000 4000 suffix). tpbconv -s run_0_2000ps.tpr -f run_0_2000ps.trr -e run_0_2000ps.edr -n index.ndx -o field_run_2000_4000ps.tpr -extend 2000

C.8.2 Permeant Ion Simulations

To investigate binding sites or ion pathways, ions were added manually to a start- ing .pdb coordinate file. The total system charge must be kept neutral, so the ad- ditional ion(s) were actually ions added previously for the purpose of charge neu- 201 tralization. The ion coordinates were manually adjusted in the .pdb to place each ion at the desired location. Energy minimization was run to remove steric conflicts.

Next, a short (20 ps) PR MD simulation was run to allow water molecules to relax.

Protein and lipid atoms were restrained with the same same parameters as previous

PR simulations. Ions were strongly restrained (1,000,000 kJ mol−1 nm−2)toprevent their movement during the short relaxation phase. The restraints are strong since the ion(s) will encounter large Coulombic forces. The “ions.itp” line in “system.top” was commented out and the following lines were added before running the PR simulation.

“POSRES NA” was defined in the .mdp file using “define = -DPOSRES NA”.

[ moleculetype ] ; molname nrexcl Na 1 [ atoms ] ; id at type res nr residu name at name cg nr charge 1 Na 1 Na Na 1 1

#ifdef POSRES_NA ; Position restraint for each NA [ position_restraints ] ; i funct fcx fcy fcz 1 1 1000000 1000000 1000000 #endif

; #include "ions.itp"

C.9 CAVER Pathway Analysis

This section explains how to analyze GROMACS trajectories with CAVER. The

CAVER pathway tool has the ability to analyze many frames from an MD trajectory 202 at once. Input files in the .pdb format were created with one or more frames of the system. For instance, typical usage was to incorporate frames every 100 ps from a 1 ns

file to create a trajectory with 11 frames. In practice, many CAVER pathways found an incorrect pathway by jutting out into the lipid tail area of the bilayer. Although sterically accessible, these pathways do not indicate putative ion pathways. Inclusion of lipids in the analysis greatly lengthened the amount of time for a pathway to complete because it generally must have reached to the edge of the simulation space.

The absense of lipids in the data to be analyzed by CAVER decreased the size of input and output files, and significantly decreased program run time. Since lipid molecules had a negligible affect on the overall CAVER results, only protein atoms were included in the trajectories to be analyzed. Water molecules are ignored by

CAVER and were not included in the input file.

Superposition of the entire protein based on the protein’s transmembrane region removed minor translational and rotational changes that occurred during the un- restrained simulation to enable better comparison of CAVER trajectories from one timeframe to another timeframe. This approach minimized the effect on superposition of larger movements associated with the flexible extracellular loops and intracellular domains. The make ndx command was used to create an index file with a group that contained α-carbon atoms from only the transmembrane helices. Note that convert- ing a compressed trajectory file (.xtc) to a .pdb file can produce a very large output depending on the size of the protein and the number of frames. The trjconv input 203 argument, “-dt” (“-skip” is also appropriate) was used to include only frames at 0 ps,

100 ps, 100 ps, etc. in the output .pdb file. The following command superposed the protein onto its transemembrane helices (centering group was selected at the prompt) and wrote out a .pdb file. trjconv -f input_trajectory.xtc -o protein_output_superposed.pdb -s beginning_protein_structure.pdb -dt 100 -n index.ndx -fit rot+trans

CAVER accepted input in the AMBER .trj file format [42]. The following two commands were used to convert a trajectory from .pdb format to a .trj format. The output was a list of the x,y, and z coordinates of each atom over all frames. grep ATOM protein_output_superposed.pdb > temp awk ’{print $7 " " $8 " " $9}’ temp > protein_output_superposed_caver.trj

A types.dat file was created which was a list of the atom types in the trajectory system. CAVER used this information to determine the radii of each atom in the system. This list was created from a .pdb of one time frame using the command below. Select “Protein” when prompted by the trjconv command. Then, the linux command awk was used to extract the atom types from the .pdb file. trjconv -s full.tpr -f full_500_1000.xtc -o types.pdb -dump 1000 awk ’ATOM {print $3}’ types.pdb > types.dat

Finally, the types.dat file was edited (a template is available on the CAVER website) so that the first line contained the number of atoms. This was quickly accomplished by executing “vi types.dat” and viewing the number of lines. The 204 remaining details of running CAVER and viewing pathways in VMD may be found on the CAVER website. An example CAVER configuration input describing files and other parameters is shown below along with a CAVER execution command.

0.7 radius.dat types.dat protein_3000_7000_s.trj 201100 3 2 7148 7409 0 caver --enable-output-vmd -d siteII -t 5 -o siteII -i config_siteII.dat

C.10 Simulations and Electrostatic Analysis

Electrostatic analysis via the PME Electrostatics plugin in VMD required a .pqr

file of the system which contained the position (p), charge (q), and radius (r) of each atom. This file was created with the GROMACS’ editconf command shown below.

The linux sed command below was critical to fix the column spacing so that VMD may correctly read the file. Using VMD, a trajectory file (.xtc) was loaded into the

.pqr molecule to find an average electrostatic potential using the PME plugin. editconf -f field_3000_4000.tpr -mead sed -e ’s/\([.][0-9][0-9][0-9]\)\([0-9][0-9]\)/\1/g’ -e ’s/\([.][0-9][0-9][0-9][0-9]\) /\1 /g’ -e ’s/\([A-Z][A-Z][A-Z]\) [A-Z]/\1 /g’ mead.pqr > mead_new.pqr 205

C.10.1 Applied Electric Field Simulation

The applied electric field in GROMACS was 0.0528 V/nm and was implemented by adding “E Field 1 0.0528 0” to the full.mdp file. Coupled with a membrane thickness of 3.78 nm, this field produced the desired 200 mV potential pointing into the cytoplasm along the z-axis. Care must be taken to ensure that the simulation and electric field reach a stabilized state. In this work, 1 ns was allowed as an equilibration time. This choice was largely arbitrary, but was based on comparison of electrostatic data from various time frames of the simulation.

C.10.2 Particle Mesh Ewald (PME) Calculation

The PME plugin allowed straightforward electrostatic calculations on simulation

trajectories. Since these systems have PBC, the “enclose” value was set to 0. A grid

size of 112 x 112 x 132 was used for these simulations. Grid resolution was dependent

on video card memory. The modified version of PME accepted a value for the biasing

field in units of kT/(eA).˚ For a system that had a length of 167.8 Anormaltothe˚ membrane axis, 200 mV potential corresponded to a PME value of 0.046 kT/(eA).˚ A

Ewald smoothing factor of 0.25 was used for all analyses [51].

C.10.3 DX File Averaging via OpenDX

OpenDX is a powerful software program for visualizing and manipulating 3D data.

Its use cannot be discussed in detail here, but it was helpful to this work in two ways. 206

First, it allowed one to easily average .dx files. This was necessary because large trajectories required large amounts of memory in VMD. For a 200,000 atom system, a GROMACS .xtc trajectory with 1000 frames required a machine with at least 2 GB of ram. To analyze longer trajectories, PME electrostatic analyses were performed on trajectories piece by piece and OpenDX was used to compile a single .dx file. Second,

OpenDX allows great flexibility in data visualization.

VMD required the .dx file in a specific format, which was slightly different than that saved by OpenDX. The line below beginning with “{print” was placed in a file,

“fix dx.awk”. Then, the second line containing the awk command was executed in

a terminal window. These steps reformatted the six-column OpenDX .dx format to

the three-column .dx format required by VMD. Finally, the eight-line header and

five-line footer of one of the .dx files originally produced by VMD was copied into the

“openDX ouput fixed.dx” file, replacing the OpenDX header and footer.

{print $1 "\t" $2 "\t" $3 "\n" $4 "\t" $5 "\t" $6}

awk -f fix_dx.awk openDX_output.dx > openDX_output_fixed.dx