Modeling of voltage-gated ion channels

Modeling of voltage-gated ion channels

Pär Bjelkmar c Pär Bjelkmar, Stockholm 2011, pages 1-65

Cover picture: Produced by Pär Bjelkmar and Jyrki Hokkanen at CSC - IT Center for Science, Finland. Cover of PLoS Computational Biology February 2009 issue.

ISBN 978-91-7447-336-0

Printed in Sweden by US-AB, Stockholm 2011 Distributor: Department of Biochemistry and Biophysics, Stockholm University Abstract

The recent determination of several crystal structures of voltage-gated ion channels has catalyzed computational efforts of studying these re- markable molecular machines that are able to conduct ions across bi- ological membranes at extremely high rates without compromising the ion selectivity. Starting from the open crystal structures, we have studied the gating mechanism of these channels by molecular modeling techniques. Firstly, by applying a membrane potential, initial stages of the closing of the channel were captured, manifested in a secondary-structure change in the voltage-sensor. In a follow-up study, we found that the energetic cost of translocating this 310-helix was significantly lower than in the origi- nal conformation. Thirdly, collaborators of ours identified new molecular constraints for different states along the gating pathway. We used those to build new models that were evaluated by simulations. All these results point to a gating mechanism where the S4 helix undergoes a secondary structure transformation during gating. These simulations also provide information about how the protein in- teracts with the surrounding membrane. In particular, we found that lipid molecules close to the protein diffuse together with it, forming a large dynamic lipid-protein cluster. This has important consequences for the understanding of protein-membrane interactions and for the theories of lateral diffusion of membrane . Further, simulations of the simple antiamoebin were per- formed where different molecular models of the channel were evaluated by calculating ion conduction rates, which were compared to experimen- tally measured values. One of the models had a conductance consistent with the experimental data and was proposed to represent the biological active state of the channel. Finally, the underlying methods for simulating molecular systems were probed by implementing the CHARMM force field into the GROMACS simulation package. The implementation was verified and specific GROMACS-features were combined with CHARMM and evaluated on long timescales. The CHARMM interaction potential was found to sample relevant protein conformations indifferently of the model of solvent used.

5

List of Papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Bjelkmar, P., Niemela, P. S., Vattulainen, I., Lindahl, E. (2009) Conformational changes and slow dynamics through microsecond polarized atomistic molecular simulation of an integral Kv1.2 ion channel. PLoS Computational Biology, 5(2) e1000289. II Schwaiger, C. S., Bjelkmar, P., Hess, B., Lindahl, E. (2011) 310-helix conformation facilitates the transition of a voltage sensor S4 segment toward the down state. Biophysical Journal, 100(6):1446-54. III Henrion, U., Renhorn, J., Börjesson, S. I., Nelson, E. M., Schwaiger, C. S., Bjelkmar, P., Wallner, B., Lindahl, E., Elin- der, F. (2011) Tracking a complete voltage-sensor cycle with metal-ion bridges. Submitted. IV Niemela, P. S., Miettinen, M. S., Monticelli, L., Hammaren, H., Bjelkmar, P., Murtola, T., Lindahl, E., Vattulainen, I. (2010) Membrane proteins diffuse as dynamic complexes with lipids. Journal of the American Chemical Society, 132(22):7574- 5. V Wilson, M. A., Wei, C., Bjelkmar, P., Wallace, B. A., Po- horille, A. (2011) Molecular dynamics simulation of the an- tiamoebin ion channel: linking structure and conductance. Biophysical Journal, 100(10):2394-402. VI Bjelkmar, P., Larsson, P., Cuendet, M., Hess, B., Lindahl, E. (2010) Implementation of the CHARMM force field in GRO- MACS: Analysis of protein stability effects from correction maps, virtual interaction sites, and water models. Journal of Chemical Theory and Computation, 6:459-66.

Reprints were made with permission from the publishers.

7

Contents

1 Introduction ...... 11 1.1 Membranes and transport proteins...... 12 1.1.1 Material transport across membranes...... 12 1.1.2 Cellular membranes...... 13 1.1.3 The membrane potential...... 14 1.2 Voltage-gated potassium channels...... 17 1.2.1 Voltage-dependent gating...... 20 1.2.2 Slow inactivation...... 22 1.2.3 The role of 310-helices...... 22 2 Molecular modeling ...... 25 2.1 Molecular dynamics...... 26 2.1.1 Basic molecular dynamics theory...... 27 2.1.2 Force fields...... 28 2.1.3 Statistical mechanics...... 32 2.1.4 Modeling the membrane potential in simulations...... 34 2.2 Modeling of protein structures...... 36 2.2.1 Comparative modeling...... 38 2.2.2 Physics-based modeling and structural refinement...... 39 3 Summary of papers ...... 41 3.1 Conformational changes in the KV 1.2 ion channel through polarized microsecond molecular simulation (paper I)...... 41 3.2 Secondary structure of the S4 helix in gating. (paper II)...... 42 3.3 Tracking the voltage-sensor gating motion with metal-ion bridges, computer modeling and simulation (paper III)...... 43 3.4 Lateral diffusion of membrane proteins (paper IV)...... 43 3.5 Linking structure and conductance in a simple ion channel (paper V)...... 44 3.6 Implementation of the CHARMM force field in GROMACS (paper VI)...... 45 4 Concluding remarks and future perspectives...... 47 5 Svensk populärvetenskaplig sammanfattning ...... 51 6 Acknowledgements ...... 53 Bibliography ...... 57

1. Introduction

The fact that life evolved out of nearly nothing, some 10 billion years after the universe evolved out of literally nothing, is a fact so staggering that I would be mad to attempt words to do it justice.

Richard Dawkins

This doctoral thesis describes the work that I have done during my time as a PhD student at the Department of Biochemistry and Biophysics at Stockholm University during the years of 2006 through 2011. The studies have been published in peer-reviewed scientific journals and the papers are the core material of the thesis. They are presented as appendices in the end of this book. To put this work in perspective and to provide a brief overview to the field, a preceding introductory and summarizing text has been compiled. It starts with a background chapter introducing the underlying biology and continues to describe the methodology and key concepts in greater depth. The next two chapters contain a summary of the main results and conclusions, and a technology walkthrough in- cluding some speculation of where we might head in the future. The text ends with a popular science summary in Swedish, my acknowledgements and the bibliography. The work has been a great journey in the interdisciplinary field I prefer to call computational biology. The proteins that have been studied are incredible molecular machines incorporated in cellular membranes. They are precisely regulated by electrical changes in the local environment and capable of highly selective conduction of ions at great speeds. The open- ing and closing of these proteins have been the main subject of my re- search. In my toolbox I have had models describing interactions between atoms, and cutting-edge software programs utilizing hundreds of com- puter processing units in parallel on the most modern computer clusters there is. These models together with the data programs can be thought of as a "computational microscope" (term coined by Klaus Schulten), generating a great amount of atomic-level information on the dynamics of the molecules studied. The data has been analyzed, complemented and corroborated with experimental results derived from biological systems in the wet lab.

11 1.1 Membranes and transport proteins As soon as the information-bearing molecules were encapsulated by a membrane somewhere in the early evolution of life, material for the pro- cesses of life needed to be transported across this impermeable barrier. In truth, for small and uncharged molecules, the transport could in fact be unassisted diffusion across the membrane due to the relatively low free-energy cost of such transport, but this energy barrier is in practice too high for larger and charged, even though small, molecules. Addition- ally, unassisted diffusion is unspecific by nature, and what is required is a highly efficient and regulated selective transport of a particular molecule or group of molecules. For such a regulated transport of larger and charged molecular species, dedicated molecular machines evolved. The birth of such remarkable molecules was of course not a blink-of-an-eye event but, as is the case for evolution in general, it was achieved by gradual improvement of the molecules involved by molecular evolution over millions of years. There are still traces of the gradual evolution of transport proteins in the liv- ing organisms today, both as simple molecules with primitive functions, as well as independently evolved protein domains found in the modular organization of modern molecular machines.

1.1.1 Material transport across membranes Cells are distinct entities separated from the surrounding environment by the plasma membrane. Within one cell, membranes are also efficiently dividing it into different compartments, called organelles, for localized and specialized functions (Figure 1.1). By incorporating specific proteins into the membranes the necessary flow of material between the organelles and the surrounding environment can be achieved. There are many dis- tinct types of membrane transport proteins and they can be regulated in different ways. There are two mechanisms for transport of substances, substrates, across a membrane; energy-independent facilitated diffusion and energy-driven active transport. There are six characterized classes of transporters [1]: channels and pores, electrochemical potential-driven transporters, primary active transporters, group translocators, transmembrane electron carriers, and accessory factors involved in transport. The first and second groups are proteins catalyzing facilitated diffusion. The channel group opens up an aqueous transmembrane pore for substrate passage along its concentration gradient whereas in the second group the protein itself functions as a carrier. The other groups of transporters include proteins that for example transport a solute against its concentration gradient using a primary source of energy,

12 Figure 1.1: The eukaryotic cell. The cell is separated from the outside by the plasma membrane. Internally, it is divided into subcellular structures, called organelles. The nuclear envelope separates the information-bearing molecules from the rest of the cell. The rough endoplasmic reticulum is a complex membrane structure holding the ribosomes, which are the molecular factories where proteins are built. In the mito- chondria, the proteins of the cellular respiratory system produces most of the ATP. The Golgi apparatus is used for protein sorting and the vesicles are the particles in which the material is transported or, in the case of the specialized lysosome, degraded. Figure reproduced from the On-line Biology Book [2].

such as hydrolysis of ATP, proteins that modify the substrate during the transport, and proteins that facilitate transmembrane electron transport. Membrane channels typically control the flux of materials across cellu- lar membranes through high selectivity combined with high conductivity and through gating that is sensitive to essential environmental factors. For example, they transport water, ions, and other small substrates, as alcohols, as well as large proteins, and their opening and closing can be controlled by ligand-binding, voltage, pH, or pressure changes.

1.1.2 Cellular membranes Membranes found in cells typically consist of phospholipids and a considerable amount of membrane proteins, cholesterol and lipids with sugar adornments, called glycolipids. Phospholipid molecules have a polar head-group and long non-polar tails. In a water environment these

13 Figure 1.2: Membrane protein in bilayer. In this snapshot from paper IV, the transmembrane part of the KV 1.2 channel (green) is embedded in a membrane bilayer consisting of the phospholipid POPC (detailed in inset). The lipids have a polar end facing the surrounding water and two long apolar hydrocarbon chains. These hydrophobic tails aggregate and form the bilayer core (gray). The surrounding water and lipids in front of the protein, in the line of sight, were omitted for clarity. non-polar chains spontaneously aggregate so that their water-exposed surface is minimized, shielded by the polar head-groups pointing toward the water. One of the ways these lipids can organize themselves is as bilayers (Figure 1.2). These membranes are efficient barriers, in practice impermeable to all substances except small uncharged molecules. This is due to their stability and the associated energy cost to make a large-enough hole for a molecule with considerable size to be able to diffuse through, and additionally, the cost of exposing charged and polar particles to the hydrophobic core of the membrane. Under physiological conditions phospholipids in the membrane are in the liquid crystalline phase, which can be considered to be a two- dimensional liquid where the molecules within it can diffuse laterally in the plane of the membrane. The composition of the membranes can al- ter their properties significantly and can in fact also affect membrane protein function. For example, membranes with less amount of choles- terol and/or a higher fraction of short-chained or unsaturated lipids will have a lower melting temperature, i.e. a higher fluidity. Compared to a water environment the diffusion in the bilayer is two to three orders of magnitude slower [3,4].

1.1.3 The membrane potential Virtually all cells have a transmembrane potential across the plasma membrane. The potential has two basic functions, first it provides a driv- ing force for operating the numerous molecular machines embedded in the membrane, and secondly, for excitable cells such as neurons and mus- cle cells, it is used for transmitting signals in between cells. The potential

14 is due to an ion concentration imbalance across the membrane established by the membrane protein Na+/K+-ATPase. These active transport pro- teins each continuously expel three sodium ions out of the cell and at the same time import two potassium ions. Hence, both a charge separation, giving rise to an electrical potential, and concentration gradients are established. Membrane protein channels are typically only permeable to specific ions and their collective interplay of conductances establishes the resting membrane potential as well as the action potential of excitable cells. Having open potassium channels in the membrane enable the potas- sium ions to passively diffuse out of the cell, in the direction of their concentration gradient that is established by the Na+/K+-ATPase. How- ever, as potassium ions move out of the cell, the inside becomes even more negative and the rising transmembrane electric potential starts to coun- teract this diffusion. These two opposite forces become equal and the net flow of potassium ions is zero and a steady state has been reached [5]. It is not a thermodynamic equilibrium state since ions are continu- ously pumped across the membrane. The resulting resting potential due to the charge separation can be calculated using the Nernst equation, see e.g. [6], given a concentration difference. If only potassium ions are considered the membrane potential would be approximately -100 mV at 37 ◦C and typical ion concentrations. Performing the same calculation for the other key ion involved, sodium, gives a transmembrane potential of +55 mV. The fact that the true, measured, resting potential (around -70 mV) is close to the value for potassium ions is because the perme- ation of potassium ions is approximately an order of magnitude larger than sodium ions under normal circumstances [7]. In excitable cells, the membrane potential can be perturbed from its resting state so that a pulse-like signal, called the action potential (Figure 1.3), is formed. In neurons this pulse is used for cell-to-cell communica- tion and in other cells its main function is to activate intracellular pro- cesses, such as muscle cell contraction and insulin secretion. The recent review article by Barnett and Larkman [8] summarizes what is known about the action potential. Its origin is a small local depolarizing re- sponse initiated in some part of the cell membrane by external stimuli, either from specialized nerve endings in sensory neurons or from up- stream neuron signaling in the central nervous system. This local shift towards less negative potential triggers fast voltage-gated sodium chan- nels to open if the local depolarization is larger than a threshold potential of about -45 mV, increasing the permeation of this ion species consider- ably. Consequently, the extracellular sodium ions start to flow into the cell along their concentration gradient further depolarizing the mem- brane. Within one millisecond, the potential reverses and continues to rise towards the steady state potential for sodium ions (+55 mV). How-

15 Peak ~ +40 Overshoot ) mV

Falling Phase 0

Rising Phase

Threshold ~ -45 Failed Membrane Voltage ( Initiations Resting potential ~ -70 Undershoot Stimulus

0 1 2 3 4 5 Time (ms) Figure 1.3: The action potential of excitable cells. Once a local depolarization stimulus from an external source becomes larger than a threshold of approximately -45 mV, the opening of sodium channels depolarize the membrane voltage during the rising phase. The voltage reverses before the slower potassium ion channels have been activated in response to the depolarization. The sodium channels deactivate and the open potassium channels repolarize the voltage during the falling phase. Before the potassium channels have had time to close the normal resting potential of approximately -70 mV is undershoot somewhat.

fredag den 11 november 2011

ever, before this value is reached two opposing factors start to act. First, although voltage-gated sodium channels are activated fast they also have an inherent property of inactivation on the millisecond scale. The sec- ond factor is an increased flux of potassium ions due to the opening of voltage-gated potassium channels as the membrane potential becomes more positive. Hence, potassium ions start to flow out of the cell at the same time as sodium channels are inactivated and the depolarization ceases (peak) and starts dropping again causing the sodium channels to close even faster. Consequently, the membrane is repolarized and returns to the resting potential.

16 1.2 Voltage-gated potassium channels All potassium channels are related members of a single family of proteins and are found in all domains of life; bacteria, archea, and eukaryotes - both plants and animals. Their amino acid sequence is easily recognized because they share a characteristic sequence of residues in the narrowest part of the ion-conducting pore responsible for the molecular discrimina- tion of ions by a remarkably efficient mechanism [9, 10, 11]. The channels are fascinating molecular machines capable of conducting K+ ions almost at the theoretical limit at a rate of ∼108 ions per second, while at the same time keeping the permeation of potassium ions approximately 104 times higher compared to sodium ions [12, 13]. While their ion-permeation characteristics are common, diversity among different subfamilies of K+ channels is mainly manifested by the various ways the channel gate is opened. Potassium channels that are voltage-gated, abbreviated KV , are a diverse and ubiquitous group of membrane proteins belonging to the voltage-dependent cation channel family, which also includes the evolu- tionary related sodium (NaV ) and calcium (CaV ) channels [14]. Normally, the concentration of K+ outside the cell is more than an order of mag- nitude lower compared to that in the intracellular fluid and hence, upon opening these channels, an outward current of positively charged cations, called the alpha current, is established. Accordingly, the KV channels limit cell excitability by returning the membrane potential to its resting state as explained in the preceding section. They are therefor key play- ers in a range of cellular processes [15], such as dictating the duration of the action potential in excitable cells, as well as cell proliferation [16] and volume regulation [17] in non-excitable cells. They are also suscep- tible to disease-causing mutations [18, 19] and are the targets of drugs used against pain, epilepsy, cardiac arrhythmias, hypertension and hy- perglycemia for example. Understandably, the pharmaceutical industry has given these channels considerable attention. The first voltage-gated K+ channel that was isolated, called , came from the fruit fly Drosophila melanogaster [20]. Somewhat amusing, the name comes from the coupling between the gene and the uncon- trolled shaking motion of the fly’s legs when anesthetized with ether [21]. Much of what is known about voltage-gated K+ channels has been derived from electrophysiological studies of this channel in addition to the pioneering work of Hodgkin, Huxley and Katz on the giant squid axon (e.g. [7, 22, 23]). Since the first isolation, hundreds of genes coding for diverse KV channels in different organisms have been found. In the human genome, some 40 KV genes have been identified and character- ized, and they have been assigned to 12 subfamilies (KV 1 to KV 12) based on sequence similarities. The KV 1 subfamily is called the Shaker-related

17 subfamily since the Shaker-homologous human gene KCNA3 is coding for the KV 1.3 channel. Structural knowledge of the ion channels have been sparse and mostly based on biochemical studies on bacterial proteins. Unfortunately, it has turned out to be very tricky to determine the three-dimensional high- resolution structure of membrane proteins, and it was not until 1998 that the first (potassium) ion channel structure, KcsA, was solved by the MacKinnon lab [12]. It is however not voltage-gated, rather pH- sensitive, but only a few years later the same group was able to crystallize the KV AP channel [24] from the thermophilic archaebacteria Aeropyrum pernix (for which, together with previous work, Roderick MacKinnon was awarded the Nobel Prize in Chemistry 2003). Subsequently, the partly incomplete structures of the mammalian KV 1.2 channel [25, 26] were solved and, most recently, a complete structure of a chimeric structure of a KV 1.2/KV 2.1 channel from rat was published [27]. Although these crystal structures have been extremely important for the description of the atomic workings of these proteins, the KV structures all caught the channel in the open state and to date, the closed state, or intermediate states along the opening pathway, have not been properly characterized. Most of the work in this thesis is an attempt to bridge this knowledge gap by studying the Shaker and KV 1.2 channels by computational and com- plementary experimental methods. An alignment of the transmembrane section of these two sequences can be found in Figure 1.4. The KV channels are made of four subunits, each containing six trans- membrane helices, called S1-S6 (Figure 1.5). S1 through S4 form the voltage-sensing domain (VSD) that responds to changes in the mem- brane potential. Helices S5 and S6 from the four subunits come together into the centrally located pore domain (PD) where the ion-conducting pore is present. The structural alterations within the VSDs upon gating are mediated to the PD by a linker helix, called the S4-S5 linker, inducing a conformational change that alters the kink angles of the pore-lining S6 helices which, in turn, control the blocking of the ion-conducting path- way at the intracellular side [28, 24]. In contrast to the distinct subunits (i.e. separate peptide chains) of the KV channels, the evolutionary re- lated NaV and CaV proteins, consist of four covalently bonded domains, each one with a similar structure to a single subunit of the KV channels. Eukaryotic KV channels have an additional intracellular domain, called the tetramerization domain, or T1, aiding the assembly of the trans- membrane part as well as providing a binding platform for an accessory β-subunit whose function is not entirely understood but it is related to a particular class of enzymes [25]. The crystal structures of voltage-gated ion channels have also shown that the VSDs are loosely attached to the pore domain, seemingly stable on their own. Interestingly, recently gene orthologs of the VSD in voltage-

18 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"#$%&$'()"*'+,$ !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"#$%&$'()"*'+,$!"-./0'+,)"1%/"23"4.&"2566"66)78)95 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"#$%&$'()"*'+,$!"-./0'+,)"1%/"23"4.&"2566"66)78)95!":%(('/0;+0%.+ !"-./0'+,)"1%/"23"4.&"2566"66)78)95!":%(('/0;+0%.+!""""='>,?.,/@,",(A%>>B*'+,$=C25665D23=667863=5686=25573EE=%FG'>,?.,/@, !":%(('/0;+0%.+!""""='>,?.,/@,",(A%>>B*'+,$=C25665D23=667863=5686=25573EE=%FG'>,?.,/@,!""""=A>,?.,/@,",(A%>>B*'+,$=C25665D23=667863=5686=25573EE=%FGA>,?.,/@, !""""='.+%!""""=>+0%.+!""""='>,?.,/@,",(A%>>B*'+,$=C25665D23=667863=5686=25573EE=%FG'>,?.,/@,!""""=A>,?.,/@,",(A%>>B*'+,$=C25665D23=667863=5686=25573EE=%FGA>,?.,/@,!""""=0'+'H<;,"IJKLMN1O2 !""""=>+0%.+!""""='>,?.,/@,",(A%>>B*'+,$=C25665D23=667863=5686=25573EE=%FG'>,?.,/@,!""""=A>,?.,/@,",(A%>>B*'+,$=C25665D23=667863=5686=25573EE=%FGA>,?.,/@,!""""=0'+'H<;,"IJKLMN1O2!""""=&'P%P,/"65G5 !""""='>,?.,/@,",(A%>>B*'+,$=C25665D23=667863=5686=25573EE=%FG'>,?.,/@,!""""=A>,?.,/@,",(A%>>B*'+,$=C25665D23=667863=5686=25573EE=%FGA>,?.,/@,!""""=0'+'H<;,"IJKLMN1O2!""""=&'P%P,/"65G5!""""=&'P,Q+,/0"5G7 !""""=A>,?.,/@,",(A%>>B*'+,$=C25665D23=667863=5686=25573EE=%FGA>,?.,/@,!""""=0'+'H<;,"IJKLMN1O2!""""=&'P%P,/"65G5!""""=&'P,Q+,/0"5G7!""""='H%$('+9"P'<$ !""""=0'+'H<;,"IJKLMN1O2!""""=&'P%P,/"65G5!""""=&'P,Q+,/0"5G7!""""='H%$('+9"P'<$!""""=>P$%+,P$%+,P$%+,P$%+,P$%+,P$%+,P$%+,+0%.+ !""""=>P$%+,P$%+,+0%.+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !""""=>P$%+,+0%.+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !"4;<&/BH%$('+)"P'<$!"-,P%$+BH<;,)">+0%.+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR !"-,P%$+BH<;,)">+0%.+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR!!"4;<&/,0B>,?.,/@,>)"2 !RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR!!"4;<&/,0B>,?.,/@,>)"2!"6)"S:T4MBU-L1I !RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR!!"4;<&/,0B>,?.,/@,>)"2!"6)"S:T4MBU-L1I!"2)"S:T42B-4V !!"4;<&/,0B>,?.,/@,>)"2!"6)"S:T4MBU-L1I!"2)"S:T42B-4V!"1'+$,?.,/@,>)"2!"6)"S:T4MBU-L1I!"2)"S:T42B-4V!"1'+$)""""""""""2EYEO8"Z"7G6[\ !"K,/&+X)"EO8!"C0,/+<+F)"""""925YEO8"ZODG7[\!"M<(<;'$<+F)"""989YEO8"Z83G3[\!"W'P>)""""""""""2EYEO8"Z"7G6[\!"M@%$,)"6O63G5 !"C0,/+<+F)"""""925YEO8"ZODG7[\!"M<(<;'$<+F)"""989YEO8"Z83G3[\!"W'P>)""""""""""2EYEO8"Z"7G6[\!"M@%$,)"6O63G5!" !"M<(<;'$<+F)"""989YEO8"Z83G3[\!"W'P>)""""""""""2EYEO8"Z"7G6[\!"M@%$,)"6O63G5!"! !"W'P>)""""""""""2EYEO8"Z"7G6[\!"M@%$,)"6O63G5!"!!RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR !"M@%$,)"6O63G5!"!!RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR !"!!RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRS:T4MBU-L1I"""""""D6"WW#]^_I#C#^U^U_:I-``CT`MWK-_IV]K-VKT]_#UVKKWU#4--K""""695 !!RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRS:T4MBU-L1I"""""""D6"WW#]^_I#C#^U^U_:I-``CT`MWK-_IV]K-VKT]_#UVKKWU#4--K""""695"""""""""""""""""""""aGaaGGGGGGGaa)Gaaaaaaa)aaaaaaaaa)aaGaaa)aaaaaaG)a) !RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRS:T4MBU-L1I"""""""D6"WW#]^_I#C#^U^U_:I-``CT`MWK-_IV]K-VKT]_#UVKKWU#4--K""""695"""""""""""""""""""""aGaaGGGGGGGaa)Gaaaaaaa)aaaaaaaaa)aaGaaa)aaaaaaG)a)S:T42B-4V"""""""""68"W^#]UVbU#I4U^I::I-``CTCMWK-_IV]KSVK4]_#IVKKWU#SS-1"""""OO S:T4MBU-L1I"""""""D6"WW#]^_I#C#^U^U_:I-``CT`MWK-_IV]K-VKT]_#UVKKWU#4--K""""695"""""""""""""""""""""aGaaGGGGGGGaa)Gaaaaaaa)aaaaaaaaa)aaGaaa)aaaaaaG)a)S:T42B-4V"""""""""68"W^#]UVbU#I4U^I::I-``CTCMWK-_IV]KSVK4]_#IVKKWU#SS-1"""""OO S:T4MBU-L1I"""""""D6"WW#]^_I#C#^U^U_:I-``CT`MWK-_IV]K-VKT]_#UVKKWU#4--K""""695"""""""""""""""""""""aGaaGGGGGGGaa)Gaaaaaaa)aaaaaaaaa)aaGaaa)aaaaaaG)a)S:T42B-4V"""""""""68"W^#]UVbU#I4U^I::I-``CTCMWK-_IV]KSVK4]_#IVKKWU#SS-1"""""OOS:T4MBU-L1I""""""696"-b_U#K-TIb__U-M-#M_U4CKbbb]MWW-K--#`T`#KU`_MIICS_b""""6D5 """""""""""""""""""""aGaaGGGGGGGaa)Gaaaaaaa)aaaaaaaaa)aaGaaa)aaaaaaG)a)S:T42B-4V"""""""""68"W^#]UVbU#I4U^I::I-``CTCMWK-_IV]KSVK4]_#IVKKWU#SS-1"""""OOS:T4MBU-L1I""""""696"-b_U#K-TIb__U-M-#M_U4CKbbb]MWW-K--#`T`#KU`_MIICS_b""""6D5"""""""""""""""""""""aaaaaaaaaaaaaa)aaaaaaaaaaaaaaaaaaaaaaaaaa)aaaaa)aa S:T42B-4V"""""""""68"W^#]UVbU#I4U^I::I-``CTCMWK-_IV]KSVK4]_#IVKKWU#SS-1"""""OOS:T4MBU-L1I""""""696"-b_U#K-TIb__U-M-#M_U4CKbbb]MWW-K--#`T`#KU`_MIICS_b""""6D5"""""""""""""""""""""aaaaaaaaaaaaaa)aaaaaaaaaaaaaaaaaaaaaaaaaa)aaaaa)aaS:T42B-4V"""""""""O8"-b_U#K-TIb__U-T-#M_U4CKbbb]MWW-K--#`T`#KUC_MIIC-_b""""66O S:T4MBU-L1I""""""696"-b_U#K-TIb__U-M-#M_U4CKbbb]MWW-K--#`T`#KU`_MIICS_b""""6D5"""""""""""""""""""""aaaaaaaaaaaaaa)aaaaaaaaaaaaaaaaaaaaaaaaaa)aaaaa)aaS:T42B-4V"""""""""O8"-b_U#K-TIb__U-T-#M_U4CKbbb]MWW-K--#`T`#KUC_MIIC-_b""""66O S:T4MBU-L1I""""""696"-b_U#K-TIb__U-M-#M_U4CKbbb]MWW-K--#`T`#KU`_MIICS_b""""6D5"""""""""""""""""""""aaaaaaaaaaaaaa)aaaaaaaaaaaaaaaaaaaaaaaaaa)aaaaa)aaS:T42B-4V"""""""""O8"-b_U#K-TIb__U-T-#M_U4CKbbb]MWW-K--#`T`#KUC_MIIC-_b""""66OS:T4MBU-L1I""""""6D6"IKWU]4CTS_-IUIW_CSIII-#K#UTIS]-S`cKK_Ib#IMM]44-``4""""295 """""""""""""""""""""aaaaaaaaaaaaaa)aaaaaaaaaaaaaaaaaaaaaaaaaa)aaaaa)aaS:T42B-4V"""""""""O8"-b_U#K-TIb__U-T-#M_U4CKbbb]MWW-K--#`T`#KUC_MIIC-_b""""66OS:T4MBU-L1I""""""6D6"IKWU]4CTS_-IUIW_CSIII-#K#UTIS]-S`cKK_Ib#IMM]44-``4""""295"""""""""""""""""""""aaa))a)GGaaaaaa)aaaaaaaaa)aaGaa)aaaaaaaaaaaGGaa))a S:T42B-4V"""""""""O8"-b_U#K-TIb__U-T-#M_U4CKbbb]MWW-K--#`T`#KUC_MIIC-_b""""66OS:T4MBU-L1I""""""6D6"IKWU]4CTS_-IUIW_CSIII-#K#UTIS]-S`cKK_Ib#IMM]44-``4""""295"""""""""""""""""""""aaa))a)GGaaaaaa)aaaaaaaaa)aaGaa)aaaaaaaaaaaGGaa))aS:T42B-4V""""""""668"IKWII41I1_-IUIWbCSIII-#K#ITI_]-]`cKK_Ib#IMMW#4-CC4""""6OO S:T42B-4V""""""""668"IKWII41I1_-IUIWbCSIII-#K#ITI_]-]`cKK_Ib#IMMW#4-CC4""""6OO S:T4MBU-L1I""""""6D6"IKWU]4CTS_-IUIW_CSIII-#K#UTIS]-S`cKK_Ib#IMM]44-``4""""295"""""""""""""""""""""aaa))a)GGaaaaaa)aaaaaaaaa)aaGaa)aaaaaaaaaaaGGaa))aS1 S:T4MBU-L1I""""""6D6"IKWU]4CTS_-IUIW_CSIII-#K#UTIS]-S`cKK_Ib#IMM]44-``4""""295"""""""""""""""""""""aaa))a)GGaaaaaa)aaaaaaaaa)aaGaa)aaaaaaaaaaaGGaa))aS:T42B-4V""""""""668"IKWII41I1_-IUIWbCSIII-#K#ITI_]-]`cKK_Ib#IMMW#4-CC4""""6OOS:T4MBU-L1I""""""296"CCM`_`CKKMC`C_:KIVK#I_S^bS======`_TVVVTWVSCIIUI`""""282 """""""""""""""""""""aaa))a)GGaaaaaa)aaaaaaaaa)aaGaa)aaaaaaaaaaaGGaa))aS:T42B-4V""""""""668"IKWII41I1_-IUIWbCSIII-#K#ITI_]-]`cKK_Ib#IMMW#4-CC4""""6OOS:T4MBU-L1I""""""296"CCM`_`CKKMC`C_:KIVK#I_S^bS======`_TVVVTWVSCIIUI`""""282"""""""""""""""""""""a)aaGaaa)aaaGaaaaaaaGa)GGG""""""""Ga)aG)aGa"aGGG)G S:T42B-4V""""""""668"IKWII41I1_-IUIWbCSIII-#K#ITI_]-]`cKK_Ib#IMMW#4-CC4""""6OOS:T4MBU-L1I""""""296"CCM`_`CKKMC`C_:KIVK#I_S^bS======`_TVVVTWVSCIIUI`""""282"""""""""""""""""""""a)aaGaaa)aaaGaaaaaaaGa)GGG""""""""Ga)aG)aGa"aGGG)GS:T42B-4V""""""""6O8"C`M`1`CKCMC`M_:KIVK#C_-UITIU1^WWW`V_^VbMTMV=CWb]]M""""267 S:T4MBU-L1I""""""296"CCM`_`CKKMC`C_:KIVK#I_S^bS======`_TVVVTWVSCIIUI`""""282"""""""""""""""""""""a)aaGaaa)aaaGaaaaaaaGa)GGG""""""""Ga)aG)aGa"aGGG)GS:T42B-4V""""""""6O8"C`M`1`CKCMC`M_:KIVK#C_-UITIU1^WWW`V_^VbMTMV=CWb]]M""""267 S:T4MBU-L1I""""""296"CCM`_`CKKMC`C_:KIVK#I_S^bS======`_TVVVTWVSCIIUI`""""282"""""""""""""""""""""a)aaGaaa)aaaGaaaaaaaGa)GGG""""""""Ga)aG)aGa"aGGG)GS:T42B-4V""""""""6O8"C`M`1`CKCMC`M_:KIVK#C_-UITIU1^WWW`V_^VbMTMV=CWb]]M""""267S:T4MBU-L1I""""""289"#UCVU#__KCIVK:CCc_V_IKV`-_K4:#TSKT_:-U`1T`CUCC4CC#""""922183 """""""""""""""""""""a)aaGaaa)aaaGaaaaaaaGa)GGG""""""""Ga)aG)aGa"aGGG)GS:T42B-4V""""""""6O8"C`M`1`CKCMC`M_:KIVK#C_-UITIU1^WWW`V_^VbMTMV=CWb]]M""""267S:T4MBU-L1I""""""289"#UCVU#__KCIVK:CCc_V_IKV`-_K4:#TSKT_:-U`1T`CUCC4CC#""""922"""""""""""""""""""""GGGaaaaa))aaaaaaaa)aaGGaaaGaaa)aGGaGG))aa)aaa)aaaaS2 S3 S:T42B-4V""""""""6O8"C`M`1`CKCMC`M_:KIVK#C_-UITIU1^WWW`V_^VbMTMV=CWb]]M""""267S:T4MBU-L1I""""""289"#UCVU#__KCIVK:CCc_V_IKV`-_K4:#TSKT_:-U`1T`CUCC4CC#""""922"""""""""""""""""""""GGGaaaaa))aaaaaaaa)aaGGaaaGaaa)aGGaGG))aa)aaa)aaaaS:T42B-4V""""""""26O"VM_VU#__C`IVK:CCc_M_I_K`-__4:#MS4W__VTC1TCCUC`4CC#""""2O7 S:T4MBU-L1I""""""289"#UCVU#__KCIVK:CCc_V_IKV`-_K4:#TSKT_:-U`1T`CUCC4CC#""""922"""""""""""""""""""""GGGaaaaa))aaaaaaaa)aaGGaaaGaaa)aGGaGG))aa)aaa)aaaaS:T42B-4V""""""""26O"VM_VU#__C`IVK:CCc_M_I_K`-__4:#MS4W__VTC1TCCUC`4CC#""""2O7 S:T4MBU-L1I""""""289"#UCVU#__KCIVK:CCc_V_IKV`-_K4:#TSKT_:-U`1T`CUCC4CC#""""922"""""""""""""""""""""GGGaaaaa))aaaaaaaa)aaGGaaaGaaa)aGGaGG))aa)aaa)aaaaS:T42B-4V""""""""26O"VM_VU#__C`IVK:CCc_M_I_K`-__4:#MS4W__VTC1TCCUC`4CC#""""2O7S:T4MBU-L1I""""""929"b_CVK4V``4IIIUVKTK#S4#`M#]U=SMMT]41MK4CK-`C-K`-`_-""""986 """""""""""""""""""""GGGaaaaa))aaaaaaaa)aaGGaaaGaaa)aGGaGG))aa)aaa)aaaaS:T42B-4V""""""""26O"VM_VU#__C`IVK:CCc_M_I_K`-__4:#MS4W__VTC1TCCUC`4CC#""""2O7S:T4MBU-L1I""""""929"b_CVK4V``4IIIUVKTK#S4#`M#]U=SMMT]41MK4CK-`C-K`-`_-""""986"""""""""""""""""""""aaaaaGaG)aa)""""""""""""a)a")GGGaaaaaaaaaaaaaaaaaa226 233 236 259 S:T42B-4V""""""""26O"VM_VU#__C`IVK:CCc_M_I_K`-__4:#MS4W__VTC1TCCUC`4CC#""""2O7S:T4MBU-L1I""""""929"b_CVK4V``4IIIUVKTK#S4#`M#]U=SMMT]41MK4CK-`C-K`-`_-""""986"""""""""""""""""""""aaaaaGaG)aa)""""""""""""a)a")GGGaaaaaaaaaaaaaaaaaaS:T42B-4V""""""""2OO"b_CVKWVIK4IS======#IU4]]W]]41MK4CK-`C-K`-`_-""""959S4 S:T4MBU-L1I""""""929"b_CVK4V``4IIIUVKTK#S4#`M#]U=SMMT]41MK4CK-`C-K`-`_-""""986"""""""""""""""""""""aaaaaGaG)aa)""""""""""""a)a")GGGaaaaaaaaaaaaaaaaaaS:T42B-4V""""""""2OO"b_CVKWVIK4IS======#IU4]]W]]41MK4CK-`C-K`-`_-""""959 S:T4MBU-L1I""""""929"b_CVK4V``4IIIUVKTK#S4#`M#]U=SMMT]41MK4CK-`C-K`-`_-""""986"""""""""""""""""""""aaaaaGaG)aa)""""""""""""a)a")GGGaaaaaaaaaaaaaaaaaaS:T42B-4V""""""""2OO"b_CVKWVIK4IS======#IU4]]W]]41MK4CK-`C-K`-`_-""""959S:T4MBU-L1I""""""982"C_SKM-^MSWK]CKW-VKS4M1-IKWKKC__K_CW``K_MM4`b_4I4WM""""E26 """""""""""""""""""""aaaaaGaG)aa)""""""""""""a)a")GGGaaaaaaaaaaaaaaaaaaS:T42B-4V""""""""2OO"b_CVKWVIK4IS======#IU4]]W]]41MK4CK-`C-K`-`_-""""959S:T4MBU-L1I""""""982"C_SKM-^MSWK]CKW-VKS4M1-IKWKKC__K_CW``K_MM4`b_4I4WM""""E26"""""""""""""""""""""aaaaaaaaaaaaaaa)aaaaaaaaaaaaaaaaaaaa)aaaaaaaaaaaGG S:T42B-4V""""""""2OO"b_CVKWVIK4IS======#IU4]]W]]41MK4CK-`C-K`-`_-""""959S:T4MBU-L1I""""""982"C_SKM-^MSWK]CKW-VKS4M1-IKWKKC__K_CW``K_MM4`b_4I4WM""""E26"""""""""""""""""""""aaaaaaaaaaaaaaa)aaaaaaaaaaaaaaaaaaaa)aaaaaaaaaaaGGS:T42B-4V""""""""95E"C_SKM-^MSWK]CKW]VKS4M1-IKWKKC__K_CW`CK_MM4`b_4I4UI""""979R1/294 R2 R3 R4 S:T4MBU-L1I""""""982"C_SKM-^MSWK]CKW-VKS4M1-IKWKKC__K_CW``K_MM4`b_4I4WM""""E26"""""""""""""""""""""aaaaaaaaaaaaaaa)aaaaaaaaaaaaaaaaaaaa)aaaaaaaaaaaGGS:T42B-4V""""""""95E"C_SKM-^MSWK]CKW]VKS4M1-IKWKKC__K_CW`CK_MM4`b_4I4UI""""979Linker S5 S:T4MBU-L1I""""""982"C_SKM-^MSWK]CKW-VKS4M1-IKWKKC__K_CW``K_MM4`b_4I4WM""""E26"""""""""""""""""""""aaaaaaaaaaaaaaa)aaaaaaaaaaaaaaaaaaaa)aaaaaaaaaaaGGS:T42B-4V""""""""95E"C_SKM-^MSWK]CKW]VKS4M1-IKWKKC__K_CW`CK_MM4`b_4I4UI""""979 """""""""""""""""""""aaaaaaaaaaaaaaa)aaaaaaaaaaaaaaaaaaaa)aaaaaaaaaaaGGS:T42B-4V""""""""95E"C_SKM-^MSWK]CKW]VKS4M1-IKWKKC__K_CW`CK_MM4`b_4I4UI""""979 S:T42B-4V""""""""95E"C_SKM-^MSWK]CKW]VKS4M1-IKWKKC__K_CW`CK_MM4`b_4I4UI""""979 K5 R6 Filter Pore helix S6 !"#$%&'()*+,,,,,,-..,+#%//!%01'$/22$334*443565'*4135325!035%7"$0$53740$,,,,-89 !"#$%&'()*+,,,,,,-..,+#%//!%01'$/22$334*443565'*4135325!035%7"$0$53740$,,,,-89,,,,,,,,,,,,,,,,,,,,,:;<:<:<<<<<<<<<<<;<<<<<<<<<:<::;:<<<<<<<<<<<<<<<<< ,,,,,,,,,,,,,,,,,,,,,:;<:<:<<<<<<<<<<<;<<<<<<<<<:<::;:<<<<<<<<<<<<<<<<-,('%?/1%01'$/22$33%*443565'*3144055!035%7"$0$53740$,,,,-@= !"#$.&($4,,,,,,,,=>-,('%?/1%01'$/22$33%*443565'*3144055!035%7"$0$53740$,,,,-@= !"#$%&'()*+,,,,,,-8.,7131303%#/#6/6A(+4'?++*?%?#/#A34%"16715475?A*!!%%7,,,,>.9 !"#$%&'()*+,,,,,,-8.,7131303%#/#6/6A(+4'?++*?%?#/#A34%"16715475?A*!!%%7,,,,>.9,,,,,,,,,,,,,,,,,,,,,<<<<<<<<<<<<<<<<<<;:<<,<;<,;::<<<<<:;<:;::::::;<;: ,,,,,,,,,,,,,,,,,,,,,<<<<<<<<<<<<<<<<<<;:<<,<;<,;::<<<<<:;<:;::::::;<;:!"#$.&($4,,,,,,,,-@-,7131303%#/#6/6A(+4+5++B?$?B67?34%"1!01%%1'7!!%(%$%,,,,->9 !"#$.&($4,,,,,,,,-@-,7131303%#/#6/6A(+4+5++B?$?B67?34%"1!01%%1'7!!%(%$%,,,,->9 !"#$%&'()*+,,,,,,>..,%+%%%'**'7''53+%4,,,,>=C Figure!"#$%&'()*+,,,,,,>..,%+%%%'**'7''53+%4,,,,>=C,,,,,,,,,,,,,,,,,,,,,;:<:<<:<;;:;<<:;; 1.4: Alignment of Shaker and K 1.2 sequences. Sequence alignment of ,,,,,,,,,,,,,,,,,,,,,;:<:<<:<;;:;<<:;;!"#$.&($4,,,,,,,,->.,40%!%'6*+0?+53##%,,,,-DCV the!"#$.&($4,,,,,,,,->.,40%!%'6*+0?+53##%,,,,-DC transmembrane section of the Shaker gene KCNAS from Drosophila melanogaster and the KV 1.2 gene KCNA2 from Rattus norvegicus (common rat) as given by the EBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB EMBOSSEBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB implementation of the Smith-Waterman algorithm [32] with default pa- rameters.EBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB Key residues are highlighted (KV 1.2 enumeration) in red, blue and gray for negative, positive and neutral, respectively. The characteristic sequence signature in the selectivity filter of the potassium channels is shown in yellow. Secondary structure is indicated above the alignment in gray; S1-S6 transmembrane helices, the interfacial S4-S5 helix linker, the re-entrant pore helix and the selectivity filter.

gated ion channels have been found, functioning as proton pores [29, 30] or regulators of phosphatase activity [31]. This is a very important and fredag den 7 oktober 2011 beautiful result since, in an evolutionary perspective, one could think of the VSD as a distinct functional entity that evolved by itself and later on was coupled to other functional domains, such as ion channels and phosphatases.

19 SF

S1 S2 S3 S4 S5 PH S6 + + +

Linker VSD PD Figure 1.5: Structure of the voltage-gated . The topology of one single subunit of a voltage-gated potassium channel is schematically drawn (left). Extracellular side is up. The four helices in green correspond to the voltage- sensing domain (VSD) while the darker-green part forms the pore domain (PD) with its preceding interfacial S4-S5 linker helix, re-entrant pore helix (PH) and selectivity filter (SF). In the S4 helix, there are several positively charged amino acids that are affected by changes in the electric potential across the membrane. Side and top views of the transmembrane part of the crystal structure of the open KV 1.2/2.1 protein [27] (middle, right). The entire PD is shown but for clarity only the VSD from one subunit. The S6 helices from the four subunits form the surface of a water-filled channel that is accessible to the ions once they are in the open conformation. At the extracellular end of the channel the selectivity filter narrows the channel for efficient ion filtering.

måndag den 26 september 2011 1.2.1 Voltage-dependent gating The four-helix bundle of the VSD has a shape reminiscent of an hour- glass, with helical ends that splay outwards at the intracellular and extra- cellular sides, forming two water-filled cavities and a central constricted region. The protein surfaces lining the two water-filled cavities are rel- atively polar whereas the environment around the domain center is hy- drophobic. Most of the polarity comes from several charged and evolu- tionary conserved residues, in particular arginines, in the S4 helix. These positive charges are located at every third position of the helix making it a rather peculiar transmembrane helix. Generally, it is very unfavor- able to have charges buried into the non-polar membrane bilayer core and this would hence be evolutionarily selected against. However, both biophysical experiments [33] and computer simulations [34] have shown that the S4 helix can be stable in isolation in the membrane bilayer al- though other studies report a considerably higher energy cost [35, 36]. In any case, this helix has been conserved throughout evolution because it has a very special purpose, namely to function as a voltage-sensor; an electromechanical converter transforming the energy of the change in the transmembrane electrical potential into a mechanical response of the protein that controls the opening and closing of the ion channel gate. Charges are affected by electrical fields, and so are the positively charged arginine residues in the S4 helix of the VSD. When the cell

20 Hyperpolarization

Depolarization

[O] [C1] [C2] [C3]

Membrane core Extracell. neg. cluster VSD hydrophobic region Intracell. neg. cluster

Figure 1.6: Model of gating. Side view of a voltage-gated ion channel, where only one voltage-sensing domain (VSD) is shown. Extracellular side is up. In the bottom panel, a cartoon of the model of the S4 gating motion is shown. In the open activated state of the channel (left), the S4 helix (in cyan ribbon representation) is located towards the extracellular side due to the depolarization of the membrane. All four argininemåndag den 10 oktober residues 2011 (blue spheres) on the extracellular half of the helix are located above the central hydrophobic region of the VSD. Upon gating, the S4 helix moves towards the intracellular side, due to repolarization of the membrane. This is a step- by-step process involving several meta-stable states (here represented by the C1 and C2 states in analogy with the terminology used in paper III), where the arginine residues change their interaction partners along the way (as depicted in the model cartoon and described more in the text). At the most closed state C3 (right), the S4 helix has moved approximately 10 Å vertically towards the intracellular side, and all arginine residues except the most extracellular one have passed the hydrophobic region of the VSD. The open state is the homology model of Shaker from paper III and the closed model is based on the most closed state, C3, in paper III and the PD from [41].

is at rest and the ion channel is closed, the transmembrane potential is approximately -70 mV (negative at the intracellular side). Consequently, the voltage-sensor S4 helix is attracted and positioned towards the in- tracellular side of the membrane. Once the depolarization occurs, the transmembrane potential reverses, causing the helix to move toward the extracellular side (Figure 1.6). This movement of arginine charges upon gating is producing a gating current that is readily detected experimen- tally [37, 38]. At least three unit charges per VSD have to be translated

21 through the electric field of the membrane to open or close the channel [39, 40]. The free energy difference between the open and closed states is thought to be fairly small, but they are thermally separated by a large barrier, which would explain their slow activation within a few milliseconds. Once the random thermal motion of the system overcomes the barrier the transition is considerably faster, perhaps as rapid as ten microseconds [42]. The barrier has been proposed to be due to the hydrophobic region in the VSD core, centered around the evolutionary conserved F233 in Figure 1.4, through which charged amino acids from the voltage-sensor have to pass to open and close the channel [43, 27, 44]. The gating charges’ movement is catalyzed by sequential ion pairing with negatively charged residues in two clusters on neighboring helices, one cluster located at the extracellular side of the hydrophobic region (corresponding to E183 and E226 in Figure 1.4) and the other on the intracellular side (E236 and D259) [38, 45]. Since experimental structures of the intermediate and the closed states are missing, the atomic details of the gating movement of the voltage-sensor and the subsequent conformational changes leading to opening and closing of the gate is still not understood properly. Probing this gating pathway by computer simulations and modeling we have tried to shed light on the understanding of this mechanism on an atomic level.

1.2.2 Slow inactivation A striking and somewhat unexpected feature of the voltage-gated chan- nels is that the ion current is transient when depolarization is maintained. The current decays with a time constant of a few ms. This means that the protein can adopt at least three distinct conformations; a closed state, where the channel is non-conducting, which is the predominant resting state at hyperpolarized membrane potentials; an active conducting state occupied transiently during depolarization; and a non-conducting relaxed state reached at prolonged depolarization [46]. Since crystal growth oc- curs under depolarized conditions it has been assumed that the state of the proteins reflect this relaxed state even though the central pore is open [47].

1.2.3 The role of 310-helices

There is a general belief in the protein structural community that 310- helices are relatively rare and short in protein structures. The idea has its origins in the classical studies of the helical conformations of polypeptides by Pauling, Corey and Branson [48]. The conformation is energetically

22 Published November 29, 2010

proteins, it has been shown that there is a maximization Data Bank accession nos. and residues in helices are of the interaction surface between helices (Bowie, listed in the legend of Fig. 2). We found two modes of 1997). In an A helix, the 3.6 residues per turn positions packing. Mode A was observed in eight cases. In this side chains every 100° around the helical axis, stagger- mode, a side-chain ridge lies in a groove formed by ing the side chains and offering an extensive set of other regions of the protein (example in Fig. 2 A). modes of interaction (Fig. 1). This arrangement there- Mode B was present in four helices. In this arrange- fore imposes few structural restrictions on the inter- ment, one or two side chains from another region of the acting partners and increases the probability of the protein pack against the 310 helix face formed between stabilization of long A helices in a protein. In con- the side-chain ridges (example in Fig. 2 B). A single trast, as a result of the Y3.2 residues per helical turn in case was observed where the helix is completely encased 310 helices, side chains are disposed in a more restrictive by the rest of the protein; this helix has a very speci!c fashion; they form ridges along the helix (Fig. 1). Intui- sequence with four glycines and one proline out of tively, this disposition imposes speci!c structural re- eight residues. More quantitatively, the different modes quirements for the packing of long 310 helices with are distinguishable by the fraction of solvent-exposed other protein regions. In an effort to verify this idea, we surface area, such that helices lying in a groove (mode A) performed a visual inspection of long 310 helices con- tend to be less solvent exposed than in the other pack- taining at least two turns (seven or more residues) in ing arrangement (mode B; Fig. 2 C). Overall, this simple protein structures listed in three different papers (Peters analysis supports the idea that packing of long 310 heli- Downloaded from et al., 1996; Pal and Basu, 1999; Enkhbayar et al., 2006). ces has speci!c requirements. It also highlights the These reports only name a small fraction of the struc- need for more extensive studies of the structural con- tures included in their studies. As a result, the !nal pool text in which 310 helices are found. We suggest that long contained 12 proteins for a total of 13 helices (Protein 310 helices are rarely observed in protein structures, not just because they might be intrinsically less stable than A helices but also because they have packing require-

ments that are rarely met over long extensions, result- jgp.rupress.org ing in the unraveling of the 310 conformation or in the variability of the helical stereochemical parameters de- tected in many studies. Another important characteristic of 310 helices is dy- namics. This property has been well documented in iso- lated peptides, which are important model systems for on August 1, 2011 the study of helices. In the case of 310 helices, one of the best-studied systems involves peptides rich in A methyl- alanine, a non-natural amino acid with an extra methyl group on the A carbon (Karle and Balaram, 1990; Karle et al., 1994; Crisma et al., 2006; Bellanda et al., 2007). Because of the stereochemical constraints imposed by the presence of two large substituents in the CA of the amino acid, these peptides have a strong tendency to adopt a helical conformation, 310 or A helical. In these systems, it has been demonstrated that a peptide can adopt both conformations and that it is possible to shift the equilibrium between one conformation and the other by altering the polarity of the solvent or tempera- ture. In some cases, these two conformations appear to coexist in the same peptide so that one region adopts a 310-helical conformation and another region adopts an A-helical conformation. A few similar observations have been documented for peptides formed by natural amino Figure 1.7: The structure of α- and 3 -helices. In the idealized α-helix (left), 10 acids; the two conformations appear to be present in residue i is interacting with residue i+4 forming a helical structure having 3.6 residues måndag den 10 oktober 2011 ◦ different segments of the peptide, and this distribution per turn so thatFigure consecutive 1. Canonical residues helical make conformations. an angle Side of views 100 andaround views theis helicalaltered axis.by small sequence changes (Fiori et al., 1994; The helical pitchalong is 5.4 the Å andaxes theof a risehelix alongin a canonical the helix A-helical axis isconformation 1.5 Å per residueDike (5 and.4/ 3 Cowsik,.6). 2006; Mikhonin and Asher, 2006). (left) and canonical 3 conformation (right). Dotted lines in- In the 310-helix however (right), residue10 i interacts with residue i + 3Moleculargiving three dynamics studies of helices in peptides have dicate hydrogen bonds between atoms in the backbone◦ of poly- (aligned) residuespeptide. per turn Helices of thewere helix generated and using an angle PepBuild. of 120 All !guresbetween were them.also Theproposed name that 310 and A helices coexist along the comes from the observationprepared with PYMOL. that there are ten atoms in the ring formedsame by peptide making and that interconversion between the two the hydrogen bond. The pitch is consequently greater at around 6 Å, and the rise 586 310 helices in membrane proteins per residue is 1.93-2 Å. A more intuitive description; the 310-helix is a tighter-wound α-helix, being longer (given the same number of residues) and more narrow. Figure adapted from [52].

unfavorable and the cost of transforming a decapeptide in a protein-like environment from α- to 310-helix has been estimated to 24 kJ/mol, i.e. 2.4 kJ/mol per residue [49]. Despite this, the wealth of protein structures now available has shown that 310-helices in proteins are quite common. Although they are most often only 1-2 turns long a significant number of longer 310-helices (>7 residues) has been found [50, 51]. However, they seem to differ from the idealized conformation (depicted in Figure 1.7) by having an average of 3.2 residues per turn instead of three, and it has been noticed that they have a wide variability in main-chain torsion an- gle distribution and this irregularity increases rapidly with helix length.

23 Another important point in explaining the difference between the two helix types is the way helices are packed against each other. The peri- odic alignment of the residues in the 310-helix imposes more constraints in helix packing, resulting in the destabilization of the 310 conformation [52]. Long 310-helices have been found in the recent crystal structure of the KV 1.2/2.1 chimera channel, confirming the early suggestions of the presence of a 310 conformation in the S4 helix [53, 54]. Moreover, re- cent experimental and computational studies support the idea that a significant part of the voltage-sensor S4 helix is in a 310 conformation [46, 55, 56]. This is also what we found in papers I-III. However, it is unclear how the helix behaves during gating. Our studies suggest that it undergoes a conformational change such that there is a stationary 310 region, located around the phenylalanine residue in the VSD core, sliding along the helix during its translation. The 310 conformation of S4 offers an apparently simple explanation for some of the issues previously raised by the properties of the voltage-sensor helix. First, it would explain the conserved sequence pattern of positively charged residues at every third position, because the 310 conformation places them on the same helical face, promoting interactions with the polar environment in the VSD core and hence shield the charges from unfavorable exposure to the non-polar membrane interior. Second, the conformation would limit the necessary S4 rotation when translated along the gating pathway.

24 2. Molecular modeling

Simplify as much as possible, but no further.

Albert Einstein

A protein’s function is to a large extent determined by its structure. Numerous methods are available to deduce structural information of biomolecules, but in practice only two give the complete high-resolution information of the relative positions of all atoms in a protein, namely nuclear magnetic resonance (NMR) spectroscopy and X-ray crystallog- raphy. Whereas the structures of water-soluble proteins are often rela- tively easy to determine, membrane proteins have for a long time eluded structural biologist. Their structural determination has proven very hard and expensive. Consequently, the Protein Data Bank (PDB), where all biomolecular structures are deposited, only consist of a couple of hundred membrane protein structures, in contrast to tens of thousands structures of water-soluble proteins. Even when a structure of a particular protein is not known there is a fair chance of successful creation of a relevant model of it based on known structures using computational molecular modeling techniques. This is because they are evolutionary related and proteins with similar function can often safely be assumed to have similar structures. Once a protein structure has been obtained, either by experiments or modeling, one would typically like to study its function. This too can be done using molecular modeling methods. For example, molecular docking protocols try to estimate the binding free energy of a ligand to the protein by evaluating the energies of different ligand conformations in different positions relative the protein. Another common application is the elucidation of reaction mechanisms in the active sites of enzymes by quantum mechanical methods. In both these applications, the protein structure is typically fixed or at least restricted to only a few degrees of freedom. This can be a rather severe limitation in some cases because the protein’s flexibility could be very important for the binding of the ligand. To study the dynamics in the systems one has to turn to simulation techniques that sample the conformational flexibility of the molecules.

25 Methods of modeling and simulation of molecular systems can be crudely divided into two distinct classes based on how the interactions of the atoms are described. In the first class, usually called quantum chemistry or simply quantum mechanics (QM), electrons are treated ex- plicitly and the energy of the system is calculated using the Schrödinger equation. In the other class, called empirical force field methods or molec- ular mechanics (MM), only the positions of the atom nuclei are consid- ered and they move and interact according to classical physics. Quan- tum mechanical methods rely on the Born-Oppenheimer approximation (as does molecular mechanics) stating that the higher velocities of the electrons relative the nuclei allows these two motions to be treated inde- pendently and the underlying Schrödinger equation can take a simpler form. The explicit treatment of electrons in these methods allow for the study of chemical processes of permanent changes in electronic struc- ture, e.g. chemical reactions, at the price of an increase in computational cost. Even though both computer software and hardware have developed enormously in recent decades, quantum chemistry is still in practice too expensive for studying biological macromolecules on relevant timescales. Additionally, the treatment of explicit water is troublesome in quan- tum chemistry methods and they are typically not able to capture the dynamics of biologically relevant temperatures. Treating the molecular system classically using molecular mechanics methods rectifies some of these shortcomings and allow increased simulation timescales of several orders of magnitude. Although these methods generally can not be used to study chemical reactions there is still a wealth of interesting systems where they can be used.

2.1 Molecular dynamics Molecular dynamics (MD) is a computer simulation of the physical move- ments of atoms, or particles in general. The atoms are allowed to interact during a period of time and their positions are calculated by solving New- ton’s equations of motion numerically. The method assumes a classical description of the system where the atoms are point charges that in- teract through a molecular mechanics interaction potential. The partial charges on the nuclei are assigned depending on their chemical envi- ronment. Hence, this method can in principle not be used in the study of processes where changes in the electronic structure matters although hybrid QM/MM methods have been developed trying to overcome this limitation. On the other hand, the benefit of molecular dynamics is that it can simulate large molecular systems on biologically relevant timescales. Molecular dynamics simulations generate a sequence of states, a trajec-

26 tory, of the molecular system. The states are time-dependent and corre- spond to a possible trajectory of motion of the particles. The energies of the states in the trajectory generated by solving Newton’s equations follow the thermodynamic equilibrium distribution (the Boltzmann dis- tribution, see section 2.1.3). Another way of exploring conformational space is by Monte Carlo (MC) sampling. Here, new conformations are generated randomly and by evaluating the energy of the new state with the underlying MM potential the random move is either accepted or re- jected in a way such that the set of accepted conformations also follow the equilibrium distribution. To start a MD or MC simulation three things are needed: 1. A starting structure 2. An interaction potential 3. An algorithm for generating new states The first requirement will be dealt with in section 2.2 and the second and third ones will be described in more detail in the following sec- tions. The simulation package used throughout the work in this thesis is GROMACS [57]. Other commonly used packages are CHARMM [58] and NAMD [59]. Another important point regarding the simulation of molecular sys- tems is that it consists of a finite number of particles and hence there is a boundary between the outer-most located particles of the simulation unit cell (often also called the simulation box) and the surroundings. This is usually coped with by using periodic boundary conditions. One could think of this as cloning the entire system in all directions and letting particles on the boundary of the unit cell interact with particles from the neighboring clone(s). Also, particles moving out of the box will, in the next step, appear on the opposite side of the system. This is a neat trick to circumvent the finite size effects of the simulations but one has to make sure the simulation box is large enough to prevent particles from interacting with their periodic copies because such interactions would propagate throughout the collection of systems giving severe artifacts.

2.1.1 Basic molecular dynamics theory In MD simulations, the time evolution of the positions of the N atoms of the system is calculated by solving Newton’s second law of motion stating that the acceleration of atom i times its mass equals the net force exerted on it by the remainder of the system:

d2~r F~ = m · ~a = m · i . (2.1) i i i i dt2

27 The force on atom i is, in turn, calculated by differentiating the total potential energy function V (see next section) with respect to the atom’s position

F~i = −∇iV (~r1, ..., ~rN ). (2.2)

Given the position of atom i at a specific time ~ri(t), the position after a short finite time interval, ~ri(t+∆t), is given by a standard Taylor series expansion around ~ri(t):

d~r (t) d2~r (t) ∆t2 ~r (t + ∆t) = ~r (t) + i ∆t + i + ... (2.3) i i dt dt2 2 Numerous algorithms exist for integrating Eq. 2.1 and many are fi- nite difference methods based on the Taylor series expansion above. The simple Verlet integration algorithm [60] is deduced by adding the Taylor expansion above with the corresponding one for ~ri(t−∆t) and truncating the expansion after the third order, yielding ~ Fi(t) 2 ~ri(t + ∆t) = 2~ri(t) − ~ri(t − ∆t) + ∆t . (2.4) mi Note that the expression for the second derivative of the position with respect to time in Eq. 2.3 is the acceleration and hence can be substi- tuted with the force and the mass of the particle according to Eq. 2.1. The procedure is iterated and produces the trajectory of the motions of all atoms in the system. Other popular, and related, algorithms are for example the leapfrog (default in GROMACS) and the Velocity-Verlet integrators [61, 62]. All these methods are time reversible, meaning that inversing the velocities of all atoms would run the simulation in exactly the opposite direction. The computational expense of any particular in- tegration scheme is insignificant compared to the cost of calculating the forces acting within the system but the length of the time step, ∆t, is on the other had very important. The maximum time step length de- pends on the integration method but it is has to be significantly smaller than the period of the fastest movements. The fastest modes of motion in atomistic systems is bond vibration involving hydrogens, limiting the time step to 0.5 − 2 · 10−15 s (femtoseconds, or fs). This small time step is one of the main drawbacks of MD simulations since it limits the available timescale.

2.1.2 Force fields The underlying potential energy function describing the interactions be- tween atoms (or particles in general) is usually called a force field. The fundamental ideas behind it were conceived in the late sixties by Shneior

28 Lifson, Arieh Warshel and Michael Levitt. It is a mathematical function relating the total potential energy of the system to the positions of its particles. In this sense the name is somewhat misleading because it is an expression of the energy, not the force. However, since differentiation of the energy function gives the force and it is the force that governs the particles’ motion (see previous section), the name is understandable. The potential consists of several terms of elementary mathematical functions each describing an intra- or intermolecular interaction. There are numer- ous force fields used in the molecular simulation community but most of them have the same basic structure. They typically contain three or four terms describing the bonded interactions, mediated by covalent bonds between atoms, and two terms for non-bonded interactions between all pairs of atoms. Since the number of pairs in a system is quadratic in the number of particles, the way in which the non-bonded interactions are calculated is critical for the computational cost of the force calculation. Next, a walk-through of the most common terms is presented. The potential energy of covalent bond vibration is typically modeled by a harmonic oscillator, analogous to a spring connecting the two atoms:

2 Ebond = Kb(b − b0) . (2.5) The energy is dependent on the difference between the distance be- tween the atom pair in the bond, b, and the reference bond length, b0. The total bonded energy is then a sum over all bonded pairs. The Kb parameter defines the stiffness of the bond. A more realistic model of the covalent bond is the Morse potential, an asymmetric function giv- ing a zero force at infinite distance, not infinite as is the case for the harmonic function. The bond vibrations involving light atoms are the fastest modes of motion and they are therefor the limiting factor of the time step length in the integration scheme. Constraining bonds to their equilibrium bond lengths allows for time steps of ∼2 fs and the rational for doing this is that it is in fact a better approximation to a quantum mechanical ground-state oscillator than the classical harmonic oscillator. Constraining bonds in this way is common in MD simulations today and has been used in all simulations in this work. Angle bending is also represented by a harmonic potential

2 Eangle = Kθ(θ − θ0) (2.6) where θ is the angle between covalently bonded atoms A-B-C. To bet- ter optimize the fit to vibrational spectra two additional terms can be added, the Urey-Bradley function for the angle bending of certain triplets and an improper dihedral potential term for preserving the planarity of planar groups. These two terms are typical for the CHARMM force field [63] for example. In analogy to the constraining of bonds to their equilib-

29 rium bond lengths, the next-fastest degree of motion is angle-vibrations involving hydrogens and removing those vibrations would motivate inte- gration time steps of 4-5 fs. This can be done by introducing so called virtual interaction sites, replacing the fastest-moving atoms. In contrast to standard atoms, the positions of virtual interaction sites are calcu- lated in a predefined manner (based on the geometry of the groups as defined by the force field) from neighboring heavy atoms. This is a fea- ture implemented in GROMACS and has been studied in paper V and used in all papers except papers II and VI. The energy of torsional rotation around the middle bond of four con- secutive atoms, A-B-C-D, is modeled by a periodic (proper) dihedral function

Etorsion = Kω(1 + cos(nω + δ)) (2.7) where the rotational angle, ω, by convention is measured with respect to the cis conformation (A and D on the same side of the bond). The energy maximum is set by Kω and the number of minima is represented by the n parameter, and δ determines their location. For more compli- cated functional forms one could either sum several such functions for the same quadruple of atoms or express it as an expansion in powers of cos ω, the so called Ryckaert-Bellemans potential. The treatment of the torsional rotation around bonds in the pro- tein backbone is crucial for the correct sampling of its conformations. Rather recently, an extension to the CHARMM force field was pub- lished [64], trying to improve this property by adding a correction term, called CMAP (for correction map), to the protein backbone torsions. The CMAP term is a function of two neighboring dihedral quadruples (the Ramachandran dihedral pair, φ and ψ, of each residue) and is ex- pressed as a two-dimensional grid-based energy that is added to the dihedral energy of the pair. The authors found that the inclusion of the CMAP term gave a better conformational sampling of the protein when comparing to crystallographic data. In paper V, the implementation of CHARMM in GROMACS study the behavior of this correction term on longer timescales. So far all interactions have been between particles connected by cova- lent bonds. Now, let us turn to the non-bonded interactions. The van der Waals interaction is a balance between two forces, the nuclei-nuclei repul- sion at short distances and the attraction of atoms at moderate distances due to the dispersive force (London force). Typically this interaction is represented by the Lennard-Jones potential:

" 12  6# Rmin Rmin EvdW =  − 2 . (2.8) rij rij

30 An exponential function would represent the repulsion better but the difference is small and the choice is motivated by the greater speed of −12 −6 the calculation of rij from rij . The energy minimum is given by  and is located at distance Rmin between the atoms. Instead of having one parameter pair for each pair of atom types in the force field, the  and Rmin parameters for a pair are calculated by averaging the individual atom’s respective parameter value. The total van der Waals energy of the system is a sum over all pairs separated by more than two bonds since the corresponding interaction between close neighbors already has been accounted for by the bonded interactions. The van der Waals interactions between pairs separated by exactly three bonds are usually too large as well, because of its partial treatment in the torsion term, and have to be scaled down. Since the interaction decays fast at longer distances, in practice only the pairs within a certain cut-off, of approximately 1 nm, are considered. The electrostatic interaction between two charged particles is given by the Coulomb equation:

qiqj ECoulomb = (2.9) 4π0rrij

where qi and qj are the charges, 0 the vacuum permittivity and r is the relative permittivity of the medium. Since the electrostatic interac- tion decays slowly its calculation is pivotal for the computational load of the simulation. The simplest approach to reduce the cost, is to truncate the interaction after a certain cut-off distance as for the van der Waals interaction. Although well-motivated in that case, here the energy is still considerable for cut-offs that make a difference, leading to discontinu- ous forces at the boundary. One could adjust the potential entirely to circumvent this but nowadays the best practice is to use more sophisti- cated approaches, such as the particle-mesh Ewald (PME) method [65]. Briefly, it adds a neutralizing potential to the point charges in the sys- tem so that this new sum of electrostatic interactions converges fast. It is calculated in real space up to a cut-off of around 1 nm from the atom of interest, while the contribution of a counteracting potential, opposite to the neutralizing one, is solved on a grid in reciprocal space by the fast Fourier method. In principle, the PME algorithm account for all electro- static interactions, also between particles in the periodic copies of the simulated system. The total potential energy of the system with respect to the positions of the N atoms in it, V (~rN ), is now given by summing up the individual sums of all the above-mentioned terms. However, how all the parameters in the model are set hast hitherto been left out. In principle, all parame- ters in the expression for the potential energy (Kb, b0,Kθ, θ, Kω, n, δ,  and Rmin) must be specified for all different atom types one would like to

31 be able to study. Since the elements may appear in a variety of chem- ical environments, giving them different partial charges, the number of atoms types are quite large and consequently, the number of parameters needed to be set is even greater. The parameterization of a force-field is the process of fitting numerical values for all these parameters, typi- cally by comparing energies to quantum chemical results and carefully measured experimental data. The functional forms together with all the parameters in the model define the force field. Even though one might get the impression that this is a huge simplification, simulations based on MM-type force fields have proven to be exceptionally good in many cases, see e.g. review articles [66, 67].

2.1.3 Statistical mechanics When trying to model and study real macroscopical systems and their properties it does not make sense, conceptually, to simply consider one single molecule of the substance of interest since, in reality, the behavior depends on a enormous number of such molecules. On the other hand, assuming the force field is able to capture the physics of the system satis- factorily, MD simulations give the complete information of the system in atomic detail for the simulated period of time, something which in gen- eral is not accessible to conventional molecular biology experiments. In order to investigate how a microscopical system in a simulation behaves on the macroscopic scale and hence correlate the simulation results to experimentally measurable thermodynamic properties, one has to turn to statistical mechanics. The experimentally measured value of an arbi- trary property, denoted A here, of a system with N particles is a time average over the time-course of the measurement:

Z τ 1 N N Aave = lim A(~p (t), ~r (t))dt. (2.10) τ→∞ τ t=0

The integrand A(~pN (t), ~rN (t)) is the instantaneous value of A at time t, depending on the momenta ~p, and position ~r of all atoms. To calcu- late such a macroscopical time average in a simulation, a system of the order of 1023 particles has to be followed for a time-period comparable to the experiment. This is of course an impossible task. Boltzmann and Gibbs recognized this problem and developed the theory of statistical mechanics, in which a single system evolving in time is replaced by a large number of replications of the system, called an ensemble, that are considered simultaneously. The ergodic hypotesis is one of the axioms of statistical mechanics and it states that the time average is equal to the ensemble average:

32 ZZ hAi = A(~pN , ~rN )ρ(~pN , ~rN ) d~pN d~rN . (2.11)

Note that the time-dependence has disappeared and the ensemble av- erage is an average over all replications of the system generated by the simulation. Each member of the ensemble has an energy, and for systems of constant number of particles N, volume V and temperature T , i.e. the canonical NVT ensemble, the distribution of systems in the ensemble, ρ(~pN , ~rN ), follow the famous Boltzmann distribution:

−E(~pN , ~rN ) ρ(~pN , ~rN ) = exp /Q. (2.12) kBT The exponent E is the total energy, i.e. both the kinetic contribution, K(~pN ), due to the momentum of the particles, and the contribution from interactions between particles described by the potential energy function N V (~r ). kB is the Boltzmann constant and Q is the canonical partition function:

ZZ  N N  1 1 −E(~p , ~r ) N N QNVT = 3N exp d~p d~r . (2.13) N! h kBT N! comes from the indistinguishability of the particles and 1/h3N is a normalizing factor. The partition function carries all thermodynamic information about a system, and it is hence possible to calculate all macroscopic properties either directly from it or from its derivative. Since the simulation method sample conformations from the Boltzmann dis- tribution, the partition function never has to be calculated explicitly. In fact, to calculate it in a MD simulation one would need to sam- ple all possible conformations of the system and sum up the values of N N exp(−E(~p , ~r )/kBT ). This number of conformations is in practice infi- nite. However, for simple systems the partition function can be calculated from quantum chemistry. This is actually quite a beautiful result, since statistical mechanics hence provide us with the awestruck tool to tie together quantum chemistry, developed during the 20th century, with thermodynamics that was described centuries earlier. The molecular dynamics method and the as-famous Monte Carlo method, generate system configurations from the Boltzmann distribution and the ensemble average of the property at study can therefor be calculated by numerical integration of Eq. 2.11. Briefly, the difference between the two methods is that MC performs a random walk through conformational space and accepts new confor- mations in a particular way so that the acquired ensemble of structures are Boltzmann-distributed. The random nature of the sampling has the consequence that consecutive states have no true connection in time. The

33 MC technique is in practice a more effective sampling strategy, since it does not require the calculation of forces nor the numerical integration of the equations of motion, but the problem is that there is no general algorithm for efficient generation of new test conformations. Molecular dynamics on the other hand, guarantees sampling from the Boltzmann distribution for arbitrarily complex systems and, additionally, generates a trajectory of conformations reflecting realistic dynamics of the system.

2.1.4 Modeling the membrane potential in simulations As described previously in Section 1.1.3, the membrane potential in living cells is established by an unequal distribution of ions between the water solutions on each side of the membrane. In a typical atomistic simu- lation set-up, this situation is not as easy to accomplish as one might expect. This is because the standard method of using periodic bound- ary conditions (see Section 2.1.2) to overcome the finite-size effects of the simulation, in fact connects the two water baths so that they are in physical contact with each other. Hence, they are not two separated water solutions but one and the same. Rather recently, the lab of Woolf has developed a method to circumvent this problem by introducing a double-bilayer system in the simulation unit cell [68]. The two parallel membranes divide the simulation system into two truly separated water solutions. This method has been shown to reproduce transmembrane potentials in a realistic way but the compu- tational overload is significant. Another complication is that to establish a biological transmembrane potential on the order of hundred millivolts, an imbalance of only one or a few ions is needed for the small systems that are practical to simulate. Approximately one access ion/104 Å2 of bilayer is enough, requiring an order of 105 atoms in the simulated sys- tem. This means that if one ion permeation event occurs (for example through passing an ion channel), the membrane potential changes dras- tically and could even be reversed. Similarly, if the embedded protein, carrying charged residues, changes conformation significantly during the simulation the transmembrane potential varies considerably. However, as computational power and software efficiency increases this method will become more feasible. Another way to achieve charge imbalance is to introduce an air-water interface at the outer edges of the water baths [69]. In such a way, the water solutions on each side of a single membrane are separated from each other even though periodic boundary conditions are applied. This method has been successfully used in several studies [70, 71, 72] but it also suffers from the problem of membrane potential variability men- tioned above.

34 0.2

3 0.1 0 (q/nm )

-0.1 ρ -0.2

0.4 0.2 0

(V/nm) -0.2

E -0.4 -0.6

0.6 0.4 0.2

(V) 0

Φ -0.2 -0.4 0240 2 4 6 810 System coordinate z (nm) (=L z ) Figure 2.1: The transmembrane potential. The charge density (top) in a simula- tion is calculated by summing the charges in slices along the direction of the applied electric field. Integrating it once gives the electric field (middle) and twice the electro- static potential (bottom), see text and Eq. 2.14. Figure produced from data of paper I. The dashed line represent the simulation without an applied field and the solid line a simulation with an applied constant field of 0.052 V/nm. With a box dimension of 10 nm in z-direction this would correspond to a potential of approximately 0.5 V (φ = EzLz). This is also the potential given by the integration. onsdag den 12 oktober 2011

In contrast to the all-atom simulation techniques mentioned so far, continuum electrostatic models have previously been used to study pro- teins in general and ion permeation of channels in particular [73, 74, 75]. In these methods, the polar solvent (water and lipids in the case of a membrane-protein system) is represented as a structureless dielectric medium leaving out the necessity of explicit solvent particles. In such a representation, the effect of the membrane potential can easily be incor- porated by modifying the underlying Poisson-Boltzmann equation [73]. It has been shown that this representation can capture the dominant features of energetics of ion solvation in bulk but not in confined en- vironments such as aqueous pores (e.g. in the selectivity filter of ion channels). The approach we took in paper I, and that has been used in numerous other studies, e.g. [76, 77, 78, 79], is to represent the membrane poten-

35 tial by an added constant external electric field along the z-direction, Ez, perpendicular to the membrane surface. The resulting electrostatic potential is given by φ = EzLz, where Lz is the simulation box length in the z-direction. In practice this is achieved by adding a force along z, Fi = qiEz, to each particle i with charge qi in the system. It is a rather unphysiological set-up, in particular because the potential drops instantaneously from one end of the simulation box to the other (i.e. a discontinuous hop over the periodic boundary), but it has been shown to be a sound representation [80]. The Poisson equation gives the rela- tion between the equilibrium charge distribution and the electrostatic potential:

d2φ(z) 1 2 = − ρ(z) (2.14) dz 0

where φ is the potential, 0 is the vacuum permittivity, and ρ is the charge density. The membrane potential can therefor be calculated from a simulation (giving the charge density) by integrating this equation twice. The reference point of the potential is arbitrarily set to z = 0 and the boundary conditions used are, φ(0) = 0 and Ez(0) = dφ(0)/dz = 0. An example of the integration from paper I is shown in Figure 2.1.

2.2 Modeling of protein structures Analysis of three-dimensional high-resolution protein structures has proven very fruitful for biological and medical discovery in the last couple of decades. Experimental determination of such structures is time-consuming and expensive, and solving all human protein structures using high-resolution methods, such as X-ray crystallography and NMR spectroscopy, is an immense task and it is questionable if it is a realistic and well-motivated goal at all given the cost. However, computational modeling of protein structures could very well bridge this gap because, as the structural information in the databases accumulates, such methods are increasingly successful in predicting biologically relevant structures for proteins with unknown structure. Comparative modeling is the means by which models of protein struc- tures are built based on the information in the sequence and structure databases. Its foundations lie in the discovery of strong correlative pat- terns between sequence similarity on one hand and an analogous function on the other. In other words, as the volume of protein sequences started to accumulate in the 1950s, the data showed that similar sequences coded for proteins with identical or similar functions. This lead to the classi- fication of proteins (based on sequence) into families and superfamilies

36 1972 0 0 0 0 0 80000 1500 1973 0 0 0 0 0 1974 0 0 0 0 0 1975 0 0 0 0 0 1976 13 13 10 10 0 1977 24 37 7 17 0 1978 6 43 2 19 0 60000 1125 1979 11 54 0 19 0 1980 16 70 1 20 0 1981 16 86 5 25 0 1982 32 118 14 39 0 1983 35 153 0 39 0 40000 750 1984 22 175 5 44 0 1985 20 195 3 47 1 1986 18 213 3 50 1 Number of folds

1987 25 238 8 58 2 Number of structures 1988 53 291 7 65 3 1989 74 365 12 77 5 20000 375 1990 143 508 12 89 6 1991 187 695 20 109 7 1992 192 887 18 127 7 1993 695 1582 41 168 9 1994 1289 2871 61 229 10 1995 947 3818 59 288 10 0 0 1996 1172 4990 78 366 14 1972 1975 1978 1981 1984 1987 1990 1993 1996 1999 2002 2005 2008 2011 1997 1565 6555 84 450 20 1998 2057 8612 86 536 26 1999 2360 10972 86 622 31 2000 2627 13599 101 723 40 80!000 1!500 2001 2831 16430 107 830 43 2002 3002 19432 86 916 59 2003 4167 23599 92 1008 75 64!000 1!200 2004 5183 28782 146 1154 86 2005 5364 34146 86 1240 110 48!000 900 2006 6478 40624 101 1341 135 2007 7209 47833 46 1387 159 2008 6974 54807 6 1393 197 32!000 600 2009 7400 62207 0 1393 233 2010 7929 70136 0 1393 263 2011 6152 76288 0 1393 302 16!000 300 Note: searchable structures vary over timeNote: as searchable some become structures obsolete vary and over are time removed as some from become the database. obsolete and are removed from the database. Number of searchable structures per year.Number of searchable structures per year. Year Yearly Total Yearly Total 0 0 1972 1975 1978 1981 1984 1987 1990 1993 1996 1999 2002 2005 2008 2011

FigureNamnlös 2.2: 1ProteinNamnlös Data Bank 2 statistics.NamnlösYearly 3 growth of total structures (blue diamonds, left axis), of unique membrane protein structures (green triangles, right axis) and unique protein folds as defined by SCOP v1.75 (red squares, right axis).

[81]. The underlying idea is that members of the same protein family are all evolutionary related, or homologous. This fact can be exploited in comparative modeling because a new protein sequence with unknown structure, the target structure, can be assigned to these families and its structure can be modeled using the structures of the proteins in that particular family, the template structures. The first protein crystal structure was solved in 1958 [82] and as more and more structures were solved the above-mentioned observation was confirmed: similar protein sequences had similar structures and there- for similar functions as well. It was also realized that the number of unique ways a protein can fold, the structural space, is limited, while the sequence space in practice is unlimited. As of today, the Structural Classification of Proteins (SCOP) database [83] contains some 1400 dis- tinct protein folds and judging from the plot of accumulated folds found in Figure 2.2, the true number is not much larger. Therefor, for new se- quences found today, there is a high probability of finding a homologous protein with a known structure that can be used as a template for struc- tural modeling. The situation is a bit more complicated for membrane proteins however. The first high-resolution structure was not published until 1985 [84] and as can be seen in the figure there is only a couple of hundred structures known today. The structural space of membrane proteins is hence sparsely explored and the number of folds for this class of proteins is unknown although it probably is considerably less than the folds of globular proteins because of the environmental restraint of the membrane. Besides the comparative modeling approach, protein structures can be predicted using physics-based methods, also called de novo or ab initio

37 modeling. These methods are in principle based on molecular simulation techniques and the underlying molecular mechanics description treated previously in this chapter. A branch of this research area is the refine- ment of protein structures. Here, protein structures of moderate to poor quality are refined with the objective to improve the model by doing local adjustments, typically achieved by structural sampling and minimization of the interaction energy function.

2.2.1 Comparative modeling The first step in comparative modeling is to align the target protein sequence to proteins with known structure. By aligning sequences, evo- lutionary conserved, and hence important, residues are identified. They are later pivotal in the model building. The search of homologous se- quences is typically done using PSI-BLAST [85, 86]. It creates a multiple sequence alignment by an iterative process, were it finds new sequence hits in each step that are incorporated in the next query to the sequence database. In this fashion it is quite successful in finding distantly related sequences in a relatively short time. The alignment is extremely impor- tant for the outcome of the modeling. A poor alignment will result in a bad model. Once homologous sequences have been found, the target- template alignment and the protein structures are sent to the modeling tools. Briefly, models are built by optimizing spatial constraints derived from the alignment. The constraints are typically distances between backbone atoms of different residues. The structural model is generated by mini- mizing the violations of these constraints. When several template struc- tures are used for the model building, local sequence similarities guide the selection of the appropriate choice of template structure for this sequence section. That is, different sections of the target structure are modeled from different template structures based on local sequence sim- ilarity. In paper III, the starting point of the modeling of different states of the VSD was to build a homology model of the Shaker VSD based on the template structure of the KV 1.2/2.1 chimera protein [27]. A sequence alignment of the two sequences was performed using HHalign [87] and since the sequence identity is more than 40 % their alignment is quite straightforward. The model was built using one of the most commonly used homology modeling tools; Modeller [88].

38 2.2.2 Physics-based modeling and structural refinement The origin of de novo structure prediction lies in the experiments on protein folding performed by Anfinsen in the middle of the 20th century (for which he was awarded the Nobel Prize in Chemistry 1972) [89]. He was able to observe folding of the denatured ribonuclease protein into its biologically active conformation, the native state, in a plain solution. This means that, at least for small globular proteins, all the information for the folding of the protein lies in the amino acid sequence. Hence, if the physical interactions are described accurately enough, the modeling method should be able to find the native state. These methods are com- putationally demanding and have been used mostly on smaller model proteins. It is safe to say they have not been as successful as compara- tive modeling but as computational power is increasing they will become more and more useful. These methods generally have a molecular mechanics-type of interac- tion potential under the hood. To decrease the computational cost it is not uncommon that the methods are extended in their description of the molecular system and the way conformations are sampled. For example, coarse-graining the atomistic model by grouping several atoms in func- tional groups, each represented by a single particle, reduces the number of particles considerably. This approach has been used for the model of the lipid molecules in most simulations in this thesis, where the hydrogen atoms in the long non-polar chains are represented together with the car- bon they are attached to as a united atom. Another trick frequently used is to model the surrounding medium by an analytic function describing its average properties without treating the atoms explicitly. This is called implicit solvent methods and was studied in paper V. Analogous efforts treating the lipid bilayer implicitly have been made and can be used in the interaction potential of the Rosetta molecular modeling tool [90] that we used for the modeling of the Shaker VSD states in paper III. For successful structural refinement of protein models the modeler of- ten has to extend the sampling technique with methods exploring the conformational space more efficiently. This is due to the fact that the refinement process readily gets stuck in local energy minima of the en- ergetic landscape. MD and MC methods typically sample conformations from low-energy regions and hence these methods are very inefficient in surmounting energy barriers. Such barriers could be due to an subopti- mal packing of amino acid side chains for example. To circumvent this problem one could use simulated annealing techniques that temporarily increase the temperature in the simulation, hence providing additional energy for passing barriers. Another way is to completely rebuild parts of the protein from a library of protein structure fragments, similar to the methodology used in homology modeling based on several template

39 structures, or less dramatic, to rebuild the s from libraries of torsional angles typical for the different amino acids. The modeling in paper III was done along these lines. Starting from the homology model of the Shaker VSD, we rebuilt the C-terminal part of the domain for each state using the corresponding constraints from the metal-bridges in the experiments. The S3-S4 loop and S4 helix was rebuilt completely from fragments in Rosetta, allowing this part to adopt different positions relative the rest of the protein and the implicitly repre- sented water and membrane. During conformational sampling the back- bone and s of the remodeled part were allowed to move whereas the non-remodeled residues were kept fixed.

40 3. Summary of papers

- So Dr. Bjelkmar, in what area are you doing research? - Well Professor Knowitall, I’m answering biological questions built up by chemical entities modeled by physical principles described by mathematics using computational methods.

Pär Bjelkmar

3.1 Conformational changes in the KV 1.2 ion channel through polarized microsecond molecular simulation (paper I) The opening of the voltage-gated ion channel and initiation of the al- pha current have been shown to occur in the millisecond range [13] but an early component of the gating current has been reported in the microsecond-range [42]. With recently developed highly efficient and par- allel molecular dynamics code (e.g. GROMACS [57]) and current state- of-the-art computer clusters, it is now practically feasible to perform simulations in the microsecond regime for large membrane-protein sys- tems like the KV channels and early stages of gating can therefor, in principle, be studied. The crystal structure of the open KV 1.2 channel [25], complemented with modeled regions where electron density was missing, was inserted into a lipid bilayer membrane and simulated with an applied electric field - mimicking the effect of the resting potential - for more than a microsecond with the goal of capturing early stages of VSD gating mo- tion. To our knowledge, this work was the first simulation reaching this long sought-after timescale for a comparable system and in the wake of it similar large-scale simulations have been performed, e.g. [91, 92]. Since gating is a reversible process the hyperpolarization is expected to return the protein to the closed, resting state although this full closure is slow and beyond the scope of this study. Both in the hyperpolarized and

41 a control simulation without an applied electric field, a small rotation of the S4 helix was found, indicating that the original crystal structure might be slightly different from the natural state of the voltage-sensor. This was also the conclusion of Lewis et al. [93]. In one subunit of the protein in the hyperpolarized simulation, a more pronounced rotation of the extracellular half of the helix converted it to a 310-helix during the time-course of hundreds of nanoseconds. This is a key result support- ing the original idea of 310 helix content in voltage-gated ion channels [53, 54] and has also been confirmed by experiments [46] and recent computational studies [55, 56], paper II and III, as well by the S4 struc- ture in the more recent crystal structure of the KV 1.2/2.1 chimera [27] that was unpublished at the initiation of the project. In fact, the pro- tein conformations in the second half of the simulation are overall more similar to the new chimera structure compared to the starting KV 1.2 structure. Another interesting, and rather unexpected result, is the fact that lipid molecules in the simulation matched the positions of resolved lipid molecules in the chimera crystal structure, successfully predicting a bilayer thinning of the same magnitude and at the same membrane location.

3.2 Secondary structure of the S4 helix in gating. (paper II) As a continuation of the work in paper I, this study had a more quan- titative approach of assessing the likelihood of S4 adopting a 310 con- formation during gating. The VSD from the KV 1.2/2.1 crystal structure [27], in two conformations, was simulated in a POPC membrane. The first being the conformation in the crystal structure, with a mix of α and 310 content in the S4 helix, while the other had an all-310 S4 conforma- tion. By measuring the work of translating these two helices along their axes by steered molecular dynamics simulations, the cost of moving the 310 S4 conformation one helix turn towards the closed state was found to be significantly lower compared to the original α-helical conforma- tion. The barrier seemed to be caused by the breaking and reforming of electrostatic interactions between the arginine residues in S4 and the negative residues on neighboring helices, together with the passing of the most intracellular arginine residue through the hydrophobic region formed by the phenylalanine residue F233 (see Figure 1.4). The 310 con- formation appeared to reduce the work by limiting the amount of S4 rotation needed to reform salt-bridge interactions and its smaller size (see Figure 1.7) reduced the distortion in the rest of the VSD when moved.

42 3.3 Tracking the voltage-sensor gating motion with metal-ion bridges, computer modeling and simulation (paper III) In an attempt to study the conformations along the gating pathway of the KV channels, a collaboration with an experimental group in Linköping was initiated. By substituting residues in the Shaker channel to Cd2+- binding cysteines, they were able to collect a set of 17 newly described interactions between the VSD segments in different states along the gat- ing pathway. Starting from a homology model of the open Shaker VSD built on the template structure of the KV 1.2/2.1 protein [27], models of three suc- cessively more closed states were constructed using these experimental constraints by incorporating them in the rebuilding the voltage-sensor S4 helix and its preceding loop (using methods described in section 2.2). These models were evaluated by molecular dynamics simulations and representative conformations were chosen based on structural clustering of the simulation conformations. The three selected models of succes- sively more closed states suggest a gating mechanism where a station- ary region of the S4 helix, opposite the central hydrophobic region of the VSD (centered at the phenylalanine residue 290 in Shaker, F233 in KV 1.2 enumeration), is in a 310 conformation as the helix translates along its axis. Additionally, two interactions in the experimental set of constraints distinguish the open active and open inactive states of the channel confirming that the KV 1.2/2.1 crystal structure is indeed in the open active state despite the long depolarization of the membrane in the crystallization experiment.

3.4 Lateral diffusion of membrane proteins (paper IV) While diffusion in pure lipid membranes has been given much attention and is rather well understood, the lateral diffusion of membrane proteins is still an unclear process in membrane dynamics [94, 95]. It is important for understanding protein sorting, membrane protein complex formation and binding of ligands, such as toxins and pharmaceuticals, to membrane proteins. An enlarged version of the KV 1.2 system described in paper I was used to study this phenomenon. The protein was found to diffuse together with a dynamic complex of 50-100 lipids surrounding it. Lateral diffusion co- efficients were calculated for neighboring lipids versus the more distant ones and indeed, neighbors were an order of magnitude slower in their diffusion, having approximately the same value as the protein. Lipids up

43 to a distance of 1-2 nm from the protein surface were affected. Since cel- lular membranes are crowded with membrane proteins occupying some 30% of the total surface area this suggest that there are no free lipids with bulk properties in them. These results also have implications for the theories of lateral diffusion of membrane proteins. The equations typi- cally include the protein radius and membrane viscosity as parameters, and as this study shows, it is effectively the radius of the whole lipid- protein complex that matters and the protein senses a viscosity different from a protein-free membrane.

3.5 Linking structure and conductance in a simple ion channel (paper V)

Eukaryotic membrane proteins such as the KV channels are complexes of several protein domains, often making them too large for molecular simulations on relevant timescales, even though this shortcoming is grad- ually diminished due to the new computational capabilities. Fortunately, more simple channels are common in nature and these can be used as model systems, still giving insight into the workings of complex ion chan- nels [96]. One such channel is formed by the 16-residue antiamoebin-1 peptide (AAM-1) [97], secreted by fungi as a weapon in the microbial warfare among organisms. It is a member of the nongenomic family of fungal peptides called peptaibols [96, 98]; 15-20 residues long peptides rich in helix-promoting nonstandard amino acids. They are not encoded by the genome but rather synthesized by a special set of enzymes. Pep- taibols are able to assemble into helical bundles that form lethal pores in the membrane of the target cell. The ion-conducting multimeric state of the AAM-1 channel is not known but electrophysiological measurements of currents generated by AAM-1 channels have shown that it only has one conductance level in- dicating one single conducting structure [99]. Measuring the current as a function of protein concentration has not unambiguously revealed its na- ture but suggested that the channel is a tetramer, hexamer or octamer, depending on the underlying theory of helix aggregation [100, 99]. Models of these different channel models were constructed from the monomeric peptide structure [101, 102, 103], and molecular dynamics simulations were used to calculate their conductance. The tetrameric channel did not conduct ions at all because of a too nar- row pore and the octameric channel had a conductance that was an order of magnitude too large. The hexameric channel on the other hand had a conductance consistent with the experimentally measured value. The calculated conductances were derived both from explicit counting of ion

44 crossings in the simulation as well as from traditional continuum methods using the Nernst-Planck equation [13, 104]. Moreover, the channel was large enough for Fickian, rather than single-file, diffusion and the ions transversed the channel having their solvation shells of water molecules intact. Consequently, the free-energy barrier of ion transport was only a couple of kcal/mol and the channel was slightly cation selective.

3.6 Implementation of the CHARMM force field in GROMACS (paper VI) If the underlying model of particle interactions in the force fields used in molecular dynamics simulations is accurate enough, it should, in princi- ple, be able to find the native state of the protein (given enough computa- tional time). The development of force fields is therefor a very important and continuously current topic of research. As described in chapter 2, a force field is constituted by a set of functions and their accompanying parameters describing the interactions between atoms. One of the most commonly used force fields is CHARMM [63, 64] and this paper describes the implementation of it into the GROMACS program [57]. CHARMM has a special term, called CMAP (for correction map), modulating the torsion potential of pairs of neighboring residues (see section 2.1.2). The torsion energy of protein backbone residues is a key factor of the energy landscape determining protein structure and it has been shown that it has not been satisfactorily modeled before. After verification of the implementation in GROMACS, it was used to study the effect of the CMAP term on long timescales together with different water models and integration time steps. The inclusion of the CMAP did indeed improve the structural sampling of the protein by exploring backbone torsional angle ensembles more similar to a crystal structure. Interestingly, this was true for all water models, including the implicit ones, and time steps. In particular, the most computationally ef- ficient combination of settings - CHARMM with CMAP, implicit solvent and long time steps - did not compromise the overall structural quality and is therefor promising for protein structure refinement. Using these settings in a small test case, it was possible to improve RMSD-values of decoy structures of Protein G from 3 Å down to 2 Å, and starting from 2 Å down to 1 Å.

45

4. Concluding remarks and future perspectives

If we have learned one thing from the history of invention and discovery, it is that in the long run, and often in the short run, the most daring prophecies seem laughably conservative.

Arthur C. Clark

The function of a protein is to a large extent determined by its struc- ture. As the techniques for determining high-resolution structures im- prove and comparative modeling of protein structures become more ac- curate, we will rather soon find ourselves in a situation where most pro- teins will have a reliable high-resolution structure, deduced either from experiments or modeling. For globular proteins this time will soon here, but the research on membrane proteins is a few decades behind. To fully understand protein function however, one or a few structures of a protein is not enough even though it can take you quite far. Instead of being seen as a single static conformation, a better representation of the native state of a protein seems to be an ensemble of conformations that all are important for function. This could for example be different side chain conformations in the active site of an enzyme, all contributing to the binding of the substrate. Additionally, large-scale conformational changes often occur within the protein as it performs its function, as we have seen for the voltage-gated ion channels in this thesis. In other words, dynamics is important. Molecular dynamics simulation is a well estab- lished method for modeling the conformational flexibility of biomolecules and it provides spatial and temporal information on scales that are dif- ficult to access with experimental techniques. Historically, the compu- tational demands of MD have been too great to be able to reach the timescales of these functionally important motions that typically take place in microseconds to seconds. This is starting to change. The first all-atom MD simulation of a small protein in vacuo was per- formed in the 70’s and reached less than 10 ps. Thirteen years ago, the first microsecond all-atom simulation was performed. It was an immense task involving several months of supercomputer time even though the

47 system consisted of only some 10,000 atoms. The technology has de- veloped considerably since then and we are now on the verge of reach- ing the region where biological interesting phenomena occur. The most large-scale all-atom MD simulations nowadays consist of hundreds of thousands atoms and reach the microsecond timescale, with milliseconds on the horizon. The increase in simulation speed is both due to development of software and hardware. Algorithms for performing the MD calculations on a single processing unit have improved but the main difference the last few years has been the parallelization of the problem over many computing units. New national computer centers have been established to provide these supercomputing capabilities to the scientific community, now enabling us to perform calculations on thousands of processors simultaneously. An- other approach to enhance the capabilities of MD simulations (and other techniques) is to use distributed computing. In contrast to the computer centers, the computational capabilities are spread around the world and are connected via the internet. The very successful folding@home ini- tiative is an example of this. Here, volunteering users around the globe lend their computer power to the project, for example through screen- savers, and the folding@home servers send the simulation information to the node and collect the results. This infrastructure limits the way these calculations are performed. Because of the temporary availability of the compute nodes and the slow interconnection between them, only small simulation tasks are sent out to the users but the immense number of them enables an incredible sampling of molecular conformations. Interestingly, new types of hardware have also emerged on the market so the computing units mentioned above do not necessarily have to refer to the traditional central processing units (CPUs). In particular, the graphics processing units (GPUs) that are common in current desktop computers, typically for the demanding graphics of computer games, have an immense capacity of floating-point arithmetics that can readily be used for MD calculations. Implementation of MD code on GPUs is still an area under development but the technology is promising, not least because these chips are very cost-effective. Another type of hardware used for MD simulations is the Cell processor in Sony’s Playstation 3. The folding@home project, for example, has quite an impressive number of PS3s in its arsenal. These are all general-purpose hardware, but there are also efforts in developing custom-built hardware specialized for MD simulations. For example, DE Shaw Research in New York has built the impressive supercomputer Anton for molecular dynamics simulations, showing unprecedented simulation speeds. Since the calculation of the underlying interactions in molecular sim- ulations has become faster it promotes improvements in the force fields as well. In particular, considerable effort has been given to represent

48 the polarization in chemical systems in a more realistic way. The cur- rent standard is to assign fixed partial charges on all the particles in the simulation but that, of course, prohibits the change in the electronic structure in the simulation. On the other end of the spectrum, model reduction through coarse-graining will enable yet another boost in sam- pling efficiency even though the accuracy is reduced somewhat. In the long run, I believe it is likely that the development of more sophisti- cated coarse-graining strategies will bridge the gap between atomistic and macroscopic representations through a hierarchical structure of sim- ulations with different coarseness. The future of molecular dynamics simulations is not simply a matter of longer and faster simulations. The field has also seen a development in how the biomolecular conformational space is explored, exemplified by enhanced sampling techniques and new methods for free-energy calcu- lations through nonequilibrium work methods. Additionally, new efforts using Markov-state modeling is trying to represent dynamical evolution over long timescales from short MD simulations by automatic decom- position of conformational space in disjoint but connected metastable states with Markovian transitions between them. In these examples the strategy is to make the conformational search more efficient, with re- spect to the process of interest, rather than relying on one or a few long, brute-force, equilibrium simulations. These strategies will for sure pave the way for the extended applicability of molecular simulations in the future, for example in the areas of computer-aided drug design, protein engineering, and the study of macromolecular conformational dynamics. Does molecular modeling have a future? As you have seen above I am quite optimistic, but I think we have to interact more with experi- mentalists. More experimental collaborations should be initiated to com- pare, complement, and validate our simulations to a higher degree than what is done today. The era of pure-MD papers is coming to an end, but on the other hand, as we incorporate more and more experimental data in our work, the potential of molecular modeling and simulation will be enhanced and become a standard tool throughout the molecular sciences. Further increases in simulation speed in combination with im- provements in force fields and sampling techniques will provide us with unprecedented insight into the relationship between biomolecular struc- ture, dynamics, and function.

49

5. Svensk populärvetenskaplig sammanfattning

Ju mer man tänker, desto mer inser man att det inte finns något enkelt svar.

Nalle Puh

En cell avgränsar sig från sin omgivning genom att omge sig av ett membran. På så sätt kan den koncentrera de livsnödvändiga molekylerna till en liten volym för att därmed kunna utföra all den kemi som är karak- täristiskt för livet. Även inuti en cell finns det membran som särskiljer delar av den i distinkta enheter, kallade organeller, med specialiserad funktion. Så fort de informationsbärande molekylerna som utgör livets kärna omgav sig av ett membran i begynnelsen av livets utveckling upp- kom bohovet att kunna transportera materiel över dessa annars ogenom- släppliga barriärer. Därför utvecklades speciella membranbundna trans- portmolekyler vars kemiska sammansättning finns lagrad, tillsammans med ritningarna för alla andra proteiner som cellen kan producera, i de informationsbärande molekyler som idag utgörs av DNA. Molekylerna som sitter insprängda i membranen kallas membranpro- teiner och de är en mycket viktig klass av proteiner som i princip sköter all interaktion mellan cellen och dess omgivning samt mellan de membra- nomgivna organellerna inuti cellen. DNA-molekylerna specificerar endast proteinernas kemiska sammansättning men för att förstå hur proteiner fungerar måste man studera deras tre-dimensionella struktur. Denna bestäms av de fysikaliska interaktionerna mellan proteinet och dess om- givning och är en häpnadsväckande process som under normala om- ständigheter låter proteinet vecka sig till en helt unik struktur som är den biologiskt aktiva. Felveckning av proteiner kan ha livsavgörande kon- sekvenser i och med att de kan framkalla diverse sjukdomstillstånd. Veck- ningsprocessen är inte förstådd i sin helhelt men efter mycket forskning har de grundläggande principerna kartlagts. Att bestämma ett proteins DNA-sekvens är relativt enkelt men att fastställa dess struktur har visat sig mycket svårt och dyrt. Det gäller speciellt gruppen av membranproteiner. Idag finns endast några hundra

51 högupplösta strukturer av membranproteiner jämfört med tio-tusentals strukturer av icke membranbundna proteiner, så kallade globulära pro- teiner, som rör sig fritt i vattenblandningen i cellen. I och med kostnaden associerad med att bestämma proteiners struktur har datorbaserade al- ternativ utvecklats. De är väldigt mycket mer tidseffektiva och mindre kostsamma och är därmed viktiga metoder inom forskningen nuförtiden. Datormodellering av proteiner på atomär nivå kan göras på många olika sätt, t.ex. med hjälp av den evolutionära informationen från alla DNA- sekvenser av proteiner från olika organismer som finns tillgängliga, till- sammans med de redan bestämda strukturerna. Ett annat alternativ är att försöka beskriva de fysikaliska interaktionerna mellan atomer och på denna väg simulera proteiners beteende. I denna avhandling har jag, tillsammans med flertalet andra forskare, främst studerat en viss typ av membranproteiner som har den fantastiska egenskapen att selektivt transportera kaliumjoner med väldig effektivitet. Regleringen av öppnandet och stängandet av kanalen sker genom spän- ningsförändringar över membranet. Dessa så kallade spänningskänsliga kaliumkanaler har studerats med hjälp av datorbaserad modellering, både utifrån evolutionär information och med fysikbaserade metoder som kompletterats med nya data från ett samarbete med en experimentell forskargrupp. Våra studier har visat att den spänningskänsliga delen av proteinet förändrar sin struktur på ett speciellt sätt under öppnandet och stängandet av kanalen. Proteinets interaktioner med det omgivande membranet har också studerats och vi har visat att proteinet påverkar membranet i stor utsträckning. Framförallt så rör sig proteinet tillsam- mans med ett kluster av lipider runt omkring sig och dessa lipider har därmed helt andra egenskaper än de opåverkade. Detta har givit ny kun- skap för förståelsen av interaktionerna mellan proteiner och membran och för teorierna som beskriver molekylrörelserna i ett membran. Vidare har datormodellering av en liten och primitiv antibiotisk jonkanal använts för att bestämma dess biologiskt aktiva form genom att, utifrån olika modeller av kanalen, beräkna dess genomsläpplighet för joner och jämföra resultaten med experimentellt uppmätta värden. Endast en av modellerna stämde överens med vad som uppmätts i labbet och vi bekräftade dessutom kanalens moderata selektivitet för positivt laddade joner. Det sista arbetet som jag utfört är att överföra ett kraftfält som beskriver de atomära interaktionerna till den programvara som vi använder i forskargruppen. Specifika egenskaper hos modellerings- programmet kunde därför kombineras med kraftfältet och dess beteende kunde studeras på långa tidsskalor. Vi fann att kraftfältet beskriver interaktionerna på ett godtagbart sätt och kombinationen med den nya programvaran uppvisade lovande egenskaper när det gäller att förbättra strukturella proteinmodeller.

52 6. Acknowledgements

I don’t know half of you half as well as I should like, and I like less than half of you half as well as you deserve.

Bilbo Baggins

The thesis you hold in your hand had not been possible to write if not for a certain number of people. Some of you might not have made a direct contribution to it but you kept me happy and mo- tivated during these years, something that should not be underestimated.

Foremost, I would like to thank you Erik, my advisor and partner in crime, for being such a generous person both when it comes to giving me the opportunity to use the most current toys to play with, as well as giving me the freedom to plan my own time. You have never looked twice when I have gone for a workout session in the middle of the day, although you’ve quite commonly commented on my sanity for so doing. Besides this, one of the greatest things you’ve done is to always believe in me, from the day you hired me after one Skype-call until today, when most of our communication still is electronic... No offense! You are a busy professor with an average altitude of at least 3 000 meters. You have an storehouse worth of knowledge in the most surprising subjects and I’m trying to steal as many boxes as I can. Bilbo would perhaps put it in this way: if I would know half of what you know in half the subjects, I am half again as knowledgeable as the rest on this planet.

I have been surrounded by more very inspirational people, Arne, with all your ideas and enthusiasm, as well as Martin and Gunnar, for the same reasons. Berk, I really appreciate your ability to explain complicated stuff in an understandable way and taking the time for so doing. You have that curiosity and drive to understand the nitty-gritty details. I would like to be the same.

One could say that my scientific journey started during my

53 master’s thesis work when you, Andrew, took me under your wings. Thanks for all the effort you underwent to grant me access to the well-guarded Moffett Field, and for your and Joanna’s hospitality. Mike and Karl, I’ll never forget our daily trips to Dana street for the great company and the best coffee I’ve ever had.

I’d like to acknowledge all my collaborators and co-authors for the great work we’ve put together. The Ames people of course, and Ilpo, for the Finnish hospitality and computer power. Fredrik for being such a ninja in the electrophysiology of the voltage-gated ion channels, and a ninja master when it comes to running! You introduced me to the ultras and hopefully we can run and compete together many more times.

The start of my PhD studentship could not have been better, with extensive discussions on the Dragon and weekly gatherings around another round of Vetligan. Thank you Jenny for those times, and Aron, my longest-lasting room-mate, for all the good times we’ve had during these years.

My past and current colleagues in the Lindahl group, you have been great to work and hang out with!

During my exile at KTH, I have missed the C4 corridor and the group of social and positive people that inhabits it. I’ll remember the dart sessions with Håkan and Andreas, supersweet Friday cake clubs, nice cold(-room) after work beers with the occasional Rock Band jamming, and great discussions over coffee, of- ten involving Charlotta, Karin, Catrine, Dan, Daniel, Katta, and Linnea.

In fact, it’s not only the C4 corridor that’s a great place to work in, but the whole DBB. I’m impressed by the deeply rooted familiar spirit in the department. You are a handful of people that nourishes it, respect to you!

A special thanks goes to Henrik and Johan, and the others in Puppy TS, for keeping my body tired and my mind sound.

A curious thing that might be fit to mention here is that I started to log all the songs I listen to in the office in September 2006 when I began my PhD studies. Since then I have listened to 91,389 songs. Assuming each song is 4 minutes long, and that I have worked all days of the year (yeah right!), I have listened to inspira- tional music for 3.34 hours on average each day. Thank you for the music!

54 Thank you mom and dad for many, many things, but in rela- tion to this, for teaching me to think critically and see the importance of education.

The best thing I did during these years had nothing to do with science, or wait, of course it had! My genetic information was fused with that of Lina’s and our wonderful biochemical experiment, Alma, was born. You are my treasures and I’m the world’s happiest man. I love you!

fredag den 4 november 2011

55

Bibliography

[1] Transporter classification database. www.tcdb.org/.

[2] On-line biology book. www2.estrellamountain.edu/faculty/farabee/biobk/biobooktoc.html.

[3] P. Mark and L. Nilsson. Structure and dynamics of the TIP3P, SPC, and SPC/E water models at 298 K. J Phys Chem B, 105(43):24A–24A, 2001.

[4] P. S. Niemela, S. Ollila, M. T. Hyvonen, M. Karttunen, and I. Vat- tulainen. Assessing the nature of lipid raft membranes. PLoS Comput Biol, 3(2):e34, 2007.

[5] J. Jäckle. The causal theory of the resting potential of cells. J Theor Biol, 249(3):445–63, 2007.

[6] P. M. Wood. What is the Nernst equation. Trends Biochem Sci, 10(3):106–7, 1985.

[7] A. L. Hodgkin and B. Katz. The effect of sodium ions on the electrical activity of giant axon of the squid. J Physiol, 108(1):37–77, 1949.

[8] M. W. Barnett and P. M. Larkman. The action potential. Pract Neurol, 7(3):192–7, 2007.

[9] J. Aqvist and V. Luzhkov. Ion permeation mechanism of the potassium channel. Nature, 404(6780):881–4, 2000.

[10] S. Berneche and B. Roux. Energetics of ion conduction through the K+ channel. Nature, 414(6859):73–7, 2001.

[11] A. N. Thompson, I. Kim, T. D. Panosian, T. M. Iverson, T. W. Allen, and C. M. Nimigean. Mechanism of potassium-channel selectivity revealed by Na(+) and Li(+) binding sites within the KcsA pore. Nat Struct Mol Biol, 16(12):1317–24, 2009.

[12] D. A. Doyle, J. Morais Cabral, R. A. Pfuetzner, A. Kuo, J. M. Gulbis, S. L. Cohen, B. T. Chait, and R. MacKinnon. The structure of the potas- sium channel: molecular basis of K+ conduction and selectivity. Science, 280(5360):69–77, 1998.

57 [13] B. Hille. Ion Channels of Excitable Membranes. Sinauer Associates, third edition, 2001.

[14] F. H. Yu and W. A. Catterall. The VGL-chanome: a protein superfam- ily specialized for electrical signaling and ionic homeostasis. Sci STKE, 2004(253):re15, 2004.

[15] C. Miller. An overview of the potassium channel family. Genome Biol, 1(4):reviews0004.1–0004.5, 2000.

[16] L. A. Pardo. Voltage-gated potassium channels in cell proliferation. Phys- iology (Bethesda), 19:285–92, 2004.

[17] F. Lang. Mechanisms and significance of cell volume regulation. J Am Coll Nutr, 26(5 Suppl):613S–3S, 2007.

[18] F. Ashcroft. Ion Channels and Disease. Academic press, 1999.

[19] S. Sokolov, T. Scheuer, and W. A. Catterall. Gating pore current in an inherited ion channelopathy. Nature, 446(7131):76–8, 2007.

[20] D. M. Papazian, T. L. Schwarz, B. L. Tempel, Y. N. Jan, and L. Y. Jan. Cloning of genomic and complementary DNA from Shaker, a puta- tive potassium channel gene from Drosophila. Science, 237(4816):749–53, 1987.

[21] E. Novitsky. New mutant reports. Drosophila Information Service, 23:61–2, 1949.

[22] A. L. Hodgkin, A. F. Huxley, and B. Katz. Measurement of current- voltage relations in the membrane of the giant axon of Loligo. J Physiol, 116(4):424–48, 1952.

[23] A. L. Hodgkin and A. F. Huxley. A quantitative description of membrane current and its application to conduction and excitation in nerve. J Physiol, 117(4):500–44, 1952.

[24] Y. Jiang, A. Lee, J. Chen, V. Ruta, M. Cadene, B. T. Chait, and R. MacKinnon. X-ray structure of a voltage-dependent K+ channel. Na- ture, 423(6935):33–41, 2003.

[25] S. B. Long, E. B. Campbell, and R. Mackinnon. Crystal structure of a mammalian voltage-dependent Shaker family K+ channel. Science, 309(5736):897–903, 2005.

[26] S. B. Long, E. B. Campbell, and R. Mackinnon. Voltage sensor of Kv1.2: structural basis of electromechanical coupling. Science, 309(5736):903–8, 2005.

58 [27] S. B. Long, X. Tao, E. B. Campbell, and R. Mackinnon. Atomic structure of a voltage-dependent K(+) channel in a lipid membrane-like environ- ment. Nature, 450(7168):376–382, 2007. [28] Y. Jiang, A. Lee, J. Chen, M. Cadene, B. T. Chait, and R. MacK- innon. The open pore conformation of potassium channels. Nature, 417(6888):523–6, 2002. [29] M. Sasaki, M. Takagi, and Y. Okamura. A voltage sensor-domain protein is a voltage-gated proton channel. Science, 312(5773):589–92, 2006. [30] I. S. Ramsey, M. M. Moran, J. A. Chong, and D. E. Clapham. A voltage-gated proton-selective channel lacking the pore domain. Nature, 440(7088):1213–6, 2006. [31] Y. Murata, H. Iwasaki, M. Sasaki, K. Inaba, and Y. Okamura. Phos- phoinositide phosphatase activity coupled to an intrinsic voltage sensor. Nature, 435(7046):1239–43, 2005. [32] T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. J Mol Biol, 147(1):195–7, 1981. [33] T. Hessa, S. H. White, and G. von Heijne. Membrane insertion of a potassium-channel voltage sensor. Science, 307(5714):1427, 2005. [34] J. A. Freites, D. J. Tobias, G. von Heijne, and S. H. White. Interface connections of a transmembrane voltage sensor. Proc Natl Acad Sci U S A, 102(42):15059–64, 2005. [35] S. Dorairaj and T. W. Allen. On the thermodynamic stability of a charged arginine side chain in a transmembrane helix. Proc Natl Acad Sci U S A, 104(12):4943–8, 2007. [36] L. Li, I. Vorobyov, Jr. MacKerell, A. D., and T. W. Allen. Is arginine charged in a membrane? Biophys J, 94(2):L11–3, 2008. [37] C. M. Armstrong and F. Bezanilla. Currents related to movement of the gating particles of the sodium channels. Nature, 242(5398):459–61, 1973. [38] F. Tombola, M. M. Pathak, and E. Y. Isacoff. Voltage-sensing arginines in a potassium channel permeate and occlude cation-selective pores. Neu- ron, 45(3):379–88, 2005. [39] N. E. Schoppa, K. McCormack, M. A. Tanouye, and F. J. Sigworth. The size of gating charge in wild-type and mutant Shaker potassium channels. Science, 255(5052):1712–5, 1992. [40] S. K. Aggarwal and R. MacKinnon. Contribution of the S4 segment to gating charge in the Shaker K+ channel. Neuron, 16(6):1169–77, 1996.

59 [41] M. M. Pathak, V. Yarov-Yarovoy, G. Agarwal, B. Roux, P. Barth, S. Ko- hout, F. Tombola, and E. Y. Isacoff. Closing in on the resting state of the Shaker K(+) channel. Neuron, 56(1):124–40, 2007.

[42] D. Sigg, F. Bezanilla, and E. Stefani. Fast gating in the Shaker K+ channel and the energy landscape of activation. Proc Natl Acad Sci U S A, 100(13):7611–5, 2003.

[43] F. V. Campos, B. Chanda, B. Roux, and F. Bezanilla. Two atomic con- straints unambiguously position the S4 segment relative to S1 and S2 segments in the closed state of Shaker K channel. Proc Natl Acad Sci U SA, 104(19):7904–9, 2007.

[44] X. Tao, A. Lee, W. Limapichat, D. A. Dougherty, and R. MacKinnon. A gating charge transfer center in voltage sensors. Science, 328(5974):67–73, 2010.

[45] M. C. Lin, J. Y. Hsieh, A. F. Mock, and D. M. Papazian. R1 in the Shaker S4 occupies the gating charge transfer center in the resting state. J Gen Physiol, 138(2):155–63, 2011.

[46] C. A. Villalba-Galea, W. Sandtner, D. M. Starace, and F. Bezanilla. S4- based voltage sensors have three major conformations. Proc Natl Acad Sci U S A, 105(46):17600–7, 2008.

[47] J. F. Cordero-Morales, L. G. Cuello, Y. Zhao, V. Jogini, D. M. Cortes, B. Roux, and E. Perozo. Molecular determinants of gating at the potassium-channel selectivity filter. Nat Struct Mol Biol, 13(4):311–8, 2006.

[48] L. Pauling, R. B. Corey, and H. R. Branson. The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci U S A, 37(4):205–11, 1951.

[49] M. L Smythe, S. E. Huston, and G. R. Marshall. The molten helix - effects of solvation on the alpha-helical to 3(10)-helical transition. J Am Chem Soc, 117(20):5445–52, 1995.

[50] D. J. Barlow and J. M. Thornton. Helix geometry in proteins. J Mol Biol, 201(3):601–19, 1988.

[51] P. Enkhbayar, K. Hikichi, M. Osaki, R. H. Kretsinger, and N. Mat- sushima. 3(10)-helices in proteins are parahelices. Proteins, 64(3):691–9, 2006.

[52] R. S. Vieira-Pires and J. H. Morais-Cabral. 3(10) helices in channels and other membrane proteins. J Gen Physiol, 136(6):585–92, 2010.

60 [53] M. Noda, S. Shimizu, T. Tanabe, T. Takai, T. Kayano, T. Ikeda, H. Taka- hashi, H. Nakayama, Y. Kanaoka, N. Minamino, and et al. Primary structure of electrophorus electricus deduced from cDNA sequence. Nature, 312(5990):121–7, 1984. [54] E. M. Kosower. A structural and dynamic molecular model for the sodium channel of Electrophorus electricus. FEBS Lett, 182(2):234–42, 1985. [55] F. Khalili-Araghi, V. Jogini, V. Yarov-Yarovoy, E. Tajkhorshid, B. Roux, and K. Schulten. Calculation of the gating charge for the Kv1.2 voltage- activated potassium channel. Biophys J, 98(10):2189–98, 2010. [56] E. V. Schow, J. A. Freites, K. Gogna, S. H. White, and D. J. Tobias. Down-state model of the voltage-sensing domain of a potassium channel. Biophys J, 98(12):2857–66, 2010. [57] B. Hess, C. Kutzner, D. van der Spoel, and E. Lindahl. GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. J Chem Theor Comp, 4(3):435–47, 2008. [58] B. R. Brooks, 3rd Brooks, C. L., Jr. Mackerell, A. D., L. Nilsson, R. J. Pe- trella, B. Roux, Y. Won, G. Archontis, C. Bartels, S. Boresch, A. Caflisch, L. Caves, Q. Cui, A. R. Dinner, M. Feig, S. Fischer, J. Gao, M. Hodoscek, W. Im, K. Kuczera, T. Lazaridis, J. Ma, V. Ovchinnikov, E. Paci, R. W. Pastor, C. B. Post, J. Z. Pu, M. Schaefer, B. Tidor, R. M. Venable, H. L. Woodcock, X. Wu, W. Yang, D. M. York, and M. Karplus. CHARMM: the biomolecular simulation program. J Comput Chem, 30(10):1545–614, 2009. [59] J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D. Skeel, L. Kale, and K. Schulten. Scalable molecular dynamics with NAMD. J Comput Chem, 26(16):1781–802, 2005. [60] L Verlet. Computer "experiments" on classical fluids. i. thermodynamical properties of Lennard-Jones molecules. Phys Rev, 159(1):98–103, 1967. [61] R. W. Hockney, S. P. Goel, and J. W. Eastwood. Quiet high-resolution computer models of a plasma. J Comput Phys, 14(2):148–58, 1974. [62] W. C. Swope, H. C. Andersen, P. H. Berens, and K. R. Wilson. A computer-simulation method for the calculation of equilibrium-constants for the formation of physical clusters of molecules - application to small water clusters. J Chem Phys, 76(1):637–49, 1982. [63] A. D. Jr. MacKerell, D. Bashford, M. Bellott, R. L. Dunbrack, J. D. Evanseck, M. J. Field, S. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph- McCarthy, L. Kuchnir, K. Kuczera, F. T. K. Lau, C. Mattos, S. Mich- nick, T. Ngo, D. T. Nguyen, B. Prodhom, W. E. Reiher, B. Roux,

61 M. Schlenkrich, J. C. Smith, R. Stote, J. Straub, M. Watanabe, J. Wirkiewicz-Kuczera, D. Yin, and M. Karplus. All-atom empirical po- tential for molecular modeling and dynamics studies of proteins. J Phys Chem B, 102(18):3586–616, 1998.

[64] A. D. Mackerell, M. Feig, and C. L. Brooks III. Extending the treatment of backbone energetics in protein force fields: limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations. J Comput Chem, 25(11):1400–15, 2004.

[65] T. Darden, D. York, and L. Pedersen. Particle Mesh Ewald - an n log(n) method for Ewald sums in large systems. J Chem Phys, 98(12):10089–92, 1993.

[66] G. G. Dodson, D. P. Lane, and C. S. Verma. Molecular simulations of protein dynamics: new windows on mechanisms in biology. EMBO Rep, 9(2):144–50, 2008.

[67] I. Vattulainen and T. Rog. Lipid simulations: a perspective on lipids in action. Cold Spring Harb Perspect Biol, 3(4), 2011.

[68] J. N. Sachs, P. S. Crozier, and T. B. Woolf. Atomistic simulations of biologically realistic transmembrane potential gradients. J Chem Phys, 121(22):10847–51, 2004.

[69] L. Delemotte, F. Dehez, W. Treptow, and M. Tarek. Modeling membranes under a transmembrane potential. J Phys Chem B, 112(18):5547–50, 2008.

[70] E. J. Denning, P. S. Crozier, J. N. Sachs, and T. B. Woolf. From the gating charge response to pore domain movement: initial motions of Kv1.2 dy- namics under physiological voltage changes. Mol Membr Biol, 26(8):397– 421, 2009.

[71] L. Delemotte, W. Treptow, M. L. Klein, and M. Tarek. Effect of sen- sor domain mutations on the properties of voltage-gated ion channels: molecular dynamics studies of the potassium channel Kv1.2. Biophys J, 99(9):L72–4, 2010.

[72] L. Delemotte, M. Tarek, M. L. Klein, C. Amaral, and W. Treptow. In- termediate states of the Kv1.2 voltage sensor from atomistic molecular dynamics simulations. Proc Natl Acad Sci U S A, 108(15):6109–14, 2010.

[73] B. Roux, S. Berneche, and W. Im. Ion channels, permeation, and electro- statics: insight into the function of KcsA. Biochemistry, 39(44):13295– 306, 2000.

62 [74] S. Berneche and B. Roux. A microscopic view of ion conduction through the K+ channel. Proc Natl Acad Sci U S A, 100(15):8644–8, 2003.

[75] W. Im and B. Roux. Ion permeation and selectivity of OmpF : a theoretical study based on molecular dynamics, Brownian dynamics, and continuum electrodiffusion theory. J Mol Biol, 322(4):851–69, 2002.

[76] A. Suenaga, Y. Komeiji, M. Uebayasi, T. Meguro, M. Saito, and I. Yam- ato. Computational observation of an ion permeation through a channel protein. Biosci Rep, 18(1):39–48, 1998.

[77] Y. Yang, D. Henderson, and D. Busath. Applied-field molecular dynam- ics study of a model selectivity filter. J Chem Phys, 118(9):4213–20, 2003.

[78] W. Treptow, B. Maigret, C. Chipot, and M. Tarek. Coupled motions be- tween pore and voltage-sensor domains: A model for Shaker B, a voltage- gated potassium channel. Biophys J, 87(4):2365–79, 2004.

[79] F. Khalili-Araghi, E. Tajkhorshid, and K. Schulten. Dynamics of K+ ion conduction through Kv1.2. Biophys J, 91(6):L72–4, 2006.

[80] B. Roux. The membrane potential and its representation by a constant electric field in computer simulations. Biophys J, 95(9):4205–16, 2008.

[81] M. O. Dayhoff. Atlas of protein sequence and structure. National Biomed- ical Research Foundation, Georgetown University, Washington DC., 1972.

[82] J. C. Kendrew, G. Bodo, H. M. Dintzis, R. G. Parrish, H. Wyckoff, and D. C. Phillips. A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature, 181(4610):662–6, 1958.

[83] A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia. SCOP: a structural classification of proteins database for the investigation of se- quences and structures. J Mol Biol, 247(4):536–40, 1995.

[84] J. Deisenhofer, O. Epp, K. Miki, R. Huber, and H. Michel. Structure of the protein subunits in the photosynthetic reaction centre of Rhodospeu- domonas viridis at 3 Å resolution. Nature, 318(6047):618–24, 1985.

[85] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25(17):3389–402, 1997.

[86] S. F. Altschul and E. V. Koonin. Iterated profile searches with PSI- BLAST–a tool for discovery in protein databases. Trends Biochem Sci, 23(11):444–7, 1998.

63 [87] J. Söding. Protein homology detection by HMM-HMM comparison. Bioinformatics, 21(7):951–60, 2005.

[88] A. Sali and T. L. Blundell. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol, 234(3):779–815, 1993.

[89] C. B. Anfinsen. Principles that govern the folding of protein chains. Science, 181(96):223–30, 1973.

[90] A. Leaver-Fay, M. Tyka, S. M. Lewis, O. F. Lange, J. Thompson, R. Ja- cak, K. Kaufman, P. D. Renfrew, C. A. Smith, W. Sheffler, I. W. Davis, S. Cooper, A. Treuille, D. J. Mandell, F. Richter, Y. E. Ban, S. J. Fleishman, J. E. Corn, D. E. Kim, S. Lyskov, M. Berrondo, S. Mentzer, Z. Popovic, J. J. Havranek, J. Karanicolas, R. Das, J. Meiler, T. Ko- rtemme, J. J. Gray, B. Kuhlman, D. Baker, and P. Bradley. ROSETTA3: an object-oriented software suite for the simulation and design of macro- molecules. Methods Enzymol, 487:545–74, 2011.

[91] R. O. Dror, D. H. Arlow, D. W. Borhani, M. O. Jensen, S. Piana, and D. E. Shaw. Identification of two distinct inactive conformations of the beta2-adrenergic receptor reconciles structural and biochemical observa- tions. Proc Natl Acad Sci U S A, 106(12):4689–94, 2009.

[92] M. O. Jensen, D. W. Borhani, K. Lindorff-Larsen, P. Maragakis, V. Jogini, M. P. Eastwood, R. O. Dror, and D. E. Shaw. Principles of conduction and hydrophobic gating in K+ channels. Proc Natl Acad Sci U S A, 107(13):5833–38, 2010.

[93] A. Lewis, V. Jogini, L. Blachowicz, M. Laine, and B. Roux. Atomic constraints between the voltage sensor and the pore domain in a voltage- gated K+ channel of known structure. J Gen Physiol, 131(6):549–61, 2008.

[94] Y. Gambin, R. Lopez-Esparza, M. Reffay, E. Sierecki, N. S. Gov, M. Gen- est, R. S. Hodges, and W. Urbach. Lateral mobility of proteins in liquid membranes revisited. Proc Natl Acad Sci U S A, 103(7):2098–102, 2006.

[95] S. Ramadurai, A. Holt, V. Krasnikov, G. van den Bogaart, J. A. Killian, and B. Poolman. Lateral diffusion of membrane proteins. J Am Chem Soc, 131(35):12650–6, 2009.

[96] J. K. Chugh and B. A. Wallace. Peptaibols: models for ion channels. Biochem Soc Trans, 29(Pt 4):565–70, 2001.

[97] R. C. Pandey, H. Meng, Jr. Cook, J. C., and Jr. Rinehart, K. L. Structure of antiamoebin i from high resolution field desorption and gas chromato- graphic mass spectrometry studies. J Am Chem Soc, 99(15):5203–5, 1977.

64 [98] H. Duclohier. Peptaibiotics and peptaibols: an alternative to classical antibiotics? Chem Biodivers, 4(6):1023–6, 2007.

[99] H. Duclohier, C. F. Snook, and B. A. Wallace. Antiamoebin can func- tion as a carrier or as a pore-forming peptaibol. Biochim Biophys Acta, 1415(1):255–60, 1998. [100] J. E. Hall, I. Vodyanoy, T. M. Balasubramanian, and G. R. Marshall. Alamethicin. a rich model for channel behavior. Biophys J, 45(1):233–47, 1984. [101] I. L. Karle, M. A. Perozzo, V. K. Mishra, and P. Balaram. Crystal struc- ture of the channel-forming polypeptide antiamoebin in a membrane- mimetic environment. Proc Natl Acad Sci U S A, 95(10):5501–4, 1998. [102] C. F. Snook, G. A. Woolley, G. Oliva, V. Pattabhi, S. F. Wood, T. L. Blundell, and B. A. Wallace. The structure and function of antiamoebin i, a proline-rich membrane-active polypeptide. Structure, 6(6):783–92, 1998. [103] T. P. Galbraith, R. Harris, P. C. Driscoll, and B. A. Wallace. Solution NMR studies of antiamoebin, a membrane channel-forming polypeptide. Biophys J, 84(1):185–94, 2003. [104] D. G. Levitt. Interpretation of biological ion channel flux data–reaction- rate versus continuum theory. Annu Rev Biophys Biophys Chem, 15:29– 57, 1986.

65