The Ins and Outs of Membrane

Topology Studies of Bacterial Membrane Proteins

Mikaela Rapp

Department of Biochemistry and Biophysics

Stockholm University, Sweden 2006 © Mikaela Rapp, Stockholm 2006

ISBN 91-7155-311-8, pages 1-58

All previously published articles are reprinted with permission from the publishers

Typesetting: Intellecta Docusys Printed in Sweden by Intellecta Docusys, Stockholm 2006 Distributor: Stockholm University Library Till min familj

Abstract

α-helical membrane proteins comprise about a quarter of all proteins in a cell and carry out a wide variety of essential cellular functions. This thesis is focused on topology analyses of bacterial membrane proteins. The topology describes the two-dimensional structural arrangement of a relative to the membrane. By combining large-scale experimental and bioinformatics techniques we have produced experimentally constrained topology models for the major part of the Escherichia coli membrane proteome. This represents a substantial increase in available topology information for bacterial membrane proteins. Many structures show signs of internal duplication and approximate two-fold in-plane symmetry. We propose a step-wise pathway to explain how proteins with such internal inverted repeats have evolved. The pathway is based on the ‘positive-inside’ rule and starts with a protein that can adopt two topologies in the membrane, i.e. a “dual” topology protein. The gene encoding the dual topology protein is duplicated and eventually, through re-distribution of positively charge residues, the two resulting homologous proteins become fixed in opposite orientations in the membrane. Finally, the two proteins may fuse into one single polypeptide with an internal inverted repeat structure. Finally, we re-create the proposed step-wise evolutionary pathway in the laboratory by showing that only a small number of mutations are required in order to transform the homo-dimeric, dual topology protein EmrE into a hetero-dimeric complex composed of two oppositely oriented proteins.

Contents

List of Publications...... 1

Abbreviations ...... 2

Introduction ...... 3 The biological membrane ...... 4 Biogenesis of MPs...... 5 An overview of the “birth and life” of MPs ...... 6 The two-stage model ...... 7

Membrane Protein Topology ...... 9 Hydrophobicity...... 9 An introduction to the thermodynamic view of insertion and α-helix formation...... 9 The α-helix formation-site...... 10 Length and hydrophobicity of TMHs ...... 10 Cracking the insertion code...... 10 The membrane-water interface region...... 12 Loops and tails ...... 12 The positive-inside rule ...... 13 The ‘snorkel’ effect ...... 16 The aromatic belt ...... 16 Topogenic determinants work in concert ...... 18 What contributes to the orientation of the signal sequence?...... 18 Helical hairpins...... 18 A joint /lipid system deciphers the unique topology of a MP...... 19 The evolution of membrane proteins ...... 20 Internal repeats ...... 20 Internal inverted repeats...... 20

Membrane Protein Structure...... 21 Intrahelical interactions ...... 21 α-Helical distortions...... 21 Interhelical interactions ...... 22 Attractive forces that may influence MP structure formation...... 22 Topological constraints...... 24 The role of interhelical loops in the folding and stability of MPs...... 24 The role of the helical backbone dipolar moment in packing of MPs ...... 24 Re-entrant loops...... 25 Helix packing models...... 26 Knobs-into-holes (left handed crossing angles)...... 26 The GxxxG motif (right handed crossing angles)...... 26 Protein-Lipid interactions ...... 27 Curvature stress...... 27 Hydrophobic mismatch...... 27 Specific protein-lipid interactions ...... 28 Lipid-dependent topogenesis of multi-spanning MPs ...... 28

Escherichia coli as a Model Organism...... 29 Cell architecture...... 29 The K-12 genome...... 29

Experimental Approaches to Reveal MP Structure...... 31 3D structural models...... 31 Topology mapping ...... 31 The C-tail and the internal loops: main sites for experimental topology mapping .. 32 Reporter fusions...... 32

Bioinformatics - a Theoretical Approach to Study MPs ...... 33 Sequence comparisons of proteins...... 33 BLAST...... 33 Pfam...... 34 Prediction methods...... 34 Topology prediction methods ...... 34

Summary of Papers I and II ...... 39 Large-scale topology analysis ...... 39 Experimentally based topology models for E. coli inner MPs (Paper I)...... 39 Global topology analyses of the E. coli membrane proteome (Paper II) ...... 40

Summary of Papers II, III and IV ...... 41 How did proteins with internal inverted repeats evolve? ...... 41 A step-wise evolutionary path for internal inverted repeats...... 41 Identification and evolution of dual-topology proteins (Paper III) ...... 42 Emulating membrane protein evolution by rational design (Paper IV) ...... 43 Discussion ...... 44

Acknowledgements ...... 45

References...... 47 List of Publications

Paper I Rapp M, Drew D, Daley DO, Nilsson J, Carvalho T, Melén K, de Gier JW, von Heijne G. “Experimentally based topology models for E. coli inner membrane proteins”. Protein Science (2004) 13:937-45

Paper II Daley DO*, Rapp M*, Granseth E, Melén K, Drew D, von Heijne G. “Global topology analysis of the Escherichia coli inner membrane proteome”. Science (2005) 308:1321-3.

Paper III Rapp M*, Granseth E*, Seppälä S, von Heijne G. “Identification and evolution of dual-topology membrane proteins”. Nature Structural & Molecular Biology (2006) 13:112-6.

Paper IV Rapp M*, Seppälä S*, Granseth E, von Heijne G. “Emulating membrane protein evolution by rational design”. Submitted.

*These authors contributed equally to the work

Additional publications Granseth E, Daley DO, Rapp M, Melén K, von Heijne G. “Experimentally constrained topology models for 51,208 bacterial inner membrane proteins”. Journal of Molecular Biology (2005) 352:489-94

Drew D, Slotboom DJ, Friso G, Reda T, Genevaux P, Rapp M, Meindl-Beinker NM, Lambert W, Lerch M, Daley DO, van Wijk KJ, Hirst J, Kunji E, de Gier JW. “A scalable, GFP-based pipeline for membrane protein overexpression screening and purification”. Protein Science (2005) 14:2011-7

1 Abbreviations

3D three dimensional BLAST basic local alignment search tool ER endoplasmatic reticulum e-value expectation value GFP green fluorescent protein MP α-helical membrane protein PE phosphatidylethanolamine PhoA alkaline phosphatase SMR small multidrug resistance SRP signal recognition particle TMH transmembrane helix/segment TMHMM transmembrane hidden markov model

Amino Acids A Alanine C Cysteine D Aspartate E Glutamate G Glycine H Histidine I Isoleucine K Lysine L Leucine M Methionine N Asparagine F Phenylalanine P Proline Q Glutamine R Arginine S Serine T Threonine V Valine W Tryptophan Y Tyrosine

2 Introduction

What is life? There is no simple answer to this question. It is however clear that one of the keystones that life relies upon, is compartmentalization (reviewed in ref [1]). This means that all cells and organelles are surrounded by at least one membrane that effectively separates the inside of a cell. Simply put, compartmentalization is required to keep essential ingredients (chemicals and proteins) at a reasonable concentration and organization inside the cell, while keeping toxic chemicals outside the cell. However, for a cell to carry out vital functions it requires contact with its surrounding environment without compromising the integrity of the membrane. This dilemma has been solved by the existence of a group of “mediating” proteins, embedded within the membrane. This class of proteins is generally referred to as integral membrane proteins and comprise about a quarter of all the proteins in a cell [2]. They contain both hydrophobic parts (anchoring the protein inside the membrane) and hydrophilic parts (in contact with the surrounding aqueous environment). Due to their key location, integral membrane proteins are involved in many processes that are vital for cell survival, including bioenergetics, , ion transmission and catalyses. Not surprisingly, they are major targets of today’s medical research. Integral membrane proteins are usually divided into two separate classes, depending on how they fold in the membrane; the β-barrel and the α-helical membrane proteins (fig 1). The β- barrel membrane proteins have so far only been found in the outer membrane of gram-negative , and mitochondria. The α-helical membrane proteins (which are the main focus of this thesis and from here on simply referred to as MPs) are far more abundant and found in a wide range of membranes such as the endoplasmatic reticulum (ER), organelles as well as the plasma membranes of eukaryotes and bacteria. This thesis is mainly focused on bacterial MPs, in particular Escherichia coli MPs. However, over the past 30 years, the available information regarding MPs has been gathered from many different organelles and organisms. The sources of information might appear diverse at a first glance, but it has become apparent that many general findings, or “rules”, are applicable to the entire class of MPs in all organisms/organelles. This is of course not a coincidence, but a remnant from their common evolutionary origin, according to the endosymbiotic theory (reviewed in ref [3]).

A) B)

Figure 1. There are two classes of integral membrane proteins. A) An example of an α-helical membrane protein B) An example of a β-barrel membrane protein.

3 The biological membrane MPs exist in a hydrophobic medium, i.e. the biological membrane. It is therefore not hard to understand that the properties of the membrane influence the properties of MPs. This influence is obvious in the integration and topology, as well as in the structure and function of MPs. Different types of lipid molecules constitute the building blocks of biological membranes. Commonly the lipid molecules have a hydrophilic phosphate head group and a hydrophobic hydrocarbon tail. If a lipid molecule has a head group that is roughly equal to the size of the tail it gives it a “cylindrical” shape (fig 2A). As a consequence of this physical property, these lipids tend to form a bilayer, i.e. two leaflets of lipid molecules arranged in long sheet-like structures (fig 2B). In each leaflet the hydrophobic tails point away from the aqueous surrounding and toward the hydrophobic tails of the other leaflet. This tendency to form a bilayer structure is the property that life relies upon (i.e. compartmentalization). However, other types of lipids are also present in the membrane (Fig 2A). They may impart important properties, such as curvature, thickness, and rigidity to the membrane. Commonly the is divided into a hydrophobic core (the hydrocarbon tails) of ∼30 Å that is largely impermeable to water and solutes, and a membrane-water interfacial region (the head groups) that occupies 10-15 Å on each side of the membrane (fig 2B). However, this is a simplified and slightly misleading picture that does not correctly reflect the complexity of the membrane. Rather, if one looks from the middle of the membrane and out toward the interface regions on each side, the membrane can be viewed as a medium with a “spectrum” of varying hydrophobic and hydrophilic properties. The core of the membrane is strictly hydrophobic, but the hydrophobicity is less prominent closer to the hydrophilic head groups on each side of the membrane. Similarly, the interface region is less hydrophilic close to the hydrocarbon tail region and becomes more and more hydrophilic closer to the surrounding aqueous solution.

Figure 2. The biological membrane. A) The shape of . The left lipid has a “cylindrical” shape and tend to form lipid bilayers. The lipid on the right-hand side is “cone” shaped. This non-bilayer lipid is also present in biological membranes and is important for several reasons (e.g. induce membrane curvature). B) A biological membrane in a three-dimensional (3D) perspective. The hydrophobic hydrocarbon tails point toward each other, while the hydrophilic phosphate head groups faces the surrounding aqueous environment.

4 Biogenesis of MPs The Sec-translocon holds a central position in the life of a MP. It enables proteins to cross, or insert into the membrane (reviewed in refs [4,5]). All secreted proteins and MPs carry an N- terminal hydrophobic signal that targets them to the membrane and initiates the export process (fig 3). Secretory proteins carry a cleavable signal sequence that is recognized and targeted to the Sec-translocon by the SecB. The targeting and export process occur in a late co- translational or post-translational fashion. A few multispanning MPs, in particular those with a large extra-cytoplasmic N-terminal domain, also carry a cleavable signal sequence. However, in most cases MPs are targeted to the membrane by its first transmembrane segment (TMH), which is not cleaved-off but remains as a part of the mature protein. The hydrophobic segment is recognized by the signal recognition particle (SRP) that guides the nascent chain/ribosome complex to the Sec-translocon. In general, MPs are inserted in a co-translational fashion, where the ribosome feeds the emerging nascent chain directly into the Sec-translocon. The hetero-trimeric Sec complex is denoted SecY/E/G in eubacteria (e.g. E. coli). It is the equivalent of Sec61α/γ/β in mammals, and Sec61p/Sss1p/Sbh1p in yeast. Since the Sec- translocon is evolutionarily conserved throughout all kingdoms of life, it points toward a conserved mechanism of protein translocation and insertion in the bacterial plasma membrane and the ER (reviewed in ref [6]). The current picture of how the translocon operates is largely based on the crystallographic structure of the archaeal Methanococcus jannaschii Sec-complex [7], in combination with a vast amount of biochemical studies (fig 4) (reviewed in refs [4,8]). It suggests that even though the Sec-complex might form higher oligomers, only one complex at the time is involved in protein export. SecY form the actual protein-gateway. Its structural arrangement creates a gated pore that allows hydrophilic segments to pass through, and hydrophobic segments to move laterally into the lipid bilayer (fig 3).

Figure 3. Biogenesis of MPs. A) A ribosome is translating the mRNA of a MP. B) SRP is a GTPase that recognizes and binds to the first hydrophobic segment that emerges from the exit tunnel of the translocon. C) The ribosome/SRP/nascent-chain complex is targeted to the membrane. SRP docks with another GTPase, the Signal Receptor (SR). As a result of their GTPase activity, SRP disengages from the SR and the ribosome, and the nascent-chain is transferred to the Sec-translocon. D) Co-translational elongation of the polypeptide and membrane insertion of hydrophobic segments. Secreted E. coli proteins (and a few MPs) are exported by a slightly different route. Instead of SRP, the SecB chaperone targets the nascent chain to the Sec-translocon. The first hydrophobic segment acts as a signal sequence and is cleaved off by Signal Peptidase II (cleavage site indicated in brackets).

5

Pore

A) B) Figure 4. The current view of how the translocon operates is largely based on the X-ray structure of the archaeal Methanococcus jannaschii Sec-complex (figure adapted from [4]) [7]. The complex has a clamshell appearance when viewed from the cytoplasm. A) SecE (grey) embraces SecY (multi-colored) and forms the back of the complex. SecY displays two-fold symmetry in plane of the membrane. The N- domain is composed of TMH1-5 and the C-domain of TMH6-10 and the domains are inverted repeats of one-another. The region between the N- and C-domain forms the front of the complex and can open toward the lipid. The plug (green) on the extra-cytoplasmic side has been proposed to block the pore of the closed channel. However, the role and importance of the plug is not entirely clear [9]. B) The proposed mechanism for TMH insertion: the nascent chain enters through the hydrophilic channel pore at the center of SecY. The translocon has been suggested to “breath”, i.e. go through small but rapid motions of opening and closing, thereby allowing a hydrophobic TMH to exit between TMH2b (light blue) and TMH7 (yellow) and “” the lipid environment surrounding the translocon. Depending on the content and position of hydrophobic side-chain within the segment it may, or may not, obtain thermodynamic equilibrium in the lipid environment. The Sec-translocon together with the lipids thus forms a system that decides whether a polypeptide segment should enter or cross the membrane.

An overview of the “birth and life” of MPs The life of a MP (or any protein for that matter) starts within the ribosome. The nascent polypeptide is fed through the ribosome exit-tunnel and into the Sec-translocon (fig 5A). As the MP enters the translocon it also, perhaps almost simultaneously, encounters lipids. Through this insertion-journey a MP assumes its unique, polypeptide encoded, topology. So, if the translation and membrane insertion can be looked upon as the birth of a MP, the topology can be viewed as its childhood (fig 5B). The topology provides the foundation, upon which the mature life as a fully folded protein relies. Once the final structure is formed, the protein (or protein complex) can dedicate the rest of its molecular life to performing its cellular function (fig 5C). The “growing up” of a MP, is thus (as in human life) shaped by many different interactions: with the translocon and the lipids, the solute, cofactors, and the protein itself. For a stably folded MP, the net energetics of all these interactions results in a protein existing at a free energy minimum (by us humans it is viewed as the road of the least resistance).

Figure 5. An overview of the “birth and life” of MPs. A) The “birth” of a MP. The ribosome feeds the MP nascent-chain into the Sec-translocon. B) During insertion the protein adopts its topology: the hydrophobic parts of the protein exit laterally into the membrane, while the hydrophilic loops and tails alternate between a cytoplasmic and extra-cytoplasmic location. C) The “mature” life of a MP. The protein folds into its native structure and can now carry out its cellular function.

6 The two-stage model Although we have some clues to the important factors that influence MPs, we are still a long way away from fully understanding the many different biophysical forces that must act on a “newborn” protein to make it fold into a structure that can perform a specialized task in the cell (reviewed in refs [10-13]). The two-stage model is a useful simplification when discussing MP folding and structure (fig 6). This model posits that most transmembrane segments insert independently into the membrane and are in themselves stable entities (stage one) [14,15]. A stably inserted TMH may then interact with other TMHs to form the final structure (stage two). Although the two-stage model does not explain how the TMHs associate to form the final structure, it provides an important conceptual framework for how to investigate and discuss the different forces that act on MPs. In this thesis I have, in accordance with this model, chosen to divide the discussion on MP structural organization into two separate sections; topology and formation of the final 3D structure.

Figure 6. The two-stage model was proposed by Popot and Engelman in 1990, and is a useful simplification for MP folding. Different forces act on each level. A) Stage one depicts that the TMHs are in thermodynamic equilibrium with the lipids and aqueous phases before helix packing, i.e. the TMHs are independently stable domains. B) The TMHs fold into their native confirmation.

7

8 Membrane Protein Topology

As a general rule, MPs follow a predictable topological organization pattern. Anti-parallel TMHs are buried within the membrane and hydrophilic loops alternate between a cytoplasmic and extra-cytoplasmic location (fig 6A). A topology model for a certain MP is thus a description of the number and positions of the TMHs and the overall orientation of the protein relative to the membrane. The code to the final topology of a MP is concealed within its amino acid sequence and deciphered during the insertion of the polypeptide chain into the membrane. There are many properties of the nascent polypeptide chain that contribute to establishing the topology (reviewed in refs [11,16]).

Hydrophobicity The most prominent topological signals within the MP polypeptides are segments of ∼20 amino acids with an overall hydrophobic character. When the polypeptide chain is fed into the translocon, these hydrophobic segments form an α–helical structure (TMHs) and partition into the membrane laterally, while the hydrophilic parts remain in the cytoplasm or are translocated to the extra-cellular space.

An introduction to the thermodynamic view of insertion and α -helix formation At first glance the hydrophobic effect operating on the hydrophobic side-chains (ΔGTMH≈40 kcal mol-1) might seem to contribute with more than enough free energy for a typical TMH to insert into the membrane (reviewed in refs [17,18]). However, all amino acids are held together by peptide bonds that are energetically costly to partition into the membrane. If the peptide bonds are involved in backbone hydrogen bonding, this cost is reduced. This is the main driving force for α-helix formation, where each individual TMH has a right-handed fold with a basic 3.6- amino-acids-per-turn motif [15,19-21]. It however requires the simultaneous dehydration of the waters that are bound to the α-helical peptide backbone. This is an energetically costly affair (ΔGTMH≈30 kcal mol-1) [20]. The hydrophobic effect (ΔGTMH≈40 kcal mol-1) minus the cost of insertion (ΔGTMH≈30 kcal mol-1) leaves enough favorable free energy (ΔGTMH≈10 kcal mol-1) for inserting a typical TMH into the membrane. This is of course a simplified and general picture. Clearly the available free energy largely relies on the content of hydrophobic side-chains versus polar and charged residues, and will fluctuate from one TMH to another. The minimal constraint for the independent insertion of a stable TMH into the membrane is however that the hydrophobic effect at least counterweighs the cost of insertion.

9 The α-helix formation-site As discussed above, α-helix formation is a general constraint for a hydrophobic segment to successfully insert into the lipid bilayer. α-helix formation may occur within the translocon channel or, at the latest, at the stage when the TMH is about to exit the translocon and meet the lipids. It also seems that some secondary structure (such as α-helix formation) can form, at least in some cases, in the ∼100 Å long exit-tunnel in the ribosome (reviewed in ref [22]). This of course concerns MPs as well as soluble proteins. It appears that there are distinct zones within the ribosomal exit-tunnel where secondary structure formation is possible. Helix propensity in combination with hydrophobicity and length of the nascent polypeptide decides whether it will occur or not [23,24].

Length and hydrophobicity of TMHs As a general rule a TMH must span the membrane completely, which requires around 20 amino acids. In reality this number will vary depending on the participating amino acids, tilt angles, surrounding TMHs and the thickness of membrane bilayer. Experimental studies imply that for an independent segment to become stably inserted and successfully form a TMH, between 12 and 40 hydrophobic amino acids are required [25,26]. From statistical studies it is found that on average, 26 amino acids participate in α-helix formation [27]. This is much longer than the average helices of 12 residues in soluble proteins [27]. The length and hydrophobicity of a TMH can in itself influence its orientation in the membrane. Long and very hydrophobic segments favor translocation of the N-terminal end (Nout-Cin), while short and less hydrophobic segments adopt an Nin-Cout topology [25,28-30].

Cracking the insertion code When a MP is threaded into the translocon, continuous decisions regarding the destination of the polypeptide have to be made. The translocon has to “decide” which segments are hydrophobic enough to enter the membrane. This might seem to be an easy task. However, if it was as easy as just counting the number of hydrophobic amino acids, prediction methods should be able to identify TMHs with close to 100% accuracy. This is not the case today. Rather, the detailed insertion code has proven a hard nut to crack. A first step toward deciphering the insertion code involves obtaining a measure of the hydrophobicity for each of the 20 amino acids. Different approaches have been employed in order to obtain the “one true” hydrophobicity scale.

Biophysical and statistical Solved MP structures have been analyzed to obtain statistical data from real proteins [31,32]. Much effort has also been put into biophysical experiments, where a set-up with one aqueous phase and one nonpolar phase is used, in which the 20 different amino acids basically has to “pick sides” [17]. Today, as a result of these efforts, we have a number of useful hydrophobicity scales that, for example, have contributed greatly to the capability of computer based topology prediction methods. Although extremely useful, it is of course desirable to verify these findings in an environment more similar to the cell, where the possible influence of the translocon on TMH insertion is taken into account.

10 A biological hydrophobicity scale Through the work of Hessa et al. we have now moved one step closer to understanding the details of the insertion code [33]. By measuring the membrane insertion efficiency for all the individual amino acids in an eukaryotic ER-based in vitro system, a “biological” hydrophobicity aa scale was created (fig 7). A value of the contribution to the free energy (ΔG app) of membrane insertion was calculated for each of the 20 amino acids. Isoleucine, Leucine, Phenylalanine and aa Valine promote membrane insertion (ΔG app < 0), while Cysteine, Methionine and Alanine have aa aa ΔG app ≈ 0. All polar and charged amino acids have ΔG app > 0, and thus oppose membrane insertion.

4.0

3.5

3.0 l

o 2.5 m / l 2.0 a c

k 1.5

p 1.0 a p a a

G 0.5 ! 0.0

-0.5 I L F V C M A W T Y G S N H P Q R E K D -1.0 amino acid Figure 7. A biological hydrophobicity scale (figure adapted from [33]). The scale provides a value of the aa contribution to the free energy (ΔG app) of membrane insertion for each of the 20 amino acids (indicated with the one letter code) when placed in the middle position of a 19 Leucine/Alanine segment.

A TMH within the translocon is in close contact with lipids The output from this biological system can be directly compared with the biophysical and statistical measurements. Except for a few outliers, the scales are surprisingly similar. This is a very important observation since it implies that the environment of the peptide in biophysical experiments (a non-polar phase) is very similar to the biological surroundings of a MP (lipids). The conclusions from the work of Hessa et al [33], and from other studies [34,35], thus support the notion that a TMH within the translocon is in direct contact with the surrounding lipids and that this TMH-lipid interaction is crucial in the recognition of TMHs by the translocon in the ER.

Positional dependence within a TMH The positional dependency of the different amino acids for TMH insertion was also investigated by Hessa et al, by systematically moving a pair of residues from the center of the segment toward its N- and C-termini [33]. In summary, it was found that charged residues placed centrally had a great effect on the insertion capability of the segment, but gradually became less unfavorable when moved apart. Not surprisingly hydrophobic residues showed no strong positional dependence. The aromatic residues Tyrosine and Tryptophan oppose membrane insertion when placed centrally, but become more favorable as they are moved apart (see also section on the aromatic belt).

An algorithm based on the biological hydrophobicity scale The results of Hessa et al. [33,36], together with solved structures of MPs (for example the voltage-gated paddle of the [37,38]) have revealed that both the type of amino acid as well as its position within a TMH have to be considered when calculating the aa ΔG app. A detailed understanding of the available free energy and the positional effect for each of the amino acids will certainly boost the reliability of prediction methods. The task of converting the findings from the study by Hessa et al, to an algorithm that can be used to calculate the membrane insertion probability of naturally occurring TMHs, is in progress (von Heijne, personal communication).

11 The membrane-water interface region The lipid bilayer is not a uniformly hydrophobic entity. Different environments exist along the membrane normal, due to the properties of the lipids that make up the lipid bilayer. The hydrophobic fatty-acid chains create the hydrophobic core of the membrane. This water impermeable seal of approximately 30 Å is the preferred environment of the hydrophobic TMHs of MPs (fig 2). However, it is also important to keep in mind that the hydrophilic head groups give rise to a distinct membrane-water interface region, at the border between the membrane core and the aqueous solvent, that encompasses about 15 Å on each side of the membrane (reviewed in refs [39,40]). The membrane–water interfacial regions, of altogether ∼30 Å, thus comprise about half of the total bilayer thickness. This is where the short loops and tails of MPs commonly reside. Outside of the interfacial regions the environment is completely aqueous, and here MPs with large extra-membranous parts are subject to the same structural forces as are globular soluble proteins.

Loops and tails All MPs have polar parts as well as hydrophobic parts. The N- and C-tail, as well as the loops that connect TMHs, are all more or less hydrophilic and commonly alternate between a cytoplasmic and an extra-cytoplasmic location.

Folding of the N-tail can influence topology It has been shown that the folding state of the N-terminus can influence the insertion process. If the N-terminus folds in the cytoplasm, it cannot be translocated [41,42]. Positively charged residues present in N-tails also prevent translocation (see below). The tail-size per se does not seem to be a limitation for proper translocation [41-45], but the chance of finding positively charged residues or tightly folded domains probably increases with an increased length of the N-tail. This might be part of the explanation to why most extra-cytoplasmically located N-tails are short (commonly < 40 residues), unless they are preceded by a signal sequence [43].

The loops connecting the TMHs In general, the loops connecting TMHs are quite short (between 1-10 residues), but there exist longer loops as well as larger soluble domains. Except for a majority of positive residues on the cytoplasmic side of the membrane (discussed separately; see section on “the positive-inside rule”), the loops show no specific conserved or recognizable traits that would help distinguishing between inside and outside loops. Rather they are equally hydrophilic with mainly charged and polar residues.

Secondary structure in the interface region Recently, through statistical analyses on solved MP structures, Granseth et al. have shed some light on the secondary structures formed by the loops and tails in the rather neglected membrane-water interface region [46]. Irregular secondary structures dominate (≈70%) and are enriched in Glycine and Proline residues (known helix breakers and turn promoters, see text below). The remaining 30% was found to be helices that run more or less parallel to the membrane plane. Interestingly, β-sheet structures were almost completely absent in the interfacial region. The interfacial helices contain a larger portion of polar amino acids than do TMHs. Longer loops (>15 amino acids) more often contain helical structure than do short loops. The average length of an interfacial helix is ∼9 residues.

12 The positive-inside rule More than thirty years ago it was discovered that secreted proteins carry a cleavable signal sequence that guides them to the ER membrane and initiates the export process [47,48]. Since then, the details of the signal sequence have been largely explored, both by statistical and experimental means (reviewed in ref [49]). One crucial discovery that emerged from these studies is that positively charged residues (Arginine and Lysine) are statistically overrepresented at the N-tail and basically absent from the C-terminal part of the signal sequence [50]. This uneven charge distribution is suggested to be a major factor in determining the orientation of signal sequences. This occurs by preventing the N-tail from being translocated across the membrane and thus forcing the signal sequence into an Nin-Cout orientation (fig 8) [51].

What determines the orientation of a MP relative to the membrane? Through the study of charge-distribution in MPs with known topology, it was later found that positively charged residues were also the major topology determinant for MPs [46,52]. This uneven amino acid distribution, generally referred to as “the positive-inside rule”, thus controls which hydrophilic segments are retained on the cytoplasmic side (high number of Lysines and Arginines), and which parts are allowed to pass across to the extra-cytoplasmic side (low number of Lysines and Arginines) (fig 8). The rule applies more or less to all MPs [2,53,54] although it seems to be a stronger topological determinant for bacterial and archaean MPs, than for eukaryotic MPs [54-56].

Experimental proof of the positive-inside rule Protein engineering experiments have provided conclusive evidence for the importance of the positive-inside rule as a topological determinant [57,58]. By repositioning just a few positively charged residues in the E. coli MP Leader peptidase, an inverted form of the molecule was produced. The wild type protein has two TMHs and an Nout-Cout orientation, where the positively charged residues that affect topology mainly are located in the cytoplasmic loop. By removing these residues, and instead placing lysine residues in the N-tail, the entire protein flipped to an Nin-Cin topology. This has later been repeated and confirmed on a number of natural and engineered proteins (Papers III and IV) [28,59,60].

Figure 8. In order to define the properties of a TMH, they can be divided into three groups; signal sequences (Nin-Cout), Nin-Cout TMHs and Nout-Cin TMHs. This nomenclature can be used for the first TMH as well as when discussing the orientation of internal TMHs in multispanning MPs. Signal sequence (Nin-Cout): Its main function is to guide the protein to the translocon. The membrane- spanning segment is usually only slightly hydrophobic and can contain as little as 7 hydrophobic amino acids. Commonly the N-tail preceding the signal-sequence is quite short and contains one or more positively charged residues. The extra-cytoplasmic C-terminal end contains a cleavage site of two adjacent small amino acids such as Glycine, which results in the cleavage of the hydrophobic segment by signal peptidase II (which is loosely associated with the Sec-translocon). All secreted soluble proteins (and a few MPs) carry a signal sequence. Nin-Cout TMH: Similar characteristics as a signal sequence. However it is usually longer, more hydrophobic and lacks the cleavage site at the C-terminal end. Nout-Cin TMH: A long and hydrophobic TMH favor an Nout-Cin orientation. The N-tail is short and contains no positively charged residues. Instead the positively charged residues are located at the C-terminal end of the TMH.

13 The characteristics of the positively charged residues Histidine is positively charged at low pH and could potentially contribute to the orientation of MPs. However, Histidine is not charged at physiological pH, and is generally not considered as an important topological determinant [61]. Lysines and Arginines, however, are both strong topological determinants [60,61]. The positive charges have an additive effect; i.e. the positive- inside rule is more dominant as a topological determinant if more positively charged residues are present on the cytoplasmic loops and/or tails [58,60,62].

Topology of multi-spanning MPs The first TMH of multi-spanning MPs targets the protein to the membrane. In addition it carries topogenic signals that determines its membrane topology. However, as a general rule the first TMH do not determine the topology of the entire multispanning MP [52,56,63-65]. Usually the following TMHs also carry strong topological signals (see for example [64]). This is apparent from reports on topological frustration [52,56,63]. This means that a multi-spanning MP with conflicting positively charged residues engineered into the loops, rather leaves one TMH out, than follow a sequential insertion pattern where the first TMH dictates the topology of the entire [56,63,66]. Although there are reports claiming that the first TMH dictates the orientation of the entire multispanning MP, this seems to be limited to a few eukaryotic MPs and artificially produced MPs [67].

The positive-inside rule holds only for short loops The positive-inside rule seems to be valid only for loops that are relatively short (less than ~60 residues) [54,68]. This is due to the different ways by which long and short loops are exported across the membrane (at least in E. coli). The translocation of long loops, as well as extra- cytoplasmic proteins, is dependent on ATPase activity of the cytoplasmic SecA (reviewed in ref [5]) [68-70]. N-terminal tails however seem not to be dependent on SecA, regardless of their lengths [71]. When translocated by the ATPase dependent mechanism, the positively charged residues are not “sensed” and thus cannot act as topological determinants. Indeed, the frequency of positively charged residues in loops exceeding 60 residues is similar to that found in secreted proteins [52,72].

Calculating the over-all charge bias In conclusion, the sidedness of a single- or multispanning MP can be determined by the net- charge difference of the protein (i.e. the (K+R) bias), counting all the positively charged residues present within the loops and tails that are shorter than 60 residues. One then simply counts all the positively charged residues (Lysines + Arginines) in the uneven loops (the N-tail is counted as number one) and subtracts the number of positively charged amino acids in the even loops. If this calculation yields a positive number the protein most likely has an Nin orientation, while a negative number indicates an Nout orientation.

Is there a “negative-outside rule”? As proof emerged regarding the importance of Lysine and Arginine residues in directing the sidedness of MPs, a complementary idea surfaced. It was suggested that the negative amino acids Aspartate and Glutamate could also have an influence on the orientation of MPs. It was proposed that positively and negatively charged amino acids might work in concert; the positively charged amino acids forcing the loop or tail to stay on the inside, while the negatively charged residues would be drawn to the outside, in a fashion comparable to electrophoresis. If then both positively and negatively charged residues are counted over the most N-terminal TMH, they would yield a net charge difference that, at least in eukaryotic MPs, has been suggested to correlate with its orientation [55,73]. Although much work has been put into proving this “charge difference rule” [74-76], there is no conclusive evidence for such a mechanism [58,60,61]. If indeed negatively charged residues had such a strong topological effect, one would expect that their distribution would be skewed toward the loops on the extra-

14 cytoplasmic side. However, no such “negative-outside” bias has yet been detected [46,54]. It therefore seem as negatively charged residues only have a very small, if any, effect on the orientation of MPs, at least in Sec-dependent translocation.

How can we explain the positive-inside rule? It is still not completely clear how positively charged residues exert their influence on MP topology. At least three separate possibilities have been suggested to be responsible, or to contribute to the phenomenon.

Anionic phospholipids One attractive theory suggests that the anionic phospholipids prevent membrane passage of positive charges [77]. Anionic phospholipids are present in all membranes and implicated in several functions, including the electrostatic binding of positively charged residues in protein domains and amphipathic helical peptides (reviewed in ref [78]) [79,80]. The positively charged amino acids of MPs has been suggested to, in a similar fashion, be retained in the cytoplasm by electrostatic binding to the negatively charged phospholipid head groups at the cytoplasmic membrane surface.

The membrane electrochemical potential

A second proposition involves the membrane electrochemical potential (ΔµH) [81]. The extra- cytoplasmic side is positive and acidic, while the cytoplasmic side is negative and basic. According to this theory, the positive and acidic extra-cytoplasmic environment is expected to oppose the translocation of positively charged residues. This however only applies to membranes that possesses a ΔµH+; i.e. the bacterial inner membrane, the mitochondrial inner membrane and the thylakoid membrane [53,82,83]. The ER membrane, in contrast, is probably devoid of a . Hence this hypothesis can provide only a part of the explanation, since ER MPs also follow the positive-inside rule. Moreover, if the ΔµH+ was the only factor involved, negatively charged residues would be expected to promote translocation, but no such role for these residues has been established (see above).

Charged residues within the Sec-translocon The final suggestion is that residues within the Sec-translocon are responsible for the positive- inside rule by either attracting or repelling positively charged residues during the translocation of MPs [84,85]. This is a very appealing theory since the Sec-translocon is one of the most conserved protein complexes throughout all three kingdoms of life. To investigate this proposal mutagenesis on Sec61p of yeast was carried out, and indeed three charged residues (two Arginines and one Glutamate) were found to affect the topology of the tested MPs [84]. However, at least two facts speak against these residues as the sole mechanism behind the positive-inside rule. First, the important residues are not, contrary to what would be expected, conserved in all Sec-. Second, the cell is still viable even though all three charged residues are simultaneously mutated [84].

Taken together it seems that all the mechanisms proposed above contribute to some extent to the positive-inside rule. There is of course also the possibility that different solutions to the mechanism operate in different organisms or that the main driving force is still to be discovered.

Exceptions to the positive-inside rule Not all mitochondrial inner MPs conform to the positive-inside rule [82]. It has been shown that some nuclear-encoded MPs do not have a high content of positively charged residues on the parts exposed to the matrix, but instead have a skewed distribution of Glutamate residues on the loops facing the inter-membrane space. This suggests that some nuclear-encoded MPs utilize a different and, as of today, unexplored mode of insertion into the mitochondrial inner-membrane.

15 The ‘snorkel’ effect It is tempting to consider all TMHs as hydrophobic entities. But this is not entirely true. Except for the obligatory hydrophobic amino acids, a TMH might also contain some polar amino acids, or even one or more charged residues. This is possible, at least in part, due to the snorkel effect [86-88]. This means that a side chain of a charged residue that is buried within the hydrophobic membrane core can adjust its conformation such that it reaches, or snorkels, toward the interfacial or aqueous regions. Snorkeling residues thus seem to experience a force that pushes them away from the membrane core. In contrast, some amino acids (in particular the hydrophobic ones) are subject to an anti-snorkeling effect, where the side-chains instead point toward the hydrophobic core. As a consequence, the precise positioning and orientation of a TMH in the membrane is influenced by snorkeling residues, which interact with different parts of the lipid bilayer.

Side chain length is important in snorkeling The length of the side chain contributes to the efficiency of a particular amino acid to snorkel. This could explain why the very long and flexible side chains of Arginine and Lysine have a stronger propensity to snorkel compared to Glutamate and Aspartate. In addition, the positive charge of the Arginine and Lysine side chains that preferably should interact with the negatively charged lipid head-group, might also contribute.

Preference for a certain end of the helix Recently, it has been demonstrated that snorkeling amino acids prefer either the N- or C- terminus of a helix, due to steric restrictions in their side-chain conformations [86]. Most snorkeling amino acids prefer the N-terminus (Arginine, Lysine, Aspartate, Glutamine), since the Cα-Cβ bond in helices point more toward this end [86]. Tyrosine on the other hand is strongly biased toward the C-terminus [86].

Snorkeling pattern is reversed in loops When Tyrosine and Tryptophan residues are present in loops located to the interfacial region, their snorkeling pattern is reversed. Here, the aromatic side-chains instead prefer to point toward the hydrophobic core, as do hydrophobic amino acids (anti-snorkeling). Polar residues however, seem to prefer to interact with the surrounding aqueous solution and hence point away from the membrane core. The positioning of different amino acids in loops most likely plays a critical role in securing interfacial helices to the membrane, in a similar way as for amphipathic helical peptides that reside in the membrane-water interface region [79,88].

The aromatic belt It is common to find aromatic residues in the interfacial regions of TMHs. This is an interesting example of uneven amino acid distribution in MPs, often referred to as the “aromatic belt” [33,46,54]. Only Tryptophan and Tyrosine (not Phenylalanine) exhibit this preference for the interfacial region [46,89]. The biological relevance of this belt is not entirely understood, but it seems to play an important role in the anchoring and positioning of individual helices, and perhaps also entire MPs, in the lipid bilayer [35,90]. Furthermore, it seems that the Tyrosine and Tryptophan residues are not just loosely, but actually rather tightly anchored to the interface [39]. This can have implications for TMH positioning, tilting and rotamer preferences.

16 Why do Tyrosine and Tryptophan align with the membrane interface? More than one explanation has been proposed for the physical and/or chemical basis for the strong preference of aromatic residues to reside in the membrane-water interface region. The Tyrosine side chain is a six-membered aromatic ring with an –OH group attached. Tryptophan has two aromatic rings that are fused into one large hydrophobic ring-structure. Phenylalanine, although aromatic, is completely hydrophobic, and is found in the transmembrane part rather than the interfacial parts of MPs. The classical explanation for the preference of Tyrosine and Tryptophan to reside in the interfacial regions is their dipolar character. The side chain must simply seek a compromise. This can be achieved by burying the aromatic ring close to, or within, the hydrophobic core, while the hydrophilic part can interact with the polar lipid head-groups at the interface. Other factors such as the aromaticity, size, rigidity and shape of Tryptophan, rather than its dipolar character, has also been suggested as the primary reasons for its interfacial preference [91]. Perhaps, as suggested by You et al., it is a balance of all these forces that explains the interfacial preference of Tyrosine and Tryptophan [91].

17 Topogenic determinants work in concert So far I have described separately the different factors that contribute to the topology of MPs. In reality however the topology of a MP is the sum of all the topological determinants present within a particular protein.

What contributes to the orientation of the signal sequence? Commonly, the first TMH of MPs acts as a targeting signal so that the ribosome and the nascent chain are brought to the Sec-translocon. Recently, Goder and Spiess demonstrated that the first TMH of MPs inserts “head-on” (N-terminus first) into the translocon [84]. It was further suggested that the hydrophobicity of a TMH has a very strong effect on the time that it spends inside the translocon. The more hydrophobic the TMH, the shorter the time it will spend inside the translocon before partitioning into the membrane, which leaves no time for N-terminal re- orientation. Thus a very hydrophobic TMH will simply move laterally into the membrane in an Nout-Cin orientation. However, positively charged residues at the N-terminus can override the “hydrophobicity signal” and vice versa. A less hydrophobic TMH is indecisive in its character, since the thermodynamics for membrane insertion are less clear. This indecision delays the membrane insertion and leaves enough time for the N-terminus to re-orient and insert with an Nin-Cout orientation. Positively charged residues at the N-terminus make the signal even more clear-cut.

Helical hairpins MPs sometimes form tight turns between two closely spaced TMHs. These are commonly referred to as helical hairpins. It has become apparent that turn propensities differ somewhat between globular proteins and MPs. Therefore a specific turn propensity scale for MPs was developed [26,92,93]. In short, hydrophobic residues do not induce a turn, while charged or highly polar residues do (to avoid the hydrophobic interior of the membrane). Furthermore, it was found that the two classical helical breakers, Proline and Glycine, both induce turns, although Proline is more effective as a turn promoter. It was also shown that it takes two or more amino acids with turn propensities to make a turn on the cytoplasmic side, compared to only one residue on the extra-cytoplasmic side [94]. Further, we have observed that helical hairpins seem to constitute one of the main building blocks in MPs (Paper II). The most common theme is that MPs have an even number of TMHs and their N- and C- terminus in the cytoplasm. This observation implies a recurring pattern of helical hairpins with extra-cytoplasmic turns.

Where is a helical hairpin formed? MP topology is determined locally and only influenced directly by the part of the nascent chain that is present within the ribosome-translocon channel when the segment integrates into the lipid bilayer. This means that helical hairpin formation most likely is closely coupled to the co- translational insertion of MPs into the Sec-translocon, and if viewed from the perspective of the Sec-translocon, looked upon as one folding unit [95]. Some marginally hydrophobic or very short TMHs might require the aid of a preceding TMH, in order to become integrated into the membrane [65,96,97]. In such cases, the loop separating the two TMHs has to be quite short. This strengthens the notion that two TMHs can be present within (or closely associated to) the translocon, so that the first one can facilitate the integration of the second TMH. It also shows that in some cases the topology and the final structure are closely coupled. In addition, recent data suggest that a pair of properly placed polar residues (Asparagine/Asparagine or Aspartate/Aspartate) within a long hydrophobic segment can, by inter-helix hydrogen bonding

18 (see below), promote helical hairpin formation [98,99]. This observation suggests that interhelical hydrogen bonds can form already in the translocon and agrees with the theory that the nascent polypeptide chain is in close contact with lipids. This protein-lipid contact is thus critical in the decision regarding membrane integration of individual TMHs and/or helical hairpins.

A joint translocon/lipid system deciphers the unique topology of a MP As discussed above, the topology of MPs is concealed within the protein sequence (fig 9). Hydrophobic stretches fold into α-helices and integrate into the bilayer, either independently or as helical hairpins. The membrane integration efficiency is influenced by the type and position of amino acids along the TMH axis. Charged and polar residues are better tolerated at the ends of a TMH, near the interface region where their side chains can snorkel and interact with the phospholipid head groups. Positively charged residues on the hydrophilic loops and tails favor the cytoplasmic side of the membrane and thereby exert the main influence on the sidedness of the protein. The joint task of the translocon and surrounding lipids, is to continuously decipher each part of the peptide until the final topology of the entire MP is obtained. Due to the precision by which the peptide-encoded topological determinants are interpreted by the lipid/translocon system, all proteins of the same type end up with the same topology (except in a few special cases, see Papers II, III and IV). Once integrated (and perhaps even during integration), aromatic residues are important to anchor and position interfacial- or transmembrane helices within the membrane. Together these residues form a belt on each side of the MP that more or less delineates the boundaries of the membrane bilayer.

N Extra-cytoplasmic side ∼20 hydrophobic residues Aromatic residues; Tryptophan, Tyrosine =Positively charged residues; Arginine, Lysine cytoplasm C (K+R) bias=+4 Figure 9. The code to the final topology of a MP is concealed within its amino acid sequence and deciphered during the insertion of the polypeptide chain into the membrane. There are many properties of the nascent polypeptide chain that contribute to establishing the unique topology of a MP. Three properties are particularly important; hydrophobic segments, the location and number of positively charged residues and aromatic residues.

19 The evolution of membrane proteins Sequence analyses has revealed that there are more than 4000 MP families [100]. However, there are only a few very large families and many very small families. For example, the mean numbers of members in the 20 largest families is 1300, while the next 20 families only have an average of 340 members. The 20 largest families include signaling proteins (e.g. GPCRs), transporter families (e.g. major facilitator family and ABC-transporters) and proteins involved in energy transduction (e.g. cytochrome b). An interesting but largely unexplored area of science concerns the structural evolution of MPs. Although much work remains before we have a complete understanding of this subject, some basic features have been established. For example, it has been found that covalent domain recombination, a mechanism often used by evolution in soluble proteins to connect evolutionary unrelated domains, is quite rare in MPs [101]. Instead MPs seem to prefer to form oligomeric complexes that are attached by non-covalent forces in order to produce evolutionary diversity. It seems as the most common way by which MPs evolve in size and complexity is through gene duplication events, where either the entire gene, or a part of the gene, becomes duplicated [102]. The newly “evolved” MP will thus be made up of partial sequence repeats that correspond to the duplication of some, or all, of the TMHs in the original MP. For example, as previously noted, a recurrent theme seems to be events where internal duplications of a part of a gene results in the addition of two TMHs, i.e. a helical hairpin. A deeper understanding of the way that MPs evolve may be of great help in the prediction of the topology, structure and function of MPs.

Internal repeats One common evolutionary process occurs when an internal gene duplication event results in two halves of the protein having the same topological and structural arrangement (fig 10A). Although it is not always possible to detect this on a sequence level, it is usually quite obvious when studying the 3D structure. This “internal repeat model” seems to fit the major facilitator superfamily of secondary transporters, as nicely illustrated by the recent crystal structure of LacY [103]. LacY has 12 TMHs that are arranged in two pseudo-symmetrical 6-helix bundles. The proteins AcrB [104], GlpT [105] and EmrD [106] are additional examples of MPs that conform to this structural arrangement.

Internal inverted repeats Another common theme noted from protein structures is the occurrence of internal inverted repeats, i.e. the protein consists of two halves with opposite topologies that are related by a two- fold symmetry axis in the plane of the membrane (fig 10B). Internal inverted repeats seem to be quite common in channels and transporters. Examples include the Na+/Ca2+ exchanger [107], SecY [7], the [108] and the glycerol and water conducting family of channels [109,110]). How did the internal inverted repeats evolve? This question is addressed in Papers II, III and IV.

Figure 10. A) Internal repeats. Due to an internal gene duplication event, this protein consists of two halves with the same topological arrangement (TMH 1-6 + TMH 1´-6´). B) Internal inverted repeats. The protein consists of two halves (TMH 1-3 and TMH 1´-3´) with opposite topologies that are related by a two-fold symmetry axis in the plane of the membrane.

20 Membrane Protein Structure

A certain MP has a specific topology in the membrane, as concluded in the section above. However, in reality the protein is not a flat entity, as commonly displayed in a topology model. Rather the topology is organized into each MP´s unique tertiary structure, which is the actual functional unit of the protein within the cell. Immediately after (and probably during) insertion it begins its search for the stable and native conformation. In addition, some topological rearrangements, such as the insertion of re-entrant loops, occur down-steam of the Sec- translocon mediated insertion. Unlike the case of soluble proteins in which the hydrophobic effect dominates the folding, it is still not possible to point to a single force that dominates in the pathway to the folded state for MPs. Instead at least three types of interactions must be considered in MP folding: interactions within the protein itself, protein-lipid interactions, and lipid-lipid interactions. Most likely there is a subtle balance between many forces (such as van der Waals interactions, hydrogen bonding, and hydrophobic matching) that make, and keep the protein in a folded state. Although we are still missing the complete picture of how these forces work in concert, we seem to be gaining a better understanding of the contributing factors (reviewed in refs [11-13,17]).

Intrahelical interactions

α-Helical distortions Many, if not most helices in MPs contain bends and irregularities [10,111]. These distortions can cause a helix to bend more than 30° and lead to deviations in side-chain placement [111]. So although α-helical distortions are an intrahelical factor, they doubtlessly have an affect on interhelical relations. Irregular helices broaden the structural diversity and flexibility of MPs, compared to if they only contained straight and rigid helices. The importance of helix distortions is manifested by the many examples where they have functional and structural implications in MPs. Depending on the type of distortion, it is convenient to divide them into four separate categories; Proline induced kinks, non-Proline distortions, π-helix turns and 310-helices. Importantly, there are now prediction tools available, where these distortions can be identified from the amino acid sequence alone [112,113].

Proline-induced kinks The TMHs of MPs contain many Prolines, as compared to α-helices in globular proteins [114,115]. The Proline residue has a rigid structure, in which the side chain (a five-membered pyrrol ring) loops back to the amide nitrogen. Within an α-helix, the structure of Proline causes a steric clash with the preceding residue and the loss of a backbone hydrogen bond [115]. When a Proline residue is present in a TMH, its inflexible nature thus pushes the TMH into structural adaptations. Either it can introduce a kink in a TMH, or alternatively, if the hydrophobic segment is sufficiently long, it can promote the formation of a helical hairpin. Although Prolines are difficult to accommodate within a TMH, their unique conformational property offers structural diversity [116]. Hence, there are many reports where Proline induced kinks are suggested to be critical for the proper structural stability and/or function of MPs

21 (reviewed in ref [117]) [118]. A recent study also suggested Proline to be important in preventing off-pathway folding of MPs [119].

Most non-Proline distortions originate from Prolines The majority of TMH distortions (≈60%) occur at Proline residues. This leaves ≈40% of non- Proline deformations whose origin is less obvious. Recently, Bowie and coworkers proposed that a majority of these non-Proline kinks in fact originate from Proline residues [113]. The study shows that the presence of a Proline residue leads to many adaptations of the residues that surround the kink. During evolution the Proline residue might be lost (at least in some homologs), but the kink is conserved within the helix by the aid of surrounding residues. By performing sequence comparisons of homologs, Bowie and coworkers showed that almost all TMHs that contain a kink originally contained a Proline residue. This finding makes Proline, by far, the most common reason for helical distortion and it is suggested that as much as 90% of the kinks are in fact derived from Proline-induced kinks.

π-helix turns A normal α-helix has a folding pattern with H-bonds between residues n ↔ n + 4. There are however examples of TMHs in MPs that deviate from the ideal α-helicity. One example that can be found in several high resolution structures is the π-helix turns (also called π-bulge, looping out or α-aneurysm) [120,121]. The π-helix turn is a local deviation produced by a n ↔ n + 5 pattern of the main chain hydrogen bonding. These “wide” turns provide, for example, an opportunity for polar interactions between TMHs by freeing a C=O group for hydrogen bonding.

310-helix

The 310-helix is another example of a deviation from the ideal α-helical structure and has a local n ↔ n + 3 H-bonding pattern. Examples of MPs that contain such “tight” turns include and [122].

Interhelical interactions The most common approach to study MP folding and stability is to divide the structure into pairs of interacting helices. These helix pairs, together with the loops and soluble domains, form the overall structure. The helix-helix packing, and the native structure, may be influenced by the way that they are inserted and handled by the translocon (e.g. helical hairpins) or by a number of other factors, of which some will be outlined below.

Attractive forces that may influence MP structure formation

Van der Waals interactions Van der Waals interactions are the only attractive force for nonpolar molecules (such as the side chains of hydrophobic amino acids) and are therefore considered a dominant force in the interaction of TMHs in the hydrocarbon core region of the bilayer. In fact, van der Waals packing has been suggested to be the dominant force that drives MP folding [17,21]. Indeed there are examples of interacting TMHs that are built only of hydrophobic amino acids and thus suggested to be stabilized solely by van der Waals interactions [123].

Hydrogen bonds Simply put, a hydrogen bond occurs when two polar residues with side-chains that contain electro-negative oxygens/nitrogens share a (electro-positive) hydrogen atom. Polar residues in soluble proteins are commonly involved in hydrogen bonding. A hydrogen bond can also be very stable in the apolar environment of the membrane. In spite of this, polar residues are only

22 rarely seen (about 1/25 residues) in TMHs, mainly due to the large energetic cost of membrane insertion. However, when present, they are often critical for the proper function of the protein, or for binding of prosthetic groups. The structural contributions of polar residues are less well understood. It has been suggested that interhelical hydrogen bonds can promote helix associations, at least in the absence of specific packing interactions [99,124-126]. Apart from its potential contribution to MP folding and stability, the ability of polar residues to form strong interhelical associations is a potential danger to a MP. Unspecific interactions may hinder the protein in forming its native structure. Indeed, mutations involving polar residues (in particular Arginine) are often associated with genetic diseases due to protein malfunction [127].

Weak hydrogen bonds In the last few years the “non-conventional” CαH…O hydrogen bond has gained an increasing amount of attention, since it has been implicated to contribute to protein interactions [128,129]. These non-conventional hydrogen bonds are quite weak (less than half the strength of a conventional hydrogen bond) but are suggested to be important in MP stability and specificity, especially when several CαH…O interactions are coordinated at a single interface. For example, the close packing that is promoted between TMHs by the GxxxG motif (see below), has been implied to promote backbone CαH…O hydrogen bonding between helix pairs [130]. Its contribution to structural stability is however still controversial [131].

Salt bridges A salt bridge is an interaction between the side chains of two amino acids with opposite charges. How common are these interactions in MPs? Charged residues are rare (but not absent) in TMHs within the apolar core of the membrane, due to the large energetic cost of membrane insertion. Charged residues are better tolerated in the interfacial region where they tend to snorkel and interact with the lipid head groups and water rather then each other. Nevertheless, salt bridges have been reported (see for example refs [132-134]), and can be critical for the function of the protein.

Disulfide bridges Two Cysteines can, in an oxidative environment, form a bond by the removal of two hydrogens. In vivo this reaction is catalyzed by enzymes and occurs only in an environment that can provide suitable redox conditions, such as the ER of eukaryotes and the periplasm of gram- negative bacteria (reviewed in refs [135,136]). Many soluble proteins that reside in these locations and some extra-cytoplasmic loops of MPs contain disulfide bridges. However, disulfide bridge formation rarely occurs inside the apolar core of the membrane.

23 Topological constraints Since a MP has already assumed a certain topology in the membrane it is, even before folding, more structurally constrained than an unfolded soluble protein. Apart from the fact that the proteins are fixed in a certain orientation, to what extent does the topology contribute to the final structural arrangement of MPs?

The role of interhelical loops in the folding and stability of MPs One could envision a critical role for the soluble loops in governing the correct helix-helix associations. However, experiments on bacteriorhodopsin [137], and a number of other MPs [138-140], show that many TMHs can be expressed individually and still assemble and form a functional protein. This indicates that the loops in many cases can be dispensable for the proper assembly of TMHs. On the other hand, loops are certainly vital for the formation of certain helix-helix associations, such as helical hairpins and when cooperation of TMHs is required for proper membrane integration. However, loops do not seem to have a general and predictable type of structural influence.

The role of the helical backbone dipolar moment in packing of MPs TMHs have a backbone dipole moment (the N-terminal being positively charged while the C- terminal end is negatively charged) that could potentially contribute to a preference for anti- parallel helix-helix packing in multi-spanning MPs. An early study by Bowie on helix-helix packing preferences, indeed revealed a pattern for TMHs to always pack in an anti-parallel fashion against sequence-neighboring TMHs [141]. However, since neighboring TMHs always are anti-parallel, it raised the question whether this preference was due to a preference for anti-parallel packing or if it simply was a consequence of the topology of MPs. A theoretical study by Ben-Tal and Honig suggested that these dipole- dipole interactions are quite weak due to the interactions with the surrounding solvent and counter ions, and thus would not contribute greatly to the packing of MPs [142]. Statistical analysis of solved MP structures seems to support these results, since they indicate that anti- parallel helix-helix interactions are only common within MP subunits and not between subunits [143]. However, the nature and importance of the dipole moment is still disputed. Recently, Killian and coworkers utilized a biophysical approach, in combination with computer modeling [144]. Their results suggest that the helix dipole moment might be a weak force, but it is strong enough to promote anti-parallel association, at least in the absence of stronger helix association forces.

24 Re-entrant loops Over the last few years it has become clear that so called re-entrant loops are present in many of the solved MP structures (>20%). A re-entrant loop is a fragment of irregular- and/or α–helical structure that penetrates the membrane, but then turns around half way through the membrane and exits on the same side of the membrane as it entered (fig 11). Recently, Viklund et al. have performed a structural classification of the re-entrant loops in order to predict these regions from sequence alone [145]. They found that the composition of amino acids in the re-entrant loops could be described as intermediate between the hydrophilic loops of the interface regions and the hydrophobic TMHs, but with a notable abundance of small residues (in particular Glycine and Alanine). Moreover, re-entrant loops seem to be most common in channel- and transporter proteins. It seems as if the re-entrant loop often has a functionally important role. Hence, it is often strongly conserved among homologs. KcsA, NHE1, SecY, the ClC chloride channel and are a few examples of MPs that contain this interesting structural feature [7,108,146,147]. It has been predicted that the fraction of MPs with re-entrant loops in a typical genome is 10%, and probably higher [145].

Figure 11. Two separate modes of insertion have been suggested for the membrane integration of re- entrant loops (figure adapted from [148]). A) The re-entrant loop is translocated through, but kept in the membrane. This insertion mechanism originates from a study on Na+/H+ exchanger isoform 1 [149]. B) Membrane integration of re-entrant loop can occur in a “post-insertional” fashion. First, the TMHs insert by the aid of the Sec-translocon. During this stage the re-entrant loop is exposed to the extra- cytoplasmic side. As a next step the MP assumes its final topology and structure, where the re-entrant loop is inserted. This insertion mode is based on a study of K+ channels [148].

25 Helix packing models Each individual TMH is always right-handed with a basic 3.6 amino-acids-per-turn pitch. Helix- helix pairs can form both right- or left-handed helix-helix interactions [150]. A left-handed TMH pair can form a coiled coil structure that allows contact between the helices every 3.5 amino acids with a crossing angle of ∼20 °. This results in a repeat every 7 amino acids called the heptad repeat. A right-handed crossing on the other hand commonly has an angle of ∼40° and promotes contacts every ∼4.0 amino acids [151]. Left- and right- handed crossings lead to different types of ways for the helices to approach and pack against one another, as discussed above. This is commonly explained by certain packing motifs/models. Two of the most common ones will be briefly outlined below.

Knobs-into-holes (left handed crossing angles) Many helix-helix pairs in MPs exhibit an anti-parallel left-handed crossing angle (in accordance with the heptad repeat). This crossing pattern is explained as a “knobs-into-holes” type of side- chain packing. Small residues (such as Glycine, Alanine and Serine) are commonly located at the helix-helix interface [150,151]. This type of packing allows close contact between the two helices and is stabilized by van der Waals forces. In a recent study by Walters and DeGrado the knobs-into-holes model is suggested to apply to a majority (∼30%) of helix pairs, and thus represents the most common packing motif in MPs [143].

The GxxxG motif (right handed crossing angles) Extensive work on the homo-dimerization of the single-spanning MP glycophorin A, has revealed the GxxxG motif as a fundamental mediator of helix-helix association (reviewed in ref [152]). When a GxxxG motif is present in a TMH, the two small Glycine residues are separated by only one turn and thus end up on the same side of the helix. This creates a groove on the helix surface, which can function as a contact surface for another helix with the same motif. The motif thus allows two anti-parallel TMHs to approach each other very closely with a right- handed crossing angle of approximately 40° that is stabilized by van der Waals interactions. There seem to be many variations to this fundamentally important motif. For example, an extension of the motif, the so-called Glycine zipper (GxxxGxxxG), and the GG7 motif (GxxxxxxG) are both statistically overrepresented in TMHs [153,154]. Moreover, it appears equally common that the GxxxG motif occurs in only one of two associated helices [143]. In support of its propensity to promote helix interaction, the GxxxG motif occurs more frequently in TMHs than the random expectation, especially in transporters, symporters and channels [155]. In fact, Walters and DeGrado have suggested that this parallel packing motif, or derivations thereof, are involved in as much as 13% of helix pairing in MPs [143]. In addition, an antiparallel version of the motif has been suggested to be even more common (16%) in MP helix packing [143].

26 Protein-Lipid interactions Above, different types of intra-and inter-TMH forces have been outlined that may contribute to folding, stability and function of MPs. It is however vital to keep in mind the possible influence of the medium that surrounds MPs in shaping MP structure (reviewed in refs [156-158]).

Curvature stress A lipid needs to have a cylindrical shape (the head group size that is roughly equal to the tail size) in order to form a lipid bilayer (fig 2). However, membranes are not just composed of cylindrically shaped lipids. Cone-shaped lipids have, as the name implies, larger tails than head groups and are present in all membranes in large amounts (fig 2). These non-bilayer lipids can only be organized into a bilayer if mixed with bilayer-forming lipids. However, this mixing leads to over-packing in the tail region and under-packing in the head region, creating a driving force within the individual bilayer leaflets to curve away from one another. This so-called curvature stress results in a higher lateral pressure in the middle of the membrane than in the interface region. A MP can relieve some of the stress by adopting an hourglass shape. Consequently, the curvature stress may affect MP structure. Indeed, analysis of a large number of MPs with solved structures, indicated a “waist” near the center of the membrane [46].

Hydrophobic mismatch Hydrophobic matching occurs when the hydrophobic core of a TMH perfectly fits the thickness of the hydrophobic core of the lipid bilayer. A thermodynamically troublesome situation occurs if the hydrophobic part of a TMH is too long or too short to be comfortably accommodated within the lipid bilayer. This situation is commonly referred to as hydrophobic mismatch, and can be further separated into a positive mismatch (when the TMH is longer than the hydrophobic core of the lipid bilayer) or a negative mismatch (when the TMH is shorter than the hydrophobic core of lipid bilayer). If mismatch does occur, the TMH and/or the hydrophobic core of the lipid bilayer must respond by making one or more adjustments, in order to minimize the energy of the system. The TMH can handle a positive mismatch by increasing the helix tilt angle, inducing a helix bend (or a combination of the two) or by promoting helix- helix interactions, thus reducing the hydrophobic area exposed outside the membrane [40,144,159]. A TMH may respond to negative mismatch by adopting a surface orientation rather than a membrane-inserted state [160], or by promoting helix-helix interactions [144,161]. The hydrophobic core of the lipid bilayer is suggested to be able to respond to mismatch by becoming thicker or thinner [162,163]. This response may be accomplished either by acyl chain ordering (in positive mismatch), or acyl chain disordering (in negative mismatch). However, there are also studies that suggest that thickness adaptation of the lipid bilayer is not likely to occur in living cells [164]. All in all, the total contribution of hydrophobic mismatch in shaping MP structure is not entirely clear [163].

27 Specific protein-lipid interactions High-resolution structures, as well as biochemical studies have revealed that specific protein- lipid interactions are critical for the proper function of several proteins. These lipids are sometimes referred to as co-factors (or non-annular lipids) and often bind so tightly that they co-purify with the protein. Examples of MPs with associated lipids include the KscA-channel [165] and cytochrome c oxidase [166].

Lipid-dependent topogenesis of multi-spanning MPs Dowhan and co-workers have published a number of papers regarding the influence of the lipid composition of membranes on MP topology [167-170]. The major finding is that the zwitterionic phospholipid phosphatidylethanolamine (PE) is required for the proper topological organization of LacY, PheP and GabP. Based on these observations PE is suggested to be an important topological determinant. PE is one of the most abundant (75%) lipids in E. coli. It is a non-bilayer prone lipid with a cone shape appearance that induces curvature stress. In short, the studies show that in the absence of PE, the first part of the protein adopts a non-native topological conformation. However, even after topology formation, these proteins seem to be able to re-adjust their topology in response to the addition of PE. It thus seems as if the proteins can sense and adjust to alterations in the composition. This suggests that topological adaptation is a more dynamic process than previously thought. The reason for the requirement of PE for proper topological assembly is not well understood.

28 Escherichia coli as a Model Organism

Cell architecture E. coli is a rod-shaped bacterium with the length of approximately 3 µm and a diameter of 1 µm (fig 12). All gram-negative bacteria are, including E. coli, enclosed by two distinct membranes: the outer membrane and the inner membrane. The α-helical class of membrane proteins are found in the inner membrane and are commonly referred to as inner membrane proteins, while the β–barrel class of membrane proteins are only found in the outer membrane and are called outer membrane proteins. The two membranes are separated by the periplasmic space, a gel-like and oxidative environment, in which soluble proteins and the peptidoglycan layer reside. The circular chromosome (and plasmids) is located in the cytoplasm.

Figure 12. Cross-section of an E. coli cell. The α-helical class of membrane proteins are found in the inner membrane, while the β–barrel class of membrane proteins are only found in the outer membrane.

The K-12 genome Most studies involving E. coli are performed using the K-12 strain. This strain was discovered early in “biotech” history (1922), which perhaps is one reason for its spread to different laboratories around the world. As is commonly known, some E. coli strains are pathogenic and can cause grave problems in the intestinal tract [171]. In addition there also exist pathogenic extra-intestinal strains. The commercially available K-12 strain is however non-pathogenic since it has been cured from the DNA that encodes the virulence factors. K-12 strain MG1655 was one of the first organisms to be fully sequenced and has become the principal laboratory E. coli strain. It contains ∼4300 predicted genes that code for proteins. Bioinformatic analysis of the MG1655 proteome suggests that about 1000 of the predicted genes encode inner membrane proteins with at least one TMH [172]. Approximately 770 proteins are predicted to contain two or more TMHs. So far, only about 60 genes have been found to encode outer membrane proteins.

29

30 Experimental Approaches to Reveal MP Structure

3D structural models

There are a few choices regarding techniques that one can utilize in order to obtain high- resolution structural information of MPs. The most common ones are X-ray crystallography and electron cryo-microscopy. Structural information in combination with classical protein biochemistry can bring the understanding of a protein to an atomic level. Unfortunately, it has proven to be an extremely difficult task to obtain detailed 3D structural information of MPs. Today, there are only a little more than 100 unique structures of MPs available (fig 13) [173]. This should be compared to more than 35,000 structures of soluble proteins. Encouragingly however, there has been a steady flow of MP structures published over the last few years, which brings hope for more stream-lined and/or “large scale” approaches to be developed in the future. Why is it so hard to obtain high-resolution structural information for MPs? The mixed hydrophobic and hydrophilic nature of MPs is perhaps the major reason for the general problems associated with MP structural determination. It causes many bottlenecks that need to be solved on an individual basis, in the over-expression, solubilization and finally in the crystallization trials.

Figure 13. A very rapid increase in the number of solved MPs is not to be expected according to a recent analysis (figure adapted from [174]). If current trends continue, only about 2,200 structures are estimated to be available in 2025.

Topology mapping

Topology mapping is an experimental approach that aims at defining the TMHs and their in/out orientation relative to the membrane (i.e. a topology model). The major advantage of this approach is that it is applicable to almost any MP. Consequently it has become a widely used starting point for further biochemical investigations and a number of different techniques are currently available. Among the most important techniques are reporter fusions [175], Cysteine- labelling [176], epitope mapping [177], mapping [178] and limited proteolysis [179].

31 The C-tail and the internal loops: main sites for experimental topology mapping As of today, experimental topology mapping is restricted to techniques that deduce the location of internal loops and the C-terminal end of MPs. For E. coli, there is no technique that can report on a membrane integrated part or a N-terminal location. The latter is largely due to the pivotal role of the N-terminal part in targeting and translocation. Any changes in the N-terminal region (e.g. adding a tag or reporter protein) may have an impact on the ability of the protein to insert and adopt its native topology.

Reporter fusions The reporter fusion technique is the most common approach to topology mapping in E. coli and utilizes the fact that some proteins are active in one compartment only (the cytoplasm or the periplasm). By fusing the reporter protein to different truncated versions of the MP of interest (for example after every predicted TMH) one can achieve a quite detailed topology model. Many different reporters have been introduced through the years. Alkaline phosphatase (PhoA) [175,180] and β-lactamase [181] are examples of two commonly used periplasmic reporter proteins. For reporting a cytoplasmic location, one can utilize for example β- galactosidase [175] or Green fluorescent protein (GFP) [182]. Below follows a more detailed description of two reporter proteins, GFP and PhoA, which have been used in the studies reported in the thesis.

PhoA PhoA is a periplasmic zinc-containing enzyme that consists of two identical subunits encoded by the phoA gene. Each subunit requires two disulphide bridges, which can only be formed in the periplasmic space (with the aid of DsbA). PhoA was the first topology reporter protein described in the literature [180]. PhoA (without its signal sequence) was fused to the C-terminus of an E. coli MP and shown to reliably report parts of the protein located to the periplasm [180]. A simple colorimetric assay revealed that fusions to positions in or near the periplasm lead to high enzymatic activity, whereas those to positions in the cytoplasmic domain gave low activity.

GFP The use of GFP has exploded within the last decade and GFP is today one of the most widely exploited proteins in biochemistry, cell biology and microbiology. The most valuable feature of GFP is its ability to emit highly visible green light following excitation (with long-wave UV light) of an internal fluorophore. Thus no substrate is needed to monitor activity. From crystal structures it is known that GFP adopts a relatively uncomplicated “β-can” structure [183,184]. The internal fluorophore is composed of a Serine-Tyrosine-Glycine sequence positioned near the N-terminus. GFP is a valuable reporter protein in E. coli for a number of reasons: (i) its ability to fold and fluoresce, also when fused to another protein, (ii) it folds only in the cytoplasm of E. coli (the reason for this is not known) [182], (iii) it folds and fluoresces only when the fusion partner is fully folded [185], (iv) it is a quantitative technique since one fluorescent GFP molecule equals one folded fusion protein. These features apply to both soluble proteins as well as MPs. As a consequence, when GFP is C-terminally fused to a MP, it can be used as a reporter for portions of the protein located to the cytoplasm [182]. Furthermore it provides a rough quantitative measurement of functional expression levels (correctly targeted and membrane inserted) of the fused MP [186,187].

32 Bioinformatics - a Theoretical Approach to Study MPs

Today, genome sequencing projects provide the scientific community with an ever-increasing number of predicted protein sequences. The majority of these sequences are experimentally uncharacterized. This, however, does not mean that we do not know anything about them. Numerous types of computer-based methods exist, that can provide whole proteomes with an initial characterization regarding the fraction of MPs, their function and predicted topologies. This initial characterization step often provides the basis for further experimental analysis.

Sequence comparisons of proteins In many instances it is possible to obtain a great deal of information about a protein regarding its topology, which protein family it belongs to and possible function, even in the complete absence of experimental information. This is due to the fact that proteins with similar functions often exhibit some degree of sequence similarity due to common evolutionary origin. This knowledge is utilized in different types of sequence comparisons, where the sequence of the protein of interest (query sequence) is compared with other protein sequences in order to obtain information regarding homology (fig 14).

BLAST BLAST (Basic Local Alignment Search Tool) is a set of widely used similarity search programs that are specialized in performing rapid, but sensitive, large-scale sequence comparisons [188]. The basic BLAST algorithm runs a pairwise comparison between the query sequence and a large number of target sequences and then identifies those alignments (hits) that display a statistically significant local similarity to the query (fig 14).

Figure 14. An illustration of a BLAST run. The BLAST algorithm is specialized in sequence comparisons and utilizes a certain procedure (algorithm) to process the data. The algorithm involves an alignment (two or more sequences that are lined up for comparison) of a query sequence to a database of other sequences. The purpose is to assess the degree of similarity and the possibility of homology between the compared sequences. It is then possible to single out the sequences in the database that display maximum levels of identity (the same amino acids) and conservation (amino acid changes with preserved properties) to the query sequence. These sequences are referred to as “hits”. The cut-off is chosen depending on the purpose of the BLAST run.

33 e-value To assess whether a given alignment constitutes evidence for homology, one needs to know how frequently this degree of sequence similarity can be expected by chance alone when the database is searched. For this reason, each hit is given a score (S) that allows it to be compared with other hits. This score is then put through a statistical analysis that tries to estimate how often alignments with a score ≥ S occur in the database search by chance. This analysis yields a so-called e-value (expectation value) (fig 14). The e-value is the number of hits with scores ≥ S expected to occur by chance in the current database. For a hit to be considered significant, the e- value should be << 1. For BLAST, the e-value is specific for a particular query sequence and a particular database.

Pfam Pfam is a manually curated database that contains a broad collection of protein domain families [189]. Today the database contains almost 8000 protein domain families and is streamlined for annotation of gene function and sequence classification [190]. Simply put, a homology search is carried out by running a query sequence against the Pfam database. The query sequence can then be annotated with the same function as the hit proteins.

Prediction methods Experimental characterization in combination with rapid development of computer capability, have led to the development of different types of computer-based prediction methods. For MPs, prediction methods allow us to predict (i) if a protein belongs to the class of α-helical MPs, (ii) topology models, (iii) re-entrant loops, (iv) presence of a signal sequence and (v) organelle localization [191]. Many prediction methods are general and classify any polypeptide chain according to the two, or sometimes three or four, first points. Other methods are more specialized, and search for signal sequences or classify which organelle a protein is targeted to. At this point it should be noted that to date, there is no reliable ab initio 3D prediction method that is publicly available [192,193]. This is largely due to the limited knowledge of the general rules that MPs follow in order to form their final structure. It is therefore currently impossible to construct a general algorithm that works from protein sequence information only.

Topology prediction methods There are a number of different topology prediction methods available today. The foundation for all of these methods is the topological uniformity of MPs (i.e. anti-parallel TMHs buried within the membrane and hydrophilic loops that alternate between a cytoplasmic and extra- cytoplasmic location). All methods rely on hydrophobicity analyses to predict the number of TMHs and most methods use the positive-inside rule to deduce the orientation of the protein relative to the membrane. The aromatic residues Tryptophan and Tyrosine are often incorporated into the algorithm in order to better define the boundaries of the TMHs. The topology prediction methods then generate a topology model that includes: (i) how many TMHs the protein has, (ii) on which side of the membrane the loops and tails are located, (iii) the boundaries of the membrane and non-membrane domains (reviewed in ref [194]).

Comparative evaluation of topology prediction methods Much effort has been put into measuring the level of performance of each individual method, as well as to estimate the difference in performance between methods. This information is important in order to guide the bioinformatician to construct better MP prediction methods, but also so that the user knows to what extent a prediction can be trusted. The estimated success- rate will fluctuate, since it is largely dependent on prediction method used, how the protein test

34 set is selected, the set of MPs that has been used to train the prediction method and which organism that is investigated.

General performance of predictors In studies aimed at comparing different topology predictors, it has become clear that most advanced prediction methods correctly distinguish between MPs and soluble proteins (success- rate 90-98%) while the older or more simple methods often falsely annotate hydrophobic regions within soluble proteins as TMHs, and thus wrongly identify them as MPs [195,196]. One drawback with current prediction methods is their tendency to interpret the hydrophobic parts of a signal sequence as a TMH, and thus incorrectly identify secreted proteins as single- spanning MPs. In addition, there are some proteins for which prediction methods have great difficulties in assigning the correct topology. This can be due to very short hydrophobic stretches, re-entrant lops or a weak over-all charge bias in the protein. Taken together, these problems influence the over-all prediction performance and result in an estimated success-rate which commonly ranges from 50% and up to 75% [172,195,197,198]. However, if the prediction accuracy of entire proteomes is considered, the prediction accuracy is more likely to be in the lower regions (50-60%) [199]. Although a success-rate of 100% might seem far away, current prediction methods provide a very good starting point for further structural characterization.

The consensus approach Recently, our lab presented a “consensus approach” to topology prediction [200]. This approach explores the possibility that consensus predictions of MP topology might provide a means to estimate the reliability of a predicted topology model without the need for an experimental input. Instead the reliability of a given topology prediction can be estimated by comparing the predictions from several prediction methods using a simple majority-vote procedure. The approach was tested on a set of 60 E. coli MPs with experimentally determined topologies. Each of the 60 proteins was subjected to five different topology prediction methods (TOPPRED, TMHMM, HMMTOP, MEMSAT and PHD). In line with the hypotheses, it was found that prediction performance varies strongly with the number of methods that agree: i.e. the fraction of correct predictions drops with increasing levels of disagreement between the different methods. When all methods agree, the topology is virtually certain to be correct. When four out of five methods agree, the majority prediction was correct in 10 out of 12 cases (∼80%). Importantly however, the correct topology was always one of the two predicted topologies. This approach can thus be used to pin down the “easy-to-predict” proteins (when at least four out of five methods agree on the prediction) in a proteome and, for these proteins, produce highly reliable topology models without any experimental input. In most organisms the consensus approach is applicable to roughly 10% of the membrane- proteome. Thus, one obvious drawback to the approach is that it does not include all MP in a typical membrane proteome, i.e., it has a rather low coverage.

Consensus predictions in combination with experimental analyses As a next step we combined the consensus approach with limited experimental analyses, in order to produce experimentally based topology models for previously uncharacterized E. coli MPs [201]. The selection of proteins was based on the following criteria: (i) the five prediction methods (listed above) must all agree on the periplasmic or cytoplasmic location of the N-terminus, (ii) one or two of the methods are allowed to disagree with the majority for one TMH, on the over- all number of TMHs. These criteria applied to about 80 MPs. Twelve of these MPs (ranging in size and number of TMHs) were picked for this pilot study. Assuming that the correct topology is one of the two predicted topologies (this is always the case when the analysis was applied to the 60 E. coli inner-membrane proteins in the consensus approach, see above [200]), the correct topology prediction can be pin-pointed once the location

35 of the C-terminus of the protein is known. The C-terminal location for each of the proteins was experimentally deduced using two C-terminal reporter protein fusions; one to PhoA (active in the periplasm) and one to GFP (active in the cytoplasm).

TMHMM TMHMM has been ranked as one of the top-performing topology predictors in several evaluations, with a success rate of 55-75% [172,195,197-199]. In the case of classifying proteins to be soluble proteins or α–helical MPs, TMHMM is also among the top candidates, with a reliability close to 97% [195,196].

The TMHMM architecture This topology predictor is based on a statistical method called a hidden Markov model [172]. The model contains a number of states: helix core, helix cap, loop and globular domain (fig 15A). The TMHMM method predicts the most probable topology for a protein in accordance with these different states. The output of the prediction is given as a probability measure. Each residue within a protein sequence is given a probability score for its likelihood to be present in three different locations: within a TMH (h), on the inside (i) or on the outside (o). This gives rise to three parallel probability measurements for each residue within the protein sequence (fig 15B). The final topology prediction is the path through the predicted probabilities that yields the highest over-all probability score; p(best topology). In addition a sum of p(all possible topologies) can also be calculated [199].

A)

B)

Figure 15. A) TMHMM architecture (figure adapted from [172]). TMHMM is constructed in such a way that the predictions must obey the common topological uniformity of MPs. The model contains a number of states: helix core, helix cap cytoplasmic, cytoplasmic loop, helix cap extra-cytoplasmic, extra- cytoplasmic loop and globular domain. The TMHMM method predicts the most probable topology for a protein in accordance with these different states. B) TMHMM prediction output. To the left: a plot where the entire sequence is labeled according to one of the three classes (h, i or o). To the right: the full prediction where all the probabilities for the different states are shown in numbers. The final topology prediction (underscored) is the path through the predicted probabilities that yields the highest over-all probability score; p(best topology).

36 Reliability score One general drawback to the current topology prediction methods is that the user does not know the degree of reliability of a certain topology prediction. This particular problem has been addressed by a modified version of TMHMM 2.0 [199]. It has been designed such that it provides a reliability score (the S3 score), which indicates how “certain” TMHMM is of the suggested topology. The S3 score is the quotient between p(best topology)/p(all p ossible topologies). A quotient close to 1.00 shows that p(best topology) is nearly as high as p(all possible topologies) and thus indicate that only one topology is likely for this particular protein and hence that the prediction is probably correct. However, if several different topologies are equally likely for a particular protein, the p(best topology) will be quite low, while p(all possible topologies) will be fairly high. In other words, the S3 score is a number describing how straightforward and easy it is to produce the topology model. For example, a MP with four very hydrophobic stretches of approximately 20 residues and a high number of positively charged residues on the N-tail and second loop will yield a prediction of Nin-(4 TMHs)-Cin that is accompanied by a high reliability score. On the other hand, a MP with four strong hydrophobic stretches of approximately 20 residues but with positively charged residues that are evenly scattered between all the loops and tails, represent a protein for which the membrane orientation is not clear-cut. So even though it may be predicted as Nin-(4 TMHs)-Cin, it will not be provided with a high reliability score. The reliability score may thus help in discriminating a correct topology prediction from an incorrect one. As shown by Melén et al. the reliability score is indeed closely related to the actual accuracy of the predicted topology [199]. If a prediction has a score close to 1.00, it means that the prediction is almost certain to be correct. A value around 0.50 on the other hand, means that only 50% of the predictions with this reliability score will be correct. By providing each prediction with a reliability score, TMHMM immediately provides the user with a hint of the correctness of the topology prediction. In addition it can provide a means to measure the reliability of the topology predictions for entire proteomes.

Experimentally constrained predictions As a further improvement, the modified version of TMHMM 2.0 also allows the user to constrain the prediction by experimentally determined reference points. TMHMM will then give a new S3 score for the “fixed” prediction. For a test set of MPs with known topologies it was shown that fixing either the proteins´ N or C terminus will raise the general prediction accuracy with approximately 10%, from an average prediction accuracy of 70% to 80%, while fixing both N- and the C-terminus raises the average prediction accuracy to 90% [199]. Thus by providing TMHMM with a very limited amount of experimental information the prediction performance increases with at least 10%. This is an approach that is highly suitable for a proteomic-scaled topology investigation. Indeed, experimentally based topology models have been obtained for a large number of bacterial and yeast MPs (Papers I and II) [202-204].

37

38 Summary of Papers I and II

Large-scale topology analysis In order to get a complete, experimentally proven topology model for any given protein, the number of fusions that need to be made and analyzed has to be equal to, or larger than the number of TMHs in the protein. Consequently, it is a very time consuming exercise to obtain a detailed topology model and most MPs remain to undergo this characterization step. The topology prediction methods can of course provide a reasonable guide to minimize the number of fusion proteins that have to be made, but each protein still requires a significant and time- consuming experimental effort. We have addressed this dilemma by developing several new approaches to topology modelling. Commonly they all involve bioinformatics prediction methods. The goal is to improve the topology models so that their accuracy becomes comparable to experimentally obtained topology models, but with considerably less experimental effort and with a minimal disruption of the MPs native topology (i.e. no internal fusions). As a consequence of the minimized experimental effort, a significant increase in the efficiency (i.e. number of characterized proteins) can be accomplished. These approaches are thus suitable for large-scale topology investigations and applicable to entire membrane proteomes. As it is extremely difficult to obtain high-resolution information of MPs, the large-scale topology analyses presented in Papers I and II contribute significantly to the structural information that is available today. In fact, a topology model is a common starting-point for further structural and functional characterization of a MP. Proteome-wide studies are also important for other reasons. We can obtain a rough idea of the distribution of MPs families in a certain proteome based on typical topological characteristics (e.g. number of TMHs). Moreover, in the context of a proteomic investigation we can compare common topological themes, as well as detect topological anomalies. We can also spot “white-patches”, i.e. certain groups of MPs for which no experimental information is available.

Experimentally based topology models for E. coli inner MPs (Paper I) This study deals with three issues that are central in large-scale topology analyses: (i) choosing the optimal fusion-point for a reporter protein, (ii) the scaling-up of experimental tools and prediction analyses to suit proteome-wide topology studies, (iii) the production of experimentally based topology models based on a combination of experimental information and topology prediction. We concluded that the C-terminus is the optimal placement for a reporter protein. Further, we fused the reporter protein GFP (only active in the cytoplasm) and PhoA (only active in the periplasm) to the C-terminus of 34 E. coli MPs. Based on the activity profiles of the reporter proteins we assigned a C-terminal location for 31 of the proteins. As a final step we constrained the experimentally deduced C-terminal locations within the framework of TMHMM. Our approach yielded 31 experimentally produced topology models of E. coli MPs.

39 Global topology analyses of the E. coli membrane proteome (Paper II) In this study we applied the same basic approach (C-terminal fusions to GFP and PhoA in combination with TMHMM) as in Paper I, to the major part of the E. coli membrane proteome. We obtain experimentally based topology models for 601 MPs (fig 16). We find that a majority (∼80%) of the proteins have a Cin topology. Interestingly, ∼60% of the proteins are predicted to have both their N- and C-terminus located to the cytoplasm, which indicates that ‘helical hairpins’ are commonly used as building blocks in MP evolution. Moreover, by tabulating the GFP fluorescence intensity, we provide an estimate of the overexpression potential for all the active GFP fusions. Finally we report on a number of evolutionary interesting MPs. This will be further dealt with in the next section.

Figure 16. Experimentally based topology models for 601 E. coli MPs (figure adapted from Paper II). A majority of the proteins have their C-termini located to the cytoplasm, an even number of TMHs and less than 7 TMHs.

40 Summary of Papers II, III and IV

How did proteins with internal inverted repeats evolve? The evolutionary model, whereby an internal gene duplication event leads to the formation of a protein composed of two halves with similar topological and structural arrangements, is quite straightforward. For example, a gene encoding a protein with 6 TMHs may undergo an internal gene duplication event that leads to a protein with 12 TMHs and two structurally related halves (fig 10A). However, the manner in which internal inverted repeats have evolved is less obvious. Based on the work in Papers II, III and IV, we have put forward a scenario for the evolution of proteins with internal inverted repeats and also present alternative structural solutions to how MPs may “solve” the requirement for oppositely oriented domains. Our proposed evolutionary scenario is summarized in the next chapter, while the findings of each of the papers are summarized thereafter.

A step-wise evolutionary path for internal inverted repeats Our proposed evolutionary scenario involves three distinct steps (fig 17). It begins with a MP without a strong charge (K+R) bias, which can adopt dual topologies in the membrane. One dual topology protein may associate with an identical protein molecule that has adopted the opposite topological arrangement. These two proteins may then form an anti-parallel homo- dimer that constitutes the functional unit in the cell. In the next step on the evolutionary path, the gene that codes for the dual topology protein undergoes gene duplication. This results in two homologous genes (i.e. two separately expressed proteins) that are closely spaced on the chromosome. Eventually, through re-distribution of the positively charged residues, the two proteins become fixed in opposite orientations in the membrane. Together, the two homologous proteins form the functional anti-parallel hetero-dimer. In a final step, two genes encoding homologues with opposite orientations in the membrane may fuse into one long polypeptide with an internal inverted repeat structure.

Figure 17. A step-wise evolutionary path for proteins with internal inverted repeats (figure adapted from [205]).

41 Global topology analysis of the E. coli inner membrane proteome (Paper II)

The large-scale topology study of the E. coli membrane proteome enabled us to scan for interesting topological events on a proteomic scale. The following three criteria were applied in the search for dual topology candidates: (i) above-background activities with both the GFP and PhoA reporters, (ii) low content of positively charged residues, (iii) a weak (K+R) bias. The topological characteristics of all five candidates (EmrE, SugE, CrcB, YnfA and YdgC) that emerged from this selection are very similar: they are all quite small (around 100 residues), contain four strongly predicted TMHs and short intervening loops. In addition we were able to detect two instances of homologous proteins that adopt opposite orientations in the membrane: YdgQ/YdgL and YdgE /YdgF. These pairs contain the same number of predicted TMHs but, based on the GFP and PhoA activities, have oppositely assigned C-terminal locations.

Identification and evolution of dual-topology proteins (Paper III) By experimental alterations of the positively charge residues, we showed that the dual topology candidates identified in Paper II are very sensitive to changes in charge distribution, compared to topologically “stable” proteins with pronounced (K+R) biases. This provides strong support for the dual topology model. Further, we searched all sequenced bacterial genomes for homologues to the E. coli dual topology candidates. We found an interesting pattern where dual topology homologues are commonly encoded by only one gene in the bacterial chromosome, while pairs of homologues with opposite topologies are encoded by two closely spaced genes. This pattern strengthens the notion of a step-wise evolutionary path that begins with a dual topology protein, which undergoes a gene duplication to form two separately expressed proteins. Finally, we identified a family (DUF606) that contains dual topology candidates, homologues with opposite topology as well as proteins with internal inverted repeats (fig 18). In accordance with the step-wise evolutionary model described above, this family seems to have members that represent all three different ways by which the requirement for an anti-parallel dimer can be fulfilled.

pairs singletons N internal inverted repeats

Figure 18. Identification of a protein family, DUF606, which contains dual topology candidates (singletons with a charge distribution around zero), homologues with opposite topology (pairs with opposite charge biases) and proteins with internal inverted repeats (example shown in insert picture) (figure adapted from Paper III). This family thus seems to have members that represent all three different ways by which the requirement for an anti-parallel dimer can be fulfilled.

42 Emulating membrane protein evolution by rational design (Paper IV) In this paper we re-created our proposed step-wise evolutionary scenario in the laboratory. Using the dual topology candidate EmrE as our model protein, we performed a small number of mutations that altered its (K+R) bias. In this way we were able to construct two homologous proteins that adopt opposite topologies in the membrane.

Background Several of the interesting proteins identified in Paper II belong to the same protein family. EmrE, as well as SugE, and the YdgE/YdgF pair, all belong to the small multidrug resistance (SMR) family of transporters. This family of proteins catalyses of substances that is toxic to the cell. Commonly the substrates are small cationic hydrophobic drugs that can be taken up by the protein either laterally from the membrane or from the cytoplasm. In exchange for two protons the drug is expelled to the extra-cellular side of the membrane. EmrE has been thoroughly examined and serves as the paradigm SMR protein (reviewed in ref [206]). It seems as if EmrE can adopt dual topologies, due to its low content and even distribution of positively charged residues (fig 20). One Nin-Cin molecule can associate with a Nout-Cout molecule and thereby form a functional anti-parallel homodimer. Three separate 3D structures are available for EmrE [207-209]. However the structures are mutually inconsistent and the authenticity of the two X-ray structures is debated [210,211].

Results We performed step-wise alterations of the positively charged residues of EmrE in order to construct one EmrE(Nin-Cin) and one EmrE(Nout-Cout) molecule (fig 20). As a next step we re- cloned these “topologically fixed” EmrE versions into a vector, which allowed us to express them alone or in pairs. By utilizing an in vivo assay, where cells are grown in the presence of ethidium bromide, we were able to show that expression of either EmrE(Nin-Cin) or EmrE(Nout- Cout) did not render cells resistant to ethidium bromide. However, when co-expressed, cell growth was restored to wild type levels. In addition, by “separating” the EmrE homo-dimer into an EmrE(Nin-Cin) and an EmrE(Nout-Cout) molecule, we were able to search for residues that are necessary in one but not both of the EmrE monomers in the dimer. We focused on Glutamate 14, a residue that has been suggested to be part of a binding domain shared by both substrates and protons, and thus essential for proper EmrE function (reviewed in ref [212]). We show that, when co-expressed, EmrE is still functional when one Glutamate is replaced by Aspartate in either EmrE(Nin-Cin)- or EmrE(Nout-Cout), but not when both Glutamates are mutated to Aspartate.

Figure 20. Two EmrE versions with fixed topologies were constructed by altering the number and placements of a small number of positively charged residues (figure adapted from Paper IV). The positively charged residues present in wild type EmrE are indicated in black, while the altered residues are shown as open circles. EmrE(Nin-Cin) has a (K+R) bias of +1, and EmrE(Nout-Cout) has a (K+R) bias of -5. The (K+R) bias of wild type EmrE is -2, which puts it right in between the (K+R) biases of the two ‘topologically fixed’ EmrE versions.

43 Discussion Our results indicate that instances of homologous proteins with opposite orientations as well as dual topology proteins are quite rare. However, proteins with internal inverted repeats are fairly common. The reason for this may be that protein folding and assembly is more efficient when the two oppositely oriented halves are part of a single polypeptide chain than when they are expressed separately. The instances of dual topology proteins and homologues with opposite orientations that we do find today might thus represent proteins for which the fusion into a single long polypeptide-chain presents a problem. Hence, it might not be a coincidence that we find a high number of oppositely oriented proteins with an even number of TMHs, since these require the de novo creation of an intervening TMH in order to produce a single long polypeptide with internal inverted repeats. Another interesting question is how dual topologies are produced. At present, there are no studies that directly investigate the mechanism by which proteins adopt dual topologies in the membrane. One can, however picture at least two different scenarios. The first one relates to the importance of the positive-inside rule on the orientation of MPs. Since the dual topology proteins have no strong charge-bias, EmrE may follow a random insertion pattern, where a fraction of the EmrE polypeptides insert in one orientation (Nin-Cin) and the remaining fraction in the opposite orientation (Nout-Cout). In the light of our findings, this scenario seems very likely. However, a second scenario that involves some kind of topology flipping mechanism cannot be ruled out at this stage. One could imagine that, initially, all EmrE molecules insert in one orientation and then a fraction of the molecules (perhaps aided by specific chaperones) flip over to the other side.

44 Acknowledgements

In the last few years when I have worked as PhD student in the “GvH group” at DBB, I have enjoyed (almost) every day at work. So even though I´m happy to have come to this part of my education, it´s not only with a feeling of happiness that I now write these last few words in my thesis. I also feel a sense of lost and a bit of sorrow, since something very good has come to an end. Gunnar, it is very hard to in just a few words sum up the years that we have worked together. But I´ll try. Not many people are as fortunate to get a supervisor that not only contains an endless amount of information and knowledge, but also is so much fun to work with. Thank you for being a good friend and a great boss. Stefan, always interested in science as well as personal issues. It´s good to know that you´re always there for us! Susanna, you are the best co-worker one could ever get! You are talented in so many different areas. I´m happy you have chosen do devote your intelligence and hard work to our lab and our projects. I appreciate all our laughs, discussions and help from you. We are a very good team and I´m looking forward to work next to you in the lab sometime soon. Marie, you should never trust your first instincts. When we first got to know each other we didn´t like one another. Today we are best friends and have done a lot together! I have enjoyed the skiing, the job talks, the baby talks, the training- and study sessions. I´m sure we´ll end up working together in the future too… Marika, always ready to listen and talk. You´re the most empathic and carrying persons that I know. The fact that you were a student in the GvH lab and always had time for a “fika,” was a major reason for me to start at DBB. Dan, although we´ve had an endless number of discussions throughout the years, the outcome of our work together has been excellent. So in some strange way, maybe we complement each other?! Also, I would like to thank you for taking the time to read through my thesis. Tara, it has been good to have you as a PhD companion! I´ve got a lot of good memories from the every day work in the lab, our discussions and the traveling we´ve done together. Louise, also a very good PhD companion, with fantastic organizing skills! What would we do without you? Good luck with your thesis. Ing-Marie, why have mutant-lists on computers when you can have direct access to the “Ingis supermind”? Thanks for all the help through the years. David D, always happy and enthusiastic. It has been great working with you. Joy, you are one of the cornerstones in the GvH lab. Good luck with your future plans. Filippa and Carolina, more clever girls! It´s good to know you´ll keep up the good spirit in the lab. Yoko and Mirjam, our new post-docs. Keep up the good work! Jan- Willem, David W and Sam, enthusiastic scientists! It has been a lot of fun working with you! To Anki, Kicki, Bogos, Ann, Inger and everyone else at DBB: Thanks for making it such a nice place to work at. En stor kram till alla mina vänner utanför labbet! En extra stor kram till Malin! Även om du bor i Australien så är du alltid nära. Du är en mycket speciell person för mig. Din livsglädje smittar! Zandra, min barndomsvän! I alla år har vi följt varandra i livet: genom skolår, pojkvänner, resor och barn. Det är skönt att alltid veta att vi finns där för varandra. Och ett stort tack till min familj…Thomas, vi har haft åtta fantastiska år tillsammans med massor av kul resor, klättring, husköp och nu också barn. Jag tycker att jag haft en otrolig tur i livet som träffat dig och ser fram emot alla roliga saker vi har kvar att göra tillsammans. Jag älskar dig! Wilma, trots alla dina ansträngningar att slita ut alla papper i mitt arkiv hemma i arbetsrummet så har jag alltså lyckats skriva denna avhandling… Du är en underbar personlighet och du berikar mitt liv. Dina kiknande skratt och roliga grimaser får mig att

45 glömma allt annat och bara vara i nuet. Aldrig trodde jag att man skulle kunna älska någon så mycket! Mamma, du har alltid haft en otrolig tro på min förmåga. Du stöttar alla min beslut och finns alltid där när jag behöver en pratstund (eller nånstans att bo). Jag hoppas jag ska lyckas vara en lika bra mamma till mina barn. Jag älskar dig. Pappa och Denise, you are always supportive and interested in my work. By always welcoming us to your home (and your car…) I feel like I have two homes. Love you! Jessica, älskade lille-syrra! Det är bara du och jag som fattar våra (roliga?) skämt, men det gör inget. Jag är så glad att du, Jörgen och Emil finns där varje dag i livet. Christopher, många kramar till brorsan! Det är alltid lika kul att ses, även om det inte blir så ofta nu när du bor i Kalmar med Jossan.

46 References

1. Koshland DE, Jr.: Special essay. The seven pillars of life. Science 2002, 295:2215- 2216. 2. Wallin E, von Heijne G: Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci 1998, 7:1029-1038. 3. Vellai T, Vida G: The origin of eukaryotes: the difference between prokaryotic and eukaryotic cells. Proc Biol Sci 1999, 266:1571-1577. 4. Osborne AR, Rapoport TA, van den Berg B: Protein translocation by the Sec61/SecY channel. Annu Rev Cell Dev Biol 2005, 21:529-550. 5. Luirink J, von Heijne G, Houben E, de Gier JW: Biogenesis of inner membrane proteins in Escherichia coli. Annu Rev Microbiol 2005, 59:329-355. 6. Pohlschroder M, Hartmann E, Hand NJ, Dilks K, Haddad A: Diversity and evolution of protein translocation. Annu Rev Microbiol 2005, 59:91-111. 7. Van den Berg B, Clemons WM, Jr., Collinson I, Modis Y, Hartmann E, Harrison SC, Rapoport TA: X-ray structure of a protein-conducting channel. Nature 2004, 427:36-44. 8. White SH, von Heijne G: The machinery of membrane protein assembly. Curr Opin Struct Biol 2004, 14:397-404. 9. Junne T, Schwede T, Goder V, Spiess M: The plug domain of yeast Sec61p is important for efficient protein translocation, but is not essential for cell viability. Mol Biol Cell 2006, 17:4063-4068. 10. Chamberlain AK, Faham S, Yohannan S, Bowie JU: Construction of helix-bundle membrane proteins. Adv Protein Chem 2003, 63:19-46. 11. White SH, von Heijne G: Transmembrane helices before, during, and after insertion. Curr Opin Struct Biol 2005, 15:378-386. 12. Popot JL, Engelman DM: Helical membrane protein folding, stability, and evolution. Annu Rev Biochem 2000, 69:881-922. 13. Bowie JU: Solving the membrane protein folding problem. Nature 2005, 438:581-589. 14. Popot J-L, Gerchman S-E, Engelman DM: Refolding of bacteriorhodopsin in lipid bilayers. A thermodynamically controlled two-stage process. J Mol Biol 1987, 198:655-676. 15. Popot JL, Engelman DM: Membrane protein folding and oligomerization - The 2-stage model. Biochemistry 1990, 29:4031-4037. 16. Higy M, Junne T, Spiess M: Topogenesis of membrane proteins at the . Biochemistry 2004, 43:12716-12722. 17. White SH, Wimley WC: Membrane protein folding and stability: Physical principles. Annu Rev Biophys Biomol Struc 1999, 28:319-365. 18. White SH, Ladokhin AS, Jayasinghe S, Hristova K: How membranes shape protein structure. J Biol Chem 2001, 276:32395-32398. 19. Booth PJ: Unravelling the folding of bacteriorhodopsin. Biochim Biophys Acta 2000, 1460:4-14.

47 20. Ben-Tal N, Ben-Shaul A, Nicholls A, Honig B: Free energy determinants of alpha-helix insertion into lipid bilayers. Biophys J 1996, 70:1803-1812. 21. Faham S, Yang D, Bare E, Yohannan S, Whitelegge JP, Bowie JU: Side-chain contributions to membrane protein structure and stability. J Mol Biol 2004, 335:297-305. 22. Kramer G, Ramachandiran V, Hardesty B: Cotranslational folding--omnia mea mecum porto? Int J Biochem Cell Biol 2001, 33:541-553. 23. Lu J, Deutsch C: Folding zones inside the ribosomal exit tunnel. Nat Struct Mol Biol 2005, 12:1123-1129. 24. Lu J, Deutsch C: Secondary structure formation of a transmembrane segment in Kv channels. Biochemistry 2005, 44:8230-8243. 25. Chen HF, Kendall DA: Artificial transmembrane segments - Requirements for stop transfer and polypeptide orientation. J Biol Chem 1995, 270:14115- 14122. 26. Nilsson I, von Heijne G: Breaking the camel's back: Proline-induced turns in a model transmembrane helix. J Mol Biol 1998, 284:1185-1189. 27. Bowie JU: Helix packing angle preferences. Nat Struct Biol 1997, 4:915-917. 28. Sakaguchi M, Tomiyoshi R, Kuroiwa T, Mihara K, Omura T: Functions of signal and signal-anchor sequences are determined by the balance between the hydrophobic segment and the N-terminal charge. Proc Natl Acad Sci USA 1992, 89:16-19. 29. Wahlberg JM, Spiess M: Multiple determinants direct the orientation of signal- anchor proteins: the topogenic role of the hydrophobic signal domain. J Cell Biol 1997, 137:555-562. 30. Rosch K, Naeher D, Laird V, Goder V, Spiess M: The topogenic contribution of uncharged amino acids on signal sequence orientation in the endoplasmic reticulum. J Biol Chem 2000, 275:14916-14922. 31. Ulmschneider MB, Sansom MSP: Amino acid distributions in integral membrane protein structures. Biochim Biophys Acta 2001, 1512:1-14. 32. Beuming T, Weinstein H: A knowledge-based scale for the analysis and prediction of buried and exposed faces of proteins. Bioinformatics 2004, 20:1822-1835. 33. Hessa T, Kim H, Bihlmaier K, Lundin C, Boekel J, Andersson H, Nilsson I, White SH, von Heijne G: Recognition of transmembrane helices by the endoplasmic reticulum translocon. Nature 2005, 433:377-381. 34. Higy M, Gander S, Spiess M: Probing the environment of signal-anchor sequences during topogenesis in the endoplasmic reticulum. Biochemistry 2005, 44:2039-2047. 35. Wimley WC, White SH: Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nature Struct Biology 1996, 3:842-848. 36. Hessa T, White SH, von Heijne G: Membrane insertion of a potassium-channel voltage sensor. Science 2005, 307:1427. 37. Jiang Y, Ruta V, Chen J, Lee A, MacKinnon R: The principle of gating charge movement in a voltage-dependent K+ channel. Nature 2003, 423:42-48. 38. Lee SY, Lee A, Chen J, MacKinnon R: Structure of the KvAP voltage-dependent K+ channel and its dependence on the lipid membrane. Proc Natl Acad Sci USA 2005, 102:15441-15446. 39. Killian JA, von Heijne G: How proteins adapt to a membrane-water interface. Trends Biochem Sci 2000, 25:429-434.

48 40. Liang J, Adamian L, Jackups R, Jr.: The membrane-water interface region of membrane proteins: structural bias and the anti-snorkeling effect. Trends Biochem Sci 2005, 30:355-357. 41. Denzer AJ, Nabholz CE, Spiess M: Transmembrane orientation of signal-anchor proteins is affected by the folding state but not the size of the N-terminal domain. EMBO J 1995, 14:6311-6317. 42. Kida Y, Sakaguchi M, Fukuda M, Mikoshiba K, Mihara K: Membrane topogenesis of a type I signal-anchor protein, mouse synaptotagmin II, on the endoplasmic reticulum. J Cell Biology 2000, 150:719-729. 43. Wallin E, von Heijne G: Properties of N-terminal tails in G-protein coupled receptors - A statistical study. Protein Engineer 1995, 8:693-698. 44. Whitley P, Zander T, Ehrmann M, Haardt M, Bremer E, von Heijne G: Sec- independent translocation of a 100-residues long periplasmic N-terminal tail in the E. coli inner membrane proteins ProW. EMBO J 1994, 13:4653- 4661. 45. Cao GQ, Dalbey RE: Translocation of N-terminal tails across the plasma membrane. EMBO J 1994, 13:4662-4669. 46. Granseth E, von Heijne G, Elofsson A: A study of the membrane-water interface region of membrane proteins. J Mol Biol 2005, 346:377-385. 47. Blobel G, Dobberstein B: Transfer of proteins across membranes. I. Presence of proteolytically processed and unprocessed nascent immunoglobulin light chains on membrane-bound ribosomes of murine myeloma. J Cell Biol 1975, 67:835-851. 48. Blobel G, Dobberstein B: Transfer to proteins across membranes. II. Reconstitution of functional rough microsomes from heterologous components. J Cell Biol 1975, 67:852-862. 49. Martoglio B, Dobberstein B: Signal sequences: more than just greasy peptides. Trends Cell Biol 1998, 8:410-415. 50. von Heijne G: Analysis of the distribution of charged residues in the N-terminal region of signal sequences: implications for protein export in prokaryotic and eukaryotic cells. EMBO J 1984, 3:2315-2318. 51. von Heijne G: Net N-C charge imbalance may be important for signal sequence function in bacteria. J Mol Biol 1986, 192:287-290. 52. von Heijne G, Gavel Y: Topogenic signals in integral membrane proteins. Eur J Biochem 1988, 174:671-678. 53. Gavel Y, Steppuhn J, Herrmann R, von Heijne G: The positive-inside rule applies to thylakoid membrane proteins. FEBS Lett 1991, 282:41-46. 54. Nilsson J, Persson B, von Heijne G: Comparative analysis of amino acid distributions in integral membrane proteins from 107 genomes. Proteins 2005, 60:606-616. 55. Sipos L, von Heijne G: Predicting the topology of eukaryotic membrane proteins. Eur J Biochem 1993, 213:1333-1340. 56. Gafvelin G, Sakaguchi M, Andersson H, von Heijne G: Topological rules for membrane protein assembly in eukaryotic cells. J Biol Chem 1997, 272:6119- 6127. 57. von Heijne G: Control of topology and mode of assembly of a polytopic membrane protein by positively charged residues. Nature 1989, 341:456- 458.

49 58. Nilsson IM, von Heijne G: Fine-tuning the topology of a polytopic membrane protein. Role of positively and negatively charged residues. Cell 1990, 62:1135-1141. 59. Beltzer JP, Fiedler K, Fuhrer C, Geffen I, Handschin C, Wessels HP, Spiess M: Charged residues are major determinants of the transmembrane orientation of a signal-anchor sequence. J Biol Chem 1991, 266:973-978. 60. Kida Y, Morimoto F, Mihara K, Sakaguchi M: Function of positive charges following signal-anchor sequences during translocation of the N-terminal domain. J Biol Chem 2006, 281:1152-1158. 61. Andersson H, Bakker E, von Heijne G: Different positively charged amino acids have similar effects on the topology of a polytopic transmembrane protein in Escherichia coli. J Biol Chem 1992, 267:1491-1495. 62. Parks GD, Lamb RA: Role of NH2-terminal positively charged residues in establishing membrane protein topology. J Biol Chem 1993, 268:19101- 19109. 63. Sato M, Hresko R, Mueckler M: Testing the charge difference hypothesis for the assembly of a eucaryotic multispanning membrane protein. J Biol Chem 1998, 273:25203-25208. 64. Nilsson I, Witt S, Kiefer H, Mingarro I, von Heijne G: Distant downstream sequence determinants can control N-tail translocation during protein insertion into the endoplasmic reticulum membrane. J Biol Chem 2000, 275:6207-6213. 65. Goder V, Bieri C, Spiess M: Glycosylation can influence topogenesis of membrane proteins and reveals dynamic reorientation of nascent polypeptides within the translocon. J Cell Biol 1999, 147:257-266. 66. Gafvelin G, von Heijne G: Topological "frustration" in multi-spanning E. coli inner membrane proteins. Cell 1994, 77:401-412. 67. Wessels HP, Spiess M: Insertion of a multispanning membrane protein occurs sequentially and requires only one signal sequence. Cell 1988, 55:61-70. 68. Andersson H, von Heijne G: Sec-dependent and sec-independent assembly of E. coli inner membrane proteins - the topological rules depend on chain length. EMBO J 1993, 12:683-691. 69. Qi HY, Bernstein HD: SecA is required for the insertion of inner membrane proteins targeted by the Escherichia coli signal recognition particle. J Biol Chem 1999, 274:8993-8997. 70. Neumann-Haefelin C, Schafer U, Muller M, Koch HG: SRP-dependent co- translational targeting and SecA-dependent translocation analyzed as individual steps in the export of a bacterial protein. EMBO J 2000, 19:6419- 6426. 71. Whitley P, Gafvelin G, von Heijne G: Sec-independent translocation of the periplasmic N-terminal tail of an E. coli inner membrane protein: Position- specific effects on translocation of positively charged residues and construction of a protein with a C-terminal translocation signal. J Biol Chem 1995, 270:29831-29835. 72. von Heijne G: Sec-independent protein insertion into the inner E. coli membrane: A phenomenon in search of an explanation. FEBS Lett 1994, 346:69-72. 73. Hartmann E, Rapoport TA, Lodish HF: Predicting the orientation of eukaryotic membrane proteins. Proc Natl Acad Sci USA 1989, 86:5786-57890.

50 74. Kiefer D, Hu XT, Dalbey R, Kuhn A: Negatively charged amino acid residues play an active role in orienting the Sec-independent Pf3 coat protein in the Escherichia coli inner membrane. EMBO J 1997, 16:2197-2204. 75. Rutz C, Rosenthal W, Schulein R: A single negatively charged residue affects the orientation of a membrane protein in the inner membrane of Escherichia coli only when it is located adjacent to a transmembrane domain. J Biol Chem 1999, 274:33757-33763. 76. Harley CA, Holt JA, Turner R, Tipper DJ: Transmembrane protein insertion orientation in yeast depends on the charge difference across transmembrane segments, their total hydrophobicity, and its distribution. J Biol Chem 1998, 273:24963-24971. 77. van Klompenburg W, Nilsson IM, von Heijne G, de Kruijff B: Anionic phosholipids are determinants of membrane protein topology. EMBO J 1997, 16:4261-4266. 78. van Dalen A, de Kruijff B: The role of lipids in membrane insertion and translocation of bacterial proteins. Biochim Biophys Acta 2004, 1694:97-109. 79. Mishra VK, Palgunachari MN: Interaction of model class A(1), class A(2) and class Y amphipathic helical peptides with membranes. Biochemistry 1996, 35:11210-11220. 80. Gallusser A, Kuhn A: Initial steps in protein membrane insertion - bacteriophage-M13 procoat protein binds to the membrane surface by electrostatic interaction. EMBO J 1990, 9:2723-2729. 81. Andersson H, von Heijne G: Membrane protein topology: Effects of ΔµΗ + on the translocation of charged residues explain the 'positive inside' rule. EMBO J 1994, 13:2267-2272. 82. Gavel Y, von Heijne G: The distribution of charged amino acids in mitochondrial inner membrane proteins suggests different modes of membrane integration for nuclearly and mitochondrially encoded proteins. Eur J Biochem 1992, 205:1207-1215. 83. von Heijne G: The distribution of positively charged residues in bacterial inner membrane proteins correlates with the trans-membrane topology. EMBO J 1986, 5:3021-3027. 84. Goder V, Spiess M: Molecular mechanism of signal sequence orientation in the endoplasmic reticulum. EMBO J 2003, 22:3645-3653. 85. Puziss JW, Strobel SM, Bassford PJ: Export of maltose-binding protein species with altered charge distribution surrounding the hydrophobic core in Escherichia coli cells harboring prl suppressor mutations. J Bacteriol 1992, 174:92-101. 86. Chamberlain AK, Lee Y, Kim S, Bowie JU: Snorkeling preferences foster an amino acid composition bias in transmembrane helices. J Mol Biol 2004, 339:471-479. 87. Monné M, Nilsson I, Johansson M, Elmhed N, von Heijne G: Positively and negatively charged residues have different effects on the position in the membrane of a model transmembrane helix. J Mol Biol 1998, 284:1177- 1183. 88. Strandberg E, Killian JA: Snorkeling of lysine side chains in transmembrane helices: how easy can it get? FEBS Lett 2003, 544:69-73. 89. Braun P, von Heijne G: The aromatic residues Trp and Phe have different effects on the positioning of a transmembrane helix in the microsomal membrane. Biochemistry 1999, 38:9778-9782.

51 90. de Planque MRR, Bonev BB, Demmers JAA, Greathouse DV, Koeppe RE, Separovic F, Watts A, Killian JA: Interfacial anchor properties of tryptophan residues in transmembrane peptides can dominate over hydrophobic matching effects in peptide-lipid interactions. Biochemistry 2003, 42:5341- 5348. 91. Yau WM, Wimley WC, Gawrisch K, White SH: The preference of tryptophan for membrane interfaces. Biochemistry 1998, 37:14713-14718. 92. Monné M, Hermansson M, von Heijne G: A turn propensity scale for transmembrane helices. J Mol Biol 1999, 288:141-145. 93. Nilsson I, Sääf A, Whitley P, Gafvelin G, Waller C, von Heijne G: Proline-induced disruption of a transmembrane α-helix in its natural environment. J Mol Biol 1998, 284:1165-1175. 94. Sääf A, Hermansson M, von Heijne G: Formation of cytoplasmic turns between two closely spaced transmembrane helices during membrane protein integration into the ER membrane. J Mol Biol 2000, 301:191-197. 95. Hermansson M, Monné M, von Heijne G: Formation of helical hairpins during membrane protein integration into the endoplasmic reticulum membrane. Role of the N and C-terminal flanking regions. J Mol Biol 2001, 313:1171- 1179. 96. Ota K, Sakaguchi M, Hamasaki N, Mihara K: Membrane integration of the second transmembrane segment of band 3 requires a closely apposed preceding signal-anchor sequence. J Biol Chem 2000, 275:29743-29748. 97. Heinrich SU, Rapoport TA: Cooperation of transmembrane segments during the integration of a double-spanning protein into the ER membrane. EMBO J 2003, 22:3654-3663. 98. Hermansson M, von Heijne G: Inter-helical hydrogen bond formation during membrane protein integration into the ER membrane. J Mol Biol 2003, 334:803-809. 99. Meindl-Beinker NM, Lundin C, Nilsson I, White SH, von Heijne G: Asn- and Asp- mediated interactions between transmembrane helices during translocon- mediated membrane protein assembly. EMBO Rep 2006. 100. Oberai A, Ihm Y, Kim S, Bowie JU: A limited universe of membrane protein families and folds. Protein Sci 2006, 15:1723-1734. 101. Liu Y, Gerstein M, Engelman DM: Transmembrane protein domains rarely use covalent domain recombination as an evolutionary mechanism. Proc Natl Acad Sci USA 2004, 101:3495-3497. 102. Shimizu T, Mitsuke H, Noto K, Arai M: Internal gene duplication in the evolution of prokaryotic transmembrane proteins. J Mol Biol 2004, 339:1- 15. 103. Abramson J, Smirnova I, Kasho V, Verner G, Kaback HR, Iwata S: Structure and mechanism of the lactose permease of Escherichia coli. Science 2003, 301:610-615. 104. Murakami S, Nakashima R, Yamashita E, Yamaguchi A: Crystal structure of bacterial multidrug efflux transporter AcrB. Nature 2002, 419:587-593. 105. Huang Y, Lemieux MJ, Song J, Auer M, Wang DN: Structure and mechanism of the glycerol-3-phosphate transporter from Escherichia coli. Science 2003, 301:616-620. 106. Yin Y, He X, Szewczyk P, Nguyen T, Chang G: Structure of the multidrug transporter EmrD from Escherichia coli. Science 2006, 312:741-744.

52 107. Sääf A, Baars L, von Heijne G: The internal repeats in the Na+/Ca2+ exchanger- related Escherichia coli protein YrbG have opposite membrane topologies. J Biol Chem 2001, 276:18905–18907. 108. Dutzler R, Campbell EB, Cadene M, Chait BT, MacKinnon R: X-ray structure of a ClC chloride channel at 3.0 Å reveals the molecular basis of anion selectivity. Nature 2002, 415:287-294. 109. Harries WE, Akhavan D, Miercke LJ, Khademi S, Stroud RM: The channel architecture of aquaporin 0 at a 2.2-Å resolution. Proc Natl Acad Sci USA 2004, 101:14045-14050. 110. Fu D, Libson A, Miercke LJ, Weitzman C, Nollert P, Krucinski J, Stroud RM: Structure of a glycerol-conducting channel and the basis for its selectivity. Science 2000, 290:481-486. 111. Riek RP, Rigoutsos I, Novotny J, Graham RM: Non-alpha-helical elements modulate polytopic membrane protein architecture. J Mol Biol 2001, 306:349-362. 112. Rigoutsos I, Riek P, Graham RM, Novotny J: Structural details (kinks and non- alpha conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors. Nucleic Acids Res 2003, 31:4625-4631. 113. Yohannan S, Faham S, Yang D, Whitelegge JP, Bowie JU: The evolution of transmembrane helix kinks and the structural diversity of G protein- coupled receptors. Proc Natl Acad Sci USA 2004, 101:959-963. 114. von Heijne G: Proline kinks in transmembrane α-helices. J. Mol. Biol. 1991, 218:499-503. 115. Reiersen H, Rees AR: The hunchback and its neighbours: proline as an environmental modulator. Trends Biochem Sci 2001, 26:679-684. 116. Yohannan S, Yang D, Faham S, Boulting G, Whitelegge J, Bowie JU: Proline substitutions are not easily accommodated in a membrane protein. J Mol Biol 2004, 341:1-6. 117. Sansom MSP, Weinstein H: Hinges, swivels and switches: the role of prolines in signalling via transmembrane alpha-helices. Trends Pharm Sci 2000, 21:445- 451. 118. Tieleman DP, Shrivastava IH, Ulmschneider MR, Sansom MS: Proline-induced hinges in transmembrane helices: possible roles in ion channel gating. Proteins 2001, 44:63-72. 119. Wigley WC, Corboy MJ, Cutler TD, Thibodeau PH, Oldan J, Lee MG, Rizo J, Hunt JF, Thomas PJ: A protein sequence that can encode native structure by disfavoring alternate conformations. Nat Struct Biol 2002, 9:381-388. 120. Luecke H, Schobert B, Richter H-T, Cartaillr J-P, Lanyi J: Structure of bacteriorodopsin at 1.55 Å resolution. J Mol Biol 1999, 291:899-911. 121. Duneau JP, Garnier N, Genest M: Insight into signal transduction: structural alterations in transmembrane helices probed by multi-1 ns molecular dynamics simulations. J Biomol Struct Dyn 1997, 15:555-572. 122. Ostermeier C, Harrenga A, Ermler U, Michel H: Structure at 2.7 A resolution of the Paracoccus denitrificans two-subunit cytochrome c oxidase complexed with an antibody FV fragment. Proc Natl Acad Sci USA 1997, 94:10547- 10553. 123. Fleming KG, Engelman DM: Computation and mutagenesis suggest a right- handed structure for the synaptobrevin transmembrane dimer. Proteins 2001, 45:313-317.

53 124. Zhou FX, Cocco MJ, Russ WP, Brunger AT, Engelman DM: Interhelical hydrogen bonding drives strong interactions in membrane proteins. Nature Struct Biol 2000, 7:154-160. 125. Zhou FX, Merianos HJ, Brunger AT, Engelman DM: Polar residues drive association of polyleucine transmembrane helices. Proc Natl Acad Sci USA 2001, 98:2250-2255. 126. Dawson JP, Melnyk RA, Deber CM, Engelman DM: Sequence context strongly modulates association of polar residues in transmembrane helices. J Mol Biol 2003, 331:255-262. 127. Partridge AW, Therien AG, Deber CM: Missense mutations in transmembrane domains of proteins: phenotypic propensity of polar residues for human disease. Proteins 2004, 54:648-656. 128. Jiang L, Lai L: CH...O hydrogen bonds at protein-protein interfaces. J Biol Chem 2002, 277:37732-37740. 129. Scheiner S, Kar T, Gu Y: Strength of the Calpha H..O hydrogen bond of amino acid residues. J Biol Chem 2001, 276:9832-9837. 130. Senes A, Ubarretxena-Belandia I, Engelman DM: The Calpha ---H...O hydrogen bond: a determinant of stability and specificity in transmembrane helix interactions. Proc Natl Acad Sci USA 2001, 98:9056-9061. 131. Yohannan S, Faham S, Yang D, Grosfeld D, Chamberlain AK, Bowie JU: A C alpha-H...O hydrogen bond in a membrane protein is not stabilizing. J Am Chem Soc 2004, 126:2284-2285. 132. Sahin-Tóth M, Dunten RL, Gonzales A, Kaback HR: Functional interactions between putative intramembrane charged residues in the lactose permease of Escherichia coli. Proc Natl Acad Sci USA 1992, 89:10547-10551. 133. Donohue PJ, Sainz E, Akeson M, Kroog GS, Mantey SA, Battey JF, Jensen RT, Northup JK: An aspartate residue at the extracellular boundary of TMII and an arginine residue in TMVII of the gastrin-releasing peptide receptor interact to facilitate heterotrimeric G protein coupling. Biochemistry 1999, 38:9366-9372. 134. Hall JA, Fann MC, Maloney PC: Altered substrate selectivity in a mutant of an intrahelical salt bridge in UhpT, the sugar phosphate carrier of Escherichia coli. J Biol Chem 1999, 274:6148-6153. 135. Ellgaard L: Catalysis of disulphide bond formation in the endoplasmic reticulum. Biochem Soc Trans 2004, 32:663-667. 136. Kadokura H, Katzen F, Beckwith J: Protein disulfide bond formation in prokaryotes. Annu Rev Biochem 2003, 72:111-135. 137. Marti T: Refolding of bacteriorhodopsin from expressed polypeptide fragments. J Biol Chem 1998, 273:9312-9322. 138. Zen KH, Mckenna E, Bibi E, Hardy D, Kaback HR: Expression of lactose permease in contiguous fragments as a probe for membrane-spanning domains. Biochemistry 1994, 33:8198-8206. 139. Schmidt-Rose T, Jentsch TJ: Reconstitution of functional voltage-gated chloride channels from complementary fragments of CLC-1. J Biol Chem 1997, 272:20515-20521. 140. Wilkinson B, Esnault Y, Craven R, Skiba F, Fieschi J, Kepes F, Stirling C: Molecular architecture of the ER translocase probed by chemical crosslinking of Sss1p to complementary fragments of Sec61p. EMBO J 1997, 16:4549-4559. 141. Bowie JU: Helix packing in membrane proteins. J Mol Biol 1997, 272:780-789.

54 142. Ben-Tal N, Honig B: Helix-helix interactions in lipid bilayers. Biophys J 1996, 71:3046-3050. 143. Walters RF, Degrado WF: Helix-packing motifs in membrane proteins. Proc Natl Acad Sci USA 2006. 144. Sparr E, Ash WL, Nazarov PV, Rijkers DT, Hemminga MA, Tieleman DP, Killian JA: Self-association of transmembrane alpha-helices in model membranes: importance of helix orientation and role of hydrophobic mismatch. J Biol Chem 2005, 280:39324-39331. 145. Viklund H, Granseth E, Elofsson A: Structural classification and prediction of reentrant regions in alpha-helical transmembrane proteins: application to complete genomes. J Mol Biol 2006, 361:591-603. 146. Zhou Y, Morais-Cabral JH, Kaufman A, MacKinnon R: Chemistry of ion coordination and hydration revealed by a K+ channel-Fab complex at 2.0 Å resolution. Nature 2001, 414:43-48. 147. Mitsuoka K, Murata K, Walz T, Hirai T, Agre P, Heymann JB, Engel A, Fujiyoshi Y: The structure of aquaporin-1 at 4.5-A resolution reveals short alpha- helices in the center of the monomer. J Struct Biol 1999, 128:34-43. 148. Sato Y, Ariyoshi N, Mihara K, Sakaguchi M: Topogenesis of NHE1: direct insertion of the membrane loop and sequestration of cryptic glycosylation and processing sites just after TM9. Biochem Biophys Res Commun 2004, 324:281-287. 149. Umigai N, Sato Y, Mizutani A, Utsumi T, Sakaguchi M, Uozumi N: Topogenesis of two transmembrane type K+ channels, Kir 2.1 and KcsA. J Biol Chem 2003, 278:40373-40384. 150. Langosch D, Heringa J: Interaction of transmembrane helices by a knobs-into- holes geometry reminiscent of soluble coiled coils. PROTEINS: Struct Funct Genet 1998, 31:150-159. 151. Walther D, Eisenhaber F, Argos P: Principles of helix-helix packing in proteins: the helical lattice superposition model. J Mol Biol 1996, 255:536-553. 152. Senes A, Engel DE, DeGrado WF: Folding of helical membrane proteins: the role of polar, GxxxG-like and proline motifs. Curr Opin Struct Biol 2004, 14:465-479. 153. Kim S, Jeon TJ, Oberai A, Yang D, Schmidt JJ, Bowie JU: Transmembrane glycine zippers: physiological and pathological roles in membrane proteins. Proc Natl Acad Sci USA 2005, 102:14278-14283. 154. Liu Y, Engelman DM, Gerstein M: Genomic analysis of membrane protein families: abundance and conserved motifs. Genome Biol 2002, 3:research0054. 155. Russ WP, Engelman DM: The GxxxG motif: a framework for transmembrane helix-helix association. J Mol Biol 2000, 296:911-919. 156. Killian JA: Hydrophobic mismatch between proteins and lipids in membranes. Biochim Biophys Acta 1998, 1376:401-416. 157. Jensen MO, Mouritsen OG: Lipids do influence protein function-the hydrophobic matching hypothesis revisited. Biochim Biophys Acta 2004, 1666:205-226. 158. Helms V: Attraction within the membrane - Forces behind transmembrane protein folding and supramolecular complex assembly. EMBO Rep 2002, 3:1133-1138. 159. Park SH, Opella SJ: Tilt angle of a trans-membrane helix is determined by hydrophobic mismatch. J Mol Biol 2005, 350:310-318.

55 160. Ren J, Lew S, Wang Z, London E: Transmembrane orientation of hydrophobic α-helices is regulated both by the relationship of helix length to bilayer thickness and by the cholesterol concentration. Biochemistry 1997, 36:10213- 10220. 161. Yano Y, Matsuzaki K: Measurement of thermodynamic parameters for hydrophobic mismatch 1: self-association of a transmembrane helix. Biochemistry 2006, 45:3370-3378. 162. de Planque MR, Greathouse DV, Koeppe RE, 2nd, Schafer H, Marsh D, Killian JA: Influence of lipid/peptide hydrophobic mismatch on the thickness of diacylphosphatidylcholine bilayers. A 2H NMR and ESR study using designed transmembrane alpha-helical peptides and A. Biochemistry 1998, 37:9333-9345. 163. Mitra K, Ubarretxena-Belandia I, Taguchi T, Warren G, Engelman DM: Modulation of the bilayer thickness of exocytic pathway membranes by membrane proteins rather than cholesterol. Proc Natl Acad Sci USA 2004, 101:4083-4088. 164. Weiss TM, van der Wel PC, Killian JA, Koeppe RE, 2nd, Huang HW: Hydrophobic mismatch between helices and lipid bilayers. Biophys J 2003, 84:379-385. 165. Valiyaveetil FI, Zhou Y, MacKinnon R: Lipids in the structure, folding, and function of the KcsA K+ channel. Biochemistry 2002, 41:10771-10777. 166. Robinson NC: Functional binding of to cytochrome c oxidase. J Bioenerg Biomembr 1993, 25:153-163. 167. Bogdanov M, Heacock PN, Dowhan W: A polytopic membrane protein displays a reversible topology dependent on membrane lipid composition. EMBO J 2002, 21:2107-2116. 168. Wang XY, Bogdanov M, Dowhan W: Topology of polytopic membrane protein subdomains is dictated by membrane phospholipid composition. EMBO J 2002, 21:5673-5681. 169. Zhang W, Bogdanov M, Pi J, Pittard AJ, Dowhan W: Reversible topological organization within a polytopic membrane protein is governed by a change in membrane phospholipid composition. J Biol Chem 2003, 278:50128- 50135. 170. Zhang W, Campbell HA, King SC, Dowhan W: Phospholipids as determinants of membrane protein topology. Phosphatidylethanolamine is required for the proper topological organization of the gamma-aminobutyric acid permease (GabP) of Escherichia coli. J Biol Chem 2005, 280:26032-26038. 171. Dobrindt U: (Patho-)Genomics of Escherichia coli. Int J Med Microbiol 2005, 295:357-371. 172. Krogh A, Larsson B, von Heijne G, Sonnhammer E: Predicting transmembrane protein topology with a hidden Markov model. Application to complete genomes. J Mol Biol 2001, 305:567-580. 173. http://blanco.biomol.uci.edu/Membrane_Proteins_xtal.html. 174. White SH: The progress of membrane protein structure determination. Protein Sci 2004, 13:1948-1949. 175. Manoil C: Analysis of membrane protein topology using alkaline phosphatase and ß-galactosidase gene fusions. Methods in Cell Biol. 1991, 34:61-75. 176. Kimura T, Ohnuma M, Sawai T, Yamaguchi A: Membrane topology of the transposon 10-encoded metal- tetracycline/H+ antiporter as studied by site- directed chemical labeling. J Biol Chem 1997, 272:580-585.

56 177. Canfield VA, Levenson R: Transmembrane organization of the Na,K-ATPase determined by epitope addition. Biochemistry 1993, 32:13782-13786. 178. Chang XB, Hou YX, Jensen TJ, Riordan JR: Mapping of cystic fibrosis transmembrane conductance regulator membrane topology by glycosylation site insertion. J Biol Chem 1994, 269:18572-18575. 179. Wilkinson BM, Critchley AJ, Stirling CJ: Determination of the transmembrane topology of yeast Sec61p, an essential component of the endoplasmic reticulum translocation complex. J Biol Chem 1996, 271:25590-25597. 180. Manoil C, Beckwith J: A genetic approach to analyzing membrane protein topology. Science 1986, 233:1403-1408. 181. Broome-Smith JK, Spratt BG: A vector for the construction of translational fusions to TEM ß-lactamase and the analysis of protein export signals and membrane protein topology. Gene 1986, 49:341-349. 182. Feilmeier BJ, Iseminger G, Schroeder D, Webber H, Phillips GJ: Green fluorescent protein functions as a reporter for protein localization in Escherichia coli. J Bacteriol 2000, 182:4068-4076. 183. Ormo M, Cubitt AB, Kallio K, Gross LA, Tsien RY, Remington SJ: Crystal structure of the Aequorea victoria green fluorescent protein. Science 1996, 273:1392-1395. 184. Yang F, Moss LG, Phillips GN, Jr.: The molecular structure of green fluorescent protein. Nat Biotechnol 1996, 14:1246-1251. 185. Waldo GS, Standish BM, Berendzen J, Terwilliger TC: Rapid protein-folding assay using green fluorescent protein. Nat Biotechnol 1999, 17:691-695. 186. Drew DE, von Heijne G, Nordlund P, de Gier JWL: Green fluorescent protein as an indicator to monitor membrane protein overexpression in Escherichia coli. FEBS Lett 2001, 507:220-224. 187. Drew D, Slotboom DJ, Friso G, Reda T, Genevaux P, Rapp M, Meindl-Beinker NM, Lambert W, Lerch M, Daley DO, et al.: A scalable, GFP-based pipeline for membrane protein overexpression screening and purification. Protein Sci 2005, 14:2011-2017. 188. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215:403-410. 189. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, et al.: The Pfam protein families database. Nucleic Acids Res 2004, 32:D138-141. 190. Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, et al.: Pfam: clans, web tools and services. Nucleic Acids Res 2006, 34:D247-251. 191. Elofsson A, von Heijne G: Membrane protein structure: prediction vs. Reality. Ann.Rev.Biochem 2006, submitted. 192. Fleishman SJ, Ben-Tal N: Progress in structure prediction of alpha-helical membrane proteins. Curr Opin Struct Biol 2006, 16:496-504. 193. Fleishman SJ, Unger VM, Ben-Tal N: Transmembrane protein structures without X-rays. Trends Biochem Sci 2006, 31:106-113. 194. Ott CM, Lingappa VR: Integral membrane protein biosynthesis: why topology is hard to predict. J Cell Sci 2002, 115:2003-2009. 195. Möller S, Croning M, Apweiler R: Evaluations of methods for the predictive evaluation of membrane spanning regions. Bioinformatics 2001, 17:646-653. 196. Chen CP, Kernytsky A, Rost B: Transmembrane helix predictions revisited. Protein Sci 2002, 11:2774-2791.

57 197. Ikeda M, Arai M, Lao D, Shimizu T: Transmembrane topology prediction methods: A re-assessment and improvement by a consensus method using a data-set of experimentally characterized transmembrane topologies. In Silico Biol 2001, 2:1-15. 198. Cuthbertson JM, Doyle DA, Sansom MS: Transmembrane helix prediction: a comparative evaluation and analysis. Protein Eng Des Sel 2005, 18:295-308. 199. Melén K, Krogh A, von Heijne G: Reliability measures for membrane protein topology prediction algorithms. J Mol Biol 2003, 327:735-744. 200. Nilsson J, Persson B, von Heijne G: Consensus predictions of membrane protein topology. FEBS Lett 2000, 486:267-269. 201. Drew D, Sjöstrand D, Nilsson J, Urbig T, Chin CN, de Gier JW, von Heijne G: Rapid topology mapping of Escherichia coli inner-membrane proteins by prediction and PhoA/GFP fusion analysis. Proc Natl Acad Sci USA 2002, 99:2690-2695. 202. Kim H, Melén K, von Heijne G: Topology models for 37 Saccharomyces cerevisiae membrane proteins based on C-terminal reporter fusions and prediction. J Biol Chem 2003, 278:10208-10213. 203. Kim H, Melen K, Osterberg M, von Heijne G: A global topology map of the Saccharomyces cerevisiae membrane proteome. Proc Natl Acad Sci USA 2006, 103:11142-11147. 204. Granseth E, Daley DO, Rapp M, Melen K, von Heijne G: Experimentally constrained topology models for 51,208 bacterial inner membrane proteins. J Mol Biol 2005, 352:489-494. 205. Bowie JU: Flip-flopping membrane proteins. Nat Struct Mol Biol 2006, 13:94- 96. 206. Schuldiner S, Granot D, Mordoch SS, Ninio S, Rotem D, Soskin M, Tate CG, Yerushalmi H: Small is mighty: EmrE, a multidrug transporter as an experimental paradigm. News Physiol Sci 2001, 16:130-134. 207. Ubarretxena-Belandia I, Baldwin JM, Schuldiner S, Tate CG: Three-dimensional structure of the bacterial multidrug transporter EmrE shows it is an asymmetric homodimer. EMBO J 2003, 22:6175-6181. 208. Ma C, Chang G: Structure of the multidrug resistance efflux transporter EmrE from Escherichia coli. Proc Natl Acad Sci USA 2004, 101:2852-2857. 209. Pornillos O, Chen YJ, Chen AP, Chang G: X-ray structure of the EmrE multidrug transporter in complex with a substrate. Science 2005, 310:1950- 1953. 210. Tate CG: Comparison of three structures of the multidrug transporter EmrE. Curr Opin Struct Biol 2006, 16:457-464. 211. Soskine M, Mark S, Tayer N, Mizrachi R, Schuldiner S: On parallel and antiparallel topology of an homodimeric multidrug transporter. J Biol Chem 2006. 212. Yerushalmi H, Schuldiner S: A common binding site for substrates and protons in EmrE, an ion-coupled multidrug transporter. FEBS Lett 2000, 476:93-97.

58