<<

GFP as a tool to monitor membrane topology and overexpression in Escherichia coli

2005 David Eric Drew

Doctoral thesis 2005 Department of Biochemistry and Biophysics Stockholm University, S-106 91 Stockholm Sweden

ISBN 91-7155-160-3, pp. 1-65 Intellecta Docusys, Stockholm 2005

All previously published papers are reprinted with permission from the publisher.

2

Table of Contents

Abstract 5 Abbreviations 6 List of publications 8

1. Introduction 9

1.2 Membrane 11 1.2.1 α-helical architecture 12 1.2.2 biogenesis 15 1.2.3 Membrane 16 1.2.4 Membrane proteins and lipids 18

2.1 Membrane protein topology 20 2.1.1 Topology prediction algorithms 20 2.1.2 Reliability of topology prediction 21 2.1.3 Experimental topology mapping 22

2.2 High-throughput topology mapping of E. coli membrane proteins 25 2.2.1 A consensus approach for generating topology models 25 2.2.2 Using GFP as a cytoplasmic membrane protein topology reporter in E. coli 25 2.2.3 Combining C-terminal orientation analysis with a consensus- prediction approach 28 2.2.4 The reliability of topologies generated by a consensus approach 28 2.2.5 Generating topology models by constraining TMHMM 29 2.2.6 Why does GFP work as a topology reporter? 30 2.2.7 Comparing 2D maps to 3D-structures 31 2.2.8 Summary of high-throughput membrane protein topology mapping 31

3.1 Membrane protein overexpression 33 3.1.1 Limited availability of biogenesis factors and/or lipid space may hamper membrane protein overexpression 33 3.1.2 ‘Trial-and-Error’ 34 3.1.3 Choosing a membrane protein overexpression host 35 3.1.4 General strategies for membrane protein overexpression in E. coli 36 3.1.5 The BL21(DE3)pET-system 38 3.1.6 Membrane protein purification 38

3.2 High-throughput membrane protein overexpression in E. coli 40 3.2.1 Inclusion bodies of membrane protein-GFP fusions are not fluorescent 40 3.2.2 GFP tagging works only for membrane proteins with a cytoplasmic C-terminus 42 3.2.3 GFP as a membrane protein folding indicator in whole cells 42

3 3.2.4 GFP-based screen to optimize membrane protein overexpression 43 3.2.5 In-gel GFP fluorescence 44 3.2.6 GFP-based purification pipeline 46 3.2.7 Recovery of membrane proteins from GFP fusions using a site specific protease 46 3.2.8 How does this GFP-based method compare to other high-throughput approaches? 47 3.2.9 Summary of high-throughput membrane protein overexpression 48

4. Characterization of the membrane protein YedZ 49 4.1.1 A test case for the GFP-based purification pipeline: YedZ 49 4.1.2 YedZ is a novel integral membrane flavocytochrome 49 4.1.3 The possible function of YedZ 52

5. Conclusions 55

References 56 Acknowledgements 65

4 Abstract

Membrane proteins are essential for life, and roughly one-quarter of all open reading frames in sequenced genomes code for membrane proteins. Unfortunately, our understanding of membrane proteins lags behind that of soluble proteins, and is best reflected by the fact that only 0.5% of the structures deposited in the protein data-bank (PDB) are of membrane proteins. This discrepancy has arisen because their hydrophobicity - which enables them to exist in a lipid environment - has made them resistant to most traditional approaches used for procuring knowledge from their soluble counter-parts. As such, novel methods are required to facilitate our knowledge acquisition of membrane proteins. In this thesis a generic approach for rapidly obtaining information on membrane proteins from the classic bacterial encyclopedia Escherichia coli is described. We have developed a Green Fluorescent Protein C-terminal tagging approach, with which we can acquire information as to the topology and ‘expressibility’ of membrane proteins in a high-throughput manner. This technology has been applied to the whole E. coli inner membrane proteome, and stands as an important advance for further membrane protein research.

5 Abbreviations

BiP binding protein, Hsp70 C-terminal carboxy-terminal ER endoplasmic reticulum FRET fluorescence resonance energy transfer GFP green fluorescent protein GPCR G-protein coupled receptor HMM hidden Markov model IMAC immobilized metal affinity chromatography IPTG isopropyl-β-D-thiogalactoside Lep signal peptidase I, leader peptidase Mo-MPT molybdenum-molybdopterin N-terminal amino-terminal NR nitrate reductase ORF open reading frame PE phosphatidylethanolamine PhoA alkaline phosphatase Pmf proton motive force SRP signal recognition particle Tat twin arginine translocation TEV tobacco etch virus TMs transmembrane segments UPR unfolding protein response

6 Amino acid designations

Alanine Ala A Cysteine Cys C Aspartic acid Asp D Glutamic acid Glu E Phenylalanine Phe F Glycine Gly G Histidine His H Isoleucine Ile I Lysine Lys K Leucine Leu L Methionine Met M Asparagine Asn N Proline Pro P Glutamine Gln Q Arginine Arg R Serine Ser S Threonine Thr T Valine Val V Tryptophan Trp W Tyrosine Tyr Y

7 List of publications This thesis is based upon the following publications:

Paper I. Drew D, Sjöstrand D, Nilsson J, Urbig T, Chin CN, de Gier JW, von Heijne G. Rapid topology mapping of Escherichia coli inner-membrane proteins by prediction and PhoA/GFP fusion analysis. Proc Natl Acad Sci U S A. 2002 Mar 5;99(5):2690-5.

Paper II. Rapp M, Drew D, Daley DO, Nilsson J, Carvalho T, Melen K, De Gier JW, von Heijne G. Experimentally based topology models for E. coli inner membrane proteins. Protein Sci. 2004 Apr;13(4):937-45.

Paper III. Drew D, von Heijne G, Nordlund P, de Gier JW. Green fluorescent protein as an indicator to monitor membrane protein overexpression in Escherichia coli. FEBS Lett. 2001 Oct 26;507(2):220-4.

Paper IV. Drew D, Slotboom D, Friso G, Reda T, Genevaux P, Rapp M, Meindl-Beinker N, Lambert W, Lerch M, Daley DO, van Wijk KJ, Hirst J, Kunji E, de Gier JW. A scalable, GFP-based pipeline for membrane protein overexpression screening and purification. Protein Sci. 2005 Aug;14(8):2011-7.

Other Publications

Urbanus ML, Fröderberg L, Drew D, Bjork P, de Gier JW, Brunner J, Oudega B, Luirink J. Targeting, insertion, and localization of Escherichia coli YidC. J Biol Chem. 2002 Apr 12;277(15):12718-23.

Drew D, Fröderberg L, Baars L, de Gier JW. Assembly and overexpression of membrane proteins in Escherichia coli. Biochim Biophys Acta. 2003 Feb 17;1610(1):3-10. Review.

Daley DO, Rapp M, Granseth E, Melen K, Drew D, von Heijne G. Global topology analysis of the Escherichia coli inner membrane proteome. Science. 2005 May 27;308(5726):1321-3.

8 1. Introduction

All cells are surrounded by a membrane, a barrier that separates the cell from the environment it faces. The membrane of the cell is mainly composed of lipid and protein at an average ratio of 1:1 (Boon and Smith, 2002). Lipids are dual natured. They consist of polar head groups that favor contact with water, and hydrophobic tails - made up of acyl carbon chains - which implicitly avoid water. The lipids pack into a fluid bilayer whereby the tails face each other and the head groups, e.g., phosphate, are in contact with the surrounding water, Figure 1.

Figure 1. Schematic representation of a lipid bilayer; blue spheres represent polar head-groups, yellow sticks represent lipid tails, coloured cylinders represent membrane proteins, and attached sugars are represented by black antlers.

The driving force in the formation of a lipid bilayer is the spontaneous packing of hydrophobic tails, as the entropy of water is increased during reduction of the hydrated hydrophobic surface, i.e., the hydrophobic effect. The outcome is a hydrophobic barrier that is impermeable for most molecules to cross without the aid of proteins which are embedded in it. Not only are these ‘membrane proteins’ required to facilitate the transport of various compounds either passively or actively across the membrane, but they also e.g., impart structural

9 support, maintain voltage differences, enable interactions with other cells, and transfer information from the outside to the inside of the cell. In other words membrane proteins are essential for life, and roughly one-quarter of our genes code for membrane proteins (Wallin and von Heijne, 1998). Strikingly, at least ~50% of all drugs manufactured today are targeted to membrane proteins (Muller, 2000). Unfortunately, our understanding of membrane proteins lags behind that of soluble proteins, and is best reflected in the fact that only 0.5% of the structures deposited in the protein data-bank (PDB) are of membrane proteins (White, 2004). This discrepancy has arisen because their hydrophobicity, which enables them to exist in a lipid environment, has made them resistant to most traditional approaches used for procuring knowledge from their soluble counter- parts. As such, novel methods are required to facilitate our knowledge acquisition of membrane proteins. In this thesis a generic approach for rapidly obtaining information on membrane proteins from the classic bacterial encyclopedia Escherichia coli is described. We have developed a Green Fluorescent Protein (GFP) C-terminal tagging approach, with which we can acquire information as to the topology and ‘expressibility’ of membrane proteins in a high-throughput manner. This technology has been applied to the E. coli inner membrane proteome, and stands as an important advance for further membrane protein research.

Before we discuss this work in detail a clearer understanding of membrane proteins is required.

10 1.2 Membrane proteins

Like lipids, membrane proteins consist of hydrophobic and hydrophilic parts. These parts come together to produce two types of membrane protein architecture, α-helical membrane proteins and β-barrel membrane proteins, Figure 2. β-barrel membrane proteins are composed of an even number of anti- parallel β-strands which hydrogen bond laterally to each other in the formation of the barrel (Schulz, 2003). Amino acid side chains of mixed polarity extend into the aqueous pore, whilst amino acids with apolar side chains line the outside of the barrel and project into the lipid bilayer. Because this class of membrane proteins is restricted to outer membranes of Gram-negative bacteria, mitochondria and the outer envelope membrane of chloroplasts, it is not further discussed here.

Figure 2: Two types of membrane protein architecture; (a) an example of a α- helical membrane protein and (b) an example of a β-barrel membrane protein (Walian et al., 2004).

11 1.2.1 α-helical architecture The majority of membrane proteins are α-helical membrane proteins (Wallin and von Heijne, 1998), henceforth they will be referred to simply as ‘membrane proteins’. α-helical secondary structure is stabilized by main-chain hydrogen bonding between backbone amide and carbonyl groups four amino acids apart. Amino acid side chains with different physicochemical properties can extend at predominantly right angles from the helix, i.e., amino acids with apolar side chains project into the hydrophobic core of the lipid bilayer. Three dimensional (3D) structures confirm that α-helices typically span the full-width of the lipid bilayer, and are often referred to as trans-membrane segments (TMs). Statistically, TMs are around 20-25 amino acids long with an average tilt angle of 24° to the membrane normal (Ulmschneider et al., 2005), though this tilt can change to accommodate the thickness of the lipid bilayer (Park and Opella, 2005).

- Helix core- A distance of ± 15Å from the centre of the membrane defines the core region of the lipid bilayer, it has the lowest dielectric constant, and as such, charged residues are uncommon in the middle of TMs (<6%) and hydrophobic amino acids leucine, valine, isoleucine, alanine are abundant ~45% (Ulmschneider et al., 2005), Figure 3. Amino acids with small side chains e.g., glycine and serine, are also common, 7% each, facilitating packing between TMs (see section 1.2.3). Biophysical and biological scales are broadly consistent with statistical analysis. Charged residues arginine, aspartate, glutamate, and lysine are clearly

disfavored in the middle of the helix (∆Gapp 2.5 to 3.5 kcal/mol) whereas hydrophobic amino acids leucine, valine, isoleucine, phenylalanine are favoured

(∆Gapp of -0.5 to -0.3 kcal/mol) (Hessa et al., 2005a). Proline while unfavorable is often found in TMs to induce a helical angle change of some functional significance (Senes et al., 2004), e.g., the sixth TM segment of the voltage-gated potassium channel Kv1.2, contains a conserved Pro-X-Pro motif which forms a

12 receptor for its voltage sensor (Long et al., 2005a). Indeed, proline has one of the largest phenotypic propensities in TM sequences from the Human Gene Mutation Database (Senes et al., 2004).

Figure 3: Schematic representation of a TM segment in a lipid bilayer; residues with positional preference are indicated by their short-hand nomenclature, e.g., W= tryptophan (see abbreviations).

- Interfacial regions-

Aromatic amino acids tryptophan and tyrosine have a clear preference (∆Gapp - 0.6 kcal/mol) for the lipid interface (-25 to -15Å and 15 to 25Å), as these residues can match their amphipathic side-chain character with that of the interfacial lipid region, Figure 3 (Hessa et al., 2005a). The penalty of moving tryptophan or tyrosine from the interface to the aqueous domain has been calculated to be 1.85 and 0.94 kcal/mol, respectively (Wimley and White, 1996). In addition, the terminal placement of tryptophan in a model polyleucine TM segment is enough to promote a C-terminal-in-orientation (Higy et al., 2004). In contrast,

13 phenylalanine has no positional preference for the interface (Hessa et al., 2005a; Ulmschneider et al., 2005). Charged residues make up one-fifth of the amino acids found in this region (Ulmschneider et al., 2005). Along with polar residues, they often extend their side-chains to the aqueous domain to help anchor TMs (Chamberlain et al., 2004). This ‘snorkeling’ phenomenon is calculated to be stronger in positively charged residues lysine and arginine, either because their side-chains are longer and/or for the reason that they also interact favourably with negatively-charged lipid head-groups (Strandberg and Killian, 2003). Snorkeling is also apparent for the positively charged residues in interfacial helices which make up 30% of the non-TM fold (Granseth et al., 2005b). Interestingly, lysine can make π-cation interactions with tyrosine. This pairing promotes additional long-range electrostatic interactions with negatively charged lipid head-groups (Gromiha and Suwa, 2005). As proline can destabilizes helices it more likely to be found at either end of the helix (interfacial region), with the C-terminal end better tolerated over the N-terminal end (Yohannan et al., 2004). The destabilizing effect of proline is calculated to be stronger in straight TMs compared to angled TMs (Senes et al., 2004). Proline may also aid protein folding by promoting the formation of random coils (Ulmschneider et al., 2005), which make up 70% of the non-TM segment fold found in this region (Granseth et al., 2005b).

-Non-membranous domains- The hydrophilic membrane protein parts are composed of N- and C- terminal tails and ‘loops’ that connect TMs. In all organisms, the frequency of positively charged residues is higher in cytoplasmically localized non-membranous domains, an observation that was coined the ‘positive-inside-rule’ (von Heijne, 1989). The preference of these positively charged residues for the cytoplasmic domain influences the topology of connecting TMs accordingly. The ability of

14 positively charged residues to dictate the orientation of a TM segment seems to depend on the overall hydrophobicity of the TM segment, and the distance of the charged residues from it (Higy et al., 2004; Nilsson et al., 2005). The basis for the rule still remains unclear. Although, it was demonstrated some time ago that the proton-motive-force is required in the establishment of this phenomenon in E. coli (Andersson and von Heijne, 1994), it offers only a partial explanation as there is no apparent electrochemical potential across the ER membrane. Recently, it was reported that charged residues in the translocon itself, by either attracting or repelling charged amino acids may play a role, i.e., to promote the orientation of a TM segment before it inserts into the lipid-bilayer (Goder et al., 2004).

Cytoplasmic N- and C- terminal tail orientations are predicted to be preferred in all cells (Wallin and von Heijne, 1998). The percentage of E. coli membrane proteins with both their N- and C-terminal ends in the cytoplasm was experimentally measured at 60% (Daley et al., 2005). It appears that helices may also have a preference for inserting into lipids as pairs (Hermansson and von Heijne, 2003); the targeting to and insertion of membrane proteins into the membrane is discussed below.

1.2.2 Membrane protein biogenesis Although soluble domains of membrane proteins can fold autonomously into the aqueous milieu of the cell, hydrophobic ΤΜs need to be actively assisted into the lipid bilayer. This assistance surpasses the input of energy required to overcome the insertion activation barrier imposed by the lipid bilayer, and prevents ΤΜs from aggregating in the cytoplasm. How does this work? If a (presumably) α-helical and sufficiently hydrophobic stretch of amino acids has exited the tunnel, it will be interpreted by the cell as a ‘signal’ for targeting to the membrane (Batey et al., 2000; Huber et al., 2005). This signal is often present at the N-terminus of the membrane protein, and is typically recognized by the signal recognition particle

15 (SRP) (Luirink and Sinning, 2004). SRP binds to the polypeptide chain, at least in eukaryotes, halts further translation whilst targeting the nascent chain to the lipid bilayer. Whether or not the ribosome can ‘prime’ SRP by sensing the presence of a TM segment before it exits the ribosome is a matter of debate (Houben et al., 2005; Woolhead et al., 2004). At the membrane, SRP makes contact with the SRP receptor, and the nacent chain is subsequently transferred to the Sec translocon; a multimeric protein-conducting channel embedded in the lipid bilayer (Driessen et al., 2001; Van den Berg et al., 2004). Translation subsequently resumes, and if the targeted nascent chain and/or other segments downstream are hydrophobic and long enough, they will pass laterally through a opening in the Sec translocon (Rapoport et al., 2004). The degree of insertion seems to depend solely on energetically favorable helix-lipid interactions (Hessa et al., 2005b). In the lipid bilayer, TM folding can be aided early on by other membrane bound chaperones, such as YidC in the cytoplasmic bacterial membrane (Houben et al., 2005).

1.2.3 Membrane protein folding Membrane protein structures can (almost) be considered as ‘inside-out’ soluble proteins, as the average hydrophobic exterior of a membrane protein is twice that of its interior (Adamian et al., 2005). For membrane proteins with multiple TMs, the TMs must come together to form a functional protein. From a global perspective the hydrophobic effect drives the formation and subsequent insertion of α-helices through the translocon - unfolding a 20 amino acid helix in the lipid bilayer would cost ~40-80 kcal/mol (Schneider, 2004) - but what pushes helices together? Thermodynamic contributions of this process have been difficult to assess because membrane proteins are difficult to purify and do not fold reversibly under standard laboratory conditions (DeGrado et al., 2003); a requirement for measuring folding equilibria, albeit that a fully reversible system was recently

16 established for the β-barrel membrane protein OmpA (Hong and Tamm, 2004). Considerable understanding of this process has been based on helix dimerization of glycophorin A (gpA), whereby physical association of GlyXXXGly (a widespread motif in TMs, Senes et al., 2000) can be conveniently monitored, e.g., by analytical ultracentrifugation, fluorescence resonance energy transfer (FRET), gel-electrophoresis, etc. (White and von Heijne, 2005). The predominant view is that once inserted in the membrane, helix-helix association is driven by the formation of favourable electrostatic interactions between side chains of polar amino acids (Dawson et al., 2002). This is in line with statistical analyses, as polar residues occupy 20% of all the residues found in TMs (Dawson et al., 2003), and with the observation that in every TM segment of every multispanning membrane solved so far, there is at least one inter-helical hydrogen bond (Senes et al., 2004). Perceptually, the flip- side of promiscuous electrostatic interactions between TMs is that it could lead to aggregation by forming erroneous hydrogen bonds (Schneider, 2004). Yet this is not the case. Once driven together, helical packing is coordinated by close, specific Van der Waals interactions of non-polar residues, which often interlink to build ‘knobs-into-holes’ packing (Engelman et al., 2003). As demonstrated for TMs in mechanosensitive ion channels, this helix packing can be fine-tuned to control the function of the protein in a most exquisite way (Edwards et al., 2005). Mechanosensitive channels are force transducing molecules which move TMs to open a channel in response to membrane tension (Kung, 2005). Mutations made in a pore forming TM segment of a bacterial mechanosensitive channel to strengthen knobs-into-holes packing to interacting TMs, leads to a loss-of- function as the channel does not open under the same magnitude of membrane tension; in contrast, an amino acid mutation to the polar amino acid serine makes the channel easier to open as ‘wild-type’ helical packing is lost (Edwards et al., 2005).

17 At short distances, hydrogen bonding between main chain Cα− H … O donors may also stabilize helices (Senes et al., 2001), although their interaction is weak, there can be many such interactions, e.g., in photosystem I there are 34 TMs and

75 Cα− H … O hydrogen-bonds (Jordan et al., 2001). Helix association can be strong enough to maintain oligomerisation even in the absence of lipids and in the presence of a harsh detergent, i.e., potassium channel KcsA remains a tetramer in SDS (Krishnan et al., 2005). Beyond the two-stage model of membrane protein folding (single TM insertion and packing), bringing of helices together depends also on insertion of co-factors, extramembranous polypeptide segment folding, and the assembly of membrane protein complexes (Engelman et al., 2003). Lastly, it has been speculated that the lipids themselves might drive interactions between TMs, as computational measurements postulate that lipid entropy increases as the protein-lipid interface decreases (Helms, 2002).

1.2.4 Membrane proteins and lipids Clearly membrane proteins and lipids go hand-in-hand; they define favorable amino acid residues in helices and dictate the insertion and folding rate of TMs through the translocon. Not only is it becoming increasingly clear that certain lipids interact more favorably with some amino acids (e.g., lysine/tyrosine π- cation long-range interactions to phosphate head groups, Gromiha and Suwa, 2005), or to some membrane proteins (e.g., cardiolipin in the purple bacterial photosynthetic reaction centre, Fyfe et al., 2005), but lipids to some extent must also supply different lateral pressure to different membrane proteins (Jensen et al., 2004). Lateral pressure in different membranes is increased by the addition of lipids with unsaturated chains and/or non-bilayer head-groups, e.g., phosphatidylethanolamine (PE). One idea is that membrane proteins insert easier into a bilayer of lower curvature stress (e.g., as shown by in vitro folding studies

18 of bacteriorhodopsin into different liposomes), but that a certain degree of lateral pressure is still needed to maintain a functional state (Booth, 2005). Interlinked is the membrane bilayer thickness to TM segment length, that is, the degree of hydrophobic mismatch between the α−helices and lipid (Jensen and Mouritsen, 2004). As demonstrated in vitro with the bacterial melibiose transporter (there are many analogous examples), maximum transport is only reached at specific acyl carbon chain lengths (Jensen and Mouritsen, 2004). Indeed, to seemingly match the thickness of the lipid bilayer, on average, TMs of Golgi membrane proteins are five amino acids shorter than those of plasma membrane proteins (Munro, 1998). Lastly, it is clear that lipid composition can affect membrane protein topology (see next section). For instance, in the absence of PE the first six TMs of lactose permease (LacY) are inverted; addition of PE after assembly of this partly inverted protein restores the correct topology (Bogdanov et al., 2002).

19 2.1 Membrane protein topology

It is envisaged that in the future more rules that govern the architecture of a membrane protein will be resolved, eventually allowing the construction of meaningful in silico membrane protein 3D-structure predictions from amino acid sequence (White and von Heijne, 2005). At present, to bridge the void created by the lack of membrane protein structures, one can formulate 2D-structure models using computer algorithms. 2D-structures are commonly referred to as ‘topology’ models, and define the number, position, and orientation of TMs relative to the membrane.

2.1.1 Topology prediction algorithms The most simplistic topology models are produced solely by computer algorithms. The five topology predictors used in this thesis are described below.

[1] The algorithm TopPred scans for a TM segment in a given amino acid sequence by searching for ‘threshold’ hydrophobicity over a typical TM segment length (trapezoid-shaped window of 21aa). The positive-inside-rule is then used to decide upon TM segment orientation (von Heijne, 1992). [2] The Memsat algorithm increases the number of states used in TopPred from two (helix or loop) to five (inside loop, outside loop, inside helix cap, helix core, and outside helix). The probability that amino acids of an inputted amino acid sequence belong in these states, their likelihood, is calculated based on a membrane protein database of well-characterized topology. The most probable outcome, i.e., the topology, is formulated by the statistical method ‘expectation maximization’ and orientation/location agreed upon by incorporating another dynamic programming algorithm (Jones et al., 1994). [3] The PHDhtm algorithm estimates only two states (helix or loop), but unlike TopPred, improves the signal by feeding off a multiple sequence

20 alignment. Notably, the algorithm has been ‘trained’ using neural networks from a set of membrane proteins with known topology (Rost et al., 1996). [4 and 5] The latest generation topology prediction programs HMMTOP (Tusnady and Simon, 1998) and TMHMM (Krogh et al., 2001), are the ‘best’ combination of the aforementioned programs. Like Memsat, HMMTOP and TMHMM take into account different states, five and seven respectively, and analogous to PHDhtm use machine-learning algorithms, in this case, hidden Markov models (HMM) to look for amino acid distribution patterns similar to those defined in the training set. One advantage of TMHMM compared to the other algorithms is that reliability scores are also generated. Recently, a newer version of TMHMM was developed, like PHDhtm, it allows the input of multiple sequence alignments. The TMHMM prediction performance is improved by ~8% (Viklund and Elofsson, 2004).

2.1.2 Reliability of topology prediction TMHMM is able to accurately predict the topology of 75% of the membrane proteins used in training its HMM algorithm (Krogh et al., 2001). However, as this training sample set is quite small, the predictive power is poorer for previously unseen membrane proteins, 55-60% (Melen et al., 2003). The sample set is also biased, as experimental determined topologies have favored those membrane proteins that were easier to analyze owing to the fact they have had clearly defined topological features, i.e., unusually hydrophobic TM segments and/or an obvious positive charge difference between inside and outside loops (Melen et al., 2003). As many of the easy to analyze proteins are prokaryotic in origin, eukaryotic membrane proteins are underrepresented in all training sets (Ott and Lingappa, 2002). Thus, the predictive performance by TMHMM for eukaryotic membrane proteins is slightly worse, ~50% (Melen et al., 2003). Highly reliable topology models can be generated by combining the aforementioned five prediction methods, TopPred, Memsat, PHDhtm,

21 HMMTOP, and TMHMM; when all methods agree the topology is virtually certain to be correct, whereas the fraction of correct topologies decreases with increasing disagreement between the methods (Nilsson et al., 2000). An approach to improve the membrane protein topology prediction is to bioinformatically anchor domains in a prediction which are 100% certain to lie on either one or the other side of the membrane, e.g., a cytosolic tyrosine phosphatase domain. In eukaryotic genomes such domains provide 11% coverage (Bernsel and Von Heijne, 2005). Alternatively, one can experimentally map the location of loops and tails in a membrane protein by a variety of methods (explained below). Just determining the C-terminal tail location of E. coli membrane proteins helps TMHMM to improve its overall prediction accuracy from 55 to 70%, i.e., as these domains can now be fixed in the topology prediction (Melen et al., 2003).

2.1.3 Experimental topology mapping Experimental approaches are often used to refine in silico topology models which are not only biased, but (in general) are likely to miss details which are hard, if not impossible, to predict, e.g., unanticipated inter- and intra- protein interactions (Ott and Lingappa, 2002). One approach of obtaining information is to use site-directed mutagenesis to introduce amino acids which are compatible to different topology determination methods, e.g., cysteine scanning, glycosylation mapping, and proteolytic cleavage. For eukaryotic membrane proteins the most common method is glycosylation mapping, which takes advantage of the fact that N-linked glycosylation - the addition of ~2.5kDa worth of sugars to Asn-X-Ser/Thr acceptor sequences - is possible only within the luminal compartment of the ER. In practice, after adding glycosylation acceptor sequences into the predicted soluble parts of the membrane protein by site-directed mutagenesis, the membrane protein is transcribed and translated in vitro. The addition of sugars to

22 the membrane protein is distinguished from unglycosylated forms by the slight difference in molecular weight after separation by SDS-PAGE (Nilsson and von Heijne, 1993). Perhaps the most labor intensive, and yet the most informative and least invasive topology mapping method is cysteine scanning. In this method cysteines are recombinantly added to a cysteine-less membrane protein, and their localization within the membrane protein mapped by membrane permeable or impermeable thiol-reagents (Bogdanov et al., 2005). This is a powerful method as it is possible to elucidate the local environment of a single amino acid. This approach was nicely demonstrated for the secondary-active transporter LacY (Frillingos et al., 1998).

Another approach for obtaining topology information is to fuse a reporter to all of the predicted solvent-exposed domains in the membrane protein. The reporter can be fused end-to-end on, or ‘sandwiched’ (if chimera retains activity), into different loops such that the full-length membrane protein is always expressed (van Geest and Lolkema, 2000). When produced in E. coli the two most common reporters are enzymes that catalyze a reaction on either one or the other side of the membrane; the cytoplasm or periplasmic space (see below).

[1] Alkaline phosphatase (PhoA) is a soluble bacterial protein that is only folded and functional when exported to the periplasm of E. coli where it can form essential disulfide-bonds. It was one of the first, and still remains to be, one of the most commonly used topology reporters. PhoA activity - the hydrolysis of phosphoric esters – is measured easily with a substrate that changes colour upon hydrolysis, e.g., p-nitrophenyl phosphate turns yellow. If PhoA remains in the reducing environment of the cytoplasm it is sensitive to proteolysis because it cannot form disulfide bonds (Manoil, 1991).

23 [2] β-galactosidase (LacZ) is a large tetrameric cytoplasmic enzyme, part of the classic ‘lac operon’ which hydrolyzes lactose into galactose and glucose. It complements PhoA as it is only active in the cytoplasm; when targeted to the periplasm it becomes trapped in the membrane, and inactive. Its activity can also be measured colorimetrically, as it turns the chromogenic substrate X-gal (5- bromo-4-chloro-3-indoyl-β-D-galactoside) blue (Manoil, 1991). To avoid false-negatives, reporter activity is usually normalized against protein expression. Protein expression is typically measured by Western-blotting or immunoprecipitations (IPs) (van Geest and Lolkema, 2000). Thus, analyzing many fusions is often labor intensive. A disadvantage with LacZ is that it may generate false-positives as a result of many artifacts, e.g., saturation of the export machinery. In contrast, PhoA is reported to be more reliable because an active fusion has to be successfully exported to the periplasm. In principle, a combination of PhoA / LacZ reporters to the same sites in the membrane protein is best. Unfortunately, ambiguous high LacZ and PhoA reporter activities to identical fusion sites have been reported in many cases (van Geest and Lolkema, 2000).

Papers I and II This thesis deals with the development of GFP as a high-throughput cytoplasmic membrane protein topology reporter. GFP can be used in combination with the periplasmic reporter PhoA, to rapidly establish the C-terminal tail orientation of a membrane protein. The usefulness of combining this information with bioinformatics to generate reliable topology models is shown.

24 2.2 High-throughput topology mapping of E. coli membrane proteins

High-throughput topology mapping requires a methodology that can simultaneously handle many membrane proteins, is reliable, robust, and easy to use. We have found that this is most easily accomplished for E. coli and Saccharomyces cerevisiae membrane proteins in their respective hosts, by combining topology prediction with minimal experimental information (Paper I; Kim et al., 2003). Here we will focus only on the high-throughput topology mapping of membrane proteins in E. coli. Topology prediction is best generated by a ‘consensus approach’ or by constraining TMHMM (as explained in section 2.1.2). For analyzing many membrane proteins in E. coli, in favor of the other approaches (section 2.1.3), minimal experimental information is best obtained using single end-to-end C-terminal reporter-protein fusions.

2.2.1 A consensus approach for generating topology models For about 80 out of the predicted 737 multispanning membrane proteins in E. coli, five prediction programs (section 2.1.1) agree on the location of the N- terminus, but disagree on the location of the C-terminus because of - plus or minus - one TM segment. When the analysis of such cases was applied to a membrane protein test set of known topology, the correct topology could always be inferred from either one of the two majority predictions (Nilsson et al., 2000). Thus, the reliability of the prediction is very high when all the methods agree, and the correct topology can be simply determined by assigning the C-terminal tail location of the membrane protein.

2.2.2 Using GFP as a cytoplasmic membrane protein topology reporter in E. coli Because of the artifactual tendency of historically used cytoplasmic reporters (e.g., LacZ), it was decided that the development of a new topology reporter would benefit greatly the C-terminal mapping of many membrane proteins in E.

25 coli. For this reason, we sought to establish if GFP could be used to monitor membrane protein topology. GFP was selected because it is incorrectly folded and does not fluoresce when targeted to the periplasm of E. coli with a Sec-type signal peptide (Feilmeier et al., 2000). This finding suggested that it would be likewise inactive when fused to periplasmic membrane protein segments. Importantly, GFP is compatible with the aforementioned high-throughput criterion; fluorescence from E. coli cells expressing membrane protein-GFP fusions is easy to measure, and only the amount of protein that is membrane embedded is fluorescent (Paper III). To test if GFP could be used to assign the C- terminal tail orientation of a membrane protein, GFP was fused to the C-terminal tail of the membrane protein leader peptidase (Lep/periplasmic C-terminus) and to its positive charge rearrangement mutant, inverted leader peptidase (Lepinv/cytoplasmic C-terminus). Lep/Lepinv-GFP fusions were expressed under standard conditions (section 3.1.4). Induced expression at a temperature of 37°C produced clear differences in Lep and Lepinv GFP fluorescence. The mutant Lepinv with the cytoplasmic C- terminus was ~10-fold more fluorescent in liquid culture than Lep (Paper I). At the lower temperature of 25°C the difference was less, therefore, cells were always cultured at 37°C, Figure 4a. After Western-blotting using antibodies directed against either GFP or Lep, it was apparent that the Lep-GFP fusion was degraded, Figure 4c. As a further control, other membrane protein-GFP fusions with cytoplasmic C-terminal tails were tested, Figure 4b. Membrane proteins with periplasmic C-terminal tails contain less fusion, perhaps due to degradation, and are consistently less fluorescent (Paper I).

26

Figure 4: GFP as an E. coli cytoplasmic topology reporter. A) Lep-GFP vs. Lepinv- whole-cell GFP fluorescence, B) ExbB-, SecF-, Lepinv-, Lep-, Sec- GFP whole-cell GFP fluorescence, C) Western-blotting of Lep-GFP and Lepinv-GFP after induced expression at 25°C (lanes 2, 5) or 37°C (lanes 3, 6); decorated with either Lep antibody (top panel) or GFP antibody (bottom panel), D) Contrasting PhoA (top graph)/GFP (bottom graph) activities for 12 E. coli membrane proteins that adhere to the majority-vote criterion (Paper I).

27 2.2.3 Combining C-terminal orientation analysis with a consensus-prediction approach PhoA and GFP C-terminal fusions were made to an initial set of 12 membrane proteins, MarC, PstA, TatC, YaeL, YcbM, YddQ, YdgE, YedZ, YgjV, YiaB, YigG, and YnfA, out of a possible 80 or so E. coli membrane proteins that adhered to our consensus criterion. After expression of fusions, as before, GFP and PhoA activities were measured. Cut-off values for what was considered ‘high’ or ‘low’ GFP fluorescence were arbitrarily decided based on the differences between Lepinv- GFP (cytoplasmic C-terminus), and Lep-GFP (periplasmic C-terminus) fluorescence (Paper I). A ‘high’ fluorescent signal over a certain threshold (12,000 units) allowed a cytoplasmic location to be tentatively assigned. A ‘low’ fluorescent signal was considered ambiguous, as it is impossible to distinguish between poorly expressing membrane proteins and those with periplasmic C- terminal tails. The location of the C-terminus was established when the fluorescent activity was in agreement with the activity from the periplasmic reporter PhoA, Figure 4d (section 2.1.3). Only two of the 12 membrane proteins (YaeL, YigG) had insufficient differences between the PhoA and GFP activities to be certain of the location of the C-terminus. For these two membrane proteins and a control, truncated GFP fusions were made to clarify the C-terminal tail orientation. The final C-terminal tail locations were then used to ascertain the correct topology predictions (Paper I).

2.2.4 The reliability of topologies generated by a consensus approach Encouraged by the consistent contrasting PhoA/GFP activity profiles used to map topologies of 12 E. coli membrane proteins, C-terminal PhoA/GFP fusions were made to another 37 E. coli membrane proteins and analyzed (Paper II). A few membrane proteins included in this test set had a known topology. The GFP activity from these membrane proteins were used to refine the original ad-hoc

28 cut-offs values made from contrasting Lep/Lepinv-GFP activity, in the assignment of unambiguous C-terminal tail locations. For 34 out of the 37 membrane proteins, contrasting PhoA and GFP activities were sufficient to assign a C-terminal tail location. This brought the total number of topologies mapped up to 46 (Paper II). After analyzing these 46 topologies it was clear that the majority prediction is most likely to offer the correct topology; when 4 out of the 5 topology predictors agree the majority prediction was correct - in regards to the location of the C-terminus - 90% of the time. How do these topology models compare to other topology studies? While the topology prediction for TatC (an essential component of the TAT-translocase, Palmer and Berks, 2003), with 6 TMs and cytoplasmic N-, C- termini was later interpreted to have only 4TMs (Gouffi et al., 2002), other independent studies have concurred with the topology prediction generated by our approach (Behrendt et al., 2004). The topology determined for YaeL, a protein that belongs to a family of membrane-embedded metalloproteases (Rudner et al., 1999), was also the same as that previously determined for the related Bacillus subtilis protein SpoIVFB as regards the location of the conserved HEXXH and NPDG motifs relative to the inner membrane (Green and Cutting, 2000). The consensus approach and the use of GFP as a topology reporter has since been used by other researchers (Culham et al., 2003; Gandlur et al., 2004; Jakubowski et al., 2004; McMurry et al., 2004; Severance et al., 2004).

2.2.5 Generating topology models by constraining TMHMM Although the consensus approach is a useful strategy for generating reliable topology models, it covers only ~10% of the α-helical membrane proteins in E. coli. An alternative approach is to ‘feed’ into TMHMM the location of experimentally determined amino acids, e.g., C-terminal tails. When this was tested in silico, using a data set of 233 membrane proteins of known topology, the

29 overall prediction performance for TMHMM increased from ~70% unconstrained to ~80% constrained (Melen et al., 2003). Somewhat unexpectedly, the prediction performance actually gets worse if the residue to be fixed is not restricted to the N- or C- terminus, but is chosen based on the "lowest probability loop residue" selected from a TMHMM probability prediction profile. The main reason for this is that loop regions predicted with greatest uncertainty, in fact, frequently correspond to true transmembrane regions making this approach unfeasible (Paper II). To establish the C-terminal tail orientation, as before, dual PhoA/GFP fusion reporters can be used (Papers I and II). The constraining of TMHMM for generating improved topology models has been successfully applied to the entire E. coli inner membrane proteome (Daley et al., 2005). Contrasting PhoA/GFP activities were sufficient to assign unambiguous C-terminal tail locations for 75% of the inner membrane proteome. Many of these proteins shared high homology to another membrane protein in the genome. These membrane proteins were used to assign C-terminal tail locations to membrane proteins not initially mapped by this approach; the final coverage was ~90%. This topological information has been extrapolated to assign topology maps to another 51,208 homologous membrane proteins in other bacterial genomes (Granseth et al., 2005a).

2.2.6 Why does GFP work as a topology reporter? Given that it is possible to export correctly folded GFP to many cellular organelles (Tsien, 1998), including the periplasm of E. coli with a Sec independent TAT-signal peptide (Thomas et al., 2001), why is GFP not fluorescent in the periplasm when targeted to this compartment with a Sec-type signal sequence? As it is possible, after acid-base treatment, to refold periplasmic GFP so that it becomes fluorescent, it suggests that Sec-exported GFP is simply incorrectly folded (Feilmeier et al., 2000). Our results indicate that the misfolded GFP is

30 sensitive to proteolysis when fused to periplasmic membrane protein segments (Papers I and II); similar degradation has been noted for a few soluble proteins terminally fused to membrane protein segments (Pourcher et al., 1996). GFP and PhoA have now been used to assign the C-terminal tail location of over 500 E. coli membrane proteins. In 71 out of 72 of the cases where the C-terminus of the membrane protein was convincingly established beforehand (i.e., 3D-structure or biochemical analyses), the PhoA/GFP assignments were in total agreement (Daley et al., 2005).

2.2.7 Comparing 2D maps to 3D-structure How often do topology predictions get it right? This is difficult to address as there are so few membrane protein structures. If we consider topology as the number of TM segments and their orientation relative to the membrane, the constrained TMHMM topology predictions, compared to structure, are more than 80% correct; the most frequent error is to leave one TM out. If we include identifying reentrant loops, interfacial helices, and the exact positioning of helices, topology predictions are (presently) only a first-step towards understanding structure-function relationships. Understanding structural details to this level is typically only possible with a high-resolution structure; section 3 will expand on this challenge.

2.2.8 Summary of high-throughput membrane protein topology mapping In the absence of a 3D structure, one way to gain structural information of any membrane protein is to determine its topology, i.e., the number, position, and the overall in-out orientation of TMs relative to the membrane. In E. coli, this step is usually accomplished by using reporter enzymes such as PhoA or LacZ fused to different portions of the membrane protein. Usually, the number of reporter fusions that needs to be made and analyzed for a complete topology

31 determination is equal to or larger than the number of TMs in the membrane protein, thus requiring significant experimental effort. We have shown that a reliable membrane protein topology can be simply and rapidly deduced from a combination of in silico topology predictions and single C-terminal PhoA/GFP reporter-protein fusions (Paper I). Although this approach might have been possible using classical PhoA and LacZ fusions, GFP offers an attractive alternative; the assay requires little experimental set-up, measurements are completed in seconds, and as the GFP fluorescence is linear to the amount which is folded - in contrast to enzymatically active fusions - GFP activity does not need to be normalized to (quantified) protein expression (Paper I). Indeed, after ambiguous results with classical PhoA/LacZ fusions, GFP has been used to clarify the topology of the ABC transporter, DrrB (Gandlur et al., 2004). After a few modifications, this approach was possible on a larger scale format (Paper II), and was extended to determine C-terminal locations, and subsequently constrained TMHMM topology models for the entire E. coli inner membrane proteome (Daley et al., 2005). This proteome information has been used to up-date the Swiss-Prot and NCBI databases.

32 3.1 Membrane protein overexpression

One of the main obstacles towards understanding membrane proteins is the difficulties associated with obtaining pure material for biochemical and structural analysis (Grisshammer and Tate, 1995). Most membrane proteins overexpress very poorly - typically less than < 1 mg/L - if they do at all. This is a huge problem. Recently, in the magazine Nature it was stated that “… labs around the world aim to add membrane proteins (structures) to international databases over the next five years. But to do so, they must first be able to churn out milligrams of easily purified protein ” (Hoag, 2005).

3.1.1 Limited availability of biogenesis factors and/or lipid space may hamper membrane protein overexpression Why do membrane proteins overexpress poorly? Intuitively, it seems that there might be a limit to the availability of membrane protein biogenesis components and space available in the lipid bilayer. Not only does the overexpression of membrane proteins require the availability of components like, e.g., SRP and the Sec translocon, to faithfully target and insert multiple copies of a membrane protein into a suitable lipid bilayer, but the lipid bilayer is also obliged to accommodate this ‘extra’ protein without compromising the membrane integrity of the cell (Drew et al., 2003). In support of this idea are the following observations; - it has been shown that upon overexpression of membrane proteins in E. coli SRP is titrated (Valent et al., 1997), - that the overexpression of membrane proteins in yeast can lead to activation of the unfolded protein response (UPR) (Griffith et al., 2003) (a mechanism against ER stress caused by unfolded protein (Kaneko and Nomura, 2003), - by keeping expression levels low enough to reduce the UPR response, one can increase the amount of functionally expressed membrane protein (Griffith et al., 2003), - that the functional expression of the

33 serotonin transporter in insect cells can be enhanced nearly 3-fold by co- expressing ER luminal folding chaperones calnexin, and to a lesser degree, calreticulin and BiP (Tate et al., 1999). In terms of lipid capacity, it was shown that expressing GPCRs in the eye of the fly - a membrane dedicated almost exclusively to the GPCR rhodopsin - is highly successful (Eroglu et al., 2002), and that the bacterium Lactococcus lactis is a suitable host for membrane protein overexpression perhaps because of the small number of endogenous membrane proteins (Kunji et al., 2003). Lastly, E. coli mutant strains with improved membrane protein overexpression characteristics were isolated (Miroux and Walker, 1996). After overexpression of a membrane protein, the cells were biochemically analyzed and visualized under an electron microscope; it was clear that for one of these strains the cell had proliferated extra internal membranes (Arechaga et al., 2000).

3.1.2 ‘Trial-and-Error’ As the focus of the majority of expression studies has been to obtain functionally expressed membrane protein, rather than analyzing membrane protein overexpression per se, we do not know how generic the aforementioned problems are. What is clear is that this is not the whole story. There are many other case- by-case examples of further factors which may influence the ability to obtain well-expressed functional membrane protein; - the membrane protein is susceptible to degradation, e.g., by the ATP- dependent integral membrane protein protease FtsH (Ito and Akiyama, 2005), - the membrane protein is unstable if overexpressed without its complex partner(s) e.g., SecY, the pore forming component of the translocon, is rapidly degraded if expressed without SecE (Ito and Akiyama, 2005; Kihara et al., 1995), - the composition of the membrane is unsuitable (Freedman et al., 1999), - the membrane protein needs to be post-translationally modified; impossible in most bacterial expression systems, e.g., N-linked glycosylation (Tate and Blakely,

34 1994) - the mRNA for the membrane protein is unstable (Afonyushkin et al., 2003; Arechaga et al., 2003). In principle, by studying the expression of a large number of membrane proteins one could find some correlation between membrane proteins that ‘express poorly’ to those that ‘express well’ (Drew et al., 2003), e.g., membrane proteins with multiple TMs are thought to give lower expression than those containing fewer TMs (Grisshammer and Tate, 1995). Unfortunately, von Heijne and co-workers did not find any correlation in any amino acid sequence parameter tested between poor vs. well expressing membrane proteins for more than 300 E. coli membrane proteins expressed in E. coli, e.g., size, degree of hydrophobicity, number of TMs (Daley et al., 2005). Our current lack of understanding means that membrane protein ‘expressibility’ cannot be predicted prior to experimental testing.

3.1.3 Choosing a membrane protein overexpression host There are many approaches used in the overexpression of membrane proteins. In general, it is preferred to overexpress membrane proteins into the membrane, as the success rate of refolding membrane proteins from inclusion bodies is very low (Drew et al., 2003). For obvious reasons, one would like to overexpress membrane proteins in their endogenous host. This is not always possible; the higher the organism from which the membrane protein comes from, the greater the cost and time needed for successful overexpression in the most comparable host to the membrane protein. E. coli is often the first vehicle tested in the overexpression of both pro- and eukaryotic membrane proteins; it is widely available, it is easy to work with it, it is very versatile, and is cheap to use. Because of these factors numerous membrane protein structures have been solved from material overexpressed in E. coli; transporters (Abramson et al., 2003; Huang et al., 2003; Hunte et al., 2005; Locher et al., 2002; Ma and Chang, 2004; Reyes and Chang, 2005; Yamashita et al.,

35 2005) respiratory proteins (Abramson et al., 2000; Bertero et al., 2003), ion channels (Chang et al., 1998; Doyle et al., 1998; Dutzler et al., 2002), and other channels (Fu et al., 2000; Khademi et al., 2004; Savage et al., 2003; Van den Berg et al., 2004). Unfortunately, there is only one example of a eukaryotic membrane protein structure elucidated from overexpressed material, i.e., the rat voltage- gated shaker K+ channel Kv1.2 (Long et al., 2005a). In this case the material was not obtained by expression in E. coli, but in the yeast Pichia pastoris. Other eukaryotic membrane protein structures have been solved, but with a membrane protein that was isolated from naturally abundant sources, e.g., rhodopsin from the bovine eye (Palczewski et al., 2000). While eukaryotic membrane proteins can express well in E. coli, see e.g., (Quick and Wright, 2002), expression levels are typically several orders of magnitude less than their bacterial counter-parts (Tate, 2001). If we want to solve eukaryotic membrane protein structures it seems that the development of new E. coli strains or the use of hosts other than E. coli is required. Indeed, it is possible to overexpress functional eukaryotic membrane proteins in yeast, insect and mammalian cells, e.g., GPCR’s in yeast (Sarramegna et al., 2002; Schiller et al., 2001), serotonin transporter in Sf9 cells using baculovirus system (Tate et al., 1999), and rat glutamate transporter in BHK cells using Semiliki Forest virus system (Raunser et al., 2005). Interestingly, the Gram- positive bacterium Lactococcus lactis has shown to be a successful host for the overexpression of eukaryotic mitochondrial carriers (Kunji et al., 2005; Kunji et al., 2003).

3.1.4 General strategies for membrane protein overexpression in E. coli There are many different strategies in each of the host systems used for overexpression of a membrane protein. In general they involve adjusting the type of promoter/plasmid system, culture conditions, and the protein itself by

36 truncations, mutations, and/or additions of various fusion tags. Here, we will focus only on the expression of membrane proteins in E. coli.

In E. coli, the membrane protein to be expressed is usually cloned into a plasmid under control of a tightly regulated and inducible promoter. The number of plasmid copies per cell, the strength of the promoter, and the homogeneity of the inducer across the cell population can all affect final yields. In general, membrane protein overexpression strategies are the same as the ones used for soluble proteins, see e.g., (Sorensen and Mortensen, 2005), with a few additional points worth mentioning (outlined forthwith). Membrane proteins typically inhibit cell growth when overexpressed, thus it is advisable to use a tight promoter system, e.g., the pBAD promoter (Morgan-Kiss et al., 2002), or the T7-based promoter in combination with the plasmid pLysS (Pan and Malcolm, 2000). As membrane proteins typically contain an N-terminal targeting signal, fusing a soluble protein to the N-terminus of the membrane protein might be problematic, additionally so, for membrane proteins with periplasmic N-terminal tails; large N-terminal domains are almost (ProW is a notable exception), non-existent in the E. coli inner membrane proteome (Daley et al., 2005). If the membrane protein naturally has a large extra-cytoplasmic N- terminal domain e.g., like many GPCRs, an N-terminal signal sequence has shown to be required for functional expression (Grisshammer et al., 1993; Weiss and Grisshammer, 2002; Yeliseev et al., 2005). Indeed, for all GPCRs in sequenced genomes N-terminal tails longer than 60 amino acids are considerably more likely to contain a signal peptide (Wallin and von Heijne, 1995). Membrane proteins seem to express better in E. coli at a temperature of 20-25°C rather than 37°C. Although expression at lower temperatures is often successful for soluble protein as well (Sorensen and Mortensen, 2005), membrane proteins maybe more sensitive to temperature as they fold co-translationally (section 1.2.2), i.e., over this temperature range, the translation rate decreases linearly with temperature

37 (Farewell and Neidhardt, 1998). Plasmids with a cytoplasmic antibiotic resistance marker (e.g., kanamycin) are recommended over periplasmic antibiotic resistance markers (e.g., β-lactamase), as it may avoid any extra workload on the Sec translocon (Ito and Akiyama, 1991).

3.1.5 The BL21(DE3) pET system In this thesis, membrane proteins were overexpressed using the BL21(DE3) pET system from a modified pET-28a plasmid (Waldo et al., 1999), which harbors a kanamycin resistance marker. Protein expression in the pET vector is under the control of the strong T7 promoter that in concert with the E. coli strain BL21(DE3), is switched on in the presence of isopropyl-β-D-thiogalactoside (IPTG), i.e., IPTG induces expression of the gene encoding the T7 RNA polymerase that is located on the chromosome integrated λ phage gene DE3 (Studier and Moffatt, 1986). As membrane protein overexpression can be toxic (Miroux and Walker, 1996), the BL21(DE3) strain is typically used in combination with a plasmid which constitutively expresses a T7 lysozyme gene, i.e., pLysS/E. The T7 lysozyme has a low affinity for the T7 RNA polymerase, and dampens ‘leaky’ expression (Pan and Malcolm, 2000). Although, some argue that the pET- based system is not applicable to membrane proteins because it is too strong (Wang et al., 2003), these plasmid and strain combinations have been used extensively to successfully overexpress membrane proteins and to obtain material for structure determination (Kastner et al., 2000; Miroux and Walker, 1996).

3.1.6 Membrane protein purification All membrane proteins are routinely purified using a detergent (Seddon et al., 2004). A detergent at a critical concentration will form a hydrophobic pocket, typically a spherical micelle that retains the integrity of the membrane protein as it extracts it from the lipid. Solubilization of membranes with detergent results in

38 a mixture of detergent, protein and lipid, in which the amount of lipid is progressively reduced as the membrane protein is purified in a buffer containing detergent (Seddon et al., 2004), e.g. by immobilized metal affinity chromatography (IMAC), anion/cation exchange, size-exclusion chromatography, etc. Finding the right detergent that retains the function of the membrane protein can be tricky. The use of shorter chain detergents, e.g., n- octyl-β-D-glycopyranoside to increase the number of protein-protein contacts for protein crystallization, most often results in the membrane protein aggregating instead. It is becoming increasingly clear that removal of too much lipid can be detrimental (Fyfe et al., 2005; Long et al., 2005b). A number of structures have revealed that certain lipids can play definitive functional and/or structural roles, e.g., cardiolipin in the purple bacteria reaction centre (Fyfe et al., 2004).

Papers III and IV One of the biggest obstacles towards understanding membrane protein structure-function relationships is the difficulties associated with obtaining milligram quantities of membrane protein. This thesis tackles this challenge by developing GFP-based methodology to monitor membrane protein overexpression in the E. coli membrane, and to use GFP as an aid in the subsequent purification of membrane proteins.

39 3.2 High-throughput membrane protein overexpression in E. coli

Traditional membrane protein overexpression screening methods in E. coli are quite laborious. In order to remove inclusion bodies, membranes are typically first isolated from whole-cells before the overexpressed protein - via SDS-PAGE - is detected by Coomaisse staining and/or Western-blotting; neither of which methods are the most ideal for quantifying protein expression. Here, we present an alternative, superior method. We show that the amount of GFP fluorescence from E. coli cells expressing membrane protein-GFP fusions is a simple, fast, and accurate estimate of expression. Not only does it complement the topology mapping of membrane proteins (section 2.2), but it is easily transferable to many laboratories, and enables the protein to be visualized during detergent solubilization and purification.

3.2.1 Inclusion bodies of membrane protein-GFP fusions are not fluorescent Waldo and co-workers showed that a C-terminal GFP fusion could be used to reliably estimate the overexpression of soluble proteins in E. coli (Waldo et al., 1999). In short, if a soluble protein was expressed into inclusion bodies, GFP did not fold and was not fluorescent. In contrast, if the soluble protein was correctly folded, GFP did fold and was fluorescent. The use of GFP to monitor the overexpression of soluble proteins in E. coli has been reinforced by others (Hedhammar et al., 2005). GFP is ideal for this purpose as it requires no substrates for its fluorescence, is stable, and is easy to measure and quantify (Tsien, 1998). To ascertain the reliability of monitoring membrane protein expression with a C-terminal GFP moiety, GFP was fused to the C-terminus of a number of well-characterized pro- and eukaryotic membrane proteins (Paper III). This was important to verify, as the folding pathway for membrane proteins is very different to soluble proteins (section 1.2.3) (Drew et al., 2003). Two of the test-set membrane proteins (rat olfactory GST-GPCR and M13-procoat) were

40 known to express into inclusion bodies (Kiefer et al., 1996; Krebber et al., 1997). The GFP used to test this contains the folding (F64L) and chromophore (S65T) mutations (Tsien, 1998), and has been evolved in E. coli to have 42-fold higher (soluble) expression than ‘wild-type’ GFP (Crameri et al., 1996). Under typical culture conditions in the E. coli strain BL21(DE3)pLysS, membrane protein-GFP fusions were overexpressed essentially as described in section 3.1.5. This system was also the same as that shown to be successful for monitoring the expression of soluble protein-GFP fusions in E. coli (Waldo et al., 1999). After overexpression, cells were lysed and fractionated by differential centrifugation. The amount of membrane protein-GFP fusion in these fractions was measured by a combination of fluorescence and quantitive immunoblotting (Paper III). After analyzing membrane protein fractions (high-speed spin) it was clear that GFP fluorescence from isolated membranes was a good estimate of expression. In contrast, it was apparent that not all of the membrane protein-GFP fusion left in the unbroken E. coli cells (low-speed spin) was fluorescent. This was particularly obvious for the M13 / GST-GPCR GFP fusions which were previously shown to express into inclusion bodies. This was alternatively visualized by Western-blotting an equivalent amount of GFP fluorescence from the low-speed and the high-speed spin fraction. Inclusion bodies from M13-GFP and GST-GPCR-GFP were in the order of ~50 and 90% of the total expressed protein, respectively. They were later isolated by a sucrose step gradient and were not fluorescent (Paper III); we have since verified this with other membrane protein-GFP fusions. Therefore, if the overexpressed fusion protein ends up in the insoluble fraction as inclusion bodies GFP is not florescent; in contrast, if the fusion is expressed in the cytoplasmic membrane, GFP does fold and is fluorescent, and the amount of GFP fluorescence correlates with the amount of protein integrated in the E. coli membrane (Papers III & IV). Recently, it has been shown that the highest amount of GFP fluorescence from LacS-GFP overexpression in E. coli -

41 LacS is an Streptococcus thermophilus lactose transporter - coincides with maximum LacS-GFP transport activity and not maximum LacS-GFP production as judged by Western-blotting (Geertsma, 2005); GFP in this case only monitored the amount of functionally expressed membrane protein.

3.2.2 GFP tagging works only for membrane proteins with a cytoplasmic C-terminus As shown in section 2.2, GFP is inactive when targeted to the periplasm with a Sec-type signal peptide (Paper I), thus, to use this approach membrane proteins must have their C-terminus localized to the cytoplasm. As most membrane proteins acquire a Cin topology this is only a minor drawback. The percentage of multispanning membrane proteins with a Cin topology has been experimentally measured at 80 and 83% in the E. coli and Saccharomyces cerevisiae genomes, respectively (Daley et al., 2005; Kim, In preparation), and is predicted to be 70- 75% in all other sequenced genomes (Wallin and von Heijne, 1998).

3.2.3 GFP as a membrane protein folding indicator in whole cells

Since GFP is a slow folding protein ~t1/2 30 min (Fukuda et al., 2000; Waldo et al., 1999), it is conceivable that GFP works as a folding indicator - when placed at the C-terminal end of a membrane protein - because there is sufficient time for the membrane protein to misfold before GFP has folded. The misfolded membrane protein is most likely degraded by the cell or retained as inclusion bodies (Chang et al., 2005). Nevertheless as GFP is very stable, once folded, it can remain fluorescent even if the membrane protein itself is later degraded. This is evident from cytosolic GFP frequently found in the supernatant of recovered membranes (Paper III). It seems that the amount of cytosolic GFP is proportional to the stability of the membrane protein. Similar observations have been made for membrane proteins fused to other soluble protein tags, e.g., PhoA (Danielsen et al., 1995; Pourcher et al., 1996). This means that an overexpression estimate made from whole cells can be misleading. For this reason, the most accurate way to

42 estimate overexpression is to measure GFP fluorescence in isolated membranes, as cytosolic GFP is not recovered in this fraction (Paper III). However, as the isolation of membranes is somewhat laborious, to be ‘high-throughput’ a reliable estimate has to be possible from whole-cells rather than recovered membranes. The reliability that can be placed on whole-cell estimation was investigated more thoroughly. In short, 48 E. coli membrane protein-GFP fusions were overexpressed, and in order to cover different membrane protein overexpression levels, 9 were purified. Satisfactorily, there is a clear correlation between the amount of whole-cell fluorescence and the amount of purified membrane protein-GFP fusion (Paper IV).

3.2.4 GFP-based screen to optimize membrane protein overexpression For a number of membrane protein-GFP fusions the whole-cell GFP fluorescence was measured from 1 ml and 1 L cultures. As there were no significant differences in the amount of fluorescence per ml, 1 ml is a satisfactory culture volume for overexpression screening, Figure 5 (Paper IV).

Figure 5: The comparison of GFP fluorescence from 13 membrane-protein GFP fusions cultured in either 1 ml or 1L.

43 As 5 ml cultures grown in a 24-well format are comparable to that of the 1 ml culture condition, the expression of many membrane proteins can be rapidly tested. This was demonstrated by the global analysis of the E. coli inner membrane proteome (Daley et al., 2005). Based on the fluorescence from membrane protein-GFP fusions it is possible to quickly optimize overexpression of a single membrane protein. Slightly varying standard culture parameters can dramatically change

overexpression yields, e.g., IPTG induction at cellular OD600 of 0.4 compared to 0.6, or IPTG concentration of 0.1 compared to 0.4 mM, can almost double yields of the putative amino acid transporter YbaT (Drew, In preparation). Each membrane protein can respond differently to these parameters in different BL21 strains i.e., BL21(DE3), BL21(DE3)pLysS, C41/43 walker strains (Miroux and Walker, 1996). At present we are determining the parameters worth screening. So far, the most consistent parameter for improving yields is to induce expression at a temperature of 20-25°C instead of 37°C (Paper IV).

3.2.5 In-gel GFP fluorescence The monitoring of membrane protein expression from whole-cells can be further improved by subjecting a whole-cell sample to standard SDS-PAGE. GFP remains partially intact under these conditions (were most proteins are denatured), and exposure of the polyacrylamide gel to UV-light enables detection of the GFP with a CCD-camera (Drew, In preparation). Thus, the amount of full-length membrane protein-GFP fusion can also be monitored, Figure 6.

44

Figure 6. Verification of whole-cell fluorescence from liquid culture with an in- gel fluorescence assay. A. Expression of YedZ-GFP, and quantification of fluorescent 'bands' correlates with whole-cell fluorescence. B. Optimizing expression of YciS-GFP in the Bl21(DE3)pLysS strain by lowering temperature after induction to 30 or 25 degrees and culturing cells after induction from 4 - 22 hours. (Drew, In preparation)

45 3.2.6 GFP-based purification pipeline

To establish a generic purification procedure or ‘pipeline’, a His8 tag was fused to

the C-terminus of GFP, i.e., gives membrane protein-GFP-His8. After standard

membrane protein-GFP-His8 overexpression and isolation, a number of membrane proteins were purified by a combination of IMAC and size-exclusion chromatography. Milligram amounts of E. coli membrane protein fusions, similar levels to that found by others e.g., (Eshaghi et al., 2005), were routinely purified from one liter cultures (Paper IV). The GFP is a useful tool in the purification of membrane proteins. With GFP present one can monitor the ability of different detergents to extract an overexpressed fusion protein from the membrane. Even though the final choice of detergent will also depend on the ability to preserve the membrane protein in a fully functional state, poorly extracting detergents can be quickly eliminated in this step. The GFP moiety of the membrane protein-GFP fusion also enables the purification to be followed visually, and the binding efficiency of a fusion to a column can be seen directly. Lastly, the GFP moiety of the membrane protein- GFP fusion means it is possible to quickly and accurately determine protein concentrations (Paper IV).

3.2.7 Recovery of membrane proteins from GFP fusions using a site specific protease There are a few cases where purified membrane protein-GFP fusions have been shown to be functional in vitro, e.g., (Quick and Wright, 2002). Membrane protein-GFP fusions can also be active in vivo, as we showed for the essential E. coli membrane protein YidC (see section 1.2.2), i.e., YidC-GFP is functional at expression levels similar to endogenous amount of YidC, and localizes to the E. coli cell-poles (Urbanus et al., 2002). However, as GFP may interfere with the function of the protein and hinder protein crystallization, a Tobacco Etch Virus protease cleavage site (ENLYFQG/S) was added to clip off the GFP-His8 moiety

from the membrane protein-TEV-GFP-His8 fusion. TEV protease was chosen

46 because; - it is a non-commercial specific protease, - it is easily produced in large quantities, - and is active in the presence of many detergents (Mohanty et al., 2003).

TEV protease was tested by incubating purified YbaT-TEV-GFP-His8 (a putative amino acid transporter), GltP-TEV-GFP-His8 (a glutamate transporter)

(Wallace et al., 1990), and YedZ-TEV-GFP-His8 (a protein of unknown function)

with His10-TEV protease. After incubation, digestion was complete, and the

His10-TEV protease, undigested membrane protein-TEV-GFP-His8 fusion and

clipped-off GFP-His8 were easily removed by batch-binding material to metal affinity resin (Paper IV). It was possible to recover intact functional full-length membrane proteins from membrane protein-GFP fusions. Throughout, GFP fluorescence could be used to monitor both the effectiveness of the TEV digestion, and the purity of the recovered membrane protein. Are isolated membrane proteins functional? Purified GltP was reconstituted into lipid vesicles and its activity was compared to purified GltP-

His8. There was no difference in the glutamate uptake activity between GltP

recovered from GltP-TEV-GFP-His8 and purified GltP-His8 (Paper IV). For further verification the YedZ protein was analysed in detail (section 4).

3.2.8 How does this GFP-based method compare to other high-throughput approaches? Many high-throughput membrane protein overexpression initiatives have estimated membrane protein expression by the quantification of ‘bands’ visible on a polaycrylamide gel after Coomaisse staining and/or Western-blotting (Dobrovetsky et al., 2005; Korepanova et al., 2005); in these cases membranes are first isolated to remove any inclusion bodies. However, Coomassie staining is inaccurate, lacks sensitivity, and Western-blotting is time consuming and not always reliable, i.e., membrane proteins with different hydrophobicity bind Coomaisse or can transfer to a semi-solid support inconsistently. To rapidly judge (in a 96-well format) the expression of many His-tagged membrane

47 proteins in E. coli, Nordlund and co-workers developed a dot-blot detection method which does not require an electrophoretic transfer step (Eshaghi et al., 2005). This method is elegant, and is capable of simultaneously screening the expression and detergent solubilization efficiency of numerous membrane proteins. The main disadvantage of this method is that the amount of full-length protein is estimated only after the binding and elution of overexpressed material to 96-well coated Ni-NTA resin. This is expensive, and maybe unaffordable for many laboratories.

3.2.9 Summary of high-throughput membrane protein overexpression One of the main obstacles towards understanding membrane proteins is the difficulties associated with obtaining pure material for biochemical and structural analysis (Grisshammer and Tate, 1995). Unfortunately, in comparison to soluble proteins, overexpression of membrane proteins typically yields little protein. Novel approaches are badly needed to identify ‘workable’ material. In this thesis, we have shown that a simple C-terminal GFP fusion is a reliable folding indicator for membrane proteins expressed in E. coli with a

cytoplasmic C-terminal tail (Paper III). By incorporating a C-terminal His8 tag to

the end of GFP, and a site for the TEV protease to clip off the GFP-His8 fusion, we show we can use an efficient, standardized purification protocol to purify protein to yields >1 mg per liter of culture (Paper IV). As proof-of-principle of this purification pipeline an E. coli membrane protein of previously unknown function was characterized (next section).

48 4. Characterization of the membrane protein YedZ

YedZ belongs to a bacterial protein family of unknown function, UPF0191 (www.sanger.ac.uk). YedZ originally attracted our attention since cells

overexpressing YedZ-TEV-GFP-His8 were orange instead of green. Although its orange colour suggested binding of some kind of cofactor, none of the Web- based prediction tools used to analyze its amino acid sequence identified any potential cofactor binding motifs.

4.1.1 A test case for the GFP-based purification pipeline: YedZ Purification of the YedZ protein by our purification pipeline yields a protein that is orange. Under both oxidizing and reducing conditions, optical spectra of the purified YedZ protein were recorded (Paper IV). Under reducing conditions the YedZ protein demonstrated an absorption spectra characteristic of cytochrome b5, with a maximum α-peak at 558 nm. This annotation could be corroborated by analysis of the purified sample by means of mass spectrometry. A major monoisotopic peak was identified at 617 Da; a mass equal to that of heme b. With an assay for heme, the YedZ to heme ratio was calculated at 1:1. Because of an atypical absorption peak in the 450-500 nm regions, YedZ was also suspected to bind flavin. This was confirmed by subjecting YedZ to reverse-phase liquid chromatography. YedZ contains FMN rather than FAD, with a molar ratio of 0.7 FMN per YedZ molecule (Paper IV).

4.1.2 YedZ is a novel integral membrane flavocytochrome How does YedZ bind the heme b? YedZ is a very hydrophobic membrane protein (23% leucine and 7% valine), and consists of six TMs connected by very short loops (Paper I). The topology of YedZ is consistent with an alignment of bacterial homologs, whereby putative loop regions fall into stretches of amino acid

49 residues with low similarity, Figure 7a. In contrast, the transmembrane segments II-V are very well conserved in YedZ and its homologs. In helix III there is one conserved histidine, and in helix V there are two. The two parallel histidines (H92 in helix III and H164 in helix V) close to the periplasmic face of the membrane are clearly the most likely histidine pair for ligation of the heme b, Figure 7b. It is plausible that these transmembrane segments form a four-helical bundle, and coordinate heme b in a similar manner as in other cytochrome b containing membrane proteins, see e.g., (Iwata et al., 1999). How does YedZ bind FMN? The coordination of heme b by integral membrane proteins has been well described, but binding of FMN in the plane of the membrane is unprecedented. Based on our current knowledge of flavin- binding protein structures it is difficult to envisage how the very short and unconserved loops of YedZ could fold to bind FMN. To date, all examples of flavin binding protein structures deposited in the Protein Data Bank have a fold architecture of at least 100 amino acids to envelope the ligand e.g., a TIM-barrel or a Rossman type fold (Fraaije and Mattevi, 2000; Hefti et al., 2003). On the other hand, the conserved amino acids W and Y as observed for globular flavodoxin proteins (Lostao et al., 2003) at the start and in the middle of transmembrane segment V, could be the key residues for FMN binding in YedZ. Thus, in light of the topology for YedZ, the unknown YedZ protein was annotated as the first integral membrane flavocytochrome (Paper IV).

50

A. I II III IV

(1) 1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 151 YEDZ_ECOLI/ 7-199 (1) -QVTWLKVC------LHLAGLLPFLWLVWAINHG---GLGADPVKDIQHFTGRTALKFLLATLLITPLARYAKQPLLIRTRRLLGLWCFAWATLHLTSYALLELGVNNLALLGKELITRPYLTLGIISWVILLALAFTSTQ-AMQRKLG- Y0G1_YERPE/ 7-206 (1) -HITWLKIA------IWLAATLPLLWLVLSINLG---GLSADPAKDIQHFTGRMALKLLLATLLVSPLARYSKQPLLLRCRRLLGLWCFAWGTLHLLSYSILELGLSNIGLLGHELINRPYLTLGIISWLVLLALALTSTR-WAQRKMG- Y304_BRUME/ 9-210 (1) -KKKTPRPGQWKLW-LLYTAGFVPAVWTFYLGATG---QLGADPVKTFEHLLGLWALRFLILTLLVTPMRDLTG-ITLLRYRRALGLLAFYYALMHFTTYMVLDQGL-NLSAIITDIVRRPFITIGMISLALLVPLALTSNN-WSIRKLG- Y538_PASMU/ 1-201 (1) -MLSLFRII------IHVCCLGPVAWLAWVLLSGDESQLGADPIKEIQHFLGFSALTILLIMFILGKVFYLLKQPQLQVLRRALGLWAWFYVVLHVYAYLALELGY-DFSLFVQELVNRGYLIIGAIAFLILTLMALSSWS-YLKLKMG- YAJ1_PSEAE/ 2-198 (1) -RYWYLRLA------VFLGALAVPAWWLYQAWIF---ALGPDPGKTLVDRLGLGALVLLLLTLAMTPLQKLSGWPGWIAVRRQLGLWCFTYVLLHLSAYYVFILGL-DWGQLGIELSKRPYIIVGMLGFVCLFLLAITSNR-FAMRKLG- YD82_RHIME/ 8-207 (1) -PKRLHGPS---IW-ALYILGFLPAVWGFYLGATG---RLPGNAVKEFEHLLGIWALRFLIATLAITPIRDLFG-VNWLRYRRALGLLAFYYVMMHFLTYMVLDQTL-LLPAIVADIARRPFITIGMAALVLLIPLAVTSNI-WSIRRLG- YG46_XANAC/ 12-211 (1) -KTLVHAAA---LA-PIALLGWQ--FWQVWQSGSD---ALGADPVAEIEHRTGLWALRLLLITLAITPLRQLTGQAVVIRFRRMLGLYAFFYATVHLAAYLTLDLRG-FWTQIFEEILKRPYITVGFAAWLLLMPLAITSTQGW-MRRLK- YJ20_AGRT5/ 11-211 (1) PKRYQPAA----IW-SLYVIGLCPGLWYFYLAATG---GLGFNPVKDFEHLLGIWALRFLCLGLLVTPLRDLFN-VNLIAYRRALGLIAFYYVLAHFTVYLVLDRGL-ILGSIAGDILKRPYIMLGMAGLIILIPLALTSNR-WSIRRLG- YP37_DEIRA/ 7-198 (1) -PYAWLGPG------VVLGGLLPTVFLLWDALSG---GLGANPVKQATHQTGQLALIVLTLSLACTPARVWLGWTWAARIRKALGLLAAFYAVLHFGIYLRGQDFS--LGRIWEDVTERPFITSGFAALLLLLPLVLTSGK-GSVRRLGF YR47_CAUCR/ 7-209 (1) -KKRPSKLQDTLVYGLVWLACFAPLAWLAWRGYAG---ELGANPIDKLIRELGEWGLRLLLVGLAITPAARILKMPRLVRFRRTVGLFAFAYVALHLLAYVGIDLFF-DWNQLWKDILKRPFITLGMLGFMLLIPLAVTSTNGWVIRMGR- YT80_RALSO/ 10-211 (1) -SLRAVRIA------VWLLALVPFLRLVVLGATD---RYGANPLEFVTRSTGTWTLVLLCCTLAVTPLRRLTGMNWLIRIRRMLGLYTFFYGTLHFLIWLLVDRGL-DPASMVKDIAKRPFITVGFAAFVLMIPLAATSTN-AMVRRLGG Consensus (1) RIA LWLAGLLP LWLVW G TG GLGADPVKDI H LG WALRLLLLTLAITPLR L G LIR RR LGLWAFFYALLHL AYLVLDLGL LG I DILKRPYITLGMIAFLLLIPLALTS WSIRKLG

V VI

(151) 151 160 170 180 190 200 215 YEDZ_ECOLI/ 7-199 (139) -KHWQQLHNFVYLVAILAPIHYLW--SVKIISPQPLIYAGLAVLLLALR------YKKLRSLFN Y0G1_YERPE/ 7-206 (139) -ARWQKLHNWVYVVAILAPIHYLW--SVKTLSPWPIIYAVMAALLLLLRYKLLLPRYKKFRQWFR Y304_BRUME/ 9-210 (143) -RRWSSLHKLVYIAIAGSAVHFLM--SVKSWPAEPVIYAAIVAALLLWRLARP--YLRTRKPALR Y538_PASMU/ 1-201 (141) -KWWFYLHQLGYYALLLGAIHYVW--SVKNVTFSSMLYLILSIMILCDALYG-LFIKRKGRSTSA YAJ1_PSEAE/ 2-198 (138) -SRWKKLHRLVYLILGLGLLHMLW--VVRADLEEWTLYAVVGASLMLLR--LPSIARRLPRLRTR YD82_RHIME/ 8-207 (140) -QRWNKLHRLVYVIAAAGALHFAM--SVKVVGPEQMLYLFLVAVLVAWRAVRKR-FLRWRRQGTA YG46_XANAC/ 12-211 (139) -RNWGRLHMLIYPIGLLAVLHFWW--LVKSDIREPALYAGILAVLLGWRVWKKLSARQTTARRST YJ20_AGRT5/ 11-211 (140) -SRWNTLHKLVYLVLIVGVLHFVL--ARKSITLEPVFYISTMVVLLGYRLVRPSIMTMKRNKRAR YP37_DEIRA/ 7-198 (137) FARWTLLHRLVYLAAALGALHYWW--GVKKDHSGPLLAVLVLAALGLAR------LKTPARLNR YR47_CAUCR/ 7-209 (146) -AAWSRLHRLVYLIVPLGVAHYYL--LVKADHRPPIIYGAVFVALMLWRVWE----GRRTASKSS YT80_RALSO/ 10-211 (138) GRRWQWLHRLVYVTGVLGILHYWWHKAGKHDFAEVSIYAAVMAVLLGLRVWWVWRGARQGAIAGG Consensus (151) RW LHRLVYLIAILG LHYLW SVK EPIIYA VLAVLL RL R R

B.

Figure 7. YedZ alignment and membrane topology A. Representative amino acid sequence alignment of YedZ bacterial homologs; ECOLI (E. coli) YERPE (Yersinia pestis), BRUME (Brucella melitensis) RHIME (Rhodospirillum rubrum) AGRT (Agrobacterium tumefaciens) CAUCR (Caulobacter crescentus), RALSO (Ralstonia solanacearum), PASMU (Pasteurella multocida), PSEAE (Pseudomonas aeruginosa) XANAC (Xanthomonas campestris), DEIRA (Deinococcus radiodurans) The bottom sequence is the consensus outlined in red. Predicted transmembrane segments for E. coli YedZ are marked with gray bars and numbered I-VI. 100% conserved amino acid residues are in red text and highlighted in yellow. B. YedZ consists of 6 hydrophobic transmembrane segments, it has N- and C- terminal cytoplasmic ends and short interconnecting loops.

51 4.1.3 The possible function of YedZ To further understand YedZ, we analyzed the DNA surrounding the yedZ gene in the E. coli genome. The gene encoding YedZ is in a putative operon together with a gene encoding YedY. YedY encodes a soluble periplasmic protein. The putative yedYZ operon possesses an upstream σ70 consensus sequence, suggesting that it is likely constitutively expressed, Figure 8a. A region of YedY shares at least 25% sequence identity to a molybdenum-molybdopterin (Mo-MPT) binding domain that is present in soluble assimilatory nitrate reductases (NR) found in plants, algae and fungi, Figure 8b. These enzymes catalyze the conversion of nitrate to nitrite to assimilate inorganic nitrogen (Barber et al., 2002).

Figure 8. Analysis of the putative yedYZ operon. A. The putative yedYZ operon possesses a σ70 consensus sequence. The stop codon of yedY is immediately followed by the start codon of yedZ and the ribosome binding site (RBS) of the yedZ gene is in the end of the yedY coding region. B. YedY belongs to a protein family of oxidoreductase molybdopterin binding proteins Pfam P11605, and shares 25 % sequence identity to the Mo-MPT domain in eukaryotic assimilatory nitrate reductases. The example shown is tobacco NR NIA2. C. E. coli ∆tatC and control cells expressing YedY-HA were labeled with 35S-methionine. YedY-HA was immunoprecipitated with an antiserum against the HA-epitope tag and immuno-precipitates were analyzed by means of SDS-PAGE and fluorography (unpublished).

52 The amino acid signature for Mo-MPT in bacteria and eukaryotes is usually different as bacteria further modify this domain to Mo-MGD (Campbell, 1996). However, a recent 3D-structure confirmed that YedY contained the unmodified cofactor Mo-MPT (Loschi et al., 2004). Like other periplasmic co-factor containing proteins it also contains a twin arginine motif (K-R-R-Q-V-L-K), which is a signal for export via the TAT translocase; a proteinaceous channel that preferentially translocates fully-folded proteins with cytosolically incorporated co-factors (Gohlke et al., 2005), Figure 8c.

Eukaryotic assimilatory NR is a homodimer of two soluble ~100 kDa subunits, each subunit containing 3 modular units in a 1:1:1 ratio of Mo-MPT, cytochrome b5 and flavin, usually FAD. Each modular unit in NR is thought to have evolved independently, and the three units are linked by highly variable hinge-like sequences (Campbell, 2001; Hyde et al., 1991). Our analysis of the putative yedYZ operon suggests that the membrane-bound YedZ protein could be equivalent to

the soluble cytochrome b5 and flavin-binding domains, and together with the globular Mo-MPT containing protein YedY, constitute an assimilatory periplasmic NR, Figure 9.

Figure 9. YedYZ is a novel nitrate reductase. Schematic representation of the relationship between YedYZ proteins and eukaryotic assimilatory NR, the example shown is the NR NIA2 from tobacco (unpublished).

53 In support of this hypothesis we have found that the periplasmic NR capacity in a yedYZ deletion mutant strain (MG1655 derivative) was slightly lower than in a control grown aerobically (unpublished data). More striking, was a clear increase in NR activity for a single yedZ deletion. Presumably uncoupling YedY allows greater access of the artificial electron donor (reduced methyl viologen), used in the NR assay to the Mo-MPT site (unpublished data).

54 5. Conclusions

In E. coli, membrane protein topology can be rapidly deduced from a combination of computer predictions and single C-terminal PhoA/GFP reporter- protein fusions (Paper I). Reporter fusions made to either the N- or C-terminal ends of membrane proteins are more informative than fusions placed elsewhere (Paper II), and are enough to appreciably improve membrane protein topology predictions (Daley et al., 2005; Melen et al., 2003). This approach has been applied to the entire E. coli inner membrane proteome (Daley et al., 2005). As expected, ~80% of membrane proteins contain cytoplasmic C-termini, and thus, can be further analyzed by a GFP-based overexpression and purification ‘pipeline’ (Papers III and IV). This pipeline allows highly overexpressed membrane proteins to be rapidly and easily screened for. As demonstrated for the membrane proteins GltP and YedZ, this pipeline can recover intact, full- length functional membrane proteins from membrane protein-GFP fusions (Paper IV). The usefulness of combining the aforementioned topology information with the GFP-based pipeline was brought to the fore with the characterization of the membrane protein YedZ; the first identified integral membrane flavocytochrome. It is envisaged that this GFP-based methodology in combination with the E. coli membrane protein-GFP library, will facilitate the characterization of the many E. coli membrane proteins without a known function. As demonstrated by the functional expression of the human KDEL receptor-GFP fusion in L. lactis (Paper IV), this technology also holds great promise for the eagerly awaited functional and structure characterization of eukaryotic membrane proteins.

55 References

Abramson, J., Riistama, S., Larsson, G., Jasaitis, A., Svensson-Ek, M., Laakkonen, L., Puustinen, A., Iwata, S. and Wikstrom, M. (2000) The structure of the ubiquinol oxidase from Escherichia coli and its ubiquinone binding site. Nat Struct Biol, 7, 910-917. Abramson, J., Smirnova, I., Kasho, V., Verner, G., Kaback, H.R. and Iwata, S. (2003) Structure and mechanism of the lactose permease of Escherichia coli. Science, 301, 610-615. Adamian, L., Nanda, V., DeGrado, W.F. and Liang, J. (2005) Empirical lipid propensities of amino acid residues in multispan alpha helical membrane proteins. Proteins, 59, 496-509. Afonyushkin, T., Moll, I., Blasi, U. and Kaberdin, V.R. (2003) Temperature-dependent stability and translation of Escherichia coli ompA mRNA. Biochem Biophys Res Commun, 311, 604-609. Andersson, H. and von Heijne, G. (1994) Membrane protein topology: effects of delta mu H+ on the translocation of charged residues explain the 'positive inside' rule. EMBO J, 13, 2267-2272. Arechaga, I., Miroux, B., Karrasch, S., Huijbregts, R., de Kruijff, B., Runswick, M.J. and Walker, J.E. (2000) Characterisation of new intracellular membranes in Escherichia coli accompanying large scale over-production of the b subunit of F(1)F(o) ATP synthase. FEBS Lett, 482, 215-219. Arechaga, I., Miroux, B., Runswick, M.J. and Walker, J.E. (2003) Over-expression of Escherichia coli F1F(o)-ATPase subunit a is inhibited by instability of the uncB gene transcript. FEBS Lett, 547, 97-100. Barber, M.J., Desai, S.K., Marohnic, C.C., Hernandez, H.H. and Pollock, V.V. (2002) Synthesis and bacterial expression of a gene encoding the heme domain of assimilatory nitrate reductase. Arch Biochem Biophys, 402, 38-50. Batey, R.T., Rambo, R.P., Lucast, L., Rha, B. and Doudna, J.A. (2000) Crystal structure of the ribonucleoprotein core of the signal recognition particle. Science, 287, 1232-1239. Behrendt, J., Standar, K., Lindenstrauss, U. and Bruser, T. (2004) Topological studies on the twin-arginine translocase component TatC. FEMS Microbiol Lett, 234, 303- 308. Bernsel, A. and Von Heijne, G. (2005) Improved membrane protein topology prediction by domain assignments. Protein Sci, 14, 1723-1728. Bertero, M.G., Rothery, R.A., Palak, M., Hou, C., Lim, D., Blasco, F., Weiner, J.H. and Strynadka, N.C. (2003) Insights into the respiratory electron transfer pathway from the structure of nitrate reductase A. Nat Struct Biol, 10, 681-687. Bogdanov, M., Heacock, P.N. and Dowhan, W. (2002) A polytopic membrane protein displays a reversible topology dependent on membrane lipid composition. EMBO J, 21, 2107-2116. Bogdanov, M., Zhang, W., Xie, J. and Dowhan, W. (2005) Transmembrane protein topology mapping by the substituted cysteine accessibility method (SCAM(TM)): application to lipid-specific membrane protein topogenesis. Methods, 36, 148- 171. Boon, J.M. and Smith, B.D. (2002) Chemical control of phospholipid distribution across bilayer membranes. Med Res Rev, 22, 251-281. Booth, P.J. (2005) Sane in the membrane: designing systems to modulate membrane proteins. Curr Opin Struct Biol, 15, 435-440. Campbell, W.H. (1996) Nitrate Reductase Biochemistry Comes of Age. Plant Physiol, 111, 355-361.

56 Campbell, W.H. (2001) Structure and function of eukaryotic NAD(P)H:nitrate reductase. Cell Mol Life Sci, 58, 194-204. Chamberlain, A.K., Lee, Y., Kim, S. and Bowie, J.U. (2004) Snorkeling preferences foster an amino acid composition bias in transmembrane helices. J Mol Biol, 339, 471-479. Chang, G., Spencer, R.H., Lee, A.T., Barclay, M.T. and Rees, D.C. (1998) Structure of the MscL homolog from Mycobacterium tuberculosis: a gated mechanosensitive ion channel. Science, 282, 2220-2226. Chang, H.C., Kaiser, C.M., Hartl, F.U. and Barral, J.M. (2005) De novo Folding of GFP Fusion Proteins: High Efficiency in Eukaryotes but Not in Bacteria. J Mol Biol, 353, 397-409. Crameri, A., Whitehorn, E.A., Tate, E. and Stemmer, W.P. (1996) Improved green fluorescent protein by molecular evolution using DNA shuffling. Nat Biotechnol, 14, 315-319. Culham, D.E., Hillar, A., Henderson, J., Ly, A., Vernikovska, Y.I., Racher, K.I., Boggs, J.M. and Wood, J.M. (2003) Creation of a fully functional cysteine-less variant of osmosensor and proton-osmoprotectant symporter ProP from Escherichia coli and its application to assess the transporter's membrane orientation. Biochemistry, 42, 11815-11823. Daley, D.O., Rapp, M., Granseth, E., Melen, K., Drew, D. and von Heijne, G. (2005) Global topology analysis of the Escherichia coli inner membrane proteome. Science, 308, 1321-1323. Danielsen, S., Boyd, D. and Neuhard, J. (1995) Membrane topology analysis of the Escherichia coli cytosine permease. Microbiology, 141 ( Pt 11), 2905-2913. Dawson, J.P., Melnyk, R.A., Deber, C.M. and Engelman, D.M. (2003) Sequence context strongly modulates association of polar residues in transmembrane helices. J Mol Biol, 331, 255-262. Dawson, J.P., Weinger, J.S. and Engelman, D.M. (2002) Motifs of serine and threonine can drive association of transmembrane helices. J Mol Biol, 316, 799-805. DeGrado, W.F., Gratkowski, H. and Lear, J.D. (2003) How do helix-helix interactions help determine the folds of membrane proteins? Perspectives from the study of homo-oligomeric helical bundles. Protein Sci, 12, 647-665. Dobrovetsky, E., Lu, M.L., Andorn-Broza, R., Khutoreskaya, G., Bray, J.E., Savchenko, A., Arrowsmith, C.H., Edwards, A.M. and Koth, C.M. (2005) High-throughput production of prokaryotic membrane proteins. J Struct Funct Genomics, 6, 33-50. Doyle, D.A., Morais Cabral, J., Pfuetzner, R.A., Kuo, A., Gulbis, J.M., Cohen, S.L., Chait, B.T. and MacKinnon, R. (1998) The structure of the potassium channel: molecular basis of K+ conduction and selectivity. Science, 280, 69-77. Drew, D., Froderberg, L., Baars, L. and de Gier, J.W. (2003) Assembly and overexpression of membrane proteins in Escherichia coli. Biochim Biophys Acta, 1610, 3-10. Drew, D., Lerch, M, Kunji, E, Slotboom, DJ, and de Gier, JW. (In preparation) Optimizing membrane protein overexpression and purification using GFP fusions. Nature Methods. Drew, D., Sjostrand, D., Nilsson, J., Urbig, T., Chin, C.N., de Gier, J.W. and von Heijne, G. (2002) Rapid topology mapping of Escherichia coli inner-membrane proteins by prediction and PhoA/GFP fusion analysis. Proc Natl Acad Sci U S A, 99, 2690- 2695. Drew, D., Slotboom, D.J., Friso, G., Reda, T., Genevaux, P., Rapp, M., Meindl-Beinker, N.M., Lambert, W., Lerch, M., Daley, D.O., Van Wijk, K.J., Hirst, J., Kunji, E. and De Gier, J.W. (2005) A scalable, GFP-based pipeline for membrane protein overexpression screening and purification. Protein Sci, 14, 2011-2017.

57 Drew, D.E., von Heijne, G., Nordlund, P. and de Gier, J.W. (2001) Green fluorescent protein as an indicator to monitor membrane protein overexpression in Escherichia coli. FEBS Lett, 507, 220-224. Driessen, A.J., Manting, E.H. and van der Does, C. (2001) The structural basis of protein targeting and translocation in bacteria. Nat Struct Biol, 8, 492-498. Dutzler, R., Campbell, E.B., Cadene, M., Chait, B.T. and MacKinnon, R. (2002) X-ray structure of a ClC chloride channel at 3.0 A reveals the molecular basis of anion selectivity. Nature, 415, 287-294. Edwards, M.D., Li, Y., Kim, S., Miller, S., Bartlett, W., Black, S., Dennison, S., Iscla, I., Blount, P., Bowie, J.U. and Booth, I.R. (2005) Pivotal role of the glycine-rich TM3 helix in gating the MscS mechanosensitive channel. Nat Struct Mol Biol, 12, 113-119. Engelman, D.M., Chen, Y., Chin, C.N., Curran, A.R., Dixon, A.M., Dupuy, A.D., Lee, A.S., Lehnert, U., Matthews, E.E., Reshetnyak, Y.K., Senes, A. and Popot, J.L. (2003) Membrane protein folding: beyond the two stage model. FEBS Lett, 555, 122-125. Eroglu, C., Cronet, P., Panneels, V., Beaufils, P. and Sinning, I. (2002) Functional reconstitution of purified metabotropic glutamate receptor expressed in the fly eye. EMBO Rep, 3, 491-496. Eshaghi, S., Hedren, M., Nasser, M.I., Hammarberg, T., Thornell, A. and Nordlund, P. (2005) An efficient strategy for high-throughput expression screening of recombinant integral membrane proteins. Protein Sci, 14, 676-683. Farewell, A. and Neidhardt, F.C. (1998) Effect of temperature on in vivo protein synthetic capacity in Escherichia coli. J Bacteriol, 180, 4704-4710. Feilmeier, B.J., Iseminger, G., Schroeder, D., Webber, H. and Phillips, G.J. (2000) Green fluorescent protein functions as a reporter for protein localization in Escherichia coli. J Bacteriol, 182, 4068-4076. Fraaije, M.W. and Mattevi, A. (2000) Flavoenzymes: diverse catalysts with recurrent features. Trends Biochem Sci, 25, 126-132. Freedman, S.D., Katz, M.H., Parker, E.M., Laposata, M., Urman, M.Y. and Alvarez, J.G. (1999) A membrane lipid imbalance plays a role in the phenotypic expression of cystic fibrosis in cftr(-/-) mice. Proc Natl Acad Sci U S A, 96, 13995-14000. Frillingos, S., Sahin-Toth, M., Wu, J. and Kaback, H.R. (1998) Cys-scanning mutagenesis: a novel approach to structure function relationships in polytopic membrane proteins. FASEB J, 12, 1281-1299. Fu, D., Libson, A., Miercke, L.J., Weitzman, C., Nollert, P., Krucinski, J. and Stroud, R.M. (2000) Structure of a glycerol-conducting channel and the basis for its selectivity. Science, 290, 481-486. Fukuda, H., Arai, M. and Kuwajima, K. (2000) Folding of green fluorescent protein and the cycle3 mutant. Biochemistry, 39, 12025-12032. Fyfe, P.K., Hughes, A.V., Heathcote, P. and Jones, M.R. (2005) Proteins, chlorophylls and lipids: X-ray analysis of a three-way relationship. Trends Plant Sci, 10, 275- 282. Fyfe, P.K., Isaacs, N.W., Cogdell, R.J. and Jones, M.R. (2004) Disruption of a specific molecular interaction with a bound lipid affects the thermal stability of the purple bacterial reaction centre. Biochim Biophys Acta, 1608, 11-22. Gandlur, S.M., Wei, L., Levine, J., Russell, J. and Kaur, P. (2004) Membrane topology of the DrrB protein of the doxorubicin transporter of Streptomyces peucetius. J Biol Chem, 279, 27799-27806. Geertsma, E.R. (2005) What lies between: Functional interfaces in a dimeric transporter. Department of Biochemistry. University of Groningen, Groningen, p. 101. Goder, V., Junne, T. and Spiess, M. (2004) Sec61p contributes to signal sequence orientation according to the positive-inside rule. Mol Biol Cell, 15, 1470-1478.

58 Gohlke, U., Pullan, L., McDevitt, C.A., Porcelli, I., de Leeuw, E., Palmer, T., Saibil, H.R. and Berks, B.C. (2005) The TatA component of the twin-arginine protein transport system forms channel complexes of variable diameter. Proc Natl Acad Sci U S A, 102, 10482-10486. Gouffi, K., Santini, C.L. and Wu, L.F. (2002) Topology determination and functional analysis of the Escherichia coli TatC protein. FEBS Lett, 525, 65-70. Granseth, E., Daley, D.O., Rapp, M., Melen, K. and von Heijne, G. (2005a) Experimentally constrained topology models for 51,208 bacterial inner membrane proteins. J Mol Biol, 352, 489-494. Granseth, E., von Heijne, G. and Elofsson, A. (2005b) A study of the membrane-water interface region of membrane proteins. J Mol Biol, 346, 377-385. Green, D.H. and Cutting, S.M. (2000) Membrane topology of the Bacillus subtilis pro- sigma(K) processing complex. J Bacteriol, 182, 278-285. Griffith, D.A., Delipala, C., Leadsham, J., Jarvis, S.M. and Oesterhelt, D. (2003) A novel yeast expression system for the overproduction of quality-controlled membrane proteins. FEBS Lett, 553, 45-50. Grisshammer, R., Duckworth, R. and Henderson, R. (1993) Expression of a rat neurotensin receptor in Escherichia coli. Biochem J, 295 ( Pt 2), 571-576. Grisshammer, R. and Tate, C.G. (1995) Overexpression of integral membrane proteins for structural studies. Q Rev Biophys, 28, 315-422. Gromiha, M.M. and Suwa, M. (2005) Structural analysis of residues involving cation-pi interactions in different folding types of membrane proteins. Int J Biol Macromol, 35, 55-62. Hedhammar, M., Stenvall, M., Lonneborg, R., Nord, O., Sjolin, O., Brismar, H., Uhlen, M., Ottosson, J. and Hober, S. (2005) A novel flow cytometry-based method for analysis of expression levels in Escherichia coli, giving information about precipitated and soluble protein. J Biotechnol, 119, 133-146. Hefti, M.H., Vervoort, J. and van Berkel, W.J. (2003) Deflavination and reconstitution of flavoproteins. Eur J Biochem, 270, 4227-4242. Helms, V. (2002) Attraction within the membrane. Forces behind transmembrane protein folding and supramolecular complex assembly. EMBO Rep, 3, 1133-1138. Hermansson, M. and von Heijne, G. (2003) Inter-helical hydrogen bond formation during membrane protein integration into the ER membrane. J Mol Biol, 334, 803-809. Hessa, T., Kim, H., Bihlmaier, K., Lundin, C., Boekel, J., Andersson, H., Nilsson, I., White, S.H. and von Heijne, G. (2005a) Recognition of transmembrane helices by the endoplasmic reticulum translocon. Nature, 433, 377-381. Hessa, T., White, S.H. and von Heijne, G. (2005b) Membrane insertion of a potassium- channel voltage sensor. Science, 307, 1427. Higy, M., Junne, T. and Spiess, M. (2004) Topogenesis of membrane proteins at the endoplasmic reticulum. Biochemistry, 43, 12716-12722. Hoag, H. (2005) Expression of interest. Nature, 437, 164-165. Hong, H. and Tamm, L.K. (2004) Elastic coupling of integral membrane protein stability to lipid bilayer forces. Proc Natl Acad Sci U S A, 101, 4065-4070. Houben, E.N., Zarivach, R., Oudega, B. and Luirink, J. (2005) Early encounters of a nascent membrane protein: specificity and timing of contacts inside and outside the ribosome. J Cell Biol, 170, 27-35. Huang, Y., Lemieux, M.J., Song, J., Auer, M. and Wang, D.N. (2003) Structure and mechanism of the glycerol-3-phosphate transporter from Escherichia coli. Science, 301, 616-620. Huber, D., Boyd, D., Xia, Y., Olma, M.H., Gerstein, M. and Beckwith, J. (2005) Use of thioredoxin as a reporter to identify a subset of Escherichia coli signal sequences that promote signal recognition particle-dependent translocation. J Bacteriol, 187, 2983-2991.

59 Hunte, C., Screpanti, E., Venturi, M., Rimon, A., Padan, E. and Michel, H. (2005) Structure of a Na+/H+ antiporter and insights into mechanism of action and regulation by pH. Nature, 435, 1197-1202. Hyde, G.E., Crawford, N.M. and Campbell, W.H. (1991) The sequence of squash NADH:nitrate reductase and its relationship to the sequences of other flavoprotein oxidoreductases. A family of flavoprotein pyridine nucleotide cytochrome reductases. J Biol Chem, 266, 23542-23547. Ito, K. and Akiyama, Y. (1991) In vivo analysis of integration of membrane proteins in Escherichia coli. Mol Microbiol, 5, 2243-2253. Ito, K. and Akiyama, Y. (2005) Cellular functions, mechanism of action, and regulation of ftsh protease. Annu Rev Microbiol, 59, 211-231. Iwata, M., Okada, K. and Iwata, S. (1999) [Structure of cytochrome bc1 complex from bovine heart mitochondria]. Tanpakushitsu Kakusan Koso, 44, 643-654. Jakubowski, S.J., Krishnamoorthy, V., Cascales, E. and Christie, P.J. (2004) Agrobacterium tumefaciens VirB6 domains direct the ordered export of a DNA substrate through a type IV secretion System. J Mol Biol, 341, 961-977. Jensen, M.O. and Mouritsen, O.G. (2004) Lipids do influence protein function: the hydrophobic matching hypothesis revisited. Biochim Biophys Acta, 1666, 205- 226. Jensen, M.O., Mouritsen, O.G. and Peters, G.H. (2004) The hydrophobic effect: molecular dynamics simulations of water confined between extended hydrophobic and hydrophilic surfaces. J Chem Phys, 120, 9729-9744. Jones, D.T., Taylor, W.R. and Thornton, J.M. (1994) A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry, 33, 3038-3049. Jordan, P., Fromme, P., Witt, H.T., Klukas, O., Saenger, W. and Krauss, N. (2001) Three-dimensional structure of cyanobacterial photosystem I at 2.5 A resolution. Nature, 411, 909-917. Kaneko, M. and Nomura, Y. (2003) ER signaling in unfolded protein response. Life Sci, 74, 199-205. Kastner, C.N., Dimroth, P. and Pos, K.M. (2000) The Na+-dependent citrate carrier of Klebsiella pneumoniae: high-level expression and site-directed mutagenesis of asparagine-185 and glutamate-194. Arch Microbiol, 174, 67-73. Khademi, S., O'Connell, J., 3rd, Remis, J., Robles-Colmenares, Y., Miercke, L.J. and Stroud, R.M. (2004) Mechanism of ammonia transport by Amt/MEP/Rh: structure of AmtB at 1.35 A. Science, 305, 1587-1594. Kiefer, H., Krieger, J., Olszewski, J.D., Von Heijne, G., Prestwich, G.D. and Breer, H. (1996) Expression of an olfactory receptor in Escherichia coli: purification, reconstitution, and ligand binding. Biochemistry, 35, 16077-16084. Kihara, A., Akiyama, Y. and Ito, K. (1995) FtsH is required for proteolytic elimination of uncomplexed forms of SecY, an essential protein translocase subunit. Proc Natl Acad Sci U S A, 92, 4532-4536. Kim, H., Melen, K. and von Heijne, G. (2003) Topology models for 37 Saccharomyces cerevisiae membrane proteins based on C-terminal reporter fusions and predictions. J Biol Chem, 278, 10208-10213. Kim, H., Unby, M, Melén, K, Warringer, J, Blomberg, A, von Heijne, G. (In preparation) A global topology map of the Saccharomyces cerevisiae membrane proteome. Korepanova, A., Gao, F.P., Hua, Y., Qin, H., Nakamoto, R.K. and Cross, T.A. (2005) Cloning and expression of multiple integral membrane proteins from Mycobacterium tuberculosis in Escherichia coli. Protein Sci, 14, 148-158. Krebber, C., Spada, S., Desplancq, D., Krebber, A., Ge, L. and Pluckthun, A. (1997) Selectively-infective phage (SIP): a mechanistic dissection of a novel in vivo selection for protein-ligand interactions. J Mol Biol, 268, 607-618.

60 Krishnan, M.N., Bingham, J.P., Lee, S.H., Trombley, P. and Moczydlowski, E. (2005) Functional Role and Affinity of Inorganic Cations in Stabilizing the Tetrameric Structure of the KcsA K+ Channel. J Gen Physiol, 126, 271-283. Krogh, A., Larsson, B., von Heijne, G. and Sonnhammer, E.L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol, 305, 567-580. Kung, C. (2005) A possible unifying principle for mechanosensation. Nature, 436, 647- 654. Kunji, E.R., Chan, K.W., Slotboom, D.J., Floyd, S., O'Connor, R. and Monne, M. (2005) Eukaryotic membrane protein overproduction in Lactococcus lactis. Curr Opin Biotechnol, 16, 546-51. Kunji, E.R., Slotboom, D.J. and Poolman, B. (2003) Lactococcus lactis as host for overproduction of functional membrane proteins. Biochim Biophys Acta, 1610, 97-108. Locher, K.P., Lee, A.T. and Rees, D.C. (2002) The E. coli BtuCD structure: a framework for ABC transporter architecture and mechanism. Science, 296, 1091-1098. Long, S.B., Campbell, E.B. and Mackinnon, R. (2005a) Crystal structure of a mammalian voltage-dependent Shaker family K+ channel. Science, 309, 897-903. Long, S.B., Campbell, E.B. and Mackinnon, R. (2005b) Voltage Sensor of Kv1.2: Structural Basis of Electromechanical Coupling. Science, 309, 903-908. Loschi, L., Brokx, S.J., Hills, T.L., Zhang, G., Bertero, M.G., Lovering, A.L., Weiner, J.H. and Strynadka, N.C. (2004) Structural and biochemical identification of a novel bacterial oxidoreductase. J Biol Chem. Lostao, A., Daoudi, F., Irun, M.P., Ramon, A., Fernandez-Cabrera, C., Romero, A. and Sancho, J. (2003) How FMN binds to anabaena apoflavodoxin: a hydrophobic encounter at an open binding site. J Biol Chem, 278, 24053-24061. Luirink, J. and Sinning, I. (2004) SRP-mediated protein targeting: structure and function revisited. Biochim Biophys Acta, 1694, 17-35. Ma, C. and Chang, G. (2004) Structure of the multidrug resistance efflux transporter EmrE from Escherichia coli. Proc Natl Acad Sci U S A, 101, 2852-2857. Manoil, C. (1991) Analysis of membrane protein topology using alkaline phosphatase and beta-galactosidase gene fusions. Methods Cell Biol, 34, 61-75. McMurry, J.L., Van Arnam, J.S., Kihara, M. and Macnab, R.M. (2004) Analysis of the cytoplasmic domains of Salmonella FlhA and interactions with components of the flagellar export machinery. J Bacteriol, 186, 7586-7592. Melen, K., Krogh, A. and von Heijne, G. (2003) Reliability measures for membrane protein topology prediction algorithms. J Mol Biol, 327, 735-744. Miroux, B. and Walker, J.E. (1996) Over-production of proteins in Escherichia coli: mutant hosts that allow synthesis of some membrane proteins and globular proteins at high levels. J Mol Biol, 260, 289-298. Mohanty, A.K., Simmons, C.R. and Wiener, M.C. (2003) Inhibition of tobacco etch virus protease activity by detergents. Protein Expr Purif, 27, 109-114. Morgan-Kiss, R.M., Wadler, C. and Cronan, J.E., Jr. (2002) Long-term and homogeneous regulation of the Escherichia coli araBAD promoter by use of a lactose transporter of relaxed specificity. Proc Natl Acad Sci U S A, 99, 7373-7377. Muller, G. (2000) Towards 3D structures of G protein-coupled receptors: a multidisciplinary approach. Curr Med Chem, 7, 861-888. Munro, S. (1998) Localization of proteins to the Golgi apparatus. Trends Cell Biol, 8, 11- 15. Nilsson, I.M. and von Heijne, G. (1993) Determination of the distance between the oligosaccharyltransferase active site and the endoplasmic reticulum membrane. J Biol Chem, 268, 5798-5801. Nilsson, J., Persson, B. and von Heijne, G. (2000) Consensus predictions of membrane protein topology. FEBS Lett, 486, 267-269.

61 Nilsson, J., Persson, B. and von Heijne, G. (2005) Comparative analysis of amino acid distributions in integral membrane proteins from 107 genomes. Proteins, 60, 606- 616. Ott, C.M. and Lingappa, V.R. (2002) Integral membrane protein biosynthesis: why topology is hard to predict. J Cell Sci, 115, 2003-2009. Palczewski, K., Kumasaka, T., Hori, T., Behnke, C.A., Motoshima, H., Fox, B.A., Le Trong, I., Teller, D.C., Okada, T., Stenkamp, R.E., Yamamoto, M. and Miyano, M. (2000) Crystal structure of rhodopsin: A G protein-coupled receptor. Science, 289, 739-745. Palmer, T. and Berks, B.C. (2003) Moving folded proteins across the bacterial cell membrane. Microbiology, 149, 547-556. Pan, S.H. and Malcolm, B.A. (2000) Reduced background expression and improved plasmid stability with pET vectors in BL21 (DE3). Biotechniques, 29, 1234-1238. Park, S.H. and Opella, S.J. (2005) Tilt angle of a trans-membrane helix is determined by hydrophobic mismatch. J Mol Biol, 350, 310-318. Pourcher, T., Bibi, E., Kaback, H.R. and Leblanc, G. (1996) Membrane topology of the melibiose permease of Escherichia coli studied by melB-phoA fusion analysis. Biochemistry, 35, 4161-4168. Quick, M. and Wright, E.M. (2002) Employing Escherichia coli to functionally express, purify, and characterize a human transporter. Proc Natl Acad Sci U S A, 99, 8597- 8601. Rapoport, T.A., Goder, V., Heinrich, S.U. and Matlack, K.E. (2004) Membrane-protein integration and the role of the translocation channel. Trends Cell Biol, 14, 568- 575. Rapp, M., Drew, D., Daley, D.O., Nilsson, J., Carvalho, T., Melen, K., De Gier, J.W. and Von Heijne, G. (2004) Experimentally based topology models for E. coli inner membrane proteins. Protein Sci, 13, 937-945. Raunser, S., Haase, W., Bostina, M., Parcej, D.N. and Kuhlbrandt, W. (2005) High-yield expression, reconstitution and structure of the recombinant, fully functional glutamate transporter GLT-1 from Rattus norvegicus. J Mol Biol, 351, 598-613. Reyes, C.L. and Chang, G. (2005) Structure of the ABC transporter MsbA in complex with ADP.vanadate and lipopolysaccharide. Science, 308, 1028-1031. Rost, B., Fariselli, P. and Casadio, R. (1996) Topology prediction for helical transmembrane proteins at 86% accuracy. Protein Sci, 5, 1704-1718. Rudner, D.Z., Fawcett, P. and Losick, R. (1999) A family of membrane-embedded metalloproteases involved in regulated proteolysis of membrane-associated transcription factors. Proc Natl Acad Sci U S A, 96, 14765-14770. Sarramegna, V., Talmont, F., Seree de Roch, M., Milon, A. and Demange, P. (2002) Green fluorescent protein as a reporter of human mu-opioid receptor overexpression and localization in the methylotrophic yeast Pichia pastoris. J Biotechnol, 99, 23-39. Savage, D.F., Egea, P.F., Robles-Colmenares, Y., O'Connell, J.D., 3rd and Stroud, R.M. (2003) Architecture and selectivity in aquaporins: 2.5 a X-ray structure of aquaporin Z. PLoS Biol, 1, E72. Schiller, H., Molsberger, E., Janssen, P., Michel, H. and Reilander, H. (2001) Solubilization and purification of the human ETB endothelin receptor produced by high-level fermentation in Pichia pastoris. Receptors Channels, 7, 453-469. Schneider, D. (2004) Rendezvous in a membrane: close packing, hydrogen bonding, and the formation of transmembrane helix oligomers. FEBS Lett, 577, 5-8. Schulz, G.E. (2003) Transmembrane beta-barrel proteins. Adv Protein Chem, 63, 47-70. Seddon, A.M., Curnow, P. and Booth, P.J. (2004) Membrane proteins, lipids and detergents: not just a soap opera. Biochim Biophys Acta, 1666, 105-117.

62 Senes, A., Engel, D.E. and DeGrado, W.F. (2004) Folding of helical membrane proteins: the role of polar, GxxxG-like and proline motifs. Curr Opin Struct Biol, 14, 465- 479. Senes, A., Gerstein, M. and Engelman, D.M. (2000) Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with beta-branched residues at neighboring positions. J Mol Biol, 296, 921-936. Senes, A., Ubarretxena-Belandia, I. and Engelman, D.M. (2001) The Calpha ---H...O hydrogen bond: a determinant of stability and specificity in transmembrane helix interactions. Proc Natl Acad Sci U S A, 98, 9056-9061. Severance, S., Chakraborty, S. and Kosman, D.J. (2004) The Ftr1p iron permease in the yeast plasma membrane: orientation, topology and structure-function relationships. Biochem J, 380, 487-496. Sorensen, H.P. and Mortensen, K.K. (2005) Soluble expression of recombinant proteins in the cytoplasm of Escherichia coli. Microb Cell Fact, 4, 1-8. Strandberg, E. and Killian, J.A. (2003) Snorkeling of lysine side chains in transmembrane helices: how easy can it get? FEBS Lett, 544, 69-73. Studier, F.W. and Moffatt, B.A. (1986) Use of bacteriophage T7 RNA polymerase to direct selective high-level expression of cloned genes. J Mol Biol, 189, 113-130. Tate, C.G. (2001) Overexpression of mammalian integral membrane proteins for structural studies. FEBS Lett, 504, 94-98. Tate, C.G. and Blakely, R.D. (1994) The effect of N-linked glycosylation on activity of the Na(+)- and Cl(-)-dependent serotonin transporter expressed using recombinant baculovirus in insect cells. J Biol Chem, 269, 26303-26310. Tate, C.G., Whiteley, E. and Betenbaugh, M.J. (1999) Molecular chaperones stimulate the functional expression of the cocaine-sensitive serotonin transporter. J Biol Chem, 274, 17551-17558. Thomas, J.D., Daniel, R.A., Errington, J. and Robinson, C. (2001) Export of active green fluorescent protein to the periplasm by the twin-arginine translocase (Tat) pathway in Escherichia coli. Mol Microbiol, 39, 47-53. Tsien, R.Y. (1998) The green fluorescent protein. Annu Rev Biochem, 67, 509-544. Tusnady, G.E. and Simon, I. (1998) Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J Mol Biol, 283, 489-506. Ulmschneider, M.B., Sansom, M.S. and Di Nola, A. (2005) Properties of integral membrane protein structures: derivation of an implicit membrane potential. Proteins, 59, 252-265. Urbanus, M.L., Froderberg, L., Drew, D., Bjork, P., de Gier, J.W., Brunner, J., Oudega, B. and Luirink, J. (2002) Targeting, insertion, and localization of Escherichia coli YidC. J Biol Chem, 277, 12718-12723. Valent, Q.A., de Gier, J.W., von Heijne, G., Kendall, D.A., ten Hagen-Jongman, C.M., Oudega, B. and Luirink, J. (1997) Nascent membrane and presecretory proteins synthesized in Escherichia coli associate with signal recognition particle and trigger factor. Mol Microbiol, 25, 53-64. Van den Berg, B., Clemons, W.M., Jr., Collinson, I., Modis, Y., Hartmann, E., Harrison, S.C. and Rapoport, T.A. (2004) X-ray structure of a protein-conducting channel. Nature, 427, 36-44. van Geest, M. and Lolkema, J.S. (2000) Membrane topology and insertion of membrane proteins: search for topogenic signals. Microbiol Mol Biol Rev, 64, 13-33. Viklund, H. and Elofsson, A. (2004) Best alpha-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information. Protein Sci, 13, 1908-1917. von Heijne, G. (1989) Control of topology and mode of assembly of a polytopic membrane protein by positively charged residues. Nature, 341, 456-458.

63 von Heijne, G. (1992) Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol, 225, 487-494. Waldo, G.S., Standish, B.M., Berendzen, J. and Terwilliger, T.C. (1999) Rapid protein- folding assay using green fluorescent protein. Nat Biotechnol, 17, 691-695. Wallace, B., Yang, Y.J., Hong, J.S. and Lum, D. (1990) Cloning and sequencing of a gene encoding a glutamate and aspartate carrier of Escherichia coli K-12. J Bacteriol, 172, 3214-3220. Walian, P., Cross, T. and Jap, B.K. (2004) Structural genomics of membrane proteins. Genome Biology, 5, 215.1-8. Wallin, E. and von Heijne, G. (1995) Properties of N-terminal tails in G-protein coupled receptors: a statistical study. Protein Eng, 8, 693-698. Wallin, E. and von Heijne, G. (1998) Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci, 7, 1029-1038. Wang, D.N., Safferling, M., Lemieux, M.J., Griffith, H., Chen, Y. and Li, X.D. (2003) Practical aspects of overexpressing bacterial secondary membrane transporters for structural studies. Biochim Biophys Acta, 1610, 23-36. Weiss, H.M. and Grisshammer, R. (2002) Purification and characterization of the human adenosine A(2a) receptor functionally expressed in Escherichia coli. Eur J Biochem, 269, 82-92. White, S.H. (2004) The progress of membrane protein structure determination. Protein Sci, 13, 1948-1949. White, S.H. and von Heijne, G. (2005) Transmembrane helices before, during, and after insertion. Curr Opin Struct Biol, 15, 378-386. Wimley, W.C. and White, S.H. (1996) Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nat Struct Biol, 3, 842-848. Woolhead, C.A., McCormick, P.J. and Johnson, A.E. (2004) Nascent membrane and secretory proteins differ in FRET-detected folding far inside the ribosome and in their exposure to ribosomal proteins. Cell, 116, 725-736. Yamashita, A., Singh, S.K., Kawate, T., Jin, Y. and Gouaux, E. (2005) Crystal structure of a bacterial homologue of Na(+)/Cl(-)-dependent neurotransmitter transporters. Nature, 437, 215-223. Yeliseev, A.A., Wong, K.K., Soubias, O. and Gawrisch, K. (2005) Expression of human peripheral cannabinoid receptor for structural studies. Protein Sci, 14, 2638-2653. Yohannan, S., Yang, D., Faham, S., Boulting, G., Whitelegge, J. and Bowie, J.U. (2004) Proline substitutions are not easily accommodated in a membrane protein. J Mol Biol, 341, 1-6.

64 Acknowledgements

This will undoubtedly be the most read portion of the thesis. I hope I have managed to convey my heart-felt thanks in a way that is genuine. Enjoyable work has been the fruit of working with good people.

Jan-Willem de Gier: For giving me freedom to pursue my research, for always supporting me, and for probably being the most loyal supervisor you will find anywhere! Gunnar von Heijne: For sharing your fountain of wisdom, for your patience, and for being a testimony to the fact that not all nice guys finish last! Edmund Kunji: For welcoming me warmly into your laboratory and for you infectious excitement of membrane protein research. Pär Nordlund: For your great hospitality, and for introducing me to some of those Swedish traditions. Dan Daley: For all those great training runs, for all those wonderful cooked meals of Geneth’s and for your enormous encouragement, help and time … thanks mate! Magnus Monne: For probably being one of the most generous Swedes anywhere, so much so, that you never complained about the random location of my socks, and you allowed me to share your king-size bed with you for 3 months!! Louise Baars: For your kindness, integrity, and intellect. Joy Kim: For being so joyful, for our fruitful scientific discussions, and for all your kindness and consideration. Wow. Mikaela Rapp: For being so much fun to work with - in the lab and around the coffee table - and for making me feel so welcome when I first arrived from NZ. Mirjam Lerch: For many interesting scientific discussions, for buying me English tea, and for your passion to just about anything. Tara Hessa: For your encouragement and for your ability to laugh when nothing is working. Linda Fröderberg: For calling a spade a spade, for all that endless administration, and for breaking-in Jan-Willem for us! Dirk Slotboom: For pushing me to use my brain and for your generosity. David Wickström: For your fantastic working attitude, for your kindness, and for passing the ball to me playing football … even if I really suck. Samuel Wagner: For all that chocolate you let me eat, and for the gift of earplugs. Marie Unby: For your cheery demeanor which makes working here so much fun. IngMarie Nilsson: For your kindness, generosity, and constant hard work. Marika Cassel: For your kindness, and patience when I am noisy in the office. Filippa Stenberg / Carolina Lundin: The new girls on the block … who make sure that no one will forget to buy the cake and/or champagne. Others (past and present) in DBB which makes this a great environment for working, special thanks to: Karl-Magnus, Gisela, Pavel, Shashi, Lotta, Pelle, Nadja, Bogos, Stefan, Inger, Kicki, and Anki. The rest of the group in Cambridge and in New Zealand: Peter M., Ted, Heather, Judy, Torsten, Ka Wai, Lisa, Marilyn and Peter H. To my good mates in Sweden: Anders, Lisa, Stig, Maria, Joel, Brenda, Bas, Sebastian, Geneth, and the wonderful Uhrnell family. To my good mates in New Zealand (my extended whanau): Daniel, Jamie, Esther, Nathan, Tammy, Steve, Katie, Strahan, Rachel, Jeremy, and Peter Haebel (honorary kiwi) … save a few waves for me boys!! To my family with whom life is very rich indeed: Mum, John, Dad, Jill, Marge, Geoffrey, Aaron, Carolina, Sofia, Paul, Jay, Jonathon, Lorraine, Johan, and my dearest twin, Natasha. To my wife Anna Maria: ….well, words are just not enough to describe how happy you make me! To my Lord: for his agape love and faithfulness.

65