From Biogenesis to Over- expression of Membrane Proteins in Escherichia coli

Samuel Wagner

Stockholm University © Samuel Wagner, Stockholm 2008

ISBN 978-91-7155-594-6, Pages i-81

Printed in Sweden by Universitetsservice AB, Stockholm 2008 Distributor: Department of Biochemistry and Biophysics, Stockholm University

To Claudia.

Abstract

In both pro- and 20-30% of all genes encode α-helical transmem- brane domain proteins, which act in various and often essential capacities. Notably, membrane proteins play key roles in disease and they constitute more than half of all known drug targets. The natural abundance of membrane proteins is in general too low to con- veniently isolate sufficient material for functional and structural studies. Therefore, most membrane proteins have to be obtained through overexpres- sion. Escherichia coli is one of the most successful hosts for overexpression of recombinant proteins, and T7 RNA polymerase-based expression is the major approach to produce recombinant proteins in E. coli. While the pro- duction of soluble proteins is comparably straightforward, overexpression of membrane proteins remains a challenging task. The yield of membrane lo- calized recombinant membrane protein is usually low and inclusion body formation is a serious problem. Furthermore, membrane protein overexpres- sion is often toxic to the host cell. Although several reasons can be postu- lated, the basis of these difficulties is not completely understood. It is gener- ally believed, that the complex requirements of membrane protein biogenesis significantly contribute to the difficulty of membrane protein overexpres- sion. Therefore, an understanding of membrane protein biogenesis is a pre- requisite for understanding membrane protein overexpression and for de- signing rational strategies to improve overexpression yields. The objective of my Ph.D. studies has been to improve membrane protein overexpression in E. coli by a) understanding membrane protein overexpres- sion from the perspective of membrane protein biogenesis, b) systematically investigating the physiological response to overexpression of membrane proteins and c) engineering strains that are optimized for membrane protein overexpression based on insights resulting from these studies. By working toward these objectives, I was able to identify and alleviate one of the major bottlenecks of membrane protein overexpression in E. coli: saturation of the Sec-translocon could be overcome by harmonizing transla- tion and membrane insertion of the recombinant membrane protein. This minimized the toxic effects of overexpression and thus resulted in increased membrane protein-producing biomass.

List of Publications

Primary publications This thesis is based on the following publications, which will be referred to by their roman numerals.

I. Baars L., Wagner S., Wickström D., Klepsch M., Ytterberg A.J., van Wijk K.J., de Gier J.W. (2008). Effects of SecE depletion on the inner and outer membrane proteomes of Escherichia coli. J. Bacteriol. in press.

II. Wagner S.*, Pop O.*, Haan G.J.*, Baars L., Klepsch M.M., Genevaux P., Luirink J., de Gier J.W. Biogenesis of MalF and the MalFGK2 maltose transport complex in Escherichia coli requires YidC. J. Biol. Chem. in revi- sion.

III. Wagner S., Baars L., Ytterberg A.J., Klussmeier, A., Wagner C.S., Nord O., Nygren P.Å., van Wijk K.J., de Gier J.W. (2007). Consequences of membrane protein overexpression in Escherichia coli. Mol. Cell. Proteomics 6:1527-50.

IV. Wagner S., Klepsch M.M., Schlegel S., Appel A., Draheim R., Tarry M., Högbom M., van Wijk K.J., Slotboom D.J., Persson J.O., de Gier J.W. Tuning Escherichia coli for membrane protein overexpression. Submitted manuscript.

Additional publications Klepsch M.M., Schlegel S., Wickström D., Friso G., van Wijk K.J., Persson J.O., de Gier J.W., Wagner S. (2008). Immobilization of the first dimension in 2D blue native/SDS-PAGE allows the relative quantification of membrane proteomes. Methods 46 commissioned manuscript.

Wagner S., de Gier J.W. Expression system for proteins. Patent application filed on 02/29/2008, Reference number: #P08207SE00.

Wagner S.*, Lerch-Bader M.*, Drew D.*, de Gier J.W. (2006). Rationaliz- ing membrane protein overexpression. Trends Biotechnol. 24:364-71.

Baars L., Ytterberg A.J., Drew D., Wagner S., Thilo C., van Wijk K.J., de Gier J.W. (2006). Defining the role of the Escherichia coli chaperone SecB using comparative proteomics. J. Biol. Chem. 281:10024-34.

Marani P.*, Wagner S.*, Baars L., Genevaux P., de Gier J.W., Nilsson I., Casadio R., von Heijne G. (2006). New Escherichia coli outer membrane proteins identified through prediction and experimental verification. Protein Sci. 15:884-9.

* authors contributed equally

Contents

Abstract ...... v

List of Publications ...... vi Primary publications ...... vi Additional publications...... vi

Abbreviations ...... x

1 Introduction ...... 1 Biological membranes ...... 2 Definition and classification of membrane proteins...... 3 Biogenesis of membrane proteins...... 3 Targeting to the cytoplasmic membrane...... 4 Integration into the cytoplasmic membrane...... 4 Assembly and folding in the membrane...... 6 How membranes shape protein structure...... 7 Quality control of membrane proteins ...... 9 Biogenesis of secretory proteins ...... 11 Folding of cytoplasmic proteins & heat shock response...... 11 Folding of cytoplasmic proteins ...... 12 Heat shock response ...... 13 Recovery and degradation of aggregated proteins ...... 15 Host systems for the overexpression of proteins ...... 17 Protein overexpression in E. coli...... 17 Promoter systems ...... 19 T7 RNA polymerase based expression ...... 21 Metabolic burden of protein overexpression...... 21 Inclusion body formation and how to prevent it ...... 23 Targeted/localized overexpression ...... 25 Membrane protein overexpression in E. coli...... 26 Which properties make a well expressing membrane protein?...... 27 Monitoring membrane protein overexpression ...... 27 Bottlenecks of membrane protein overexpression ...... 28 Heterologous expression ...... 30 Target protein optimization ...... 32 Host strain optimization...... 34

Optimization of expression conditions ...... 35

2 Methodology...... 37 Flow cytometry of bacterial cells ...... 37 Aggregate purification ...... 39 Preparation of inner and outer membranes ...... 39 2-dimensional polyacrylamide gel electrophoresis ...... 40 2-dimensional IEF/SDS polyacrylamide gel electrophoresis...... 41 2-dimensional blue native/SDS polyacrylamide gel electrophoresis...... 42 Image analysis of 2-dimensional polyacrylamide gels...... 43 Statistical analysis of 2-dimensional polyacrylamide gels ...... 45 Identification of proteins by MALDI-TOF and peptide mass fingerprinting...... 46 Red swap...... 48 Gene deletion...... 49 Promoter swap...... 49

3 Results & Discussion ...... 51 Paper I: Effects of SecE depletion on the inner and outer membrane proteomes of Escherichia coli...... 51

Paper II: Biogenesis of MalF and the MalFGK2 maltose transport complex in Escherichia coli requires YidC...... 54 Paper III: Consequences of membrane protein overexpression in Escherichia coli.....56 Paper IV: Tuning Escherichia coli for membrane protein overexpression...... 60 Future perspectives...... 63

Sammanfattning på Svenska ...... 64

Deutsche Zusammenfassung ...... 65

Acknowledgements ...... 66

References...... 68

Abbreviations

aa amino acids AAA+ ATPases associated with a variety of cellular activities ACA ε-amino-n-caproic acid ANOVA analysis of variance approx. approximately BN blue native bp base pairs CRP catabolite repressor protein DDM n-dodecyl-β-D-maltopyranoside ER endoplasmic reticulum FRT FLP recognition target GFP green fluorescent protein HSP heat shock protein IEF isoelectric focusing IM/CM inner membrane/cytoplasmic membrane I-CLiP intramembrane cleaving IPG immobilized pH gradient IPTG isopropyl-β-D-thiogalactoside m/z mass/charge ratio MALDI matrix assisted laser desorption/ionization MBP maltose binding protein MCS multiple cloning site MS mass spectrometry OM outer membrane PAGE polyacrylamide gel electrophoresis PMF peptide mass fingerprinting QC quality control RNC ribosome nascent chain complex SD Shine-Dalgarno site SDS sodium dodecyl sulfate SRP signal recognition particle Tat twin-arginine translocation TEV tobacco etch virus TMD transmembrane domain TOF time of flight

1 Introduction

If a living cell were an ancient city, the city wall would correspond to the greasy confining the cell. The lipids of the membrane represented the bricks that build up the wall. As a city wall is not only an enclosure that separates the inside from the outside, but contains gates al- lowing entry and trade, membranes have numerous installations dedicated to interaction with its environment. Those installations are usually membrane proteins in all its facets mediating e.g., uptake of nutrients, exchange of ions, transmitting signals and sensing stimuli. However, membrane protein func- tions are not limited to communication purposes; membranes also harbor devices for energy conversion, construction and maintenance1. It becomes obvious that biological membranes are much more complex than city walls: in bacteria, membrane proteins functioning in various capacities account for 75% of the membrane’s weight2. Due to the exposed location at the border of cells and a multitude of often essential functions, membrane proteins play key roles in disease and they constitute more than half of all known drug targets3. Thus, they are a very interesting and relevant object to study. The natural abundance of membrane proteins is in general too low to conveniently isolate sufficient material for functional and structural studies. Therefore, most membrane proteins have to be obtained through overexpres- sion. Escherichia coli is one of the most successful hosts for overexpression of recombinant proteins, and T7 RNA polymerase-based expression is the major approach to produce recombinant proteins in E. coli. While the pro- duction of soluble proteins is comparably straightforward, overexpression of membrane proteins remains a challenging task. The yield of membrane lo- calized recombinant membrane protein is usually low and inclusion body formation is a serious problem. Furthermore, membrane protein overexpres- sion is often toxic to the host cell. Although several reasons can be postu- lated, the basis of these difficulties is not completely understood. It is gener- ally believed, that the complex requirements of membrane protein biogene- sis significantly contribute to the difficulty of membrane protein overexpres- sion. Therefore, an understanding of membrane protein biogenesis is a prerequisite for understanding membrane protein overexpression and for designing rational strategies to improve overexpression yields. The objective of my Ph.D. studies has been to improve membrane protein overexpression in E. coli by a) understanding membrane protein overexpres- sion from the perspective of membrane protein biogenesis, b) systematically

1 investigating the physiological response to overexpression of membrane proteins and c) engineering strains that are optimized for membrane protein overexpression based on insights resulting from these studies.

In the first section of the introduction I will give an overview of biomem- branes, membrane proteins and protein folding. In the second section I will focus on protein overexpression in general and membrane protein overex- pression in E. coli in particular.

Biological membranes As the term ‘membrane proteins’ suggests, membrane proteins are associ- ated with biological membranes. A biological membrane is a ~50 Å thick bilayer consisting of a mosaic of lipids and proteins4. The idea of a “bimol- ecular layer of lipoids” was already born more than 80 years ago5, but it took several decades until it was generally accepted6. Aliphatic chains of lipid molecules of both leaflets of the bilayer face towards the core of the mem- brane while the polar headgroups are directed towards the aqueous phase surrounding the membrane7. In this way, the nonpolar fatty acid regions are sequestered away from contact with water, thereby maximizing hydrophobic interactions. Phospholipids are the major lipid component of biomembranes, complemented by sphingolipids, cholesterol and cardiolipin, depending on the type of membrane and function of the cell8. Minor lipids like phosphati- dylinositol often have critical cellular functions, for instance in signal trans- duction. Lipid bilayers are impermeable to most polar or charged solutes and permeable to nonpolar compounds, but proteinaceous transporters and chan- nels make membranes selectively permeable enabling the cell to regulate crossings of molecules through the membrane. Movement of membrane proteins and lipids from either one side of the bilayer to the other is re- stricted, which maintains a sidedness of the membrane2. For a long time it was believed that membrane proteins are dispersed and float around freely in the two-dimensional plane of a uniform membrane4. However, it became clear that the lateral movement of both lipids and pro- teins can be restricted and that regions of biased lipid and protein composi- tion are functionally relevant. Single-particle tracking revealed that crowd- ing, ectodomain collisions, transbilayer interactions, adhesion sites and cy- toskeletal structure limit the motion of proteins and lipids9. Furthermore, membrane proteins are often organized into functional complexes and super- complexes10, and lipid composition is not uniform throughout the mem- brane, in which rafts of lipids form areas of functional specialization11. These findings made a re-evaluation of the fluid mosaic model necessary, which now illustrates “that membranes are patchy, with segregated regions of structure and function, that lipid regions vary in thickness and composi-

2 tion, and that crowding and ectodomains limit exposure of lipids to the adja- cent aqueous regions” (from Ref. 12).

Definition and classification of membrane proteins Integral membrane proteins are empirically defined as proteins that cannot be extracted from the with alkaline buffer or high concentra- tions of urea or salt. They include monotopic proteins, which bind strongly to the membrane but do not span it, bitopic proteins, which span the mem- brane once and polytopic proteins, which span the membrane several times13 (Fig. 1). Further, monomeric and homo- and heterooligomeric membrane proteins and membrane protein complexes can be distinguished. There are two kinds of integral membrane proteins based on the membrane traversing secondary structure: α-helical transmembrane domain (TMD) proteins and β-barrel proteins. While α-helical membrane proteins populate all mem- branes, β-barrels are only located in bacterial outer membranes and the outer membranes of mitochondria and chloroplasts. In both pro- and eukaryotes 20-30% of all genes encode α-helical membrane proteins14. This thesis will mostly deal with α-helical membrane proteins and therefore, they will sub- sequently be referred to as ‘membrane proteins’. 15

Fig. 1. Classes of membrane pro- teins. Monotopic proteins bind strongly but do not span the mem- brane. Bitopic proteins span the membrane once. Polytopic proteins span the membrane several times. Adapted from Ref. 15.

Biogenesis of membrane proteins Biogenesis of membrane proteins is based on very conserved mechanisms with homologues of key components present in all kingdoms of life16. Here, I will only deal with the biogenesis of membrane proteins in E. coli as this organism has been the focus of my studies. Differences in membrane protein biogenesis between E. coli and eukaryotes and their implications for the heterologous overexpression of membrane proteins will be discussed in a dedicated chapter.

3 Targeting to the cytoplasmic membrane Ribosome/membrane protein nascent chain complexes (RNCs) are targeted in general in a co-translational fashion to the cytoplasmic or inner membrane (CM, IM) of E. coli via the Signal Recognition Particle (SRP)-pathway17 (Fig. 2). In E. coli, the SRP consists of a 4.5S RNA scaffold and the protein Ffh (fifty-four homologue), which is homologues to the 54 kDa protein of the eukaryotic SRP18. Upon emergence of a nascent membrane protein from the ribosome, one part of the SRP contacts the ribosome close to the poly- peptide exit tunnel while another binds the emerging transmembrane seg- ment19. Subsequently, the SRP-RNC associates with the SRP-receptor FtsY and the entire complex binds to the cytoplasmic membrane20. An intricate interplay of interactions of the SRP-RNC complex with the translocon and lipids and the concerted hydrolysis of two GTPs, one by the SRP and one by FtsY, triggers the transfer of the RNC to the translocon and the subsequent dissociation of the SRP and FtsY21. 22

Fig. 2. Membrane protein biogene- sis and degradation in E. coli. CM, cytoplasmic membrane; OM, outer membrane. Adapted from Ref. 22.

Integration into the cytoplasmic membrane The Sec-translocon is a protein conducting channel that facilitates both the translocation of hydrophilic polypeptide chains across the membrane and the insertion of transmembrane segments into the lipid bilayer23. In E. coli, the core of the translocon is composed of the subunits SecY, SecE and the non- essential component SecG17. The structure of an archean homologue of the Sec-translocon revealed that SecY resembles a clam-shell hinged by SecE24 (Fig. 3). The actual translocation channel lies at the center of SecY25, and has an hourglass shape with a ring of six isoleucines constituting a sealing constriction24. Under non-translocating conditions, the pore of the channel is

4 closed by a plug at the periplasmic side of the translocon26. During initiation of translocation, a signal sequence or transmembrane segment intercalates into the lateral gate at the mouth of the clam-shell and causes the plug to move toward the back of SecY27. It has been proposed that ‘molecular breathing’ of the translocon allows the sampling of transiting polypeptides by surrounding membrane lipids28. Sufficiently hydrophobic transmembrane segments or signal sequences of nascent polypeptides equilibrate with lipids and partition into the lipid bilayer through the lateral gate29, 30.

Fig. 3. Structure of the Sec-translocon (PDB 1RHZ)24. Left, side view. Indicated is the hydrophobic constriction sealing the channel during translocation and residues presumably involved in topology determination of TMDs. Residue numbers corre- spond to Sec61p in Saccharomyces cerevisiae31. Right, top view. Indicated is the ‘plug’ sealing the closed translocon and the clam-shell like structure including the putative exit for TMDs and signal sequences. The subunits of the translocon are shown in green & orange, the two halves of the α-subunit (SecY), blue, the γ- subunit (SecE) and yellow, the β-subunit (no homologue in E. coli). Adapted from Ref. 32, © Elsevier B.V.

The stoichiometry of the translocon is still a matter of debate. Dimers and even tetramers of the SecYEG complex have been observed by blue native polyacrylamide gel electrophoresis (PAGE) as well as cryo-electron micros- copy33-35. However, the active translocation pore is presumably a single Se- cYEG trimer, possibly utilizing another copy of the complex to increase the number of interaction sites with channel partners36, 37. The topology of membrane proteins, i.e., the orientation of transmem- brane segments across the membrane, follows the positive-inside rule, which describes that positive charges in ectodomains of membrane proteins are asymmetrically distributed, favoring the cytoplasmic side of the mem- brane38. It is not completely clear, how the positive inside rule is imple- mented. There are indications that interactions between charged residues in membrane proteins and the charge of lipid head groups play a role in topol- ogy determination39. The membrane potential in bacteria40 as well as specific residues in the Sec translocon of yeast may also be instrumental31 (Fig. 3).

5 The integration of polytopic membrane proteins proceeds in an orches- trated alternation of integration of transmembrane segments, translocation of periplasmic loops and elongation of cytoplasmic loops. The translocation of sizeable periplasmic loops requires the ATPase SecA, which is a peripheral subunit of the translocon41 (Fig. 2). SecA is the only component of the Sec targeting pathway of which the regulation of expression is known in detail. SecA translation is enhanced when the Sec-translocon dependent transloca- tion of the secretion monitor SecM is hampered42, 43. In addition to the common SRP/Sec targeting and translocation pathway, there are a number or proteins that utilize the essential inner membrane pro- tein YidC for their insertion into the membrane44, 45 (Fig. 2). YidC is a mem- ber of the conserved Oxa1/Alb3/YidC family, which comprises proteins involved in the biogenesis of membrane proteins in the cytoplasmic mem- brane of bacteria, the mitochondrial inner membrane and the thylakoid membrane in chloroplasts46, 47. Oxa1, Oxa2 and Alb3 can substitute for YidC in E. coli, and vice versa, YidC can substitute for Oxa1 and Oxa2 in S. cere- visiae, which indicates that Oxa1, Oxa2, Alb3 and YidC share a similar function48-50. It is still a matter of debate whether the co-translational target- ing of membrane proteins to the YidC insertase involves SRP or not51 (Fig. 2). So far, only a handful of inner membrane proteins that insert via the YidC-only pathway have been identified. All these proteins are small and do not contain more than two TMDs and sizable periplasmic domains. How- ever, it is not clear how YidC assists the integration of these proteins.

Assembly and folding in the membrane Partitioning of TMDs into the lipid bilayer is followed by folding of the membrane protein into its native conformation. TMDs may already associate during membrane integration within or in close proximity to the translo- con52. Sequential triage of TMDs from their initial entry site into multiple secondary sites within the translocon was proposed to provide a means to facilitate early folding events before release into the lipid bilayer53. The prevalence of helical hairpins, i.e., two closely spaced hydrophobic trans- membrane helices separated by a short turn54, as a structural element sug- gests that two adjacent TMDs may partition pair wise into the lipid bilayer or at least influence each other during integration55, 56. As the solved struc- ture of the translocon probably represents its closed state, it is not exactly clear yet, how many TMDs the active, ribosome-bound translocon can ac- commodate. It is conceivable that two copies of SecYEG associate front-to- front thereby creating a larger pore that can accommodate several helices at a time35. In E. coli, YidC has been identified as a factor that assists in the integra- tion, folding, and assembly of inner membrane proteins in concert with the Sec-translocon44, 45 (Fig. 2). During the biogenesis of SRP/Sec-translocon

6 dependent inner membrane proteins, YidC specifically interacts with the TMDs of these proteins (e.g.,52, 57-59). It has been suggested that YidC medi- ates the transfer of TMDs from the Sec-translocon into the lipid bilayer51. YidC could be co-purified with the Sec-translocon, which suggests a physi- cal connection59. It has been shown that the first TMDs of the nascent poly- topic inner membrane protein MtlA assemble simultaneously at the Sec/YidC insertion site52. Recently, docking of the projection maps of YidC and the SecYEG-translocon indicated that there is a potential contiguous space for integrating TMDs60, which could provide an environment where folding and assembly of TMDs can occur prior to their release into the lipid bilayer. It has been speculated that the ER localized TRAM protein has a similar function as YidC61. A recent in vitro study using lactose permease (LacY) reveals a novel function for YidC in the co-translational folding of this inner membrane protein rather than its insertion into the membrane62. The observation that YidC depletion leads to the induction of the Cpx and σE envelope stress responses, which both sense protein misfolding in the cell envelope63, also points to a role of YidC in the folding of inner membrane proteins64. It was further suggested that SecY is also involved in membrane protein folding. In certain secY mutants, LacY was found to be inserted into the membrane, but was misfolded and susceptible to proteolysis. These secY mutants exibited elevated membrane stress through the σE stress-response pathway as well. While YidC and SecY may facilitate the folding of TMDs, folding of soluble cytoplasmic domains may be supported by cytoplasmic chaperones like DnaK and folding of periplasmic domains by periplasmic chaperones, such as DegP (Fig. 2). Furthermore, the lipid composition of the membrane may also influence membrane protein folding65. After folding of individual subunits, complex formation of oligomeric membrane proteins follows. While there have been some attempts to study the sequence of assembly steps of selected complexes66, not much is known about factors involved in their formation. However, recently it could be shown that the YidC homologue Alb3 facilitates the insertion of the D1 pro- tein into the photosystem II complex67.

How membranes shape protein structure At this point, it would be appropriate to undertake a discussion of the envi- ronment within the lipid bilayer and how this influences the folding of mem- brane proteins. What does it mean to be a membrane protein in a lipid environment? Firstly, it means the absence of the hydrophobic effect, which drives the folding of soluble proteins68, and a particular importance of in- trachain hydrogen bonds as surrounding lipids cannot contribute to them69. These restraints early led to the conclusion that TMDs consist of hydropho- bic α-helices that maximize intrahelical hydrogen bonding and minimize

7 hydrophobic mismatches with its surrounding70. Conversely, unfolded poly- peptides are highly unfavorable within the membrane and it takes 80 kcal mol-1 to unfold an α-helix therein7. Based on the biochemical and structural knowledge of their time, Popot and Engelman proposed the ‘two-stage model’ for membrane protein folding in 199069. In this model, TMDs are seen as autonomous folding domains that accrete into a final structure after each of them “has reached thermodynamic equilibrium with the lipid and aqueous phases”. Notably, the model was based on the observation that TMDs were long hydrophobic α-helices that were oriented more or less per- pendicular to the membrane plane (as seen in bacterial photosynthetic reac- tion centers and bacteriorhodopsin71, 72). As von Heijne recently reviewed, 18 years and many membrane protein structures later “the picture is not so simple anymore. Membrane-embedded helices can be short, long, kinked or interrupted in the middle of the membrane, they can cross the membrane at oblique angles, lie flat on the surface of the membrane, or even span only a part of it and then turn back” 73. Consequently, the two-stage model does not hold completely true anymore. As discussed above, TMDs may not always partition individually into the lipid bilayer but instead may gather at the translocon or YidC before release. Furthermore, TMDs are not necessarily autonomous folding domains as adjacent helices can influence each other’s topology and membrane integration propensity74. Nonetheless, the basic questions addressed in this model remain of utmost importance: How are folded structures of membrane proteins maintained if general assumptions for soluble proteins do not apply? How is the entropy overcome that favors helix separation? Ectodomains of the membrane protein may stabilize the transmembrane helix bundle and links between helices will limit their possi- ble diffusion in the plane of the membrane. Hydrogen bonds and ion pairs can be used to drive the association of local regions of a pair of helices, in- teractions with prosthetic groups provide packing constraints and antiparallel association between adjacent helices is promoted by the helix dipoles69, 75. Furthermore, tight ‘knobs-into-holes’ packing has been found to be a com- mon characteristic of helix-helix interactions, allowing a more efficient packing between helices than between helices and lipids76. The GXXXG motif found in glycophorin A is the most prominent example of this cate- gory77. Just recently a grouping of all helix-helix interactions observed in existing membrane protein structures was undertaken78. This study promoted the development of computed α-helical peptides that can target specific TMDs79. In summary, the above mentioned interactions are so efficient that membrane proteins fragmented into smaller pieces assembly autonomously into functional proteins if co-expressed or combined after separate refold- ing80, 81.

8 In addition to being a two-dimensional solvent for membrane proteins comprising charged surfaces and a hydrophobic core, inherent bilayer prop- erties can play important roles for membrane protein folding, stability and structure. The shape mismatch of non-bilayer lipids, i.e., lipids with small head groups and a large radius of the hydrocarbon tails, creates a driving force for the bilayer leaflets to curve away from one another82. Forcing them into a bilayer creates a curvature frustration, which can be simplistically described as an overpacking in the hydrocarbon region and an underpacking in the headgroup region. The drive to relieve curvature elastic energy can shape membrane protein structure as shown for the β-barrel protein OmpA and α-helical bacteriorhodopsin83, 84. Different lipid compositions can alter the natural bilayer thickness. If the length of a hydrophobic TMD does not match the thickness of the hydrocar- bon core of the bilayer, a tension is created because of the drive to shield hydrophobic surfaces from water. This tension can be resolved by either tilting the TMD to submerge more completely in the membrane, or by re- cruitment of longer lipids to the mismatching membrane protein85. Recent experiments suggest that in native membranes, the thickness of the bilayer is dictated by membrane proteins rather than lipids86.

Quality control of membrane proteins Non-native membrane proteins, which arise as a result of mutations, errors during biosynthesis or cellular stress, can easily disrupt membranes and lead to cell death. As a protection, the cell possesses quality control (QC) mecha- nisms that detect and dispose of non-native membrane proteins. In the dena- tured state, membrane proteins retain considerable secondary structure, al- though tertiary and quaternary interactions might be disrupted87. Secondary structures are stabilized because intrachain hydrogen bonding is strongly favored in the hydrophobic core of the membrane (see above). The molecu- lar mechanism that determines the recognition of a non-native membrane protein is largely unknown15. In analogy to soluble proteins, misfolded membrane proteins might expose hydrophobic patches of their ectodomains to the aqueous environment, which subsequently could be recognized by chaperones. Within TMDs, recognition is not that obvious. Here, polar or charged residues exposed in the membrane core might be a signal for detec- tion. A different possibility is that QC mechanisms recognize non-native membrane proteins by their propensity to disrupt the membrane15. A mem- brane breach could be detected by e.g., a flow of ions or a redistribution of lipids. It is tempting to speculate that the phage shock response, which re- sponds to proton leakage and damaged membranes in E. coli88, is involved in sensing non-native membrane proteins.

9 In E. coli, mechanisms of QC have only recently been studied. The key factor appears to be the FtsH89 (Fig. 2). FtsH is attached to the bac- terial inner membrane by two N-terminally located TMDs. AAA+ ATPase and Zn2+-metalloprotease domains protrude into the cytoplasm. In vivo, the homo-hexameric FtsH acts in a holoenzyme complex together with its part- ners HflK, HflC and QmcA90-92. These accessory proteins of the holoenzyme complex modulate the proteolytic activity of FtsH. FtsH is conserved among bacteria as well as mitochondria and chloroplasts. Its homologues play e.g., critical roles in QC of mitochondrial inner membrane proteins93. Recently, cross-links between components of the FtsH holoenzyme and YidC were detected, which suggests a linked role for these proteins in quality control of newly inserted membrane proteins94. The processive dislocation and proteolysis reactions of FtsH, which are stimulated by the proton motive force, can be initiated from either end of a membrane protein if it protrudes sufficiently into the cytosol95, 96. The ATP- ase activity of FtsH is not required for proteolysis but for proper presentation of the substrate protein to the protease active site89. In case of membrane protein substrates, this may include dislocation of proteins from the mem- brane, which is energetically very costly and requires fairly strong forces97. HtpX, another membrane protease, is also involved in QC of membrane proteins in E. coli98, 99. Like FtsH, HtpX is a membrane-bound Zn2+- metalloprotease. However, HtpX does not have an ATPase activity and will only act against cytoplasmic regions of membrane protein substrates. This supports the notion that ATPase activity is essential for dislocation of TMDs. Nonetheless, FtsH and HtpX exhibit synthetic lethality, suggesting that both proteases either cooperate or have overlapping functions. htpX is part of the Cpx regulon98. The Cpx response is one of three systems in E. coli sensing stress in the cell envelope63. Notably, alkaline pH and misfolded periplasmic proteins and perturbations in the inner membrane are sensed by this two-component system. Induction of the Cpx response up-regulates genes whose products are involved in maintenance of the cell envelope such as periplasmic chaperones and proteases, including HtpX100. As mentioned above, the σE envelope stress response is also up-regulated upon membrane protein folding stress64. However, it is not yet known, how induction of this response contributes to the alleviation of perturbations of the cytoplasmic membrane. One of the bona fide substrates of FtsH and HtpX is the major translocon subunit SecY. SecY is rapidly degraded by these proteases in the absence of its complex partner SecE98, 101-103. FtsH also degrades the ATP synthase sub- 104 unit Foa when it fails to associate with its cognate complex . Notably, both substrates were identified upon their overexpression without co- overexpressing the respective complex partners. E. coli encodes a number of intramembrane cleaving proteases (I-CLiP) like the GlpG and the S2P metalloprotease RseP. These

10 proteases cleave within TMDs of specific target proteins involved in e.g., signal transduction. Even though GlpG and RseP exhibit a fairly wide sub- strate specificity, their role in quality control of membrane proteins has not been established yet105, 106.

Biogenesis of secretory proteins Secretory proteins are synthesized as pre-proteins containing a (cleavable) N-terminal signal sequence. These are 20-30 aa long extensions, comprising a positive charge at the N-terminus followed by a short hydrophobic stretch of 7-15 aa107. Unlike in eukaryotes, where secretory proteins are targeted co- translationally by the SRP to the Sec-translocon in the ER membrane108, bacteria engage mostly post-translational targeting, which is aided by the chaperone SecB and the ATPase SecA (Fig. 2). Binding of SecB to secretory proteins is the first step of post-translational targeting109. SecB is a tetrameric protein comprising an extended peptide binding groove110. It rec- ognizes stretches of ~9 residues favoring aromatic and basic amino acids111. SecB maintains secretory proteins in a translocation-competent state by keeping them in an unfolded conformation112. SecA interacts with SecB and accepts the chaperoned polypeptide, probably binding both the signal se- quence and the subsequent segment109. Thereafter, the SecA-pre-protein complex binds to the membrane and docks onto SecYEG. Membrane bind- ing of SecA relies on a low affinity to acidic phospholipids but a high affin- ity to the Sec-translocon113. The stoichiometry of active translocon-bound SecA is still controversial: some experiments indicate that it functions as a monomer during translocation, others suggest the importance of a dimeric form114. The delivery of secretory proteins to the translocon requires a full cycle of ATP hydrolysis by SecA37. Once inserted into the channel, the pre- protein is translocated by an ATP-fueled pushing mechanism115. Transloca- tion is supported by an electrochemical gradient across the membrane116. An accessory hetero-trimeric complex composed of the integral inner membrane proteins SecD, SecF and YajC presumably stimulates transloca- tion and supports subsequent release of translocated substrate proteins from the membrane117, 118. Once translocated, the signal peptide is usually cleaved off by leader peptidase yielding the mature protein119 (Fig. 2).

Folding of cytoplasmic proteins & heat shock response Membrane protein overexpression has consequences for the physiology of the entire organism, and in particular for the folding of cytoplasmic proteins.

11 Therefore, I will also discuss protein folding in the cytoplasm of E. coli and the cell’s reaction to folding stress: the so called heat shock response.

Folding of cytoplasmic proteins The native structure of a protein is encoded in its amino acid sequence. An- finsen’s proof that proteins can fold unassisted was a break-through in bio- chemistry120. However, within the crowded environment of the cell, folding of many proteins requires the assistance of so-called molecular chaperones. Chaperones are defined as “proteins that help the folding of other proteins, usually through cycles of binding and release, without forming part of their final native structure” (from Ref. 121). Two modes can be distinguished in the chaperone assisted folding path- way of cytoplasmic proteins in E. coli: co-translational folding, which in- volves mainly the ribosome bound chaperone trigger factor (TF), and post- translational folding mediated by the Hsp60 chaperonin GroEL and its co- chaperone GroES121 (Fig. 4). However, this division may not be particularly strict, as it has been shown that GroEL/S can accept substrates co- translationally for subsequent post-translational completion of folding122, 123. The Hsp70 chaperone system comprising DnaK, DnaJ and GrpE can engage in both modes of folding.

Fig. 4. Chaperone assisted protein folding in the E. coli cytosol. Co-translational fold- ing is supported by TF and DnaK/J. Post-translational folding is concertedly chaper- oned by DnaK/J and GroEL/S. Adapted from Ref. 121.

TF is the first chaperone nascent polypeptides meet when they emerge from the ribosome124. It sits at the polypeptide exit tunnel of the ribosome and forms a ‘molecular cradle’ that shields hydrophobic stretches of nascent polypeptides from proteases and aggregation125, 126.

12 The ATP-dependent chaperone DnaK promotes folding by transiently binding to short hydrophobic segments of newly synthesized proteins and thereby protecting them from aggregation and folding into non-native con- formations. DnaK does so by switching between two states: it has a low affinity and fast substrate exchange rates in its ATP bound state and high affinity but slow substrate exchange rates if ADP is bound127. ATP hydroly- sis of DnaK is stimulated by binding of its co-chaperone DnaJ whereas GrpE promotes nucleotide exchange128. Viability of TF/dnaK double mutants is severely compromised, which can be partially rescued by overexpression of GroEL or SecB129-131. Interestingly, the functional cooperation of TF and DnaK in co-translational folding improves folding yield but markedly delays folding relative to translation132. The chaperonin GroEL builds up a 14-mer barrel-like structure compris- ing a lid of heptameric GroES133. Cycles of ATP hydrolysis open and close the barrel triggering substrate capture and release134. Additionally, binding of ATP stimulates the unfolding of bound substrate proteins before release into the cavity135, 136, and binding of ATP and GroES induce a structural conver- sion of the inner GroEL surface from hydrophobic to hydrophilic that proba- bly facilitates folding of substrate proteins. Folding of encapsulated sub- strates is promoted by preventing non-productive interactions with other unfolded polypeptides and by geometric confinement within the GroEL bar- rel, which improves the protein’s folding landscape137. Two proteome wide hunts for GroEL substrates have yielded ~250 proteins, of which ~85 exhibit an obligate dependence on GroEL for successful folding138, 139. Due to spatial constraints, only substrates up to ~60 kDa can be encapsulated by GroEL. However, folding of larger proteins might be promoted by interactions of unfolded polypeptides with hydrophobic surfaces of the open GroEL barrel and by the above mentioned substrate unfolding upon binding of ATP135, 136.

Heat shock response The heat shock response is probably the most critical stress response for the overexpression of recombinant proteins in E. coli and other bacteria as it deals with a key issue of protein production, namely, the folding and aggre- gation of overexpressed proteins. In E. coli, regulation of the heat shock response is mediated by rpoH, which encodes the σ32 transcription factor140. Sigma factors are subunits of bacterial RNA polymerases accounting for their promoter specificity141. Besides σ32, other sigma factors include σ70, regulating the transcription of housekeeping genes, σE, involved in the extracytoplasmic stress response and σS, which is responsible for transcription of genes required for stationary phase survival141. Three phases of the heat shock response can be distinguished: induction, adaptation and shut-off phase. Upon induction, heat shock proteins (HSPs)

13 are rapidly expressed to very high levels. These levels decrease somewhat during the adaptation phase to reach a new steady-state level140. Relief of the stress signal leads to the shut off of the heat shock response by degradation of σ32 and repression of its activity.

Fig. 5. Regulation of the heat shock response. Expression of rpoH is regulated at the levels of transcription and translation. Binding of DnaK sequesters σ32 and pro- motes its degradation by FtsH, thereby regulating σ32 activity and stability. An in- creased number of misfolded proteins titrates DnaK resulting in free, active and stable σ32 that can promote transcription of the heat shock regulon. Heat shock in- duced accumulation of DnaK reduces the number of free σ32, which leads to a new steady state level and eventually shuts off the heat shock response.

The heat shock response is regulated by controlling transcription, transla- tion, activity and stability of the heat shock sigma factor σ32 140 (Fig. 5). Transcriptional regulation of the σ32 gene rpoH has only a minor effect on heat shock induction as rpoH is mainly expressed from promoters depending on the housekeeping sigma factor σ70. However, at very high temperatures, the extracytoplasmic stress sigma factor σE can induce rpoH expression142. A temperature upshift from 30°C to 42°C can increase the translation of σ32 more than 10-fold. This is accomplished by the thermal melting of a σ32 mRNA secondary structure at the translation initiation region, which facili- tates ribosome entry and active translation143. However, the major regulation of the heat shock response occurs on the level of σ32 activity and stability, and is mediated by DnaK (Fig. 5). In this model of regulation, which has been coined the ‘homeostatic regulation model’ 140, misfolded proteins and

14 σ32 compete for binding to DnaK, with the DnaK– σ32 complex inactive in transcription. In addition, DnaK binding to σ32 facilitates σ32 degradation presumably by FtsH144. When the ratio of misfolded proteins to DnaK is low, the transcription inactive DnaK– σ32 complex predominates and σ32 is rapidly degraded, resulting in only basal expression of HSPs. However, when the ratio of misfolded proteins to DnaK is increased due to folding stress, DnaK is titrated away from σ32; the active, stable, chaperone-free state of σ32 predominates; and the heat shock response is induced. HSP- mediated refolding or degradation of misfolded proteins relieves the induc- ing signal and frees the DnaK system to shut off the heat shock response. Recently it has been suggested that also GroEL/S is involved in this mecha- nism of regulation145. Finally, regulation of the heat shock response includes competition of σ32 with σ70. The latter is more abundant at 30°C and therefore prevents binding of σ32 to the RNA polymerase. σ70 tends to aggregate at high temperatures leading to an increased ability of σ32 to bind to RNA polymerase146. The heat shock regulon comprises at least 51 promoters that drive the ex- pression of 49 transcriptional units147. Target genes of the heat shock regulon encode proteins in all compartments of the bacterial cell except the outer membrane147. Among these target genes are chaperones involved in protein folding (e.g., DnaK/J/E and GroEL/S), proteins involved in inclusion body formation and disaggregation (e.g., ClpB and IbpA/B) as well as proteases responsible for the cleanup of misfolded proteins (e.g., ClpAP, HslUV and Lon)147. Interestingly, this study revealed, that 25% of known regulon mem- bers reside in the cytoplasmic membrane of E. coli147. This raises the ques- tion whether the heat shock response has a yet underestimated function to preserve the membrane upon heat and protein folding stress. In summary, induction of the heat shock response increases the protein folding capacity of the cell, prevents misfolding and aggregation and facili- tates disaggregation, refolding or degradation of already aggregated proteins.

Recovery and degradation of aggregated proteins Aggregation of proteins occurs when massive protein misfolding exceeds the buffering capacity of the cell’s protein folding systems, e.g., upon severe heat stress or upon overexpression of recombinant proteins. In the latter case it is most often referred to as ‘inclusion body formation’ 148. Aggregation of proteins is often toxic to cells149. Several factors may contribute to reduced viability, including an intrinsic toxicity of aggregated proteins and a net loss of active (essential) proteins. Upon heat shock at least, the latter aspect ap- pears to be more important as refolding of disaggregated proteins and not just their removal and degradation is essential for cell survival and thermo- tolerance150.

15 In E. coli, aggregates usually contain large amounts of the small HSPs IbpA and IbpB151. IbpA/B are hardly expressed under non-stressed condi- tions but are strongly induced upon thermal or overexpression stress; there- fore, they are a good marker for aggregate formation within the cell152, 153. While IbpA/B are not involved in aggregation prevention152, they keep mis- folded proteins in a recovery competent state, accelerate disaggregation and constitute a functional triade together with ClpB and DnaK154. This is cor- roborated by the fact that ibpA/B mutants are viable while a triple knock out of ibpA/B and dnaK is synthetically lethal154. Furthermore, IbpA/B may sup- press degradation of aggregated proteins in favor of their recovery155. The major heat shock proteins required for disaggregation are DnaK and ClpB152. ClpB is a hexameric AAA+ ATPase that unfolds aggregated pro- teins by threading them through its central pore in an ATP dependent man- ner150. DnaK is presumably required for initial substrate unfolding processes, helping ClpB to extract polypeptides from aggregates150. ClpB and DnaK act synergistically to achieve disaggregation152, and a double knock out of clpB and dnaK is synthetically lethal156. Refolding of disaggregated proteins is supported by DnaK and GroEL157. As mentioned above, reactivation of disaggregated material is crucial for cell survival at high temperatures and thermotolerance, but degradation of terminally misfolded or mistargeted protein is essential as well158, 159. E. coli expresses a number of ATP-dependent cytoplasmic proteases or protease complexes, namely HslUV, ClpAP, ClpXP, Lon and FtsH. These proteases are somewhat redundant, however, deletion of two or more results in tem- perature sensitivity158. All of these proteases comprise an ATP-independent protease activity and an ATPase responsible for substrate unfolding and feeding of the protease160. In case of HslUV, ClpAP and ClpXP, the activi- ties reside on separate polypeptides while Lon and FtsH contain both activi- ties in a single . Substrate specificity, if pronounced, is maintained by the ATPase subunit or by specialized adapter proteins recognizing certain signals at the N- or C-terminus of a target protein160. With the exception of FtsH, these proteases are induced upon heat shock. While FtsH and HslUV have a fundamental role in the homeostasis of σ32 158, 161, Lon and to a minor extent ClpXP are responsible for degradation of mis- folded proteins, in particular in cells with limiting amounts of DnaK159. In contrast to ClpB, these proteases do not seem to require DnaK for disaggre- gation prior to degradation159.

After having reviewed the principles of biological membranes, membrane proteins and protein folding and degradation, I will proceed with the specific issue of protein overexpression and membrane protein overexpression in E. coli in particular.

16 Host systems for the overexpression of proteins Protein overexpression can be carried out in many different hosts or expres- sion systems. Bacterial overexpression systems, in particular E. coli, provide the benefits of inexpensiveness, fast growth, easy genetics, scalability and high expression. On the downside, bacteria express many proteases that are a danger to the overexpressed material, inclusion body formation can be cum- bersome, and the overexpression of eukaryotic proteins suffers from the lack of post-translational modifications, a different codon usage and overly rapid kinetics of transcription, translation and, if applicable, targeting. Overex- pression in different yeast species like S. cerevisiae and Pichia pastoris is also inexpensive, relatively easy and safe, however, a thick cell wall makes lysis difficult and post-translational modifications exist but are not mammal- ian-like. In contrast, the use of mammalian expression systems is usually expensive, time-consuming and difficult to scale up while providing optimal conditions for the expression of mammalian proteins themselves. Recently, the use of cell-free expression systems has become more popular162. It has the advantage of high yields in small volumes, highly controllable expres- sion conditions and no toxicity to a host. However, cell free expression sys- tems are fairly expensive and also difficult to scale up. As all of my work has been done in E. coli, I will focus on this organism in the following section.

Protein overexpression in E. coli E. coli is among the most successful hosts for overexpression of recombi- nant proteins. As mentioned above, overexpression in E. coli offers many advantages, including growth on inexpensive media, straight forward genet- ics, a large number of mutant host strains and expression vectors, rapid bio- mass accumulation, amenability to high cell-density fermentations and sim- ple process scale-up163-166. Due to these advantages, E. coli is also the major workhorse for recent structural genomics efforts167. Usually, recombinant proteins are expressed from medium to high copy number plasmids (15-60 copies per cell) based on a ColE1 derived origin of replication164. Expression plasmids are stabilized by the use of an antibiotic resistance marker, conferring resistance to e.g., ampicillin or kanamycin. However, a high recombinant gene dosage can also be achieved by expres- sion from the bacterial chromosome. By this strategy, the use of antibiotics can be avoided, albeit, by the cost of inflexibility when the overexpression of many different proteins is desired164.

17

18 Fig. 6. Promoter systems for the overexpression of recombinant proteins in E. coli. A, the lac promoter region indicating the two operators, O1 & O3, the CRP , the -10 and -35 promoter regions as well as the Shine-Dalgarno site (SD) and the start codon. The bold ‘A’ depicts the transcription start. B, the arabi- nose inducible pBAD system. The bi-functional AraC dimer represses transcription by binding to the regulatory elements O2 and I1 in the absence of arabinose and promotes transcription by binding to I1 and I2 in the presence of arabinose. Binding of CRP-cAMP enhances transcription. C, the rhamnose inducible pRha system. In the presence of rhamnose, RhaR activates transcription of rhaR and rhaS, resulting in an accumulation of RhaS. RhaS in turn acts as the rhamnose-dependent positive regulator of the rhaBAD promoter. Binding of CRP-cAMP enhances transcription. D, the IPTG inducible T7 RNA polymerase system. T7RNAP transcription is con- trolled by the catabolite insensitive lacUV5 promoter. The lac repressor represses transcription from the lacUV5 promoter in the absence of IPTG by binding to the lac operator O3 and O1. Correspondingly, it represses transcription from the T7lac promoter on the expression plasmid. IPTG bound lac repressor dissociates from the lac operator. This has two consequences: i) transcription of T7RNAP by E. coli RNAP from the chromosomal lacUV5 promoter. ii) transcription of the target pro- tein by T7RNAP from the T7lac promoter. T7RNAP activity can be dampened by co-expression of T7Lys from plasmids pLysS or pLysE.

Promoter systems In E. coli, promoters are located 10-100 bp upstream of the ribosome- binding site. Promoters are usually under the control of a regulatory protein, which is encoded on the host chromosome or on the plasmid itself. Promot- ers of E. coli consist of two hexanucleotide sequences 35 bp (-35 region) and 10 bp (-10 region) upstream of the transcription initiation site, respec- tively168. The two regions are separated by a short spacer of 16-18 nucleo- tides and may be supplemented by an upstream element, which enhances transcription166. Downstream of the promoter and within the ribosome bind- ing site is the Shine-Dalgarno (SD) sequence, which is complementary to the 3’ end of the 16S rRNA and required for initiation of translation. The SD sequence lies 4-14 bp upstream of the start codon with 8 bp being the opti- mal spacing for efficient initiation of translation169 (Fig. 6A). A useful promoter exhibits several desirable features: it is strong and ide- ally, the strength of protein expression is adjustable, it has a low basal ex- pression level (i.e., it is tighly regulated), it works strain independent and its induction is simple and cost-effective166. A strong promoter is required for overexpression of e.g., soluble proteins, as expression should result in the accumulation of recombinant protein constituting up to 50% of the total cellular protein. However, an adjustable promoter strength is desired if ex- pression of a protein is more difficult, e.g., as it is for membrane proteins. Leaky basal expression without inducer can compromise growth and result

19 in plasmid instability or loss of protein production. Therefore, tight regula- tion is essential for the production of proteins that are toxic to the host166. A multitude of different promoters has been employed for the overexpres- sion of proteins. Induction can be thermally, by the addition of small mole- cules, or even by changes in pH or osmolarity of the culture medium166. The most common promoters used for recombinant protein expression are de- rived from operons of sugar utilization systems, namely from lactose, arabi- nose and recently also rhamnose operons. All these promoters are subject to catabolite repression, which means that transcription can be repressed by the addition of glucose. The absence of glucose leads to a rise in cAMP levels. cAMP bound to catabolite repressor protein (CRP) activates transcription by binding to a site near the promoter and stabilizing the promoter-RNA poly- merase complex170. The often used lacUV5 promoter is a catabolite- insensitive lac promoter mutant that has a high affinity for E. coli RNA po- lymerase171. Transcription from the lac promoter is controlled by the lac operator. This is a nearly symmetric sequence, to which the lac repressor binds172. Binding of allolactose or the non-hydrolysable analog IPTG to the lac rep- ressor leads to a conformational change in the protein and consequently a release of repression173. To enhance repression, the lacIQ allele is often used. A single nucleotide mutation in the -35 region of the lacI promoter leads to a more than five-fold increase in the number of lac repressor molecules and hence a stronger repression, which is sufficient to control expression from multi-copy number plasmids174. The transcriptional strenght of lac promoters induced with IPTG can be adjusted if mutant hosts of the lactose permease LacY are used175. The arabinose inducible araBAD promoters were introduced to provide a tight regulation of expression in the presence of glucose and a better adjust- ment of expression intensity176 (Fig. 6B). However, araBAD promoters are not particularly strong. It turned out that autoinduction of arabinose per- meases results in an all-or-nothing expression phenotype at low inducer con- centrations177. This problem can be overcome by driving expression of ara- binose permeases from constitutive promoters178. Expression from araBAD promoters is controlled by the bi-functional AraC protein, which can repress transcription in the absence of arabinose and promote transcription in its presence179. A combination of tight regulation, high-level induction and the possibility to adjust transcription intensity is provided by recently introduced expres- sion systems based on rhamnose promoters180 (Fig. 6C). In the presence of L-rhamnose, the RhaR protein activates transcription of rhaR and rhaS, re- sulting in an accumulation of RhaS. RhaS then acts as the L-rhamnose- dependent positive regulator of the rhaBAD promoter181.

20 T7 RNA polymerase based expression T7 RNA polymerase (T7RNAP) based expression is the major approach to produce recombinant proteins in E. coli. In 2003, it accounted for more than 90% of protein productions referred to in PDB165. T7RNAP recognizes an exceptionally long (20 bp) promoter sequence that is not naturally occurring within the E. coli genome182. Recombinant proteins are usually expressed from pET system vectors where the target gene is positioned downstream of a T7lac hybrid promoter consisting of the strong T7 φ10 promoter and a lac operator183. This allows induction of transcription by addition of IPTG (see above). Expression of target proteins requires host strains encoding T7RNAP. In E. coli BL21(DE3) and its derivatives, T7RNAP is expressed from the λ- lysogen DE3, and its expression is under the control of the IPTG-inducible lacUV5 promoter (Fig. 6D). Furthermore, strains are available where T7RNAP is expressed from within the chromosomal lac operon or from araBAD promoters184, 185. Leaky expression of T7RNAP and hence also the target protein is a seri- ous problem when employing T7RNAP based systems, in particular if the target protein is toxic to the host cell. Leakiness is slightly reduced by the use of above mentioned T7lac hybrid promoters. Unfortunately, the addition of glucose to the culture medium is hardly relieving the problem due to the catabolite-insensitive lacUV5 promoter driving T7RNAP expression. T7RNAP activity can be dampened by the specific inhibitor T7 lysozyme (T7Lys)186. Therefore, co-expression of T7Lys is often used to allow the transformation of host cells with vectors encoding toxic recombinant pro- teins187. In this set-up, T7Lys is expressed from the pACYC based plasmids pLysS or pLysE, which are not in the same incompatibility group as the pBR322 derived pET vectors and hence can be stably maintained in the same cell (Fig. 6D). pLysS and pLysE differ in the expression intensity of T7Lys: low expression leads to weak inhibition of T7RNAP using pLysS and high expression results in strong T7RNAP inhibition using pLysE.

Metabolic burden of protein overexpression Maintenance of high copy number plasmids and high-level expression of recombinant proteins often induce a multiplicity of cellular stress responses within the host188. Such stress responses resemble environmental stress situa- tions such as heat shock, amino acid depletion and starvation. The strong overexpression of target proteins from T7 promoters poses a significant metabolic burden onto host cells. In this context it is notable, that T7RNAP transcribes DNA with an transcription rate 8 times greater than E. coli RNA polymerase189.

21 The ‘metabolic burden’ describes the difference in growth rate between expressing and non-expressing, not plasmid containing cells and is defined as “the amount of resources (raw material and energy) that are withdrawn from the host metabolism for maintenance and expression of foreign DNA and protein” 190. Generally, the specific growth rate of cells expressing a product correlates inversely with the rate of recombinant protein synthe- sis188. Consequently, the expression of recombinant proteins usually results in impaired growth rates and reduces formation of biomass. Protein synthesis is the most energy demanding cellular process. To meet the increased energy requirements of protein production, the host adjusts catabolism and anabolism. It has been observed that overexpressing cells increase the flux through glycolysis and TCA cycle by regulating enzymatic activities and expression191, 192. Furthermore, alternative pathways of sub- strate-level phosphorylation are often promoted. This less efficient way of energy generation frequently leads to the accumulation of cell-toxic levels of acetate in the culture medium193. High fluxes into energy-generating path- ways upon expression of recombinant proteins are coupled to a reduced sup- ply of precursors for biomass formation191. Under normal growth conditions, components of the protein producing system, e.g., ribosomal proteins, are expressed in balance with the cellular needs, i.e., a close regulation by growth rate is established. However, upon protein overexpression, this regu- lation is misbalanced and these components are often not sufficiently sup- plied192. In general, it seems that the expression of house-keeping genes is inversely correlated with the overexpression of recombinant protein194. The drain of precursors towards recombinant protein production can re- sult in amino acid starvation and the induction of the stringent response. The stringent response is mediated by accumulation of the alarmone ppGpp and it involves an extensive reprogramming of gene expression including a down-regulation of many genes involved in transcription and translation as well as in amino acid biosynthesis195. Another potential problem of T7 based overexpression is related to the coupling of transcription and translation. In E. coli, the average rate of tran- scription matches that of translation, and both processes are often tightly coupled. Synchronization of transcription and translation is important for the stability of at least some nascent bacterial mRNAs196. T7RNAP uncouples these processes by its fast transcription. This results in naked mRNA stretches that are prone to degradation by RNase E and thus reduces protein expression yields196. Consistent with this observation, the use of a mutant T7RNAP that transcribes three times more slowly than the wild type enzyme has been shown to yield more overexpressed protein197. Besides the effect on transcript instability, abundant free mRNA resulting from overexpression may lead to destruction of ribosomal RNAs and thus compromise host cell viability198.

22 Inclusion body formation and how to prevent it Protein overexpression overwhelms the folding machinery of the host with a high load of nascent recombinant protein. An insufficient folding capacity easily results in the aggregation of recombinant protein in inclusion bodies and the induction of the heat shock response (see above). The likelihood of misfolding is increased by the routine use of strong promoters and high in- ducer concentrations163. In particular, eukaryotic proteins are prone to mis- folding as they often possess complex tertiary and quaternary structures and require the formation of multiple disulfide bonds or other post-translational modifications to reach their native, biologically active conformation163. A statistical analysis on the molecular characteristics determining the propen- sity of inclusion body formation yielded six relevant parameters: charge average, turn-forming residue fraction, cysteine fraction, proline fraction, hydrophilicity and total number of residues. While the first two parameters are strongly correlated with inclusion body formation, the last four parame- ters show a weak correlation199. Inclusion bodies can accumulate in the cytoplasm or in the periplasm of E. coli, depending on the target localization of the overexpressed protein. Typically, the overexpressed protein accounts for 80-95% of the aggregated protein and is contaminated by outer membrane proteins, ribosomal compo- nents and small amounts of phospholipids and nucleic acids200. In addition to the major chaperones DnaK and GroEL, IbpA/B are often associated to in- clusion bodies151, 201, 202. Inclusion bodies are porous structures that appear rather amorphous if examined by electron microscopy. However, recent experiments suggest that inclusion bodies exhibit amyloid-like properties with a high level of organized intermolecular β-sheet structure203. If this holds true, it would also suggest that aggregate formation is a specific, se- quence dependent process, comparable to amyloid fibril formation, rather than a non-specific association of hydrophobic surfaces in folding interme- diates204. Because inclusion bodies are largely resistant to proteolysis and contain substantial amounts of relatively pure material, overexpression into inclusion bodies is often exploited for the production of proteins that are toxic, unsta- ble or easy to refold163. General strategies for refolding of recombinant pro- tein have been established205, however, it is to be considered, that refolding is not always successful and is only amenable for soluble proteins and β- barrel membrane proteins. Even the best protocol only refolds a small frac- tion of the input protein, and it is difficult to purify the refolded protein167. Furthermore, as usually only successful cases are reported, the success rate of refolding is significantly overestimated164. Besides the aggregation of recombinant proteins in non-native structures, it has been observed, that inclusion bodies can contain a fraction of protein in its native and biologically active conformation148. However, strictly spo-

23 ken these aggregates should be referred to as precipitates rather than inclu- sion bodies204. Alleviation of protein misfolding and prevention of inclusion body forma- tion is often achieved by reducing the rate of transcription and translation. This can be accomplished by the use of weaker promoters, lower inducer concentrations or the expression at lower temperatures. The latter approach has the additional advantage that it reduces the strength of interactions be- tween hydrophobic groups that contribute to protein misfolding163 and en- hances the quality of the overexpressed material206. Other approaches that have been employed to minimize the formation of inclusion bodies include engineering of the target protein, growth of cells in the presence of osmo- lytes like sorbitol and glycine betaine, alteration of the pH of the culture medium and the use of fusion partners (e.g., thioredoxin, maltose binding protein (MBP) and glutathione S- (GST))166. The most probable reason for improved folding of C-terminally fused passenger proteins is that the fusion partner efficiently achieves a native conformation, thereby pro- moting the acquisition of correct structure in downstream folding units by favoring on-pathway isomerization reactions164. Proper in vivo folding of recombinant cytoplasmic proteins can be pro- moted by co-expression of chaperones, typically from co-transformed plas- mids carrying several chaperones with synergistic effects. Commonly used are TF, the DnaK and GroEL chaperone systems as well as ClpB207, 208. The choice depends on whether the recombinant protein requires folding assis- tance in earlier (TF, DnaK) or later (GroEL) folding steps, or both. Co-expression of chaperones has also been explored for membrane pro- tein overexpression. As most membrane proteins engage a co-translational targeting pathway to the inner membrane (see above), the usefulness of co- expressing cytoplasmic chaperones is probably limited to the folding of ectodomains of membrane proteins or the palliation of side effects like pro- tein aggregation. However, Wang and co-workers showed that overexpres- sion levels of the magnesium transporter CorA could be improved by the co- expression of the DnaK system209. Recently, the structure of CorA was solved210, 211. The CorA monomer consists of a large N-terminal cytoplasmic domain and two transmembrane segments at the very C-terminus. The archi- tecture of CorA suggests that unlike most other membrane proteins, it is targeted post-translationally to the inner membrane. Thus, it is probable that DnaK is involved in the targeting and folding of CorA and it could explain why co-expressing DnaK improves CorA overexpression yields. In vitro, GroEL has been shown to support the integration of some mem- brane proteins into liposomes212. However, these experiments have only limited value for the enhancement of membrane protein overexpression in vivo as the system used was very artificial.

24 Targeted/localized overexpression Recombinant proteins can be directed to five different compartments of the cell, namely the cytoplasm, the inner membrane, the periplasm, the outer membrane and into the culture medium (Fig. 7). I will briefly discuss the potential of periplasmic expression and extracellular secretion in this sec- tion, while overexpression into the inner membrane will be presented in detail at a later point. Overexpression into the outer membrane has hardly been exploited for protein production purposes; therefore, I will refrain from discussing it here. Overexpression into the cytoplasm of E. coli is not suitable for all recombinant proteins. In particular, many eukaryotic proteins require the formation of disulfide bonds to reach their native conformation, which is not supported in the reducing environment of the E. coli cytoplasm. Albeit this can be circumvented by the use of e.g., thioredoxin reductase (trxB) mutant strains that create a more oxidizing milieu in the cytoplasm213, secretion of recombinant proteins into the periplasm is often more successful. The periplasm is conducive to disulfide bond formation because of the presence of the Dsb machinery214. Additional advantages of periplasmic expression include the presence of fewer proteases and the option of selective release of the overexpressed material by lysozyme treatment and osmotic shock166. In this way, purification is greatly facilitated. To target recombinant proteins in a Sec-dependent fashion beyond the inner membrane, the use of an N- terminal signal sequence is required. Commonly used are signal sequences from endogenous outer membrane and periplasmic proteins, e.g., from OmpA, OppA, PhoA, or MBP. The authentic N-terminus of the recombinant protein is obtained after removal of the signal sequence by leader peptidase (see above). Periplasmic expression can be limited by the capacity of the Sec-targeting and translocation machinery. However, the translocation of secretory proteins appears to be much faster and more efficient than the integration of membrane proteins into the inner membrane. Co-expression of targeting components can alleviate this bottleneck as well as co-expression of periplasmic chaperones may supports the folding of overexpressed mate- rial166.

Fig. 7. Protein overexpres- sion into the different com- partments of E. coli. Un- folded proteins are exported via the Sec pathway; folded proteins are exported via the Tat pathway.

25 The potential of the bacterial twin-arginine translocation (Tat) system for periplasmic expression has not yet been fully explored215. The Tat transloca- tion machinery recognizes signal sequences containing a twin-arginine mo- tive and only allows passage of folded substrate proteins. Thus, this system combines cytoplasmic folding and periplasmic localization of the end prod- uct, and it could be a valuable tool for the secretion of recombinant proteins that assume a folded structure before reaching the Sec machinery. However, the Tat- is easily saturated and proton leakage during the translo- cation process may be a serious problem215. Co-expression of PspA im- proves the secretory capacity of the Tat-machinery, possibly by sealing leaky membranes216. Recombinant proteins can also be targeted to the culture medium, which presents the benefits of very low levels of proteolysis and simple purification as there are hardly contaminant proteins in the medium. However, the high dilution of the target protein is a disadvantage. Unfortunately, E. coli usually does not support extracellular secretion166. This problem is often solved by the induction of limited leakage of the outer membrane, which, however, is not beneficial for the viability of host cells166. Recently, it was shown that the E. coli lipoprotein YebF is secreted into the culture medium by a hitherto unknown mechanism217. A YebF-fusion approach was exploited to direct the extracellular production of recombinant proteins. Another strategy to induce extracellular secretion in E. coli is the utilization of secretion pathways in- volved in pathogenicity. A prominent example is hemolysin, which has been used for the construction of secreted hybrid proteins218. Unfortunately, hemolysin mediated secretion is not a very efficient process. The exploita- tion of auto-transporters, of which some are naturally expressed to very high levels without eliciting stress responses in the cell, may be more promis- ing219. It remains to be explored how much replacement of the passenger domain by a target protein can facilitate high yield expression into the cul- ture medium.

Membrane protein overexpression in E. coli While the production of soluble proteins is comparably straightforward, overexpression of membrane proteins remains a challenging task22. This is presumably caused by the complex requirements of membrane protein bio- genesis (see above). Overexpression into inclusion bodies is not an option as there are only very few examples of α-helical membrane proteins that have been successfully refolded after denaturing isolation220, 221. Therefore, it is preferred to overexpress α-helical membrane proteins in a membrane sys- tem, from which they can be purified after detergent extraction.

26 In the following sections, I will highlight the recent knowledge of mem- brane protein overexpression in E. coli, its limitations and potential solu- tions.

Which properties make a well expressing membrane protein? Several overexpression screens of prokaryotic membrane proteins in E. coli led to the conclusion that well expressed membrane proteins tend to be small with relatively few transmembrane helices222-224. In contrast, the analysis of a large-scale overexpression screen of E. coli membrane proteins has shown that levels of membrane integrated protein did not correlate with – and hence could not be predicted by – obvious sequence characteristics such as codon usage, protein size, hydrophobicity and number of transmembrane helices225. These conflicting conclusions may be explained by the different approaches that were used to express and monitor membrane protein overexpression. Expression levels of membrane proteins in the first three studies were screened by means of Western- and dot-blotting, respectively. It is possible that the transfer of small membrane proteins with relatively few transmem- brane helices during Western-blotting is simply more efficient than the transfer of larger membrane proteins. In the second study, constructs were expressed for 2 h at 37°C, which is suboptimal for membrane protein over- expression. Therefore, the expression yields may not be representative.

Monitoring membrane protein overexpression Monitoring the localization, quantity and quality of overexpressed mem- brane proteins is important to allow assessment and optimization of overex- pression yields. Since it is unpredictable whether overexpressed membrane proteins end up in the membrane or in inclusion bodies, the first step in monitoring membrane protein overexpression is usually fractionation of the overexpression vehicle into a soluble, an insoluble (inclusion bodies) and a membrane fraction22. Separation of whole cells or fractionated material by SDS polyacrylamide gel electrophoresis (PAGE) and subsequent visualization of proteins by Coomassie staining is most often used to analyze the purity, integrity as well as quantity of overexpressed membrane proteins. Western-blotting, using antibodies against the target protein or a purification tag therein is also widely employed to detect membrane proteins after SDS PAGE. However, due to the hydrophobic nature of membrane proteins their transfer from a gel to a blotting membrane can be cumbersome, making Western-blotting less suitable for quantitative purposes. Recently, detection of membrane protein overexpression levels by means of dot-blotting (i.e., spotting samples directly on a nitrocellulose filter) was explored226. Detergent solubilized membrane protein purified by affinity

27 chromatography was spotted on a nitrocellulose filter and subsequently de- tected with antibodies against the affinity tag. The method is fast and avoids tedious SDS PAGE and Western-transfer. Dot-blotting, however, does not provide any information about the integrity of the overexpressed membrane protein (except that the affinity tag is still intact) and solubilization of aggre- gated overexpressed protein in harsh detergents may lead to an overestima- tion of membrane integrated protein. In another recent report, a detergent adapted version of colony filtration (CoFi) blotting was introduced to screen for well expressing colonies among transformants of a library of randomly mutated membrane proteins227. Based on diffusion properties, CoFi blotting can separate soluble or detergent solu- bilized protein from inclusion bodies by a filtration step at the colony level228. Unfortunately, this screening approach does not assess the correct assembly or integrity of the overexpressed membrane protein as this may be independent from solubility aspects. Therefore, further evaluation of positive colonies is needed. Fusing green fluorescent protein (GFP) to the C-terminus of membrane proteins enables monitoring the levels of overexpressed proteins in intact cells. The GFP-moiety only folds properly and becomes fluorescent if a membrane protein is stably inserted into the membrane229. Using whole cells as starting material, the expression of the membrane protein GFP-fusion can be detected through spectrofluorometry or in-gel fluorescence in standard SDS PAGE230. In addition, the GFP-moiety greatly facilitates the monitoring of purification steps and precrystallisation screening230, 231. Just recently, the use of GFP as a folding indicator was extended to simultaneously detect well-folded and aggregated proteins232. This allows a more detailed assess- ment of the quality of overexpressed membrane protein. A drawback of this method is that it is largely based on time-consuming SDS PAGE. It would be faster to monitor protein aggregation by using an IbpA/B reporter con- struct as the expression of IbpA/B correlates extremely well with aggregate formation (unpublished observation). The above described methods to monitor the quantity and quality of over- expressed membrane proteins do not provide information on the proper fold- ing and functionality of the overexpressed material; integration into the membrane is no guarantee for this. To monitor the functionality of a protein, not only its function should be known but also an activity assay needs to be available.

Bottlenecks of membrane protein overexpression Membrane protein overexpression is limited by a number of bottlenecks, which presumably relate to their complex biogenesis. The most general bot- tleneck concerns toxicity to the host, which severely limits biomass forma- tion and yield. Toxicity however, may be the net result of bottlenecks on the

28 molecular level of membrane protein biogenesis and therefore, I will discuss these aspects in more detail below. The availability of endogenously expressed factors assisting the biogene- sis of membrane proteins can be insufficient to support proper processing of large quantities of overexpressed protein. In E. coli it has been shown that upon overexpression of membrane proteins the SRP is titrated out233. This is conceivable as the number of SRPs in the cytoplasm is very limited (approx. one protein Ffh per 100 ribosomes234). However, recent studies in our labo- ratory have shown that the SRP requirement in vivo may be strongly overes- timated as the targeting of many membrane proteins is unaffected upon Ffh depletion (David Wickström, personal communication). Furthermore, pro- teomic profiling of membrane protein overexpression and Ffh depletion showed little overlap153 (David Wickström, personal communication) sug- gesting that a limitation of the SRP may not be critical for membrane protein overexpression. This is supported by the observation that the supply of Ffh or FtsY for in vitro expression of membrane proteins is not yield limiting235. In contrast, the proteomic profile of E. coli cells depleted for the Sec- translocon component SecE exhibited a fairly large overlap with the profile of membrane protein overexpression153, 236. Aggregation of secretory pro- teins in the cytoplasm and reduced accumulation levels of membrane and secretory proteins were in common and point to a limiting capacity of the Sec-translocon as the major bottleneck of membrane protein overexpression. Once inserted into the membrane, the capacity of generic membrane pro- tein chaperones may not be sufficient to support the folding of overex- pressed membrane proteins. Folding of ectodomains of membrane proteins may require cytoplasmic and periplasmic chaperones as suggested by the stronger recruitment of DnaK to inner membranes upon membrane protein overexpression153. Also these chaperones may not always be present in suffi- cient numbers to support the folding of overexpressed material in addition to their endogenous substrates. The membrane space required to accommodate overexpressed membrane proteins represents another potential bottleneck. It has been shown for the E. coli b subunit of F1Fo ATP synthase that membrane proliferation upon mem- brane protein overexpression improved overexpression yields237, 238. The mechanism that induces the proliferation of membranes upon membrane protein overexpression is not understood. Notably, proliferation of mem- branes is not limited to membrane protein overexpression as it has also been observed upon addition of Mg2+ to the culture medium and upon depletion of the SRP component Ffh239. The accumulation of non-native or foreign structures in a membrane can induce stress responses and activate proteolytic systems of the overexpres- sion host. As mentioned above, in E. coli, the inner membrane proteases FtsH and HtpX are involved in the degradation of improperly assembled membrane proteins. Their action could also account for the proteolytic deg-

29 Fig. 8. Effect of htpX deletion on mem- brane protein over- expression and sta- bility. A, per cell overexpression levels 4 hours after induc- tion with IPTG. B, stability of overex- pressed LepI-GFP assessed by in gel fluorescence. radation of recombinant membrane proteins. Surprisingly, however, deletion of HtpX resulted in reduced expression levels of a number of overexpressed membrane proteins, pointing at the importance of functional quality control mechanisms in the cell (Fig. 8A). Furthermore, stability of the degradation prone protein LepI-GFP was not affected by deletion of HtpX (Fig. 8B). The stability of overexpressed membrane proteins can sometimes be enhanced by engineering of tightly folded fusion partners to the C-termini (see below).

Heterologous expression Yields of overexpressed membrane proteins foreign to the expression host are often extremely low (see paper IV Fig. 6). Some problems related to heterologous expression are of general nature and not exclusive for membrane proteins. The most prominent example con- cerns codon usage. Several codons used commonly in eukaryotes are rare in E. coli. Two solutions are generally employed to overcome this problem: either codon optimization of the target gene or overexpression in a host that co-expresses rare tRNAs. As the de novo synthesis of long genes is very expensive, the latter approach, which works almost as well, is chosen more often167. Another general problem concerns the domain organization and folding of proteins: bacteria have on average smaller proteins than eukaryo- tes and a less complex multi-domain architecture240. Furthermore, the cou- pling of translation and folding is presumably different in bacterial and eu- karyotic cells132. Both aspects may contribute to the circumstance that pro- teins from eukaryotes often fold inefficiently in bacteria164. Heterologous expression of membrane proteins encounters a number of additional difficulties, which relate mainly to their biogenesis and their physical properties.

30 Targeting and insertion The targeting and insertion of heterologously overexpressed membrane pro- teins have been studied in E. coli. In vitro translation cross-linking ap- proaches have shown that heterologously expressed membrane proteins make contact with the SRP, the Sec-translocon and YidC241. Nonetheless, this does not necessarily mean that the E. coli components are optimal for assisting the biogenesis of heterologously expressed membrane proteins. The successful overexpression of mitochondrial carriers in the Gram-positive bacterium Lactococcus lactis indicates that membrane proteins can be effi- ciently targeted and inserted in a heterologous system via pathways not used under normal conditions242. Heterologous overexpression of membrane proteins can be hampered by different synthesis, targeting, insertion and folding characteristics in the overexpression host243. For instance, in prokaryotes the polypeptide elonga- tion and protein folding rates are considerably higher than in eukaryotes. In particular, in E. coli formation of the RNC-SRP complex does not involve stalling of translation244. It is well conceivable that this causes mistargeting and misfolding of heterologously expressed membrane proteins. Incompati- bility of factors involved in the processing of membrane proteins presents a general problem. For instance, some of the conserved charged residues in the yeast Sec61-translocon that are suggested to be important for topology de- termination (Fig. 3) are not conserved in the prokaryotic Sec-translocon31. Thus, it is quite possible that subtle differences in the insertion process con- tribute to the difficulties experienced in the heterologous overexpression of membrane proteins75.

Folding, quality control & post-translational modifications Some membrane proteins, such as some transporters in yeast and some rhodopsins, require membrane protein-specific chaperones for proper fold- ing62, 245-247. The absence of specialized chaperones required for the heterolo- gous overexpression of membrane proteins may represent a bottleneck to their overexpression. The quality control of membrane proteins is considerably different in prokaryotes and eukaryotes: Folding of ER-glyco(membrane)proteins is chaperoned in a complex quality control cycle248 and removal of misfolded proteins involves a dislocation machinery, the ubiquitin system and the pro- teasome249. Many of the involved elements are foreign to E. coli or are sub- stantially different, which could mean that correct folding and quality assur- ance of heterologously expressed protein is difficult to achieve. Glycosylation of eukaryotic membrane proteins can be essential for proper folding, stability and also function250. E. coli usually does not glyco- sylate integral inner membrane proteins. However, this does not disqualify E. coli per se as a host for the functional overexpression of eukaryotic mem-

31 brane proteins that are normally glycosylated as for instance, the human CB2 receptor, can be functionally overexpressed in E. coli251.

Lipid requirements Membranes from prokaryotes, yeast or higher eukaryotes differ in their lipid composition, which may lead to problems of the functional overexpression of heterologous membrane proteins. This is illustrated by the observation that the mammalian presynaptic serotonin transporter (SERT) is only func- tionally overexpressed in the presence of cholesterol, which is not available in prokaryotic membranes243. Thus, differences in membrane bilayer proper- ties can have a significant effect on the insertion, folding and functioning of a membrane protein252, 253. This may not necessarily be a dead end pathway as the addition of certain lipids during/after purification can restore func- tionality of overexpressed membrane proteins.

Target protein optimization Besides the optimization of expression systems or conditions the membrane protein itself can be modified to improve its expression yields. Here, I dis- cuss how this can be achieved (See also figure 9).

N-terminal truncations and addition of signal sequences In both prokaryotes and eukaryotes there is a strong preference for N- terminal tails of membrane proteins to be localized to the cytoplasm225, 254. Translocation of an N-terminal tail depends on the ability of the N-terminus to remain unfolded (which is seemingly easier if the N-terminal tail is shorter) the number of positively charged residues in the tail region and the ‘strength’ of the first transmembrane segment255. The inability to efficiently translocate the N-terminal tail of a membrane protein may hamper the over-

Fig. 9. Target protein obimization. A, modifi- cation of translocated N- termini. B, modification of cytoplasmic termini.

32 expression of membrane proteins with an N-out orientation. Therefore, trun- cation of N-tails can greatly improve its translocation and the overexpression of the respective protein22. A signal peptide fused to the N-terminus of a membrane protein can also facilitate its translocation. Likewise, functional expression of the GPCRs neurotension (NTR) and adenosine (A2a) receptors is improved in E. coli by fusing MBP containing a secretory signal sequence to their N-termini256, 257. Interestingly, MBP alone is targeted co- and post- translationally to the Sec-translocon258. Post-translational targeting of the extremely hydrophobic GPCR is completely detrimental and is likely to result in cytoplasmic aggregation of the protein. This means that even though the overall effect of fusing MBP to the N-terminus of a GPCR is beneficial, it is by no means optimal, as a large proportion of the protein may aggregate.

N-terminal protein fusions N-terminal soluble protein fusion partners are routinely tested for their abil- ity to improve the expression of poorly expressed membrane proteins, e.g., GST, NusA and GFP. For aforementioned reasons, without a signal se- quence the approach is likely to be more successful if the N-terminal tail is cytoplasmic. Recently, the N-terminal addition of Mistic, a small protein that is unique to Bacillus subtilis and closely related species, was reported to improve expression of a number of eukaryotic membrane proteins and bacte- rial sensor histidine kinases in E. coli 259, 260. By autonomously associating with the bacterial membrane, Mistic was speculated to chaperone a down- stream membrane protein into the lipid bilayer without assistance from the Sec-translocon. At this stage, further experiments are needed to verify if this targeting hypothesis is correct.

C-terminal protein fusions As processing through the translocon is by and large unidirectional, (i.e., from the N- to C- terminus)261, C-terminal tails of multispanning membrane proteins can be threaded through on either side of the membrane as long as they are in an unfolded state. Thus, translocation of large extra-cytoplasmic C-terminal tails is unlikely to be as problematic as for N-terminal tails, and the addition of any protein fusion seems reasonable. As mentioned above, the advantage of adding fusion proteins is one of stability, as proteases, like the E. coli FtsH-complex, can unravel and degrade membrane proteins from free N- or C-terminal ends89. The most significant improvements of overex- pression were obtained with thioredoxin and GFP, which is presumably at- tributed to the remarkable stability of those proteins22. Removal of fusion partners can be facilitated by the use of tobacco etch virus (TEV) or Preci- sion protease cleavage sites located at the boundaries of the target protein230.

33 Mutagenizing the target protein Slight differences in the amino acid sequence of membrane proteins can have a big effect on overexpression yields262.This is one of the reasons, why most often the expression of many homologues of a target is tested to select the best expressing and functional candidate. It also offers the opportunity to randomly mutate a given membrane protein and screen for improved expres- sion. In a recent screening of nine proteins, one cycle of directed evolution resulted in significant improvements of membrane protein overexpression yield for five targets227 showing the potential of this strategy. However, ran- dom mutagenesis bears the danger of selecting against the function of a pro- tein and well expressing mutant proteins may not always be well crystalliz- ing ones. In this screen, the well expressing mutants showed no preference for certain amino acid substitutions or for locations of the substitutions rela- tive to the predicted topology. As expression is a multi-component mecha- nism, beneficial mutations can affect transcription, translation, targeting, insertion, folding and stability – i.e., the complete range of biogenetic proc- esses. Therefore, it is unlikely to find preferences for beneficial mutations solely related to the secondary structure based on such a small dataset.

Host strain optimization With increasing knowledge of membrane protein biogenesis and the avail- ability of powerful genetic techniques, the possibility of engineering or se- lecting host strains optimized for the overexpression of membrane proteins comes within reach. Analogous to the co-expression of chaperones to improve the folding of soluble proteins, the co-expression of SRP pathway targeting factors and Sec-translocon components has been tried to improve membrane protein targeting to and insertion into the cytoplasmic membrane. However, so far the attempts were not productive209. Presumably, the choreography of target- ing, insertion and assembly is too complex and co-expression of components has to be extremely well fine-tuned to be successful. This is emphasized by the observation that SecY overexpression without co-expression of SecE results in degradation of the Sec-translocon101. More than a decade ago, two derivatives of BL21(DE3), named C41(DE3) and C43(DE3), were isolated in a genetic screen that selected for the ability to cope with toxic effects of the overexpression of two particular membrane proteins263. These two strains are recognized for their superior ability to produce a wide array of membrane proteins to high yields. The beneficial effects may partly result from reduced transcription of recombi- nant mRNA in C41(DE3) and C43(DE3) (also often called ‘Walker strains’) compared to BL21(DE3) and from a delayed onset of transcription in C43(DE3)263. Furthermore, the Walker strains exhibit improved plasmid

34 stability during the expression of toxic recombinant proteins264, and they can induce the proliferation of intracellular membranes that accommodate large quantities of overexpressed membrane protein238. C41(DE3) and C43(DE3) are now widely and very successfully used to overexpress membrane pro- teins226, 230. Based on the observation that overexpression of some membrane proteins leads to the proliferation of internal membranes, it has been suggested to exploit this phenomenon by strain engineering to increase the available space for membrane protein production within the cell265. N-linked glycosylation is the most abundant post-translational modifica- tion in eukaryotic cells. Unfortunately, E. coli is naturally not capable of N- linked glycosylation, which represents a bottleneck for heterologous overex- pression. An N-linked glycosylation system was found in Campylobacter jejuni, which could be functionally transferred to E. coli266. Although bacte- rial N-glycans differ structurally from their eukaryotic counterparts, this opens up the possibility of engineering strains that are better suited for het- erologous overexpression.

Optimization of expression conditions It was already mentioned above that a reduction of expression temperature has many beneficial effects. This is not only the case for the folding of solu- ble proteins but also for the biogenesis of membrane proteins. For this rea- son, membrane proteins are usually expressed at temperatures between 15°C and 30°C and almost never at 37°C. E. coli supports expression of mem- brane proteins at low temperatures with a little ‘trick’: the composition of the membrane is adjusted in such way that the fluidity remains constant over different temperatures. This is achieved by modulating the ratio of saturated vs. unsaturated fatty acids and the fatty acid chain length267. Overexpression can also be optimized by the composition of the culture medium. Often, very rich media like terrific broth support the growth of bacteria to higher cell densities than Luria broth does. However, it has to be noted, that optimal growth conditions are not always optimal production conditions. For instance, Weiss and Grisshammer mentioned256 that “the addition of glucose (0.2% (w/v) or more) to the medium increased the final cell density from an A600 of about 1.5 to an A600 of about 6. The number of receptors per cell, monitored by binding of a receptor specific antagonist to whole E. coli cells, increased about fourfold by including 0.2% glucose, but decreased again at higher glucose concentrations (0.4%)”. The same paper mentions that supplementation of the culture medium with a ligand of the overexpressed receptor improved expression levels by 10-30%256. Excellent expression results have also been achieved by the use of auto- inducing media268. Auto-induction leads to the gradual induction of overex- pression due to the slow release of glucose-mediated catabolite repression.

35 The gradual start of overexpression may provide the cell with time to slowly adjust to overexpression needs, e.g., by the up-regulation of folding media- tors, thereby preventing overloading of the biosynthetic machinery and in- duction of stress responses. However, it is not completely clear if the benefi- cial effects result from auto-induction per se or from the fact that the particu- lar medium is enriched e.g., in phosphate, trace elements, carbon sources and salts, supporting higher cell densities (unpublished observations).

In conclusion, we recognize that membrane proteins are challenging targets for overexpression. Their physical and chemical properties make them diffi- cult to handle and membrane protein biogenesis is complex. However, pro- ducing membrane proteins in E. coli has proven successful in many cases, even though yields are often suboptimal. Recently, the perception of mem- brane protein overexpression has turned from a mandatory exercise to a sci- entific problem that can be systematically and rationally addressed. While the understanding of membrane protein biogenesis may have come far enough to support this approach, the physiological consequences of mem- brane protein overexpression have not been investigated. Therefore, my studies aimed at understanding membrane protein overexpression from the perspective of membrane protein biogenesis and at systematically investigat- ing the physiological response to their overexpression in E. coli. This may provide a platform for designing strategies to improve membrane protein overexpression yields.

36 2 Methodology

In the second part of my thesis, I want to explain and discuss techniques central to the work presented in my papers. I will refrain from reviewing common techniques such as SDS PAGE and Western-blotting, and I will not provide detailed descriptions and recipes of e.g., buffers as those can be found in the papers and references therein. Rather, I want to discuss general aspects, and the strengths and weaknesses of the techniques I have used fre- quently. Workflows of the proteomics experiments can be found in supplementary figure 1 of paper I and in figure 1 of paper III. Expression conditions and ways to monitor membrane protein overexpression were already considered in the preceding sections.

Flow cytometry of bacterial cells Flow cytometry is a technique to analyze single particles. A fluidic system leads to the hydrodynamic focusing of particles that subsequently pass, one by one, through the beam of a laser269. Light scattering and fluorescence emission (if the particle is labeled with a fluorophore) provide information about the particle’s properties. In this way, more than 1000 cells can be ana- lyzed per second, and typically, 10,000 events are accumulated. Traceable properties that do not depend on a fluorophore include particle size, which is correlated to forward scatter, and granularity, correlated to side scatter. The assessment of other properties is only limited by the availability and suitabil- ity of fluorophores and ranges from the amount of nucleic acids or lipids over intracellular pH and ion concentration to the expression of proteins. The latter is often detected by the use of antibodies conjugated to fluoropho- res. Despite of its widespread use, flow cytometry has not been much recog- nized for the analysis of bacterial cells. The major reason for this under- representation is presumably the small size of bacteria. A small cell volume is correlated with few targets for fluorescent molecules, no matter if a chemical or a physical property of the cell is measured. The consequence is that light scattering and fluorescence emission from bacterial cells are often at the detection limit of common machines and close to contaminating sig- nals from e.g., small air bubbles. A further obstacle is the rod shape of many

37 bacteria like E. coli. While round cells pass the beam always in the same way, the orientation of rods can vary, leading to the typical bi-modal shape observed in the forward and side scatter plot (Figure 10). In this way, emis- sion intensities of bacterial cells can easily cover an order of a magnitude, which compromises the resolution. I used flow cytometry to assess the expression of membrane-protein-GFP fusions, the morphology of the bacterial cell and the cell’s viability. Infor- mation about morphology was derived from forward and side scatter as well as from the use of the membrane specific fluorophore FM4-64. FM4-64 is a red fluorescent lipophilic styryl dye that predominantly stains the cytoplas- mic membrane of E. coli270. I assumed that the amount of membranes corre- lates with cell size and that no internal membranes were present (which was confirmed by electron microscopy). Furthermore, I used the specificity of FM4-64 to distinguish bacteria from air bubbles and other contaminants (Figure 10).

Fig. 10. Filtering flow cytometry data by the help of FM4-64. Left, pseudo-color dot blot of raw data from E. coli. Forward scatter vs. side scatter shows the typical bi-modal shape of the population. Contamination is visible in the low intensity side scatter. Middle, histogram showing the FM4-64 fluorescence. FM4-64 positive signals are gated. Right, same as in left panel but showing only the FM4-64 positive signals gated in the middle panel.

Propidium iodide was used to assess the viability of cells depleted of SecE. Propidium iodide is a hydrophilic dye that fluoresces red upon interca- lation into nucleic acids. Cytoplasmic membranes of unstressed, healthy cells are impermeable for propidium iodide while it permeates readily through membranes of dead cells271. It is to consider that the terms ‘viability’ and ‘viable’ can be misleading. Traditionally, viability described the ability to form a colony on an agar plate, therefore rather meaning ‘culturability’. However, the existence of viable cells that are not able to form colonies is obvious. It is better to distinguish further between ‘metabolically active’, ‘intact’ and ‘permeabilized’ cells271. Only the latter are stained by propidium iodide. Flow cytometry data were analyzed with the software ‘FlowJo’ (Tree Star).

38 Aggregate purification Our procedure for the purification of aggregates was based on a protocol developed by Tomoyasu et al.159. Cells were lysed by lysozyme treatment, osmotic shock and sonication. After removal of unbroken cells and debris by a low speed spin, pellets containing membranes and aggregates were repeat- edly washed and sonicated in buffer containing 2% of the mild detergent NP-40 (Fig. 11). In this way, membranes were solubilized but aggregates remained largely intact. Only minute amounts of proteins were detected in unstressed cells. Those ‘contaminants’ were outer membrane proteins, some DNA associated proteins and proteins involved in glycogen metabolism. The presence of these proteins can largely be explained by a certain ‘stickiness’ of the molecules, they are associated with. Contrary, a multitude of proteins was detected in aggregates purified from cells with secretion stress (secB deletion, SecE depletion, membrane protein overexpression). Stressed cells that did not tend to form larger amounts of aggregates consistently showed a distinct pattern of aggregation of ribosomal subunits, which was not ob- served in unstressed cells.

Fig. 11. Workflow of aggregate purification. After cell lysis by sonica- tion, unbroken cells were removed and membrane contaminated aggregates were collected. Mem- branes were removed by repeatedly washing with 2% NP-40.

Strikingly, we were largely unable to identify integral membrane proteins in aggregates derived from cells with secretion stress (except the overex- pressed one), irrespective of the method we used for the separation and iden- tification of aggregated proteins.

Preparation of inner and outer membranes The protocol for cell fractionation was a fusion of two methods described previously272, 273. I employed two subsequent sets of sucrose density step gradient centrifugation to separate inner and outer membranes (Fig. 12). Outer membranes have a higher buoyant density than inner membranes and therefore equilibrate with a higher percentage sucrose solution during cen-

39 trifugation274. While it was easy to separate inner and outer membranes in control cells, overexpression of inner membrane proteins resulted in a higher buoyant density of the inner and a lower buoyant density of the outer mem- branes, making it more difficult to obtain pure fractions. Furthermore, ag- gregates tended to co-sediment with and hence contaminate the outer membrane fraction273, 275 (Fig. 12B, Fig. 13).

Fig. 12. Preparation of inner and outer membrane fractions. A, workflow. After French pressing and removal of unbroken cells, crude membranes were spun onto a bed of 55% (w/w) sucrose solution. Membranes were diluted and subsequently, inner and outer membranes were separated on a 6 step sucrose gradient. B, SDS PAGE gel of inner and outer membrane fractions of BL21(DE3)pLysS overexpress- ing YidC-GFP and of control cells. Overexpression of YidC-GFP led to a more difficult separation of inner and outer membranes. Outer membranes were slightly contaminated by aggregates as indicated by the presence of IbpA/B.

2-dimensional polyacrylamide gel electrophoresis 2-dimensional (2D) PAGE is a technique to separate proteins based on two different physical properties. In the first dimension, proteins are often sepa- rated based on their charge or complex size while in the second dimension, proteins are separated by their molecular mass. Protein spots are visualized by staining gels with e.g., Coomassie or silver stain. 2D gel images are ana- lyzed to compare the protein spot patterns and identify differences between samples or groups of samples. 2D PAGE is widely used in proteomics research due to its low tech pro- cedure and reasonable costs compared to mass spectrometry based ap- proaches. While the latter offer a more exhaustive proteome coverage, gel based systems are visually very accessible, which is a feature not to underes- timate. The disadvantages of 2D PAGE approaches are a low throughput, a

40 fairly high technical variation and a low proteome coverage, which is mainly caused by a limited sensitivity and a small dynamic range. I used two different techniques to separate proteins in the first dimension. The first described here is more suitable for soluble proteins and β-barrel membrane proteins while the second is used for the analysis of complexes of the inner membrane proteome.

2-dimensional IEF/SDS polyacrylamide gel electrophoresis Soluble and outer membrane proteomes, respectively, were analyzed by isoelectric focusing (IEF), which separates proteins in the first dimension based on their net charge. Basic and acidic amino acid side chains, C- and N-termini and potential post-translational modifications contribute to the net charge of a protein. Depending on the pH, proteins are more negatively (at high pH) or more positively charged (at low pH). The pH, at which a pro- tein’s net charge equals zero, is called isoelectric point (pI). Consequently, at this pH, a protein will not migrate in an electric field. Most E. coli proteins have a pI between 4 and 7. Proteins can be separated based on their pI by making use of immobilized pH gradients (IPG) in polyacrylamide gels276. In an electric field, proteins migrate towards the pH that corresponds to their pI. Due to the zero net charge at the pI, protein migration ceases and proteins are focused by increasing electric field strength. IEF has a high resolution power, especially if narrow range pH gradients are used. During sample preparation for IEF, proteins are solubilized and denatured in high molar urea buffer, after nucleic acids have been removed. Detergents are critical for solubilization, however as IEF depends on the net charge of proteins, preferentially non-ionic and zwitterionic detergents are used. Un- fortunately, these solubilization conditions are too mild for a number of pro- teins, in particular for α-helical integral membrane proteins. Furthermore, solubility of proteins is smallest at their respective pI, often leading to the precipitation of proteins (and again in particular of membrane proteins) within the first dimension gel. Following IEF, protein containing IPG gel strips are equilibrated in SDS buffer to prepare them for the second dimension. I employed both, conven- tional SDS PAGE with glycine as trailing ion277, as well as tricine gels that facilitate the resolution of small proteins278. Both techniques separate pro- teins based on their size. Usually, up to twelve gels were run in parallel to minimize technical variations. 2D IEF/SDS PAGE is particularly suited for the analysis of secretion. As a signal sequence of secretory proteins is around 2 kDa and contains basic residues at the N-terminus, precursor and processed forms of secretory pro- teins migrate differently in the 1st and in the 2nd dimension and therefore, they can be readily distinguished based on their location on the gel (Fig. 13). Not surprisingly, 2D IEF/SDS PAGE has been used successfully for the

41 identification of substrates of components of different secretion systems279, 280 and it also proved very valuable for the analyses presented in this thesis.

Fig. 13. Precursor accumulation of secretory proteins analyzed by 2D IEF/SDS PAGE. Top, precursor and processed forms of OmpA in purified aggregates of YidC-GFP overexpressing cells due to a slight contamination of aggregates by outer membranes. Bottom, precursor and processed forms of OmpF in purified outer membranes of YidC-GFP overexpressing cells due to co-sedimentation of aggregates with outer membranes in sucrose density centrifugation.

2-dimensional blue native/SDS polyacrylamide gel electrophoresis As membrane proteins are poorly resolved on 2D IEF/SDS PAGE, other methods must be used to analyze membrane proteomes. 2D blue native polyacrylamide gel electrophoresis (2D BN/SDS PAGE) is the method of choice for the global analysis of membrane proteomes and the subunits of membrane protein complexes281. For the first dimension of 2D BN/SDS PAGE, membrane proteins com- plexes are solubilized using mild non-ionic detergents, labeled with the ani- onic dye Coomassie Brilliant Blue G, and are subsequently separated ac- cording to size in a gradient gel. In the second dimension, the subunits of the separated complexes are resolved by SDS PAGE as mentioned above. Solubilization of membrane proteins and membrane protein complexes during sample preparation is a critical step as the structure and activity of proteins must be preserved. In contrast to IEF, only very mild, non- denaturing detergents can be used. Most common are n-dodecyl-β-D- maltopyranoside (DDM), Triton X-100 and digitonin. Solubilization and preservation are further influenced by the ionic strength and the pH of the buffer and by the concentration of ε-amino-n-caproic acid (ACA) and Coomassie blue. Inner membranes of E. coli are best solubilized by DDM282. Coomassie blue is also responsible to provide negative charge to the solubi- lized proteins and secure unidirectional mobility during the electrophoresis run. In the currently available protocols the first dimension BN gel lanes get distorted during their tedious transfer to the second dimension separation gels. This leads to low reproducibility and high variation of 2D BN/SDS gels, rendering them unsuitable for comparative analysis. I have developed a 2D BN/SDS PAGE protocol where the first dimension BN gel is cast on a GelBond PAG (i.e., polyester) film153. Immobilization of the BN gel not

42 only speeds up the transfer of BN gel lanes onto second dimension separa- tion gels, it also prevents swelling/shrinking of the lanes during the equili- bration step preceding the second dimension and further prevents distortion when the lanes are placed on the second dimension gels. Immobilization of the BN gel lowers variation and improves reproducibility of 2D BN/SDS gels significantly, enabling reliable comparative analysis (Fig. 14).

Fig. 14. 2D BN/SDS PAGE methodology suitable for relative quantification153. A. To be able to use 2D BN/SDS PAGE for relative quantification, a 1.0 mm thick first dimension gel is cast on a polyester support (GelBond PAG Film, Cambrex) (1). After the first dimension blue native run, lanes are cut (2), equilibrated in dena- turing, reducing and alkylating buffer and submerged into a low-melting agarose solution on top of a 1.5 mm thick 10% duracrylamide gel (3). 6-12 gels are run in parallel in the second dimension in an Ettan Dalt twelve system (GE Healthcare) (4). Relative quantification of spot intensities is performed using ‘PDQuest’ (5).

Image analysis of 2-dimensional polyacrylamide gels 2D PAGE gels were stained with Coomassie or silver stain to visualize pro- tein spots. Silver stain has the advantage of higher sensitivity, typically re- sulting in ~1000 spots, while Coomassie (~800 spots) offers a higher dy- namic range and less often leads to spot saturation. The dynamic range is further influenced by the bit depth of the scanner used. With a larger bit depth, typically 16 bit, more gray shades can be distinguished. After scanning, the gel image was cropped, filtered (noise reduction) and the background was subtracted. In ‘PDQuest’ (Bio-Rad), the software I used for image analysis, background subtraction is performed by the so called ‘floating ball method’. If one imagines the gel image as an upside-down 3D

43 landscape as shown in figure 15, the background signal is subtracted where a floating ball touches the landscape. The radius of the floating ball is calculated based on the largest spot cluster in the gel image.

Fig. 15. ‘Floating ball method’ for back- ground subtraction of 2D polyacrylamide gels. The background signal is subtracted where a floating ball touches the 3D land- scape of the gel image. The radius of the floating ball is calculated based on the larg- est spot cluster in the gel image.

Fuzzy, streaked, or overlapping spots in a raw image can be difficult to accurately quantify. Because the image profile of an ideal spot conforms to a Gaussian curve, ‘PDQuest’ uses Gaussian modeling to create ‘ideal’ spots that can be easily identified and quantitated. A Gaussian spot is a precise three-dimensional representation of an original scanned spot. Gaussian curves are fitted to the scanned spot in the X and Y dimensions, and then additional modeling is performed to create the final Gaussian spot. For com- parison, a 3D visualization of raw, filtered and Gaussian images is shown in figure 16.

Fig. 16. 3D visualization of raw, filtered and Gaussian images of 2D IEF/ SDS polyacrylamide gels.

Initial spot detection was done automatically; the sensitivity of spot de- tection is based on a selected faint spot. Spot detection was manually curated and spots were matched among the gels. Spot detection in 2D BN/SDS gels was performed exclusively manually by using the spot boundary tools of ‘PDQuest’. This was necessary because spots in 2D BN/SDS gels often ex- hibit an irregular shape. Spots created using the spot boundary tools are not Gaussian modeled. To compensate for technical variations in spot intensity, spot volumes were normalized based on the total density in the gel image or on the total

44 quantity of valid spots in a particular gel. After normalization, spot volumes were quantified and exported for statistical analysis.

Statistical analysis of 2-dimensional polyacrylamide gels Statistical analysis of 2D gels is necessary to compare protein spot patterns and identify significant differences between samples or groups of samples. Differences between samples can be caused by technical or biological vari- ability, and only the latter is of interest. As mentioned above, technical variation was compensated by normalization. Furthermore, technical varia- tion was minimized by performing all procedures in parallel: cultures, sam- ple preparation, IEF or BN, 2nd dimension SDS PAGE and gel staining. To compensate for variation within a replicate group, 3-4 independent biologi- cal samples, i.e., cultures from different colonies were analyzed. Raw data from 2D gels are usually not normally distributed. However, the normal distribution of data is an assumption, on which many statistical methods are based. It is therefore necessary to transform the data by taking the logarithm. Usually, this also has the positive effect that variances are stabilized283. Missing values are a common problem in 2D PAGE. A value can be missing for biological reasons or by chance (random error). Missing data in some gels of a replicate group are interpreted as random error. Spots with missing values in a complete replicate group might be either for biological reasons or just at random. Whether missing data in a complete replicate group can be interpreted as a biological effect depends on the random error risk. If it is assumed that missing values occur completely at random with the same probability, the random error risk can be estimated as the propor- tion of missing values among replicates with at least 1 non-missing value. This risk was 0.076 for the whole cell analysis set presented in paper IV. The risk, for a specific spot, that at least one replicate group (of 6) shows no signal by chance was then 0.0014. If the assumption is valid it seems quite safe to interpret missing data in a complete replicate group as a biological effect, and more specifically, as a qualitative change. As there is no use of assessing the significance of a qualitative change, I omitted spots with miss- ing values in a complete replicate group from the ANOVA analyses in paper IV. Traditional analysis of proteome data has applied primarily a ‘spot by spot’ approach (or univariate analysis), e.g., of treated versus control sam- ples using Student’s t-test. This was also applied in studies I and III were only two different replicate groups were compared with each other. The use of Student’s t-test for proteomics analysis is slightly problematic for two reasons that will be explained in the following paragraphs. In proteomics experiments, often very many tests are performed, e.g., up to 1000 in case of a 2D gel loaded with E. coli whole cell lysates. If the

45 probability α of a false positive decision, i.e., the decision, that the expres- sion of two proteins is significantly different in control and treated cells, even though it is not, is set to 0.05, 50 out of 1000 decisions can be expected to be wrong, which is unacceptable. Therefore, the α-level has to be con- trolled more stringently. In paper IV, we have used the ‘false discovery rate’ (FDR) to control the error rates among all tests in the multiple testing proce- dures. The FDR is defined as “the ratio of the number of false positive deci- 283 sions to the number of all positive decisions” . Therefore, an FDR0.05 means that only 5% of all positive decisions are expected to be false. If changes in the expression of several proteins combined create a consis- tent pattern of variation that is related to a specific biological mechanism or if more than two replicate groups are compared with each other (as in paper IV), a univariate approach may not be sufficient. Then, the application of a global approach (or multivariate analysis) is more applicable, where the test is based on the results of an analysis of variance (ANOVA) considering all the observations together. The proteomics experiments in paper IV (analysis of whole cell lysates by 2D IEF/SDS PAGE and cytoplasmic membranes by 2D BN/SDS PAGE) were analyzed by two-way ANOVA with ‘treatment’ (± membrane protein overexpression) and ‘strain’ as factors, resulting in a p- value for each spot. The p-values were ordered and the FDR0.05-level was estimated. A test with a p-value below the FDR0.05-level was considered as significant. A significant test indicates the presence of differences between some combination of treatment and strain for that specific spot. A grouping of the spots on the kind of effect was done. The effects were ‘treatment’, ‘strain’, ‘treatment and strain’ (no interaction) and ‘interaction’. ‘Interaction’ means that e.g., differences upon ‘treatment’ depend on the strain and vice versa.

Identification of proteins by MALDI-TOF and peptide mass fingerprinting It is only possible to make sense out of data from 2D PAGE if the identities of protein spots are known. Often, this is done by mass spectrometry (MS) and peptide mass fingerprinting (PMF). PMF is based on a comparison of the masses of tryptic peptides of an unknown protein with masses of all pos- sible tryptic peptides of a proteome284. The probability of protein identifica- tion depends on the number and accuracy of matching peptides and the size of the searched database. Trypsin is the preferred protease for the digestion of analytes for several reasons: Trypsin specifically hydrolyzes peptide bonds at the carboxyl side of lysine (Lys, K) and arginine (Arg, R) residues, creating positively charged protonated peptides. This is important, as only charged peptides can be ana- lyzed by MS. Arg and Lys occur on average every 9th residue, leading to

46 tryptic peptides of approximately 1000 Da, which is ideally suited for analy- sis by MS. The tryptic digestion of membrane proteins is often suboptimal, which is caused by the few basic residues within TMDs of membrane pro- teins. A double digest of trypsin and cyanogen bromide, which cleaves polypeptides only on the carboxyl side of methionine residues, can solve this problem285.

Fig. 17. Peptide mass fingerprinting. A, mass spectra are calibrated and annotated. The insert shows different isotopes of a peptide. B, peak list extracted from mass spectra. Trypsin peaks removed by the software Peak Eraser are in blue. C, a data- base search is performed with the extracted peak list. D, resulting protein identifica- tion showing protein ID, probability, mass and pI. E, matching peptides and se- quence coverage. F, error distribution of the matching peptides is within 40 ppm.

Tryptic peptides are usually analyzed by matrix assisted laser desorption/ ionization-time of flight (MALDI-TOF) MS286. MS instruments consist of an ion source and a mass analyzer. MALDI is the first, while TOF describes the latter. In MALDI, ionization is triggered by a laser. Matrix molecules protect the co-crystallized peptides from decomposition by the laser and facilitate their vaporization and ionization. I used a matrix of 2,5-dihydroxybenzoic acid. The mass of peptides is analyzed by measuring the time it takes the accelerated ion to reach the detector. The flight time depends on the ion’s mass/charge ratio. As MALDI predominantly leads to the formation of sin- gly charged ions, the mass/charge ratio (m/z) equals the mass in Dalton. Usually, peptides in the range of m/z 800-5000 are analyzed, even though well resolved ion peaks above m/z 2500 are rare. Each peptide gives rise to a number of peaks in the mass spectrum that differ by 1 Da (Fig. 17). The peaks represent different isotopes, dependent on the number of 13C in the

47 molecule. Only the monoisotopic mass, i.e., the mass spectral peak repre- senting a peptide not containing 13C, is relevant for further database searches. I analyzed my mass spectra with the software ‘Moverz’. The mass spectra were internally calibrated based on known m/z of peaks from auto- proteolytic trypsin fragments. Monoisotopic masses of peaks rising above a certain signal to noise ratio threshold were exported and subsequently, ma- trix, auto-proteolytic trypsin fragments, and/or contaminant peaks (e.g., keratin) were removed. The resulting peptide mass lists were used to search the SwissProt database for E. coli with the Mascot search engine. Several parameters are important to achieve a significant (p<0.05) hit of peptides matching to an in silico trypsinized protein: The error distribution of the matching peptides should be within ±50 ppm, peptide modifications have to be taken into account (such as oxidation, cystein modifications) and possible missed cleavages should be considered (i.e., trypsin does not cleave next to a proline). Furthermore, at least 4 matching peptides should be found, 15% sequence coverage obtained and the number of not matching peptides should not exceed 4 times the number of matching ones. To control the probability of false positive identifications, it is important to run searches against scrambled databases. The false positive rate we determined was below 1%. Even though PMF is a powerful tool for the identification of sizable solu- ble proteins, it has only limited use for small proteins (>10 kDa) and α- helical membrane proteins. For both classes, trypsinisation yields only very few peptides, often less than 4 or not enough to cover the protein sequence sufficiently. In these cases, sequencing of peptides by tandem MS (MS/MS) is required. The precise sequence of a single 10 aa peptide can be sufficient to identify a protein with high probability.

Red swap Red swap is a method for the integration of foreign DNA into bacterial chromosomes287. Foreign DNA is electroporated into cells in the form of linear PCR products that contain an antibiotic resistance marker flanked by homology regions mediating integration specificity. Normally, E. coli is not readily transformable with linear DNA fragments because of the presence of intracellular exonucleases that degrade linear DNA, in particular exonucle- ase V of the RecBCD recombination complex288. However, in this procedure recombination is mediated by the recombination complex of the λ Red phage. The λ Red system comprises three genes (γ, β, exo) encoding the proteins Gam, Bet and Exo289. Gam inhibits the host RecBCD exonuclease V so that Bet and Exo can gain access to DNA ends to promote recombination. The λ Red recombinase is provided on a low copy number plasmid and its transcription is governed by an arabinose inducible promoter.

48 The basic strategy is to replace a chromosomal sequence with a selectable antibiotic resistance cassette that is generated by PCR using primers with 50 bp homology extensions (H1 and H2 in Fig. 18A). Replacement is ac- complished by Red-mediated recombination in these flanking homology regions. After selection, the resistance cassette can be eliminated by trans- forming cells with a helper plasmid expressing the FLP recombinase, which acts on the directly repeated FRT (FLP recognition target) sites flanking the resistance gene290. The Red and FLP helper plasmids can be cured by grow- ing cells at 37°C as they comprise temperature-sensitive replicons.

Gene deletion Gene deletion by Red swap is simply accomplished by replacing a target gene by the above mentioned antibiotic resistance cassette. The homology region H1 includes the respective start codon and 47 bp upstream, and the homology region H2 includes codons for the six C-terminal residues, the stop codon and 29 bp downstream. FLP-mediated excision of the FRT- flanked resistance cassette creates a scar sequence in-frame with the target gene initiation codon and its C-terminal 18 bp coding region. Translation from the authentic target gene Shine-Dalgarno site and the start codon pro- duces a 34-residue scar peptide with an N-terminal methionine, 27 scar- specific residues and six C-terminal, gene specific residues291 (Fig. 18A). In this way, gene deletions are expected to be non-polar, i.e., should not affect the expression of genes downstream of the target gene.

Promoter swap To be able to introduce additional DNA sequences into the chromosome by Red swap, I modified the above described method somewhat: First, a multi- ple cloning site was introduced between the FRT site and priming site P2 of pKD13 by site directed mutagenesis (Fig. 18B). The resulting plasmid was designated pKD13i. Promoter sequences were cloned into pKD13i resulting in pKD13i-promoter. A linear DNA fragment comprising the antibiotic re- sistance cassette flanked by the FRT sites and a target promoter sequence was amplified and used for Red swap as described above. Introduction of the scar sequence upstream of the promoter had no obvious impact on the regu- lation of promoter activity.

49 A.

Fig. 18. Principles of gene deletion and promoter swap by Red swap. A, primers for gene deletion anneal upstream (P1) and downstream (P2) of the FRT sites flank- ing the kanamycin resistance cassette. Furthermore, the primers comprise 5’ ends of 50 bp homologous to upstream and downstream sequences for targeting the gene deletion. H1 includes the target gene initiation codon. H2 covers 21 bp at the 3’ end of the gene and 29 bp downstream. Primers U, D, k1 and k2 are used to sequence upstream and downstram junctions created by insertion of the resistance cassette. After looping out the resistance cassette by Flp mediated recombination at the FRT sites, a scar is created encoding a 34 aa in-frame peptide. B, a multiple cloning site (MCS) was engineered into pKD13 to facilitate the introduction of additional DNA sequences (e.g., the lac promoter mutants) into the chromosome. The white arrows indicate that the same experimental steps were undertaken as in A. Figure adapted from Ref. 291, © Nature publishing group.

50 3 Results & Discussion

The main objective of my Ph.D. studies has been to improve membrane pro- tein overexpression in E. coli. As I noted above, a sound understanding of membrane protein biogenesis and of the physiology of overexpression are prerequisites for the design of rational strategies to achieve this objective. Therefore, in paper I and II, we studied the biogenesis of membrane proteins using both global and focused approaches. Paper III describes the analysis of the physiological consequences of membrane protein overexpression in E. coli while paper IV extends this analysis and provides a solution to the major bottleneck of membrane protein overexpression.

Paper I: Effects of SecE depletion on the inner and outer membrane proteomes of Escherichia coli Our knowledge of membrane protein biogenesis has significantly increased in the last decade. However, the role of the Sec-translocon in protein trans- location and insertion in E. coli has only been studied using focused ap- proaches and a very limited set of model substrates. Therefore, we have studied protein translocation and insertion in E. coli cells depleted of the essential translocon component SecE using comparative proteomics. Deple- tion of SecE resulted in 90% reduction in SecE compared to control cells, while SecY and SecG were reduced by 50% and 20%, respectively. 1D BN PAGE based Western-blotting showed that the level of the SecYEG- complex was strongly reduced upon SecE depletion. While an increased expression of SecA indicated secretion stress, SecE depletion did not affect the accumulation levels of SecB or Ffh. As described in paper III, overex- pression of membrane proteins does not lead to increased levels of Ffh ei- ther. These observations reflect that Ffh levels are not critical for targeting (see ‘bottleneck’ section) and hence up-regulation of Ffh cannot compensate for a limiting Sec-translocon capacity. Our analysis of the subproteomes of cells with strongly reduced Sec- translocon levels resulted in three main observations and conclusions that are explained and discussed below.

51 Reduced Sec-translocon levels result in precursor accumulation and aggregation of secretory proteins in the cytoplasm Translocation of secretory proteins is strongly hampered upon SecE deple- tion. This is indicated by a hampered processing of secretory proteins, by reduced accumulation levels of processed secretory proteins in the periplasm and outer membrane and by the accumulation and aggregation of precursors of secretory proteins in the cytoplasm. Precursors of secretory proteins are particularly aggregation prone due to their hydrophobic signal sequences. Precursor accumulation and aggregation of secretory proteins induced a heat shock response, leading to increased levels of the cytoplasmic chaperones IbpA/B, DnaK, GroEL and ClpB. Similar to the aggregates resulting from membrane protein overexpres- sion, only a few inner membrane proteins were identified in the aggregates. This may be explained by the efficient degradation by SsrA mRNA depend- ent tagging of stalled nascent chains of co-translationally targeted membrane proteins and subsequent turnover by proteases292. As mentioned above, inefficient secretion led to up-regulation of SecA and an increased recruitment of SecA to the inner membrane. Interestingly, our 2D BN/SDS PAGE analysis revealed that SecA dimers accumulated at the inner membrane of SecE depleted cells. Recently, it has been shown that the Sec-translocon catalyzes the monomerization of the SecA dimer293. Thus, a lack of translocons could explain the observed effect.

SecE depletion reduces the insertion and steady-state levels of most outer membrane proteins A direct correlation between the effects on the outer membrane proteome and Sec-translocon dependence is difficult to demonstrate since key players involved in outer membrane protein biogenesis – like YaeT and Skp294 – are affected upon SecE depletion. Nonetheless, the analysis of the outer mem- brane proteome indicates that translocation of proteins across the inner membrane is hampered but not blocked upon SecE depletion. One would expect that translocation of secretory proteins is more robust to translocon deficiencies than the processing of co-translationally inserted membrane proteins. Post-translational targeting provides a constant pool of translocation competent substrates. Coupling of co-translational targeting and insertion at the translocon is tighter and therefore likely to be more prone to targeting defects. While membrane proteins presumably have only one chance for successful targeting, secretory proteins may have several. Furthermore, aggregation of mistargeted membrane proteins is likely to be terminal while aggregated secretory proteins may be recovered. The components of the outer membrane proteome were differentially af- fected by the depletion of SecE. OmpA, FadL and Pal were unaffected or increased in the outer membrane upon SecE depletion, while levels of most

52 other outer membrane proteins were reduced to different extents. One expla- nation for this difference is superior access of these proteins to the Sec- translocons still present upon SecE depletion. However, we were not able to identify any signal sequence characteristics (e.g., hydrophobicity, charge distribution, length) that correlated with the differential effects of SecE de- pletion on the constituents of the outer membrane proteome. The use of backup pathways (e.g., Tat, SecA only) for translocation of Sec-dependent secretory proteins is possible but obviously not sufficiently efficient to relieve translocation stress as precursor accumulation and aggre- gation still occurs.

SecE depletion has differential effects on the inner membrane proteome Depletion of SecE resulted in reduced steady state levels and integration of approximately 30 integral membrane proteins, while almost as many were not significantly affected or even increased. In most cases, our observations are very consistent with previously published results. Our analysis of the inner membrane proteome of SecE depleted and con- trol cells allowed us to search for common features among proteins that were reduced, unaffected or increased in the membrane of SecE depleted cells. Our observations could be explained by differences in access to the remain- ing Sec-translocons and insertion efficiency. However, we found no correla- tions between the effect of SecE depletion and topology, hydrophobicity, and the energy required for membrane integration of the first transmembrane domain (ΔGapp) as would be expected if affinity to the translocon or insertion efficiency were decisive. Interestingly, the difficulty to correlate the effect of SecE depletion to these physical properties is reminiscent of the difficulty to correlate hydrophobicity and topology to overexpression yields of mem- brane proteins225. The inner membrane proteins that were either unaffected or increased upon depletion of SecE lacked large periplasmic domains and/or consisted of only one or two transmembrane segments. Notably, all the proteins that so far have been shown to integrate via the YidC only pathway share similar features44. Thus, it is tempting to speculate that those proteins are potential substrates of the YidC only pathway. However, insertion of some proteins via hitherto unknown mechanisms cannot be excluded. It is intriguing to compare the above observation to the notion that smaller membrane proteins tend to overexpress better. If the Sec-translocon represents the major bottle- neck of membrane protein overexpression, it is sensible that smaller pro- teins, which are able to bypass the translocon, are easier to overexpress. Recombinant membrane proteins using alternative targeting and insertion pathways are expected to exhibit different overexpression characteristics. For instance, overexpression of those proteins should not result in precursor accumulation and aggregation of secretory proteins in the cytoplasm and

53 hence overexpression should be less toxic to the host. Overexpression of Mistic-fusion-membrane proteins is supposed to exploit the very benefit of bypassing the Sec-translocon259.

Paper II: Biogenesis of MalF and the MalFGK2 maltose transport complex in Escherichia coli requires YidC The biogenesis of the polytopic membrane protein MalF has been studied to quite some extent, but not all studies on its SRP/Sec-dependence were con- clusive295-297. We have re-evaluated targeting and insertion of MalF and we have monitored the role of YidC in MalF biogenesis using a combination of in vitro cross-linking approaches, in vivo biogenesis assays and BN PAGE. Thereby, we show that YidC plays an important role in the folding of MalF and the formation of the hetero-oligomeric MalFGK2 maltose transport complex.

Targeting and insertion of MalF depend on the SRP, SecA, SecY and SecE TMD1 of nascent MalF cross-linked in vitro to Ffh and the in vivo targeting of full length MalF was strongly hampered in cells depleted of the SRP component 4.5S RNA. This suggests that MalF is targeted to the inner membrane via the SRP-pathway. In short MalF nascent chains, TMD1 not only interacted with Ffh, but also with TF and the ribosomal protein L23, which is located near the exit of the ribosomal tunnel. The SRP and TF sam- ple nascent chains on the ribosome in a nonexclusive fashion, and binding of FtsY to the ribosome-bound SRP excludes TF from the ribosome allowing the docking of the ribosome to the Sec-translocon298. So far, involvement of SecA in MalF assembly was assessed by the use of the SecA inhibitor sodium azide. To examine the role of SecA more directly, we monitored MalF assembly in a SecA depletion strain. Insertion of MalF into the inner membrane was strongly hampered upon depletion of SecA, which does not affect the rest of the Sec-translocon299. SecA dependence of MalF relates presumably to the translocation of the large periplasmic loop between helix 3 and 4. The first three TMDs of MalF inserted and remained in the vicinity of SecY and SecE during the biogenesis of MalF and in vivo assembly of MalF was affected upon SecE-depletion. Notably, MalF comprises a large perip- lasmic domain and 8 TMDs, and by this falls into the category of SecE de- pendent proteins presented in paper I. There was no effect on the insertion of MalF into the membrane in a secG null mutant background.

54 MalF assembles in close proximity to the Sec-translocon and YidC before release into the lipid bilayer Using two different assays, we showed that in vivo insertion and topogenesis of MalF into the membrane were not affected upon YidC depletion. How- ever, in vitro cross-linking experiments demonstrated that the first three TMDs of MalF not only inserted into the membrane close to SecY and SecE but also near YidC. Notably, they remained in this environment at least until the first three TMDs of MalF have emerged from the ribosome in the co- translational insertion process. Similar observations have been made for the polytopic membrane protein MtlA52. This suggests that membrane protein assembly and folding in close proximity to the Sec-translocon and YidC is a rather general mechanism. In the particular case of the MalFGK2 complex the sampling of MalF in close proximity to the Sec translocon could be im- portant to keep MalF in a folding competent state until MalG is translated from the polycistronic mRNA of the malE operon and inserted into the membrane subsequently after MalF integration. In contrast to the TMDs of MalF and MtlA, the two TMDs of the inner membrane protein leader peptidase (Lep) partition one by one into the lipid bilayer via the Sec/YidC interface according to the linear insertion model of TMs 300. A possible explanation for this discrepancy may relate to the role of the TMDs in the respective proteins. The TMDs of Lep merely act as a membrane anchor for the periplasmic P2 domain of Lep, to which the cata- lytic activity of Lep is confined, whereas the TMDs of MtlA and MalF play key roles in the functioning of these proteins.

YidC supports folding of MalF before incorporation into the maltose transport complex The cross-linking data are consistent with the notion that YidC functions in co-translational folding of complex inner membrane proteins62. We used 1D BN PAGE combined with Western-blotting to investigate if YidC is also required for the formation of the maltose transport complex. Accumulation of the MalFGK2 complex was ~80% reduced upon YidC depletion. Pre- sumably, the reason for reduced complex formation is a decreased stability of MalF upon YidC depletion as judged by pulse chase analysis. Although MalF was rapidly degraded upon YidC depletion, the absence of the other constituents of the MalFGK2 complex did not affect the stability of the pro- tein. However, trypsin accessibility experiments showed that the folding of the large periplasmic loop of MalF301 is not completed before it is incorpo- 302 rated into the MalFGK2 complex (unpublished observations). Possibly, YidC mainly chaperones the initial folding of MalF by acting on its integral membrane part, which is likely to be critical for MalF stability and hence for assembly of MalFGK2 complexes.

55 Summarizing, we have shown that the polytopic integral membrane protein MalF, which is a constituent of the MalFGK2 complex, is targeted to the inner membrane via the SRP-pathway to the Sec/YidC insertion site. Despite close proximity of nascent MalF to YidC during insertion, YidC is not re- quired for the insertion of MalF into the membrane. However, YidC is re- quired for the stability of MalF and the formation of the MalFGK2 maltose transport complex. Our data indicate that YidC supports the folding of MalF into a stable conformation before it is incorporated into the maltose transport complex. The importance of integral membrane chaperones for folding of mem- brane proteins has been disregarded concerning the functional overexpres- sion of membrane proteins. Membrane localized expression is the only at- tribute that is routinely monitored, e.g., by GFP fluorescence303. Assessment of proper folding is more difficult and is best monitored by measuring the activity of the respective protein. Interestingly, functional expression of in vitro translated MtlA largely failed presumably due to limiting availability of YidC in membrane vesicles235. The more intricate folding of large poly- topic membrane proteins may be another reason why the successful overex- pression of membrane proteins is biased towards smaller proteins. Proper folding of overexpressed membrane protein may also be critical for the in- tegrity of the membrane and hence for viability of the overexpression host15.

Paper III: Consequences of membrane protein overexpression in Escherichia coli Besides the understanding of membrane protein biogenesis, an understand- ing of the physiological response to membrane protein overexpression is needed to design rational strategies to improve yields of overexpression in E. coli. Therefore, we characterized cells overexpressing three different mem- brane protein-GFP fusions in the cytoplasmic membrane. Overexpression of a soluble-GFP fusion protein was used to distinguish between general over- expression effects and effects related to overexpression into the cytoplasmic membrane. Proteomes of total lysates, purified aggregates and cytoplasmic membranes were analyzed by 1D and 2D PAGE and mass spectrometry, complemented with flow cytometry, microscopy, Western-blotting and pulse-labeling experiments. We showed in paper III that the toxicity of membrane protein overexpres- sion is primarily caused by a limiting Sec-translocon capacity having a se- vere impact on both the cell envelope proteome and, surprisingly, also the cytoplasmic proteome. Two effects of the limiting Sec-translocon capacity are especially noteworthy: i) the aggregation of precursors of secretory pro- teins and cytoplasmic proteins in the cytoplasm and ii) the shifted and ineffi-

56 cient energy metabolism, likely caused by redox activation of the Arc two component system. Our observations are illustrated in figure 11 of paper III and discussed in more detail below.

Toxicity due to limiting capacity of the Sec-translocon Upon membrane protein overexpression SecA levels were increased, which indicates that the Sec-translocon capacity is not sufficient under these condi- tions. Quantitative analysis of the cytoplasmic membrane proteome showed that the levels of many endogenous membrane proteins and complexes in the bacterial cytoplasmic membrane were significantly reduced, indicating that the overexpressed membrane protein competes out endogenous membrane proteins. This is nicely illustrated by the observation that overexpression of the soluble protein GST-GFP leads to increased levels of the IPTG induced LacY transporter in the cytoplasmic membrane whereas membrane protein overexpression did not. Furthermore, the levels of membrane-associated ribosomes were increased upon membrane protein overexpression, whereas total levels of ribosomes in the cells did not change. This is in agreement with previous E. coli studies that showed that high rates of membrane pro- tein translation result in more membrane associated ribosomes304. Our data suggest that upon membrane protein overexpression most Sec-translocons are engaged in co-translational protein translocation, i.e., in the biogenesis of (overexpressed) membrane proteins. As a consequence, only few Sec- translocons are available for post-translational protein targeting, i.e., for the translocation of secretory proteins. Indeed, the levels of the processed forms of many secretory proteins were diminished, while their precursors accumu- lated in the cytoplasm as aggregates. Taken together, the insufficient Sec-translocon capacity has a severe im- pact on the composition and as a consequence also the functioning of the cell envelope, as is evidenced by hampered cell division and a reduced capacity of the respiratory chain. Surprisingly, we did not detect up-regulation of markers for envelope stress, such as the periplasmic chaperone Skp or chap- erone/protease DegP (unpublished observation). However, this does not necessarily mean that there is no envelope stress: Skp and DegP are both secretory proteins whose translocation might be hampered upon membrane protein overexpression. Furthermore, integral membrane components of the stress sensing and signal transduction machinery might be compromised under these conditions making it impossible to elicit an envelope stress re- sponse.

57 Toxicity due to aggregation of cytoplasmic and mistargeted secretory proteins in the cytoplasm We observed accumulation of protein aggregates in the cytoplasm of the membrane protein overexpressors, but not upon overexpression of soluble GST-GFP. The overexpressed membrane proteins constituted only about 20% of the total aggregated protein. Approximately one quarter of the total protein in the aggregates was made up of precursor forms of secretory pro- teins supporting the above mentioned severe secretion defect caused by the overexpression of membrane proteins. Together with the mistargeted recom- binant membrane protein, the hydrophobic signal sequences of secretory proteins are excellent seeds for aggregation. As mentioned before, hardly any endogenous membrane proteins could be detected in the aggregates. The remaining proteins in the aggregate preparations were mainly chaperones and proteases, which are most likely to be involved in the disaggregation, refolding and degradation of the protein aggregates (see above).

Fig. 19. Correlation of aggregation and toxicity. Strain Lemo21(DE3) expressing YidC-GFP from a T7lac promoter was grown in the presence of varying concen- trations of rhamnose to induce expression of the T7RNAP inhibitor T7Lys to vary- ing degrees (see paper IV). A, A600 and GFP fluorescence increases with rham- Is aggregate formation a ma- nose concentration. A600 was measured jor contributor to toxicity of 4 h after induction of expression, GFP membrane protein overexpres- fluorescence 8 h. B & C, aggregation decreases with rhamnose concentration. sion and if so, what is the reason Samples were taken after 4 h expression. for this toxicity? These questions C, quantification of the fraction of aggre- are not easy to answer. While I gated proteins. could not correlate the extent of aggregation to toxicity of overexpression in a screen of 29 different mem- brane proteins (unpublished observation), toxicity and aggregation were linked upon titration of YidC-GFP transcription (Fig. 19). As mentioned

58 above, toxicity of aggregates can originate from a net loss of active protein or from an intrinsic toxicity of aggregates150. Cell survival upon severe pro- tein aggregation at very high temperatures (50°C) depends on the recovery of aggregated proteins and hence toxicity is likely related to a net loss of active proteins150. It is conceivable that aggregation of essential proteins, like the cell divison protein MinD or the elongation factor-Tu, contribute to the toxicity of membrane protein overexpression. However, upon overexpres- sion of YidC-GFP in BL21(DE3) ~7% total protein ended up in aggregates, of which >20% was recombinant protein. A net loss of ~5% active protein – which is little compared to 20% in rpoH mutants at 42°C159 – may be easy to bear by the host cell. Furthermore, neither deletion nor co-expression of ClpB affected viability of membrane protein overexpressing cells (unpub- lished observation), suggesting that it is unlikely that recovery of aggregated proteins is crucial for membrane protein overexpression and a net loss of active protein the major problem. Therefore, the intrinsic toxicity of aggre- gates resulting from membrane protein overexpression may be more rele- vant. It is conceivable that hydrophobic TMDs and signal sequences of ag- gregated proteins interfere with cellular processes and compromise the integ- rity of the membrane. It should be noted in this context that production of soluble proteins into inclusion bodies is in general not toxic305.

Toxicity due to a less efficient energy metabolism Membrane protein overexpression strongly affected the composition of the cytoplasmic membrane proteome; e.g., levels of the key respiratory chain complexes succinate dehydrogenase and cytochrome bd and bo3 oxidases were reduced. The state of the respiratory chain in E. coli is monitored by a number of redox sensor systems to maintain redox homeostasis306. One of these systems is the Arc two component system, which mediates adaptive responses to changing respiratory conditions by monitoring the redox state of the Q pool. Our proteomics data, in particular the induction of the acetate- pta pathway and the lower accumulation levels of TCA cycle components, indicate that the Arc system is activated upon membrane protein overexpres- sion307, which typically occurs when the Q pool is mostly in a reduced state (paper III Figure 11). Together with the up-regulation of pyruvate dehydro- genase, this may lead to an increased generation of ATP via the acetate-pta pathway resulting in secretion of acetate into the medium. Indeed, the pH of the culture medium was lowered upon membrane protein overexpression. Since we grew E. coli under fully aerated conditions, it is unlikely that the Arc system is induced by a reduced Q pool resulting from O2 limitation, but Arc activation is rather resulting from a reduced activity of terminal oxidases and hence it is ‘misled’ by effects of membrane protein overexpression. In- terestingly, depletion of SecE resulted in diminished levels of cytochrome bo3 oxidase but no significant Arc response was observed.

59 It is notable that adjustments in energy metabolism differ considerably between the overexpression of membrane proteins and soluble proteins. Whereas the production of recombinant soluble proteins results in increased flux though glycolysis and TCA cycle by regulating enzymatic activities and expression191, 192, membrane protein overexpression results in repression of the TCA cycle. It is in common though, that alternative pathways of sub- strate-level phosphorylation are promoted to meet the increased energy re- quirements of protein production. However, it becomes clear that ATP is produced inefficiently under the conditions usually employed to overexpress membrane proteins, most likely as a side effect of a limiting Sec-translocon capacity.

In summary, our comparative proteome analysis of aggregates, purified cy- toplasmic membranes and whole cell lysates gives important physiological insight into the toxicity of membrane protein overexpression in E. coli. It provides a basis to design generic strategies to improve membrane protein overexpression yields in E. coli.

Paper IV: Tuning Escherichia coli for membrane protein overexpression Paper III resulted in a plethora of potential leads for engineering strains with improved membrane protein overexpression characteristics. To narrow down the number of leads, we studied the consequences of membrane protein overexpression in the Walker strains C41(DE3) and C43(DE3)263. These strains were selected almost a decade ago in a screen that was designed to isolate derivatives of BL21(DE3) with improved membrane protein overex- pression characteristics263. Overexpression of many membrane proteins in these BL21(DE3) derivatives is hardly toxic; i.e., growth is only marginally affected often resulting in high membrane protein overexpression yields. Using a combination of proteomics and genetics we show that mutations in the promoter governing expression of T7 RNA polymerase are key to the improved membrane protein overexpression characteristics of the Walker strains.

Mutations in the promoter governing T7RNAP expression are key to the overexpression characteristics of the Walker strains While the expression yield of YidC-GFP was 6-7 fold higher in the Walker strains, the overall perturbations of the host proteome and physiology caused by membrane protein overexpression were milder in the Walker strains than in BL21(DE3)pLysS, exemplified by less aggregation, a milder heat shock response and smaller changes in the expression of metabolic . We

60 observed that the initial expression rates in C41(DE3) and C43(DE) were low but increased over time whereas expression in BL21(DE3)pLysS was strong and steady from the start. These differences were due to lower re- combinant transcript levels in the Walker strains resulting from three muta- tions in the promoter region governing expression of T7 RNA polymerase. Sequencing of the promoter regions revealed that two mutations were lo- cated in the -10 region of the promoter and one in the lac operator O1 just upstream of the symmetric part of the lac repressor binding site (Fig. 6A). The two mutations in the -10 region revert the lacUV5 promoter back into the much weaker wild-type lac promoter171. In contrast to the lacUV5 pro- moter, the wild-type lac promoter is susceptible to catabolite repression and requires activation by CRP-cAMP170, 308. To show directly that these three mutations are key to the improved membrane protein overexpression charac- teristics, we swapped the lac promoters between BL21(DE3) and C43(DE3). In this way, BL21(DE3) could be converted into a Walker strain and vice versa.

Mutations in the lacUV5 promoter can be mimicked by dampening T7RNAP activity Given that low levels of transcription are crucial for the successful overex- pression of membrane proteins, we reasoned that the titration of T7RNAP activity by adjustable co-expression of the T7RNAP inhibitor T7Lys should make it possible to optimize transcription levels of recombinant proteins individually. Therefore, we expressed T7Lys from a rhamnose inducible promoter (Fig. 4A paper IV). Supplementing the culture medium with vary- ing amounts of rhamnose resulted in titration of T7Lys, which is inhibiting T7RNAP activity to different degrees. In this way, a rate of transcription could be achieved that was optimal for efficient overexpression of any given membrane protein. These observations demonstrate that the mutations in the lacUV5 promoters in the Walker strains can be mimicked in BL21(DE3) by dampening T7RNAP activity.

How can such minor modifications have such a large impact on membrane protein overexpression yields? In E. coli, the average rates of transcription and translation match, and both processes are often tightly coupled. As discussed in the ‘metabolic burden’ section, uncoupling of transcription and translation by the fast processing T7RNAP can result in reduced transcript stability and destruction of ri- bosomes. In our hands, high levels of recombinant mRNA are unlikely to be the reason for toxicity, as the overexpression of frameshifted YidC-GFP mRNA lacking a start codon had no effects on viability (unpublished obser- vation). Miroux and Walker concluded that toxicity of protein overexpres- sion resulted from the uncoupling of transcription and translation, and that C41(DE3) and C43(DE3) overcome this problem by the production of lower

61 recombinant mRNA levels263. However, it is important to distinguish be- tween a slower speed of transcription by T7RNAP and less transcription events because of lower T7RNAP levels. Only the former will result in a better coupling of transcription and translation. As there were no mutations within the gene encoding T7RNAP of BL21(DE3) and the Walker strains, respectively, we have to assume that transcription proceeds with the same speed. Instead, lower levels of recombinant mRNA follow from reduced amounts of T7RNAP transcribed from a mutated lacUV5 promoter. Presumably, the major advantage of lower transcript levels is reduced translation, which as a consequence poses a smaller load on post- translational processes. In general, post-translational processes of cytoplas- mic proteins are less demanding than the complex targeting mechanisms of secretory and membrane proteins. This is reflected by the fact that the Walker strains are primarily recognized for their ability to successfully over- express membrane proteins rather than soluble proteins. In fact, reduced transcription of many cytoplasmic proteins directly leads to lower expression yields (unpublished observation). Our data suggest that the superiority of C41(DE3) and C43(DE3) for membrane protein overexpression results from a better coupling of transla- tion, targeting and insertion. Presumably, the E. coli SRP does not stall translation until the ribosome docks at the SecYEG translocon as it occurs in eukaryotes244. Therefore, the bacterial targeting machinery may by particu- larly sensitive to too high translation rates of membrane proteins, which results in the production of more protein than the Sec-translocon can process with the side effects described in paper III. Either by the mutations in the lacUV5 promoter or by dampening T7RNAP activity with T7Lys, transcrip- tion and hence translation of most genes encoding recombinant membrane proteins can be tuned such that saturation of the Sec-translocon does not occur. Thus, by harmonizing translation and insertion into the membrane of the recombinant membrane protein, the toxic effects of overexpression are minimized.

In conclusion, the systems biotechnology approach used here to characterize the Walker strains has not only enabled us to identify the mutations that are key to their improved membrane overexpression characteristics but has also made it possible for us to engineer an E. coli BL21(DE3)-derived strain, Lemo21(DE3), that is tunable for overexpression. This strain conveniently allows optimizing overexpression of both membrane and soluble proteins using only a single strain and a simple rhamnose titration rather than a multi- tude of different strains, and it is ideally suited for high-throughput screen- ing procedures.

62 Future perspectives By studying the biogenesis of membrane proteins and the physiological con- sequences of their overexpression in E. coli, I was able to identify and alle- viate one of the major bottlenecks of membrane protein overexpression: saturation of the Sec-translocon could be overcome by harmonizing transla- tion and membrane insertion of the recombinant membrane protein. This minimized the toxic effects of overexpression and thus resulted in increased membrane protein-producing biomass. Notably, the key to improved membrane protein overexpression yields in the Walker strains is not an increased amount of overexpressed membrane protein per cell, but increased biomass. This indicates that yields per cell are bound to a maximum possible due to spatial constraints. It is known that E. coli is able to form additional membranes, which can harbor more overex- pressed membrane protein238. Studying the mechanism of additional mem- brane formation may provide the basis to engineer strains with an inherent higher capacity for membrane protein overexpression. While reduced transcription alleviates the bottleneck of a limiting Sec- translocon capacity it also limits the rate of overexpression. Instead of treat- ing the symptoms and palliating the limiting translocon capacity, treatment of the causes by fine-tuned co-expression of Sec-translocons, targeting com- ponents and folding mediators should increase the rate of insertion of re- combinant membrane proteins and hence result in an increased productivity. In other words, it is better to widen or bypass a bottleneck than to alleviate its consequences. Finally, heterologous membrane protein overexpression in E. coli is still a major unresolved problem. More work is required to elucidate the causes of low yields and to provide a basis to engineer strains and design rational het- erologous overexpression strategies.

63 Sammanfattning på Svenska

Liksom en historisk stad var omgiven av en ringmur omsluter ett biologisk membran biologiska celler. Medan ringmuren består av stenar byggs biologiska membran upp av fetter, eller lipider. Och liksom ringmurar har portar, torn och skottgluggar, för att möjliggöra handel och försvar, innehåller cellmembran proteiner, som fyller många olika funktioner, till exempel transport av näringsämnen, generering av användbar energi eller vidarebefordran av signaler. Proteiner som är associerade med cellmembran kallar man membranproteiner. På grund av sitt utsatta läge i cellkanten och att dom oftast har livsviktiga funktioner, är det ofta membranproteiner som läkemedel är tilverkade för att påverka. För att studera membranproteiners struktur och funktion, behöver man stora mängder rent protein. Eftersom de naturliga mängderna av membranproteinerna oftast inte uppfyller detta krav, producerar man membranproteiner i biologiska celler, man kallar det överuttryck. Men på grund av membranproteinernas fysikaliska egenskaper – de är dels vattenlösliga och dels fettlösliga – och deras komplexa biologiska uppkomst, eller biogenes, är det svårt att framställa membranproteiner i tillräckliga mängder. Oftast aggregerar det producerade membranproteinet därför att det inte placeras rätt i cellmembranen eller därför att produktionen i sig själv är giftig för värdcellerna. I min avhandling har jag undersökt problemet med membranproteinproduktion i kolibakterier. Escherichia coli är kanske det mest använda och bäst kända verktyget för framställning av proteiner. Jag har studerat hur bakteriecellerna reagerar på överuttryck av membranprotein och vad som är flaskhalsen för produktionen. För att förstå framställningsprocessen bättre, har jag också undersökt biogenes av membranproteiner på ett djupare sätt. Målet med min forskning var att skapa en bas för att utveckla strategier för en mer effektiv membranproteinproduktion. Speciellt ville jag möjliggöra konstruktionen av kolibakterier med bättre förmåga att producera membranproteiner. Genom min forskning var det möjligt att hitta och avlägsna den förmodligen viktigaste flaskhalsen vid membranproteinproduktion i E. coli: mättnad av de så kallade Sec-transloconen visade sig vara huvudorsaken till att membranproteinerna aggregerar och är toxiska vid överproduktion. Genom att harmonisera omvandlingen av genetisk information till membranprotein med deras insättning i membranen avlägsnades mättnaden och därmed aggregeringen och toxiciteten. Byggt på den insikten kunde jag konstruera en kolibakterie vars produktionshastighet kan justeras på ett sånt sätt att negativa biverkningar av framställningen undviks och proteinutbytet maximeras.

64 Deutsche Zusammenfassung

Wie eine historische Stadt von einer Stadtmauer umgeben war, so sind biologische Zellen von Membranen umschlossen. Während Stadtmauern jedoch aus Steinen bestehen, sind biologische Membranen aus Fetten, oder Lipiden, aufgebaut. Und genau so, wie Stadtmauern Tore, Türme und Schießscharten hatten, um sowohl Handel zu ermöglichen als auch die Stadt zu verteidigen, so beinhalten Zellmembranen Proteine, die vielerlei Funktionen erfüllen: Aufbau und Erhaltung der Zellhülle, Transport von Nährstoffen, Erzeugung von Energie und Weiterleitung von Signalen, um nur einige zu nennen. Proteine, die mit Membranen verbunden sind, nennt man Membranproteine. Wegen ihrer ausgesetzten Lage und ihren oft lebenswichtigen Funktionen spielen Membranproteine eine große Rolle bei Krankheiten und bieten vorzügliche Ansatzpunkte für Medikamente. Um die Struktur und Funktion von Membraneproteinen zu studieren, benötigt man relativ große Mengen reinen Proteins. Weil die natürlichen Vorkommen diese Vorrausetzung selten erfüllen, werden Membranproteine für gewöhnlich in biologischen Zellen produziert, man spricht auch von Überexprimierung. Doch auf Grund der physikalischen Eigenschaften von Membraneproteinen – sie sind teils wasserlöslich und teils fettlöslich – und der komplexen Art und Weise ihrer biologischen Entstehung, oder Biogenese, ist es schwierig, sie in ausreichenden Mengen herzustellen. Oft fallen überexprimierte Membranproteine aus, weil sie nicht in die Membran inseriert werden konnten oder die Produktion per se toxisch für die Wirtszellen ist. In meiner Doktorarbeit habe ich mich dem Problem der Membranprotein- produktion in Kolibakterien zugewandt. Ich habe untersucht, wie Escherichia coli auf Membranproteinproduktion reagiert und wo die Engstellen der Herstellung liegen. Um den Produktionsprozess besser zu verstehen, habe ich zudem die Biogenese von Membranproteinen eingehend studiert. Ziel meiner Studien war es, eine Grundlage für die Entwicklung von Strategien zu legen, die es erlauben, Membranproteine effizienter herzustellen. Insbesondere sollte die Konstruktion von leistungsfähigeren Kolibakterien ermöglicht werden. Durch meine Arbeit war es mir möglich, die vermutlich wichtigste Engstelle der Membranproteinproduktion in E. coli zu identifizieren und zu beheben: Die Sättigung des Biogenesewegs am sog. Sec-translocon, die auch Hauptgrund für den Ausfall von Membranproteinen und für die Toxizität ihrer Produktion ist, konnte behoben werden, indem die Umsetzung der genetischen Information in das Membranprotein und dessen folgende Insertion in die Membran besser harmonisiert wurde. Auf diesen Erkenntnissen aufbauend, gelang es mir, einen Kolistamm zu konstruieren, dessen Produktionsrate sich so justieren lässt, dass die negativen Nebenwirkungen der Überexprimierung vermieden werden. Somit kann eine höhere Proteinausbeute erzielt werden.

65 Acknowledgements

I am very grateful for everybody that was involved in making my doctoral studies a successful experience and a great time of my life: First of course, I want to thank Jan-Willem, for being the best possible Ph.D. supervisor. Your enthusiasm for science and research infected me from the first moment. You had a hard time to make me appreciate E. coli, but you managed. Thank you especially for the enormous freedom on my scientific playground throughout all my studies (also financially, I have to admit). Thanks for always having time for me (except you were on the phone ;-) and also thanks for the open ears for my ideas. You taught me more than just science; a whole lot of science politics, too, in particular your unbelievable networking skills. Thanks also for your enthusiasm for Lemo and Xbrane. It was and is truly fun to explore these things with you. It was a great time in your lab and I enjoyed it a lot. Louise, thanks for introducing me into practical proteomics, and for all the intense discussions about science and beyond. You were a great lab mate. You get another big hug for making it happen that I always had more chocolate on Mondays than on the Friday before ;-). David, the inventor of ‘stompfler’, for being such a lively colleague. There was never a laugh missing in your presence. Mirjam, for spreading a great atmosphere throughout the lab. Even though it was sometimes a bit stinky ;-). No thanks for winning settlers to often ;-). But a big hug for all the unselfish investment of time into Lemo! Susan, for not taking me too serious all the time, for baking excellent cake and for being a generous supplier of German chocolate. And also you a big hug for all the time you stayed long night to measure hundreds of GFP fluorescences in Lemo & Co! Linda for paving the way and shaping Jan-Willem. Dave, for giving me such a warm welcome in the group and for being an extremely knowledgeable colleague never minding to discuss science. All my students: Anja, Max, Ansgar and Katharina. Thanks a lot for all your help. Gunnar, for an open door and open ear anytime and for the opportunity to work at CBR. It is hard to avoid getting spoiled working in such a great environment. Thanks to the big ‘Gunnar-clan’ past & present: Joanna, Marie, Morten, Susanna, Marika, Mikis, Mirjam, Nadja, Tara & Yoko; Dan, Filippa & Kalle; Joy & Salomè; Ingis, Carolina & Kalle (a big spe- cial thanks for correcting my ‘sammanfattning på svenska’). Working to- gether with you folks was fun, exciting and inspiring! What’s-up-Roger, for

66 sharing the fate of late-night-working. Not for winning settlers ;-) but for correcting my thesis. And together with Mike and Martin, thanks for the tetraspanners! Elzbieta’s and Peter’s group for being such nice lab neighbors, always open for a chat and to share cookies and chemicals. Rosa, for the great support and company during teaching. Tatiana, for your endless care about the centrifuges and for sharing scanner, tubes and trypsin. Tiago, for the tasty gifts from Portugal. Stefan, for being an excellent, dedicated and approachable head of the Ph.D. program. I couldn’t imagine anyone doing the job better! Anki, Ann, Lotta, Kiki, Bogos, Eddi, Peter, Torbjörn & Bengt: DBB would be unthinkable without you. Thanks for a great job! Klaas, for hour hospitality and for teaching me what a real scientific discussion is. I hope I can catch some of your analytical thinking. Jimmy, for introducing me into mass spectrometry. Lisa, Andrea and the rest of the group, for making my stays at Cornell such a lovely event. Joen, for being such a critical thinker and sometimes kind of my bad scientific conscience (in particular concerning proteomics). DJ, Edwin, Wouter, Ovidiu, Ana & Zhong, thanks for a great ‘Dutch-connection’ experience. Thanks to everybody that supported the realization of my ideas: Per-Åke, Olof, Jan-Olov, Neus, Inger, Petra, Sandra, Joachim & Peter. And not to forget Erik, Ashkan, Saeid & Mårten, for investing time, enthusiasm, ideas and energy into Xbrane. Johannes for being such a good friend here in Stockholm and for the ample supply of excellent wine. Sven for being a good old ‘Hubi’-friend and -colleague. I still can’t believe that someone truly likes enzymology… ;-). Michael & Pia for being nice and caring friends and neighbors and for always watering our plants. Göran, Annika, Nora, Emil, Marie, Julia, Jan, Elina, Ester, Jakob and Maria, for adding the ‘Swedish experience’ to my stay in Stockholm och för att ni aldrig pratade engelska med mig. Papa & Mama, euch einen herzlichen Dank, dass es überhaupt so weit gekommen ist. Danke dafür, die Freude und Neugier an der Natur in mir zu wecken und mich auf meinem Weg zu unterstützen. Mama, ich find es supercool, dass Du Schwedisch gelernt hast. Du bist bestimmt mittlerweile viel besser als ich! Ein dicker Dank auch an Daniel & Tim-David. Claudia, for being the best possible wife! You complement all my weaknesses perfectly. I think, we are a great team together. A big hug for all the help in the lab and beyond. Thanks for sharing the good times and the not-so-good times. Also thanks for being so critical. You always manage to bring me back down to earth when I think I am flying. It is often hard to accept, but I love you for that (even if it may not immediately appear so ;-)!

67 References

1. G. von Heijne. The membrane protein universe: what's out there and why bother? J Intern Med 261, 543-57 (2007). 2. D. L. Nelson & M. M. Cox. Lehninger Principles of biochemistry (W. H. Freeman, New York, 2004). 3. A. L. Hopkins & C. R. Groom. The druggable genome. Nat Rev Drug Discov 1, 727-30 (2002). 4. S. J. Singer & G. L. Nicolson. The fluid mosaic model of the structure of cell membranes. Science 175, 720-31 (1972). 5. E. Gorter & F. Grendel. On bimolecular layers of lipoids on the chromocytes of the blood. J Exp Med 41, 439-43 (1925). 6. M. Edidin. Lipids on the frontier: a century of cell-membrane bilayers. Nat Rev Mol Cell Biol 4, 414-8 (2003). 7. S. H. White et al. How membranes shape protein structure. J Biol Chem 276, 32395-8 (2001). 8. G. van Meer, D. R. Voelker & G. W. Feigenson. Membrane lipids: where they are and how they behave. Nat Rev Mol Cell Biol 9, 112-24 (2008). 9. M. J. Saxton & K. Jacobson. Single-particle tracking: applications to membrane dynamics. Annu Rev Biophys Biomol Struct 26, 373-99 (1997). 10. H. Schagger & K. Pfeiffer. Supercomplexes in the respiratory chains of yeast and mammalian mitochondria. EMBO J 19, 1777-83 (2000). 11. K. Simons & E. Ikonen. Functional rafts in cell membranes. Nature 387, 569-72 (1997). 12. D. M. Engelman. Membranes are more mosaic than fluid. Nature 438, 578-80 (2005). 13. G. Blobel. Intracellular protein topogenesis. Proc Natl Acad Sci USA 77, 1496- 1500 (1980). 14. A. Krogh et al. Predicting topology with a hidden Markov model. Application to complete genomes. J Mol Biol 305, 567-80 (2001). 15. M. P. Krebs et al. Quality control of integral membrane proteins. Trends Biochem Sci 29, 648-55 (2004). 16. M. Pohlschroder et al. Diversity and evolution of protein translocation. Annu Rev Microbiol 59, 91-111 (2005). 17. J. W. de Gier & J. Luirink. Biogenesis of inner membrane proteins in Escherichia coli. Mol Microbiol 40, 314-22 (2001). 18. S. L. Wolin. From the elephant to Escherichia coli: SRP-dependent protein targeting. Cell 77, 787-90 (1994). 19. C. Schaffitzel et al. Structure of the Escherichia coli signal recognition particle bound to a translating ribosome. Nature 444, 503-6 (2006). 20. J. Luirink et al. An alternative protein targeting pathway in Escherichia coli: studies on the role of FtsY. EMBO J 13, 2289-96 (1994). 21. S. O. Shan & P. Walter. Co-translational protein targeting by the signal recognition particle. FEBS Lett 579, 921-6 (2005). 22. S. Wagner et al. Rationalizing membrane protein overexpression. Trends Biotechnol 24, 364-71 (2006).

68 23. T. A. Rapoport. Protein translocation across the eukaryotic endoplasmic reticulum and bacterial plasma membranes. Nature 450, 663-9 (2007). 24. B. van den Berg et al. X-ray structure of a protein-conducting channel. Nature 427, 36-44 (2004). 25. K. S. Cannon et al. Disulfide bridge formation between SecY and a translocating polypeptide localizes the translocation pore to the center of SecY. J Cell Biol 169, 219-25 (2005). 26. W. Li et al. The plug domain of the SecY protein stabilizes the closed state of the translocation channel and maintains a membrane seal. Mol Cell 26, 511-21 (2007). 27. P. C. Tam et al. Investigating the SecY plug movement at the SecYEG translocation channel. EMBO J 24, 3380-8 (2005). 28. T. A. Rapoport et al. Membrane-protein integration and the role of the translocation channel. Trends Cell Biol 14, 568-75 (2004). 29. T. Hessa et al. Recognition of transmembrane helices by the endoplasmic reticulum translocon. Nature 433, 377-81 (2005). 30. T. Hessa et al. Molecular code for transmembrane-helix recognition by the Sec61 translocon. Nature 450, 1026-30 (2007). 31. V. Goder, T. Junne & M. Spiess. Sec61p contributes to signal sequence orientation according to the positive-inside rule. Mol Biol Cell 15, 1470-8 (2004). 32. S. H. White & G. von Heijne. Transmembrane helices before, during, and after insertion. Curr Opin Struct Biol 15, 378-86 (2005). 33. P. Bessonneau et al. The SecYEG preprotein translocation channel is a conformationally dynamic and dimeric structure. EMBO J 21, 995-1003 (2002). 34. J. F. Menetret et al. Architecture of the ribosome-channel complex derived from native membranes. J Mol Biol 348, 445-57 (2005). 35. K. Mitra et al. Structure of the Escherichia coli protein-conducting channel bound to a translating ribosome. Nature 438, 318-24 (2005). 36. J. F. Menetret et al. Ribosome binding of a single copy of the SecY complex: implications for protein translocation. Mol Cell 28, 1083-92 (2007). 37. A. R. Osborne & T. A. Rapoport. Protein translocation is mediated by oligomers of the SecY complex with one SecY copy forming the channel. Cell 129, 97-110 (2007). 38. G. von Heijne & Y. Gavel. Topogenic signals in integral membrane proteins. Eur J Biochem 174, 671-8 (1988). 39. M. Bogdanov, P. N. Heacock & W. Dowhan. A polytopic membrane protein displays a reversible topology dependent on membrane lipid composition. EMBO J 21, 2107-116 (2002). 40. H. Andersson & G. von Heijne. Membrane protein topology: Effects of DmH+ on the translocation of charged residues explain the 'positive inside' rule. EMBO J 13, 2267-2272 (1994). 41. H. Andersson & G. von Heijne. Sec dependent and sec independent assembly of Escherichia coli inner membrane proteins: the topological rules depend on chain length. EMBO J 12, 683-91 (1993). 42. H. Nakatogawa & K. Ito. Secretion monitor, SecM, undergoes self-translation arrest in the cytosol. Mol Cell 7, 185-92. (2001). 43. H. Nakatogawa & K. Ito. The ribosomal exit tunnel functions as a discriminating gate. Cell 108, 629-36. (2002). 44. K. Xie & R. E. Dalbey. Inserting proteins into the bacterial cytoplasmic membrane using the Sec and YidC . Nat Rev Microbiol 6, 234-44 (2008). 45. D. Kiefer & A. Kuhn. YidC as an essential and multifunctional component in membrane protein assembly. Int Rev Cytol 259, 113-38 (2007).

69 46. J. Luirink, T. Samuelsson & J. W. de Gier. YidC/Oxa1p/Alb3: evolutionarily conserved mediators of membrane protein assembly. FEBS Lett 501, 1-5 (2001). 47. R. E. Dalbey & A. Kuhn. YidC family members are involved in the membrane insertion, lateral integration, folding, and assembly of membrane proteins. J Cell Biol 166, 769-74 (2004). 48. E. van Bloois et al. Saccharomyces cerevisiae Cox18 complements the essential Sec-independent function of Escherichia coli YidC. FEBS J 274, 5704-13 (2007). 49. E. van Bloois et al. The Sec-independent function of Escherichia coli YidC is evolutionary-conserved and essential. J Biol Chem 280, 12996-3003 (2005). 50. M. Preuss et al. Evolution of mitochondrial oxa proteins from bacterial YidC. Inherited and acquired functions of a conserved protein insertion machinery. J Biol Chem 280, 13004-11 (2005). 51. J. Luirink et al. Biogenesis of inner membrane proteins in Escherichia coli. Annu Rev Microbiol 59, 329-55 (2005). 52. K. Beck et al. YidC, an assembly site for polytopic Escherichia coli membrane proteins located in immediate proximity to the SecYE translocon and lipids. EMBO Rep 2, 709-14 (2001). 53. H. Sadlish et al. Sequential triage of transmembrane segments by Sec61alpha during biogenesis of a native multispanning membrane protein. Nat Struct Mol Biol 12, 870-8 (2005). 54. D. M. Engelman & T. A. Steitz. The spontaneous insertion of proteins into and across membranes: The helical hairpin hypothesis. Cell 23, 411-22 (1981). 55. G. von Heijne. Recent advances in the understanding of membrane protein assembly and structure. Quart Rev Biophys 32, 285-307 (1999). 56. S. U. Heinrich & T. A. Rapoport. Cooperation of transmembrane segments during the integration of a double-spanning protein into the ER membrane. EMBO J 22, 3654-63 (2003). 57. E. N. Houben et al. Nascent Lep inserts into the Escherichia coli inner membrane in the vicinity of YidC, SecY and SecA. FEBS Lett 476, 229-33 (2000). 58. M. L. Urbanus et al. Sec-dependent membrane protein insertion: sequential interaction of nascent FtsQ with SecY and YidC. EMBO Rep 2, 524-9 (2001). 59. P. A. Scotti et al. YidC, the Escherichia coli homologue of mitochondrial Oxa1p, is a component of the Sec translocase. EMBO J 19, 542-9 (2000). 60. M. Lotz et al. Projection structure of YidC: a conserved mediator of membrane protein assembly. J Mol Biol 375, 901-7 (2008). 61. A. E. Johnson & M. A. van Waes. The translocon: A dynamic gateway at the ER membrane. Annu Rev Cell Dev Biol 15, 799-842 (1999). 62. S. Nagamori, I. N. Smirnova & H. R. Kaback. Role of YidC in folding of polytopic membrane proteins. J Cell Biol 165, 53-62 (2004). 63. N. Ruiz & T. J. Silhavy. Sensing external stress: watchdogs of the Escherichia coli cell envelope. Curr Opin Microbiol 8, 122-6 (2005). 64. N. Shimohata et al. SecY alterations that impair membrane protein folding and generate a membrane stress. J Cell Biol 176, 307-17 (2007). 65. M. Bogdanov et al. A phospholipid acts as a chaperone in assembly of a membrane transport protein. J Biol Chem 271, 11615-8 (1996). 66. F. Stenberg, G. von Heijne & D. O. Daley. Assembly of the cytochrome bo3 complex. J Mol Biol 371, 765-73 (2007). 67. F. Ossenbuhl et al. Efficient assembly of photosystem II in Chlamydomonas reinhardtii requires Alb3.1p, a homolog of Arabidopsis ALBINO3. Plant Cell 16, 1790-800 (2004). 68. K. A. Dill. Dominant forces in protein folding. Biochemistry 29, 7133-55 (1990).

70 69. J. L. Popot & D. M. Engelman. Membrane protein folding and oligomerization - The 2-stage model. Biochemistry 29, 4031-7 (1990). 70. R. Henderson. The structure of the purple membrane from Halobacterium hallobium: analysis of the X-ray diffraction pattern. J Mol Biol 93, 123-38 (1975). 71. J. Deisenhofer et al. Structure of the protein subunits in the photosynthetic reaction centre of Rhodopseudomonas viridis at 3Å resolution. Nature 318, 618-24 (1985). 72. R. Henderson et al. A Model for the structure of bacteriorhodopsin based on high resolution electron cryo-microscopy. J Mol Biol 213, 899-929 (1990). 73. G. von Heijne. Membrane-protein topology. Nat Rev Mol Cell Biol 7, 909-18 (2006). 74. N. M. Meindl-Beinker et al. Asn- and Asp-mediated interactions between transmembrane helices during translocon-mediated membrane protein assembly. EMBO Rep 7, 1111-6 (2006). 75. J. U. Bowie. Solving the membrane protein folding problem. Nature 438, 581-9 (2005). 76. K. R. MacKenzie, J. H. Prestegard & D. M. Engelman. A transmembrane helix dimer: Structure and implications. Science 276, 131-3 (1997). 77. M. A. Lemmon et al. A dimerization motif for transmembrane a-helices. Nature Struct Biol 1, 157-63 (1994). 78. R. F. Walters & W. F. DeGrado. Helix-packing motifs in membrane proteins. Proc Natl Acad Sci U S A 103, 13658-63 (2006). 79. H. Yin et al. Computational design of peptides that target transmembrane helices. Science 315, 1817-22 (2007). 80. J.-L. Popot, S.-E. Gerchman & D. M. Engelman. Refolding of bacteriorhodopsin in lipid bilayers. A thermodynamically controlled two-stage process. J Mol Biol 198, 655-76 (1987). 81. E. Bibi & H. Kaback. In vivo expression of the lacY gene in two segments leads to functional lac permease. Proc Natl Acad Sci USA 87, 4325-9 (1990). 82. S. M. Gruner. Intrinsic curvature hypothesis for biomembrane lipid composition: a role for nonbilayer lipids. Proc Natl Acad Sci U S A 82, 3665-9 (1985). 83. H. Hong & L. K. Tamm. Elastic coupling of integral membrane protein stability to lipid bilayer forces. Proc Natl Acad Sci U S A 101, 4065-70 (2004). 84. S. J. Allen et al. Controlling the folding efficiency of an integral membrane protein. J Mol Biol 342, 1293-304 (2004). 85. A. G. Lee. Lipid-protein interactions in biological membranes: a structural perspective. Biochim Biophys Acta 1612, 1-40 (2003). 86. K. Mitra et al. Modulation of the bilayer thickness of exocytic pathway membranes by membrane proteins rather than cholesterol. Proc Natl Acad Sci U S A 101, 4083-8 (2004). 87. T. Haltia & E. Freire. Forces and factors that contribute to the structural stability of membrane proteins. BBA-Bioenergetics 1228, 1-27 (1995). 88. A. J. Darwin. The phage-shock-protein response. Mol Microbiol 57, 621-8 (2005). 89. K. Ito & Y. Akiyama. Cellular functions, mechanism of action, and regulation of FtsH protease. Annu Rev Microbiol 59, 211-31 (2005). 90. A. Kihara, Y. Akiyama & K. Ito. A protease complex in the Escherichia coli plasma membrane: HflKC (HflA) forms a complex with FtsH (HflB), regulating its proteolytic activity against SecY. EMBO J 15, 6122-31 (1996). 91. N. Saikawa, Y. Akiyama & K. Ito. FtsH exists as an exceptionally large complex containing HflKC in the plasma membrane of Escherichia coli. J Struct Biol 146, 123-9 (2004).

71 92. S. Chiba, K. Ito & Y. Akiyama. The Escherichia coli plasma membrane con- tains two PHB (prohibitin homology) domain protein complexes of opposite ori- entations. Mol Microbiol 60, 448-57 (2006). 93. M. Koppen & T. Langer. Protein degradation within mitochondria: versatile activities of AAA proteases and other peptidases. Crit Rev Biochem Mol Biol 42, 221-42 (2007). 94. E. van Bloois et al. Detection of cross-links between FtsH, YidC, HflK/C suggests a linked role for these proteins in quality control upon insertion of bacterial inner membrane proteins. FEBS Lett (2008). 95. S. Chiba, Y. Akiyama & K. Ito. Membrane protein degradation by FtsH can be initiated from either end. J Bacteriol 184, 4775-82 (2002). 96. Y. Akiyama. Proton-motive force stimulates the proteolytic activity of FtsH, a membrane-bound ATP-dependent protease in Escherichia coli. Proc Natl Acad Sci U S A 99, 8066-71 (2002). 97. A. Kihara, Y. Akiyama & K. Ito. Dislocation of membrane proteins in FtsH- mediated proteolysis. EMBO J 18, 2970-81 (1999). 98. N. Shimohata et al. The Cpx stress response system of Escherichia coli senses plasma membrane proteins and controls HtpX, a membrane protease with a cytosolic . Genes Cells 7, 653-62 (2002). 99. M. Sakoh, K. Ito & Y. Akiyama. Proteolytic activity of HtpX, a membrane-bound and stress-controlled protease from Escherichia coli. J Biol Chem 280, 33305- 10 (2005). 100. P. N. Danese & T. J. Silhavy. The sigmaE and the Cpx signal transduction systems control the synthesis of periplasmic protein-folding enzymes in Escherichia coli. Genes Dev 11, 1183-93 (1997). 101. T. Taura et al. Determinants of the quantity of the stable SecY complex in the Escherichia coli cell. J Bacteriol 175, 7771-5 (1993). 102. A. Kihara, Y. Akiyama & K. Ito. FtsH is required for proteolytic elimination of uncomplexed forms of SecY, an essential protein translocase subunit. Proc Natl Acad Sci USA 92, 4532-6 (1995). 103. Y. Akiyama et al. FtsH (HflB) is an ATP-dependent protease selectively acting on SecY and some other membrane proteins. J Biol Chem 271, 31196-201 (1996). 104. Y. Akiyama, A. Kihara & K. Ito. Subunit a of proton ATPase Fo sector is a substrate of the FtsH protease in Escherichia coli. FEBS Lett 399, 26-8 (1996). 105. S. Maegawa et al. The intramembrane active site of GlpG, an Escherichia coli rhomboid protease, is accessible to water and hydrolyses an extramembrane peptide bond of substrates. Mol Microbiol 64, 435-47 (2007). 106. K. Koide, K. Ito & Y. Akiyama. Substrate recognition and binding by RseP, an Escherichia coli intramembrane protease. J Biol Chem 283, 9562-70 (2008). 107. G. von Heijne. The signal peptide. J Membr Biol 115, 195-201 (1990). 108. W. Wickner & R. Schekman. Protein translocation across biological membranes. Science 310, 1452-6 (2005). 109. L. L. Randall & S. J. Hardy. SecB, one small chaperone in the complex milieu of the cell. Cell Mol Life Sci 59, 1617-23 (2002). 110. C. Dekker, B. de Kruijff & P. Gros. Crystal structure of SecB from Escherichia coli. J Struct Biol 144, 313-9 (2003). 111. N. T. Knoblauch et al. Substrate specificity of the SecB chaperone. J Biol Chem 274, 34219-25 (1999). 112. D. Collier et al. The antifolding activity of SecB promotes the export of the Escherichia coli maltose-binding protein. Cell 53, 273-83 (1988). 113. F. U. Hartl et al. The binding cascade of SecB to SecA to SecY/E mediates preprotein targeting to the Escherichia coli plasma membrane. Cell 63, 269-79 (1990).

72 114. E. Papanikou, S. Karamanou & A. Economou. Bacterial protein secretion through the translocase nanomachine. Nat Rev Microbiol 5, 839-51 (2007). 115. A. Economou & W. Wickner. SecA promotes preprotein translocation by undergoing ATP-driven cycles of membrane insertion and deinsertion. Cell 78, 835-43 (1994). 116. E. Schiebel et al. Delta mu H+ and ATP function at different steps of the catalytic cycle of preprotein translocase. Cell 64, 927-39 (1991). 117. F. Duong & W. Wickner. Distinct catalytic roles of the SecYE, SecG and SecDFyajC subunits of preprotein translocase holoenzyme. EMBO J 16, 2756- 68 (1997). 118. S. Matsuyama, Y. Fujita & S. Mizushima. SecD is involved in the release of translocated secretory proteins from the cytoplasmic membrane of Escherichia coli. EMBO J 12, 265-70 (1993). 119. C. Zwizinski & W. Wickner. Purification and characterization of leader (signal) peptidase from Escherichia coli. J Biol Chem 255, 7973-7 (1980). 120. C. B. Anfinsen. Principles that govern the folding of polypeptide chains. Science 181, 223-30 (1973). 121. J. C. Young et al. Pathways of chaperone-mediated protein folding in the cytosol. Nat Rev Mol Cell Biol 5, 781-91 (2004). 122. B. W. Ying et al. Co-translational involvement of the chaperonin GroEL in the folding of newly translated polypeptides. J Biol Chem 280, 12035-40 (2005). 123. B. W. Ying, H. Taguchi & T. Ueda. Co-translational binding of GroEL to nascent polypeptides is followed by post-translational encapsulation by GroES to mediate protein folding. J Biol Chem 281, 21813-9 (2006). 124. T. Hesterkamp et al. Escherichia coli trigger factor is a prolyl that associates with nascent polypeptide chains. Proc Natl Acad Sci U S A 93, 4437-41 (1996). 125. L. Ferbitz et al. Trigger factor in complex with the ribosome forms a molecular cradle for nascent proteins. Nature 431, 590-6 (2004). 126. A. Hoffmann et al. Trigger factor forms a protective shield for nascent polypeptides at the ribosome. J Biol Chem 281, 6539-45 (2006). 127. J. S. McCarty et al. The role of ATP in the functional cycle of the DnaK chaperone system. J Mol Biol 249, 126-37 (1995). 128. K. Liberek et al. Escherichia coli DnaJ and GrpE heat shock proteins jointly stimulate ATPase activity of DnaK. Proc Natl Acad Sci U S A 88, 2874-8 (1991). 129. E. Deuerling et al. Trigger factor and DnaK cooperate in folding of newly synthesized proteins. Nature 400, 693-6 (1999). 130. P. Genevaux et al. In vivo analysis of the overlapping functions of DnaK and trigger factor. EMBO Rep 5, 195-200 (2004). 131. R. S. Ullers et al. SecB is a bona fide generalized chaperone in Escherichia coli. Proc Natl Acad Sci U S A 101, 7583-8 (2004). 132. V. R. Agashe et al. Function of trigger factor and DnaK in multidomain protein folding: increase in yield at the expense of folding speed. Cell 117, 199-209 (2004). 133. Z. Xu, A. Horwich & P. Sigler. The crystal structure of the asymmetric GroEL- GroES-(ADP)7 chaperonin complex. Nature 388, 741-50 (1997). 134. B. Bukau & A. Horwich. The Hsp70 and Hsp60 chaperone machines. Cell 92, 351-66 (1998). 135. Z. Lin, D. Madan & H. S. Rye. GroEL stimulates protein folding through forced unfolding. Nat Struct Mol Biol 15, 303-11 (2008). 136. S. Sharma et al. Monitoring protein conformation along the pathway of chaperonin-assisted folding. Cell 133, 142-53 (2008). 137. Y. C. Tang et al. Structural features of the GroEL-GroES nano-cage required for rapid folding of encapsulated protein. Cell 125, 903-14 (2006).

73 138. W. A. Houry et al. Identification of in vivo substrates of the chaperonin GroEL. Nature 402, 147-54 (1999). 139. M. J. Kerner et al. Proteome-wide analysis of chaperonin-dependent protein folding in Escherichia coli. Cell 122, 209-20 (2005). 140. F. Arsene, T. Tomoyasu & B. Bukau. The heat shock response of Escherichia coli. Int J Food Microbiol 55, 3-9 (2000). 141. R. A. Mooney, S. A. Darst & R. Landick. Sigma and RNA polymerase: an on- again, off-again relationship? Mol Cell 20, 335-45 (2005). 142. T. Yura & K. Nakahigashi. Regulation of the heat-shock response. Curr Opin Microbiol 2, 153-8 (1999). 143. M. T. Morita et al. Translational induction of heat shock transcription factor sigma32: evidence for a built-in RNA thermosensor. Genes Dev 13, 655-65 (1999). 144. T. Tatsuta et al. Heat shock regulation in the ftsH null mutant of Escherichia coli: dissection of stability and activity control mechanisms of sigma32 in vivo. Mol Microbiol 30, 583-93 (1998). 145. E. Guisbert et al. A chaperone network controls the heat shock response in Escherichia coli. Genes Dev 18, 2812-21 (2004). 146. A. Blaszczak et al. Both ambient temperature and the DnaK chaperone machine modulate the heat shock response in Escherichia coli by regulating the switch between sigma70 and sigma32 factors assembled with RNA polymerase. EMBO J 14, 5085-93 (1995). 147. G. Nonaka et al. Regulon and promoter analysis of the Escherichia coli heat- shock factor, sigma32, reveals a multifaceted cellular response to heat stress. Genes Dev 20, 1776-89 (2006). 148. S. Ventura & A. Villaverde. Protein quality in bacterial inclusion bodies. Trends Biotechnol 24, 179-85 (2006). 149. D. J. Selkoe. Folding proteins in fatal ways. Nature 426, 900-4 (2003). 150. J. Weibezahn et al. Thermotolerance requires refolding of aggregated proteins by substrate translocation through the central pore of ClpB. Cell 119, 653-65 (2004). 151. S. P. Allen et al. Two novel heat shock genes encoding proteins produced in response to heterologous protein expression in Escherichia coli. J Bacteriol 174, 6938-47 (1992). 152. A. Mogk et al. Identification of thermolabile Escherichia coli proteins: prevention and reversion of aggregation by DnaK and ClpB. EMBO J 18, 6934- 49 (1999). 153. S. Wagner et al. Consequences of membrane protein overexpression in Escherichia coli. Mol Cell Proteomics 6, 1527-50 (2007). 154. A. Mogk et al. Small heat shock proteins, ClpB and the DnaK system form a functional triade in reversing protein aggregation. Mol Microbiol 50, 585-95 (2003). 155. H. LeThanh, P. Neubauer & F. Hoffmann. The small heat-shock proteins IbpA and IbpB reduce the stress load of recombinant Escherichia coli and delay degradation of inclusion bodies. Microb Cell Fact 4, 6 (2005). 156. A. Mogk et al. Refolding of substrates bound to small Hsps relies on a disaggregation reaction mediated most efficiently by ClpB/DnaK. J Biol Chem 278, 31033-42 (2003). 157. C. Schlieker et al. Solubilization of aggregated proteins by ClpB/DnaK relies on the continuous extraction of unfolded polypeptides. FEBS Lett 578, 351-6 (2004). 158. M. Kanemori et al. Synergistic roles of HslVU and other ATP-dependent proteases in controlling in vivo turnover of sigma32 and abnormal proteins in Escherichia coli. J Bacteriol 179, 7219-25 (1997).

74 159. T. Tomoyasu et al. Genetic dissection of the roles of chaperones and prote- ases in protein folding and degradation in the Escherichia coli cytosol. Mol Mi- crobiol 40, 397-413 (2001). 160. R. T. Sauer et al. Sculpting the proteome with AAA(+) proteases and disassembly machines. Cell 119, 9-18 (2004). 161. T. Tomoyasu et al. Escherichia coli FtsH is a membrane-bound, ATP- dependent protease which degrades the heat-shock transcription factor sigma32. EMBO J 14, 2551-60 (1995). 162. L. Jermutus, L. A. Ryabova & A. Pluckthun. Recent advances in producing and selecting functional proteins by using cell-free translation. Curr Opin Biotechnol 9, 534-48 (1998). 163. F. Baneyx & M. Mujacic. Recombinant protein folding and misfolding in Escherichia coli. Nat Biotechnol 22, 1399-408 (2004). 164. F. Baneyx. Recombinant protein expression in Escherichia coli. Curr Opin Biotechnol 10, 411-21 (1999). 165. H. P. Sorensen & K. K. Mortensen. Advanced genetic strategies for recombinant protein expression in Escherichia coli. J Biotechnol 115, 113-28 (2005). 166. S. Makrides. Strategies for achieving high-level overexpression of genes in Escherichia coli. Microbiol Rev 60, 512-38 (1996). 167. S. Graslund et al. Protein production and purification. Nat Methods 5, 135-46 (2008). 168. D. K. Hawley & W. R. McClure. Compilation and analysis of Escherichia coli promoter DNA sequences. Nucleic Acids Res 11, 2237-55 (1983). 169. S. Ringquist et al. Translation initiation in Escherichia coli: sequences within the ribosome-binding site. Mol Microbiol 6, 1219-29 (1992). 170. A. Kolb et al. Transcriptional regulation by cAMP and its receptor protein. Annu Rev Biochem 62, 749-95 (1993). 171. R. R. Arditti, J. G. Scaife & J. R. Beckwith. The nature of mutants in the lac promoter region. J Mol Biol 38, 421-6 (1968). 172. W. Gilbert & A. Maxam. The nucleotide sequence of the lac operator. Proc Natl Acad Sci U S A 70, 3581-4 (1973). 173. M. Lewis et al. Crystal structure of the lactose operon repressor and its complexes with DNA and inducer. Science 271, 1247-54 (1996). 174. M. P. Calos. DNA sequence for a low-level promoter of the lac repressor gene and an 'up' promoter mutation. Nature 274, 762-5 (1978). 175. A. Khlebnikov & J. D. Keasling. Effect of lacY expression on homogeneity of induction from the Ptac and Ptrc promoters by natural and synthetic inducers. Biotechnol Prog 18, 672-4 (2002). 176. L. M. Guzman et al. Tight regulation, modulation, and high-level expression by vectors containing the arabinose PBAD promoter. J Bacteriol 177, 4121-30 (1995). 177. D. A. Siegele & J. C. Hu. Gene expression from plasmids containing the araBAD promoter at subsaturating inducer concentrations represents mixed populations. Proc Natl Acad Sci U S A 94, 8168-72 (1997). 178. A. Khlebnikov et al. Homogeneous expression of the PBAD promoter in Escherichia coli by constitutive expression of the low-affinity high-capacity AraE transporter. Microbiology 147, 3241-7 (2001). 179. E. Englesberg, C. Squires & F. Meronk, Jr. The L-arabinose operon in Escherichia coli B-r: a genetic demonstration of two functional states of the product of a regulator gene. Proc Natl Acad Sci U S A 62, 1100-7 (1969). 180. M. J. Giacalone et al. Toxic protein expression in Escherichia coli using a rhamnose-based tightly regulated and tunable promoter system. Biotechniques 40, 355-64 (2006).

75 181. J. F. Tobin & R. F. Schleif. Positive regulation of the Escherichia coli L- rhamnose operon is mediated by the products of tandemly repeated regulatory genes. J Mol Biol 196, 789-99 (1987). 182. F. W. Studier & B. A. Moffatt. Use of bacteriophage T7 RNA polymerase to direct selective high-level expression of cloned genes. J Mol Biol 189, 113-30 (1986). 183. J. W. Dubendorff & F. W. Studier. Controlling basal expression in an inducible T7 expression system by blocking the target T7 promoter with lac repressor. J Mol Biol 219, 45-59 (1991). 184. W. Jack et al. An optimal host for expression of DNA-processing enzymes via T7 RNA polymerase. New England Biolabs internal (1996). 185. D. R. Wycuff & K. S. Matthews. Generation of an AraC-araBAD promoter- regulated T7 expression system. Anal Biochem 277, 67-73 (2000). 186. B. A. Moffatt & F. W. Studier. T7 lysozyme inhibits transcription by T7 RNA polymerase. Cell 49, 221-7 (1987). 187. F. W. Studier. Use of bacteriophage T7 lysozyme to improve an inducible T7 expression system. J Mol Biol 219, 37-44 (1991). 188. F. Hoffmann & U. Rinas. Stress induced by recombinant protein production in Escherichia coli. Adv Biochem Eng Biotechnol 89, 73-92 (2004). 189. I. Iost, J. Guillerez & M. Dreyfus. Bacteriophage T7 RNA polymerase travels far ahead of ribosomes in vivo. J Bacteriol 174, 619-22 (1992). 190. W. E. Bentley & D. S. Kompala. Optimal induction of protein synthesis in recombinant bacterial cultures. Ann N Y Acad Sci 589, 121-38 (1990). 191. J. Weber, F. Hoffmann & U. Rinas. Metabolic adaptation of Escherichia coli during temperature-induced recombinant protein production: 2. Redirection of metabolic fluxes. Biotechnol Bioeng 80, 320-30 (2002). 192. F. Hoffmann, J. Weber & U. Rinas. Metabolic adaptation of Escherichia coli during temperature-induced recombinant protein production: 1. Readjustment of metabolic enzyme synthesis. Biotechnol Bioeng 80, 313-9 (2002). 193. G. W. Luli & W. R. Strohl. Comparison of growth, acetate production, and acetate inhibition of Escherichia coli strains in batch and fed-batch fermentations. Appl Environ Microbiol 56, 1004-11 (1990). 194. J. Vind et al. Synthesis of proteins in Escherichia coli is limited by the concentration of free ribosomes. Expression from reporter genes does not always reflect functional mRNA levels. J Mol Biol 231, 678-88 (1993). 195. D. E. Chang, D. J. Smalley & T. Conway. Gene expression profiling of Escherichia coli growth transitions: an expanded stringent response model. Mol Microbiol 45, 289-306 (2002). 196. I. Iost & M. Dreyfus. The stability of Escherichia coli lacZ mRNA depends upon the simultaneity of its synthesis and translation. EMBO J 14, 3252-61 (1995). 197. O. V. Makarova et al. Transcribing of Escherichia coli genes with mutant T7 RNA polymerases: stability of lacZ mRNA inversely correlates with polymerase speed. Proc Natl Acad Sci U S A 92, 12250-4 (1995). 198. H. Dong, L. Nilsson & C. G. Kurland. Gratuitous overexpression of genes in Escherichia coli leads to growth inhibition and ribosome destruction. J Bacteriol 177, 1497-504 (1995). 199. D. L. Wilkinson & R. G. Harrison. Predicting the solubility of recombinant proteins in Escherichia coli. Biotechnology (N Y) 9, 443-8 (1991). 200. P. Valax & G. Georgiou. Molecular characterization of beta-lactamase inclusion bodies produced in Escherichia coli. 1. Composition. Biotechnol Prog 9, 539-47 (1993). 201. M. M. Carrio & A. Villaverde. Construction and deconstruction of bacterial inclusion bodies. J Biotechnol 96, 3-12 (2002). 202. M. M. Carrio & A. Villaverde. Localization of chaperones DnaK and GroEL in bacterial inclusion bodies. J Bacteriol 187, 3599-601 (2005).

76 203. M. Carrio et al. Amyloid-like properties of bacterial inclusion bodies. J Mol Biol 347, 1025-37 (2005). 204. S. Ventura. Sequence determinants of protein aggregation: tools to increase protein solubility. Microb Cell Fact 4, 11 (2005). 205. K. Tsumoto et al. Practical considerations in refolding proteins from inclusion bodies. Protein Expr Purif 28, 1-8 (2003). 206. A. Vera et al. The conformational quality of insoluble recombinant proteins is enhanced at low growth temperatures. Biotechnol Bioeng 96, 1101-6 (2007). 207. K. Nishihara et al. Overexpression of trigger factor prevents aggregation of recombinant proteins in Escherichia coli. Appl Environ Microbiol 66, 884-9 (2000). 208. A. de Marco. Protocol for preparing proteins with improved solubility by co- expressing with molecular chaperones in Escherichia coli. Nat Protoc 2, 2632- 9 (2007). 209. Y. Chen et al. DnaK and DnaJ facilitated the folding process and reduced inclusion body formation of magnesium transporter CorA overexpressed in Escherichia coli. Protein Expr Purif 32, 221-31 (2003). 210. V. V. Lunin et al. Crystal structure of the CorA Mg2+ transporter. Nature 440, 833-7 (2006). 211. S. Eshaghi et al. Crystal structure of a divalent metal ion transporter CorA at 2.9 angstrom resolution. Science 313, 354-7 (2006). 212. J. Deaton et al. Functional bacteriorhodopsin is efficiently solubilized and delivered to membranes by the chaperonin GroEL. Proc Natl Acad Sci U S A 101, 2281-6 (2004). 213. A. I. Derman et al. Mutations that allow disulfide bond formation in the cytoplasm of Escherichia coli. Science 262, 1744-7 (1993). 214. H. Kadokura, F. Katzen & J. Beckwith. Protein disulfide bond formation in prokaryotes. Annu Rev Biochem 72, 111-35 (2003). 215. T. Bruser. The twin-arginine translocation system and its capability for protein secretion in biotechnological protein production. Appl Microbiol Biotechnol 76, 35-45 (2007). 216. M. P. DeLisa et al. Phage shock protein PspA of Escherichia coli relieves saturation of protein export via the Tat pathway. J Bacteriol 186, 366-73 (2004). 217. G. Zhang, S. Brokx & J. H. Weiner. Extracellular accumulation of recombinant proteins fused to the carrier protein YebF in Escherichia coli. Nat Biotechnol 24, 100-4 (2006). 218. N. Mackman et al. Release of a chimeric protein into the medium from Escherichia coli using the C-terminal secretion signal of haemolysin. EMBO J 6, 2835-41 (1987). 219. N. Dautin & H. D. Bernstein. Protein secretion in gram-negative bacteria via the autotransporter pathway. Annu Rev Microbiol 61, 89-112 (2007). 220. H. Kiefer. In vitro folding of alpha-helical membrane proteins. Biochim Biophys Acta 1610, 57-62 (2003). 221. S. E. Bane, J. E. Velasquez & A. S. Robinson. Expression and purification of milligram levels of inactive G-protein coupled receptors in Escherichia coli. Protein Expr Purif 52, 348-55 (2007). 222. A. Korepanova et al. Cloning and expression of multiple integral membrane proteins from Mycobacterium tuberculosis in Escherichia coli. Protein Sci 14, 148-58 (2005). 223. E. Dobrovetsky et al. High-throughput production of prokaryotic membrane proteins. J Struct Funct Genomics 6, 33-50 (2005). 224. G. Psakis et al. Expression screening of integral membrane proteins from Helicobacter pylori 26695. Protein Sci 16, 2667-76 (2007).

77 225. D. O. Daley et al. Global topology analysis of the Escherichia coli inner mem- brane proteome. Science 308, 1321-3 (2005). 226. S. Eshaghi et al. An efficient strategy for high-throughput expression screening of recombinant integral membrane proteins. Protein Sci 14, 676-83 (2005). 227. D. M. Molina et al. Engineering membrane protein overproduction in Escherichia coli. Protein Sci 17, 673-80 (2008). 228. T. Cornvik et al. Colony filtration blot: a new screening method for soluble protein expression in Escherichia coli. Nat Methods 2, 507-9 (2005). 229. D. Drew et al. A scalable, GFP-based pipeline for membrane protein overexpression screening and purification. Protein Sci 14, 2011-7 (2005). 230. D. Drew et al. Optimization of membrane protein overexpression and purification using GFP fusions. Nat Methods 3, 303-13 (2006). 231. T. Kawate & E. Gouaux. Fluorescence-detection size-exclusion chromatography for precrystallization screening of integral membrane proteins. Structure 14, 673-81 (2006). 232. E. R. Geertsma et al. Quality control of overexpressed membrane proteins. Proc Natl Acad Sci U S A 105, 5722-27 (2008). 233. Q. A. Valent et al. Nascent membrane and presecretory proteins synthesized in Escherichia coli associate with signal recognition particle and trigger factor. Mol Microbiol 25, 53-64 (1997). 234. C. G. Jensen & S. Pedersen. Concentrations of 4.5S RNA and Ffh protein in Escherichia coli: the stability of Ffh protein is dependent on the concentration of 4.5S RNA. J Bacteriol 176, 7148-54 (1994). 235. J. J. Wuu & J. R. Swartz. High yield cell-free production of integral membrane proteins without refolding or detergents. Biochim Biophys Acta (2008). 236. L. Baars et al. Effects of SecE depletion on the inner and outer membrane proteomes of Escherichia coli. J Bacteriol (2008). 237. J. H. Weiner et al. Overproduction of fumarate reductase in Escherichia coli induces a novel intracellular lipid-protein . J Bacteriol 158, 590-6 (1984). 238. I. Arechaga et al. Characterisation of new intracellular membranes in Escherichia coli accompanying large scale over-production of the b subunit of F1Fo ATP synthase. FEBS Lett 482, 215-9 (2000). 239. A. A. Herskovits et al. Accumulation of endoplasmic membranes and novel membrane-bound ribosome-signal recognition particle receptor complexes in Escherichia coli. J Cell Biol 159, 403-10 (2002). 240. W. J. Netzer & F. U. Hartl. Recombination of protein domains facilitated by co- translational folding in eukaryotes. Nature 388, 343-9 (1997). 241. A. Raine et al. Targeting and insertion of heterologous membrane proteins in Escherichia coli. Biochimie 85, 659-68 (2003). 242. E. R. Kunji et al. Eukaryotic membrane protein overproduction in Lactococcus lactis. Curr Opin Biotechnol 16, 546-51 (2005). 243. C. G. Tate. Overexpression of mammalian integral membrane proteins for structural studies. FEBS Lett 504, 94-8 (2001). 244. T. Powers & P. Walter. Co-translational protein targeting catalyzed by the Escherichia coli signal recognition particle and its receptor. EMBO J 16, 4880-6 (1997). 245. J. Kota & P. O. Ljungdahl. Specialized membrane-localized chaperones prevent aggregation of polytopic proteins in the ER. J Cell Biol 168, 79-88 (2005). 246. J. Kota, C. F. Gilstring & P. O. Ljungdahl. Membrane chaperone Shr3 assists in folding amino acid permeases preventing precocious ERAD. J Cell Biol 176, 617-28 (2007). 247. H. Matsumoto et al. Gene encoding cytoskeletal proteins in Drosophila rhabdomeres. Proc Natl Acad Sci U S A 84, 985-9 (1987).

78 248. L. Ellgaard & A. Helenius. Quality control in the endoplasmic reticulum. Nat Rev Mol Cell Biol 4, 181-91 (2003). 249. B. Meusser et al. ERAD: the long road to destruction. Nat Cell Biol 7, 766-72 (2005). 250. A. Helenius & M. Aebi. Roles of N-linked glycans in the endoplasmic reticulum. Annu Rev Biochem 73, 1019-49 (2004). 251. A. A. Yeliseev et al. Expression of human peripheral cannabinoid receptor for structural studies. Protein Sci 14, 2638-53 (2005). 252. A. G. Lee. How lipids affect the activities of integral membrane proteins. Biochim Biophys Acta 1666, 62-87 (2004). 253. C. Hunte. Specific protein-lipid interactions in membrane proteins. Biochem Soc Trans 33, 938-42 (2005). 254. E. Wallin & G. von Heijne. Properties of N-terminal tails in G-protein coupled receptors - A statistical study. Protein Engineer 8, 693-8 (1995). 255. M. Monne et al. Competition between neighboring topogenic signals during membrane protein insertion into the ER. FEBS J 272, 28-36 (2005). 256. H. M. Weiss & R. Grisshammer. Purification and characterization of the human adenosine A2a receptor functionally expressed in Escherichia coli. Eur J Biochem 269, 82-92 (2002). 257. J. Tucker & R. Grisshammer. Purification of a rat neurotensin receptor expressed in Escherichia coli. Biochem J 317, 891–9 (1996). 258. L. G. Josefsson & L. L. Randall. Processing in vivo of precursor maltose- binding protein in Escherichia coli occurs post-translationally as well as co- translationally. J Biol Chem 256, 2504-7 (1981). 259. T. P. Roosild et al. NMR structure of Mistic, a membrane-integrating protein for membrane protein expression. Science 307, 1317-21 (2005). 260. G. Kefala et al. Application of Mistic to improving the expression and membrane integration of histidine kinase receptors from Escherichia coli. J Struct Funct Genomics 8, 167-72 (2007). 261. R. D. Dalbey, A. Kuhn & G. von Heijne. Directionality in protein translocation across membranes: The N-tail phenomenon. Trends Cell Biol 5, 380-383 (1995). 262. Y. Huang et al. Structure and mechanism of the glycerol-3-phosphate transporter from Escherichia coli. Science 301, 616-20 (2003). 263. B. Miroux & J. E. Walker. Over-production of proteins in Escherichia coli: Mutant hosts that allow synthesis of some membrane proteins and globular proteins at high levels. J Mol Biol 260, 289-98 (1996). 264. L. Dumon-Seignovert, G. Cariot & L. Vuillard. The toxicity of recombinant proteins in Escherichia coli: a comparison of overexpression in BL21(DE3), C41(DE3), and C43(DE3). Protein Expr Purif 37, 203-6 (2004). 265. J. E. Walker, B. Miroux & I. Arechaga. Expression system for membrane proteins. WO/2001/029236 Great Britain (2000). 266. M. Wacker et al. N-linked glycosylation in Campylobacter jejuni and its functional transfer into Escherichia coli. Science 298, 1790-3. (2002). 267. Y. M. Zhang & C. O. Rock. Membrane lipid homeostasis in bacteria. Nat Rev Microbiol 6, 222-33 (2008). 268. F. W. Studier. Protein production by auto-induction in high density shaking cultures. Protein Expr Purif 41, 207-34 (2005). 269. H. M. Shapiro. Practical flow cytometry (Wiley-Liss, Hoboken, 2003). 270. I. Fishov & C. L. Woldringh. Visualization of membrane domains in Escherichia coli. Mol Microbiol 32, 1166-72 (1999). 271. C. J. Hewitt & G. Nebe-Von-Caron. The application of multi-parameter flow cytometry to monitor individual microbial cell physiological state. Adv Biochem Eng Biotechnol 89, 197-223 (2004).

79 272. T. De Vrije, J. Tommassen & B. De Kruijff. Optimal posttranslational transloca- tion of the precursor of PhoE protein across Escherichia coli membrane vesi- cles requires both ATP and the protonmotive force. Biochim Biophys Acta 900, 63-72 (1987). 273. E. Laskowska et al. Aggregation of heat-shock-denatured, endogenous proteins and distribution of the IbpA/B and Fda marker-proteins in Escherichia coli WT and grpE280 cells. Microbiology 150, 247-59 (2004). 274. M. J. Osborn et al. Mechinism of assembly of the outer membrane of Salmonella typhimurium. Isolation and characterization of cytoplasmic and outer membrane. J Biol Chem 247, 3962-72 (1972). 275. P. Marani et al. New Escherichia coli outer membrane proteins identified through prediction and experimental verification. Protein Sci 15, 884-9 (2006). 276. A. Gorg, W. Weiss & M. J. Dunn. Current two-dimensional electrophoresis technology for proteomics. Proteomics 4, 3665-85 (2004). 277. U. K. Laemmli. Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature 227, 680-5 (1970). 278. H. Schagger & G. von Jagow. Tricine-sodium dodecyl sulfate-polyacrylamide gel electrophoresis for the separation of proteins in the range from 1 to 100 kDa. Anal Biochem 166, 368-79 (1987). 279. L. Baars et al. Defining the role of the Escherichia coli chaperone SecB using comparative proteomics. J Biol Chem 281, 10024-34 (2006). 280. D. A. Widdick et al. The twin-arginine translocation pathway is a major route of protein export in Streptomyces coelicolor. Proc Natl Acad Sci U S A 103, 17927-32 (2006). 281. H. Schagger & G. von Jagow. Blue native electrophoresis for isolation of membrane protein complexes in enzymatically active form. Anal Biochem 199, 223-31 (1991). 282. F. Stenberg et al. Protein complexes of the Escherichia coli cell envelope. J Biol Chem 280, 34409-19 (2005). 283. W. Urfer, M. Grzegorczyk & K. Jung. Statistics for proteomics: a review of tools for analyzing experimental data. Proteomics 6 Suppl 2, 48-55 (2006). 284. M. Mann, P. Hojrup & P. Roepstorff. Use of mass spectrometric molecular weight information to identify proteins in sequence databases. Biol Mass Spectrom 22, 338-45 (1993). 285. T. T. Quach et al. Development and applications of in-gel CNBr/tryptic digestion combined with mass spectrometry for the analysis of membrane proteins. J Proteome Res 2, 543-52 (2003). 286. M. Karas & F. Hillenkamp. Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Anal Chem 60, 2299-301 (1988). 287. K. A. Datsenko & B. L. Wanner. One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proc Natl Acad Sci U S A 97, 6640-5 (2000). 288. M. Oishi & S. D. Cosloy. The genetic and biochemical basis of the transformability of Escherichia coli K12. Biochem Biophys Res Commun 49, 1568-72 (1972). 289. K. C. Murphy. Use of bacteriophage lambda recombination functions to promote gene replacement in Escherichia coli. J Bacteriol 180, 2063-71 (1998). 290. P. P. Cherepanov & W. Wackernagel. Gene disruption in Escherichia coli: TcR and KmR cassettes with the option of Flp-catalyzed excision of the antibiotic- resistance determinant. Gene 158, 9-14 (1995). 291. T. Baba et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol 2, 2006 0008 (2006). 292. J. H. Withey & D. I. Friedman. A salvage pathway for protein structures: tmRNA and trans-translation. Annu Rev Microbiol 57, 101-23 (2003).

80 293. M. Alami et al. Nanodiscs unravel the interaction between the SecYEG channel and its cytosolic partner SecA. EMBO J 26, 1995-2004 (2007). 294. N. Ruiz, D. Kahne & T. J. Silhavy. Advances in understanding bacterial outer- membrane biogenesis. Nat Rev Microbiol 4, 57-66 (2006). 295. B. Traxler & C. Murphy. Insertion of the polytopic membrane protein MalF is dependent on the bacterial secretion machinery. J Biol Chem 271, 12394-400 (1996). 296. N. D. Ulbrandt, J. A. Newitt & H. D. Bernstein. The Escherichia coli signal rec- ognition particle is required for the insertion of a subset of inner membrane pro- teins. Cell 88, 187-96 (1997). 297. H. Tian, D. Boyd & J. Beckwith. A mutant hunt for defects in membrane protein assembly yields mutations affecting the bacterial signal recognition particle and Sec machinery. Proc Natl Acad Sci U S A 97, 4730-5 (2000). 298. I. Buskiewicz et al. Trigger factor binds to ribosome-signal-recognition particle (SRP) complexes and is excluded by binding of the SRP receptor. Proc Natl Acad Sci U S A 101, 7902-6 (2004). 299. H. Y. Qi & H. D. Bernstein. SecA is required for the insertion of inner mem- brane proteins targeted by the Escherichia coli signal recognition particle. J Biol Chem 274, 8993-7 (1999). 300. E. N. Houben et al. The two membrane segments of leader peptidase partition one by one into the lipid bilayer via a Sec/YidC interface. EMBO Rep 5, 970-5 (2004). 301. M. L. Oldham et al. Crystal structure of a catalytic intermediate of the maltose transporter. Nature 450, 515-21 (2007). 302. B. Traxler & J. Beckwith. Assembly of a hetero-oligomeric membrane protein complex. Proc Natl Acad Sci USA 89, 10852-6 (1992). 303. D. E. Drew et al. Green fluorescent protein as an indicator to monitor mem- brane protein overexpression in Escherichia coli. FEBS Lett 507, 220-4 (2001). 304. R. Gropp, F. Gropp & M. C. Betlach. Association of the halobacterial 7S RNA to the polysome correlates with expression of the membrane protein bacte- rioopsin. Proc Natl Acad Sci U S A 89, 1204-8 (1992). 305. N. Gonzalez-Montalban et al. Bacterial inclusion bodies are cytotoxic in vivo in absence of functional chaperones DnaK or GroEL. J Biotechnol 118, 406-12 (2005). 306. J. Green & M. S. Paget. Bacterial redox sensors. Nat Rev Microbiol 2, 954-66 (2004). 307. X. Liu & P. De Wulf. Probing the ArcA-P modulon of Escherichia coli by whole genome transcriptional analysis and sequence recognition profiling. J Biol Chem 279, 12588-97 (2004). 308. A. E. Silverstone, R. R. Arditti & B. Magasanik. Catabolite-insensitive rever- tants of lac promoter mutants. Proc Natl Acad Sci U S A 66, 773-9 (1970).

81