BUNDLES AND BARRELS 5

A. B.

Structures of helical bundle and β-barrel membrane proteins differ in many respects, seen in the ribbon diagrams of the photosynthetic reaction center from Rb. sphaeroides (a) and the maltoporin trimer from E. coli outer membrane (b). (a) Redrawn from Jones, M. R., et al., Biochim Biophys Acta. 2002, 1565: 206–214. © 2002 by Elsevier. Reprinted with permission from Elsevier; (b) redrawn from Wimley, W. C., Curr Opin Struct Biol. 2003, 13:404–411. © 2003 by Elsevier. Reprinted with permission from Elsevier.

The thermodynamic arguments discussed in the pre- HELICAL BUNDLES vious chapter make it clear that the TM segments of proteins will utilize secondary structure to satisfy the Transmembrane (TM) α-helices have dominated the pic- hydrogen bond needs of the peptide backbone. While a ture of membrane proteins, guided by early structural variety of combinations of secondary structures might information on bacteriorhodopsin and by the first x-ray be imagined in polytopic membrane proteins, all known structure solved for membrane proteins, that of the pho- protein structures cross the bilayer with either α-heli- tosynthetic reaction center (RC). The majority of integral ces or β-strands. This produces two classes of integral membrane proteins whose high-resolution structures membrane proteins: helical bundles and β-barrels. This have been solved by x-ray crystallography exhibit the chapter looks at the structure and function of the para- helical bundle motif (see many examples in Chapters 9 digmatic proteins in each class, then covers the surpris- through 13). Helix–helix interactions have been analyzed ing variation among β-barrels, and ends with a look at in many of these, providing details of both tertiary and proteins that fit neither class. quaternary interactions. Identification of new integral

107 Bundles and barrels

membrane proteins in the proteome relies heavily on Flagella prediction of TM helices, as described in Chapter 6. Bacteriorhodopsin If a single protein dominated the thinking about struc- ture, dynamics, and assembly of membrane proteins in H ADP P the decades following 1970, that protein was bacteriorho- i ATP dopsin (bR) from the purple membranes of the salt-loving bacterium Halobacter salinarum. From early structural Light H images and spectroscopic characterizations, bR became ATPase the paradigm for ion transport proteins and indeed for α-helical TM proteins in general. From the wealth of stud- ies of its structure and function, a truly detailed under- Purple standing of this has emerged. membrane H ATPase bR is the only protein species in the discrete mem- brane domains called purple membranes, the light-sen- ATP sitive regions of the plasma membranes of H. salinarum ADP Pi (Figure 5.1). Together with specialized lipids, this pro- tein forms functional trimers that pack as ordered Red Respiratory membrane two-dimensional arrays on the bacterial cell. In pho- chain tophosphorylation bR functions to pump protons out O2 of the cell in response to the absorption of light by its chromophore, retinal, converting light energy into an Substrate Cell H electrochemical proton gradient that supports the syn- wall thesis of ATP. The ability of reconstituted vesicles con- taining bR and beef heart mitochondrial ATP synthase to synthesize ATP in response to light provided crucial early support for Mitchell’s chemiosmotic hypothesis Flagella that the energy of an electrochemical gradient across the membrane could be used to do work (Figure 5.2). 5.1 Schematic showing the different membrane domains Like rhodopsin, the light-absorbing protein in the rod of a halobacterium cell. The patches of purple membrane outer segments of the eye’s retina (see Chapter 10), bR containing bacteriorhodopsin (bR) are separate from the has seven helices labeled A to G that span the membrane, regions of membrane containing the respiratory chain and and a retinal that is bound to a lysine residue in helix G the ATP synthase. Protons are pumped out of the cell in via a protonated Schiff base (Figure 5.3). Whereas bR response either to light absorption by bR in the purple undergoes a light-induced photocycle involving conver- membrane or to cytosolic substrates for the electron sion of the retinal from all-trans to 13-cis accompanied transport chain. The ATP synthase normally uses the by conformational changes in the protein that result in uptake of protons to drive the synthesis of ATP, although proton transfer across the membrane, visual rhodopsin’s it can act as an ATPase and eject protons at the expense activation cascade involves an 11-cis to all-trans conver- of ATP. Redrawn from Stoeckenius, W., Sci Am. 1976, sion of the retinal, followed by its dissociation from the 234:38–46. © 1976 by Scientific American. Reprinted with protein. In addition to bR, H. salinarum contains three permission from Scientific American. related rhodopsins, and similar molecules are found in eubacteria and unicellular eukaryotes. The purple membrane is composed of 75% protein bR was the first integral membrane protein whose and 25% lipid by weight, with ten halobacterial lip- topological organization in the membrane was eluci- ids per bR monomer. These native lipids are based on dated. Electron diffraction of the native two-dimensional archeol (see Figure 2.6), so they differ in headgroups but crystalline arrays of purple membrane provided early not in length of acyl chains. Delipidation by treatment images of bR trimers, revealing the monomer structure with a mild detergent affects the kinetics of the bR reac- to be seven TM α-helices arranged in an arc-like double tion; addition of halobacterial lipids but not phospholip- crescent in the plane of the bilayer (Figure 5.4a,b). Mod- ids restores activity. When bR is crystallized (see below) els fitting the primary structure of the protein, with 70% the bound lipids that are retained from the membrane of its 248 residues being hydrophobic, to the observed fit well into the grooves along the protein surface (see images were aided by the sensitivity of exposed loop resi- Figure 8.20). dues to partial proteolysis in situ (carried out on bR in

108 Helical bundles

C18 C20 Light C19 C5 C7 C9 C11 C13 C15 C Lys216 C4 C6 C8 C10 C12 C14 N H C3 C1 H C2 C17 C16 h Bacteriorhodopsin

C18 C19 C20

C5 C7 C9 C11 C13 C4 C6 C8 C10 C12 C14 C3 C1 C15 H C17 C2 N C16 H C Lys216 5.3 Retinal bound to lysine 216 in bacteriorhodopsin. The Schiff base linkage (shaded) between the aldehyde of retinal and the ε-amino group of lysine 216 is protonated, as shown, before light stimulation. The all-trans retinal converts to 13-cis retinal upon absorption of a photon. Redrawn from Neutze, R., et al., Biochim Biophys Acta. 2002, 1565:144–167. © 2002 by Elsevier. Reprinted with permission from Elsevier. Mitochondrial F1F0-ATP synthase Lipid Asp85 and Asp96. These results led to a model for the vesicle topology of bR that was further modified as genetic stud- ies defined the positions of many residues (Figure 5.4c). Extensive mutagenesis of the gene for bR provided a large collection of mutant proteins that could be studied in cell suspensions or reconstituted in lipid vesicles, with changes of pH, temperature, and salt conditions used to further characterize the protein function. In addition, a ADP P ATP variety of retinal analogs were incorporated to observe H their effects on the absorption spectrum and activ- ity. Researchers used a number of pH-sensitive dyes to 5.2 Schematic of reconstituted vesicles containing investigate the stoichiometry of proton pumping. Over bR and ATP synthase. When the light is turned on, ATP many years, increasingly sophisticated instrumentation is synthesized from ADP + Pi. When the light is turned for visible and ultraviolet absorbance, fluorescence, cir- off, ATP synthesis stops. These vesicles gave important cular dichroism, Raman, and infrared spectroscopy have evidence to support the chemiosmotic theory of Peter been employed to follow the response of bR to light. Mitchell. Redrawn from Garrett, R. H., and C. M. Grisham, The primary event when bR absorbs a photon is the Biochemistry, 2nd ed., Brooks/Cole, 1999, p. 897. isomerization of retinal. This event triggers subsequent © 1999. Reprinted with permission from Brooks/Cole, a structural changes and pK shifts in the protein that division of Thomson Learning: www.thomsonrights.com. a allow deprotonation of the Schiff base, vectorial transfer of the proton to the extracellular side of the membrane, the membrane), although the precise beginning and end and uptake of a proton from the cytosol. These processes of each helix were uncertain for years. Such studies also are accompanied by differences in the absorbance spec- showed that a few N-terminal amino acids are exposed trum of bR, allowing detection of intermediates with to the exterior and the last 17 to 24 amino acids of the lifetimes varying from a few picoseconds (ps, 10−12 sec) C terminus are accessible in the cytoplasm. The loca- to a few milliseconds (msec, 10−3 sec). tion of retinal in the center of the protein (Figure 5.4b) The light-induced changes in bR are summarized in was determined by neutron diffraction, and the lysine a photoreaction cycle, or photocycle (Figure 5.5). bR in

to which it binds was identified by reduction of the its resting state has a λmax of 570 nm (purple); when it

Schiff base with NaBH4. Once it was clear that the reti- absorbs a photon, it rapidly isomerizes to the K inter- nal binds to Lys216, nearby residues were investigated mediate (λmax 590 nm) and then converts to the L inter- by site-directed mutagenesis, which identified residues mediate (λmax 550 nm). The transition from L to M (λmax that interact with the retinal, such as Asp212 and Arg82, 410 nm) occurs when the proton from the Schiff base as well as residues crucial to proton pumping, such as is transferred to Asp85, the primary acceptor. At this

109 Bundles and barrels

A. B. C.

5.4 EM structure of bR. (A) The electron density profile of the two-dimensional crystalline purple membrane shows arrays of bR trimers. Each subunit of the trimer is arc-shaped with three well-resolved peaks in the inner layer and four less-resolved peaks in the outer layer. From Hayward, S. B., et al., Proc Natl Acad Sci USA. 1978, 75:4320-4324. (B) Seven helices are modeled to correspond to the seven peaks of a bR monomer. Based on neutron diffraction data, a retinal has been placed in the center of the protein. From Subramaniam, S., and R. Henderson, Biochim Biophys Acta. 2000, 1450:157–165. © 2000 by Elsevier. Reprinted with permission from Elsevier. (C) A topology model of bR shows the predicted sequence composition of the seven helices and their connecting loops. The model was adjusted periodically based on genetic mutations of targeted residues (colored boxes) until the x-ray structure was solved. From Khorana, H. G., J Biol Chem. 1988, 263:7439– 7442. © 1988 by American Society for Biochemistry & Molecular Biology. Reproduced with permission of American Society for Biochemistry & Molecular Biology via Copyright Clearance Center.

point a structural rearrangement occurs to switch the The structure has now been refined to better than 2 accessibility of the Schiff base, described as M1 → M2, Å resolution and provides sufficient detail to trace the which is essential for vectorial proton transport by pre- proton channel, including several important water mole- venting reprotonation from the extracellular side that cules (Figure 5.7a). The central cavity that contains reti- would result in a zero net effect. The next three steps are nal in Schiff base linkage to Lys216 is quite rigid, with the slower, each taking around 5 msec. Transfer of a proton π-bulge in helix G stabilized by an H-bond from Ala215 to the Schiff base from Asp96 creates the N intermediate to a water molecule. In the resting state, the positive

(λmax 560 nm). With the return from 13-cis to all-trans charge on the protonated Schiff base (pKa ∼13.5) is sta-

retinal, the O intermediate (λmax 640 nm) is formed and bilized by the nearby deprotonated carboxylate groups the release of a proton from Asp85 completes the cycle. of Asp85 and Asp212. Polar side chains and water mol- The first high-resolution structure of bR was achieved ecules make a clear proton path in the extracellular half by x-ray crystallography of microcrystals prepared in of the molecule from Asp85, the proton acceptor, to the bicontinuous cubic phase lipids, either monoolein or extracellular surface where the proton is released (Fig- monopalmitolein. (These are not phospholipids but rather ure 5.7b). At the cytoplasmic surface are several acidic racemic mixtures of glycerol esterified to one acyl chain, groups that may be involved in transferring protons either oleoyl or palmitoleoyl, on C1.) In cubic phase- from the cytoplasm, but no clear proton path connects

grown crystals, bR trimers are stacked in layers that have the central cavity to them. The pKa of Asp96, the proton the same orientation and lipid content as that observed donor during reprotonation of the retinal from the cyto- in purple membrane. Furthermore, spectroscopic studies plasmic side, is very high due to its nonpolar environ- show that bR in cubo undergoes the main steps of the ment and to its side chain H-bond to the side chain of photocycle. As expected from the electron density images, Thr46. This part of the protein is more flexible than the the seven TM helices cross the membrane nearly perpen- extracellular half and must undergo a conformational dicular to the plane of the bilayer and are packed closely change to open a proton path between the cytoplasmic together and connected by short loops (Figure 5.6). surface and Asp96.

110 Helical bundles

bR τ 4 ps τ 5 ms 570 nm max Cytoplasmic OK max 640 nm B max 590 nm τ μ τ 5 ms 1 s D96

N L 560 nm max max 550 nm

τ 5 ms τ 40 μs max 410 nm D F G A C K216 M2 M1 τ μ 350 s Retinal D212 5.5 The photocycle of bR. In response to light, bR D85 undergoes a series of transitions through intermediates K, L, M1, M2, N, and O, which have different lifetimes E (τ) and different absorbance maxima, as shown. The photocycle is initiated by isomerization of the retinal from R82 all-trans to 13-cis (bR → K, L), followed by transfer of a proton (L → M), the conformational change that switches E204 the accessibility of the Schiff base from the extracellular E194 side to the cytoplasmic side (M1 → M2), another proton transfer (M → N), and conversion of the retinal back to all-trans (N → O). From Neutze, R., et al., Biochim Biophys Acta. 2002, 1565:144–167. © 2002 by Elsevier. Reprinted Extracellular with permission from Elsevier.

To relate the elegant structure of bR in its resting state 5.6 The first high-resolution structure of bR. This overview to the dynamic events of its photocycle, structural infor- of the structure shows the seven TM helices labeled A–G, mation has been obtained for different intermediates in and the residues involved in proton translocation as well as the photocycle by crystallization of mutants prevented the retinal. From Pebay-Peyroula, E., et al., Biochim Biophys from completing the photocycle (such as D96N, which Acta. 2000, 1460:119–132. © 2000 by Elsevier. Reprinted stops at the late M state) and by “kinetic crystallogra- with permission from Elsevier. phy” of wild type crystals, which uses low temperatures and different wavelengths of light to trap a significant of the photosynthetic RC from Rhodopseudomonas viridis, population of the molecules in the crystals in one state. the first high-resolution structure achieved for integral By now over 20 high-resolution structures show signifi- membrane proteins. When Michel and coworkers first cant movements of the cytoplasmic portions of helices crystallized the RC, the gene sequences encoding its pro- F and G and the extracellular portion of helix C during tein constituents were not even available! The beautiful the photocycle (Figure 5.8). Although there is disagree- structure of this multicomponent complex provided spe- ment over the details of the structural changes, together cific descriptors of its protein domains including TM heli- they complement the spectroscopic data to portray the ces, along with the locations of cofactors involved in light steps of the cycle. Other methods, such as EPR and time- absorption and electron transfer. resolved wide angle x-ray scattering, continue to add In photosynthesis light energy is converted into insights into the dynamic mechanism of this light-driven chemical energy when the absorption of a photon drives proton pump, nature’s simplest photosynthetic machine. an electron transfer that is otherwise thermodynami- cally unfavorable. The subsequent passage of electrons Photosynthetic Reaction Center through spatially arranged carriers is coupled with the expulsion of protons, just as it is in oxidative phospho- Nearly a decade before the first high-resolution x-ray struc- rylation, thus providing the proton gradient that drives ture for bR was published, the 1988 Nobel Prize in Chemis- the ATP synthase. Plants have two types of photosystems, try was awarded to Hartmut Michel, Johann Deisenhofer, called PSI and PSII, which differ in the electron accep- and Robert Huber for the elucidation of the x-ray structure tors used. The much simpler photosynthetic RCs of

111 Bundles and barrels

A. B.

Cytoplasmic side T89 K216 Schiff base 2.75 Helix G 2.85 W402 2.57 4 D212 Helix C 2.81 3.17 G F E D85 A B C D W401 2.63 3.01 W400 Asp96 2.58

2.85 W403 R82 Lys216 3 3.29 2.77 W408 W407 3.11 2.36 1 3.25 Asp85 Retinal 2.71 E204 2.49 W409 E194 W406 5 Arg82

Glu204 Glu194

2

Extracellular side

5.7 Proton path in bR. (A) The ribbon diagram for the seven TM helices, labeled A–G, is marked to show proton transfer steps indicated by arrows numbered in chronological order from 1 to 5. Step 1 is release of a proton from the Schiff base to Asp85. In step 2 a proton is released to the extracellular medium, possibly via Glu204 or Glu194. In step 3 the Schiff base is reprotonated by Asp96. Step 4 is the reprotonation of Asp96 from the cytoplasmic medium. Step 5 is the final proton transfer step from Asp85 to the group involved in proton release at the extracellular side, either Glu204 or Glu194. From Neutze, R., et al., Biochim Biophys Acta. 2002, 1565:144–167. © 2002 by Elsevier. Reprinted with permission from Elsevier. (B) Details of the proton path on the extracellular side of the Schiff base reveal a network of hydrogen bonds between Asp85, Asp212, Arg82, Glu194, and Glu204 and discrete water molecules (red, labeled W). Interatomic distances are given in angstroms. From Pebay-Peyroula, E., et al., Biochim Biophys Acta. 2000, 1460:119–132. © 2000 by Elsevier. Reprinted with permission from Elsevier.

purple bacteria are considered the ancestors of PSII. the RC from Thermochromatium tepidum was solved at Recent x-ray structures of both PSI and PSII reveal com- 2.2 Å resolution. While the cofactor–protein interactions mon structural features with the RCs in spite of their are nearly the same in all three complexes, they represent enormous size and complexity. two types of RCs. The Rb. sphaeroides RC is a member of RCs were discovered in photosynthetic purple bacteria group I and contains three protein subunits called L, M, and characterized by biophysical techniques such as EPR and H, along with ten cofactors (see Frontispiece). The (see Box 4.2) and optical spectroscopy before they proved other two RCs are members of group II, and they contain amenable to crystallization. Soon after elucidation of the an additional subunit, a c-type cytochrome with its four structure from R. viridis (renamed Blastochloris viridis), heme cofactors (Figure 5.9). The three protein subunits – L another high-resolution structure was obtained for the RC (light), M (medium), and H (heavy) – were named for from Rhodopseudomonas sphaeroides (renamed Rhodo- their apparent molecular weights determined with SDS bacter sphaeroides). More recently, the x-ray structure for gel electrophoresis.

112 Helical bundles

5.8 Significant conformational changes during the photocycle in bR. The ribbon diagram of the resting conformation (green) of bR is superposed with intermediate (black) and late (red) conformations to show the movements of the F, G, and C helices once the retinal is converted from all-trans (green) to 13-cis (magenta). bR is oriented with the cytoplasmic end (CP) at the top and the extracellular end (EC) at the bottom. From Andersson, M., et al., Structure. 2009, 17:1265–1275.

Outside

5.9 The structure of the photosynthetic reaction center from Blastochloris viridis, a group II reaction center. The complex contains four protein subunits, L, M, H, and a Inside cytochrome, and 14 cofactors (red). The TM helices are highlighted in yellow. Compare it with the group I reaction center shown in the chapter frontispiece. Redrawn from Nelson, D. L., and M. M. Cox, Lehninger Principles of Biochemistry, 4th ed., W. H. Freeman, 2005, p. 376. © 2005 by W. H. Freeman and Company. Used with permission.

113 Bundles and barrels

The Proteins negative charge on the periplasmic side and a net positive charge on the cytoplasmic side. The membrane-spanning The B. viridis RC has a size of ∼130 Å by ∼70 Å; its TM surface is very hydrophobic. There are no charged amino domain consists of five α-helices from L, five α-helices acids in this 30 Å-wide band around the center of the RC, from M, and one α-helix from H. The remainder of the and very few water molecules associate with it. Like other H subunit caps the structure on the cytoplasmic side, membrane proteins, it has tyrosine and tryptophan resi- and the c-type cytochrome lies on the periplasmic side dues distributed at the interfacial borders. Surprisingly, (Figure 5.9). The TM helices composed of 21 to 28 amino the polarity of the interior of the membrane-spanning acids cross nearly perpendicular to the plane of the domain is like that of the interior of soluble proteins, inter- membrane. Three helices from each L and M subunit are mediate between the polarities of amino acids exposed to nearly straight, while one is curved and one is bent more water and the hydrophobic interior of the bilayer. The ten than 30° at a proline residue and ends with a 3 helix (a 10 TM helices of L and M together have 74 polar side chains. slightly narrower helix with three amino acids per turn). Furthermore, most of these polar residues do not appear The structural similarity of L and M (Figure 5.10) gives to be involved in forming hydrogen bonds, as there are at a high degree of two-fold symmetry in the TM domain most two hydrogen bonds between any pair of TM heli- in spite of only 26% sequence identity. The cytochrome ces. Their predominant roles seem to be interactions with with its two pairs of hemes also has two-fold symme- cofactors and protein subunits. try, unrelated to that of Land M. Its compact structure Comparison of related RC sequences indicates that consists of five segments: the N-terminal segment (resi- residues buried in the interior of the structure are con- dues C1 to C66); the first heme-binding segment (C67 to served more than are residues on the surface. The M, C142); a connecting segment (C143 to C225); the second L, and H subunits of the Rb. sphaeroides RC have 59%, heme-binding segment (C226 to C315); and the C-termi- 49%, and 39% homology to subunits in B. viridis, respec- nal segment (C316 to C336). tively, and are very similar in structure. While some dif- As the first available detailed structure of integral ferences in amino acid sequence lead to differences in membrane proteins, the RC confirmed expectations about the interactions of the cofactors with the peptide chains, distribution of amino acids on its surface, yet it revealed the complex has the same approximate two-fold symme- a new concept of interior polarity. The surface of the RC try. The site-specific mutants first available in Rb. sphaer- complex is polar in the peripheral subunits, with a net oides revealed the roles of GluL212 and SerL223 for reduction and protonation of the quinone and TyrM210 for efficient electron transfer.

Lipids H L Based on its size and shape, the RC is predicted by EPR to have 30–35 annular lipids (see Chapter 4). However, most of the lipids are replaced by detergent during puri- fication. For crystallization, the RC is solubilized with LDAO (N,N-dimethyl-dodecylamine-N-oxide), and a few detergent molecules are included in the crystal structure. While it is often difficult to ascertain whether an acyl chain detected in the structure is from a lipid or a deter- gent, the x-ray structures give evidence for specific lipids (see Chapter 8), including a cardiolipin and a PE, which cyt M fit closely into hydrophobic grooves at the surface of the protein exposed to the nonpolar membrane domain.

The Cofactors

The proteins provide a scaffold for the cofactors, holding them in the same spatial arrangement in all three RCs crystallized. The RC core has ten cofactors: • Four bacteriochlorophylls (BChl), which resemble 5.10 The peptide backbones of L, M, H, and cytochrome heme except for the replacement of iron by Mg2+ subunits of the photosynthetic reaction center from B. viridis. ions, a cyclopentenone ring fused to one pyrrole From Deisenhofer, J., et al., Nature. 1985, 318:618–624. © ring, and different substituents off two of the pyr- 1985. Reprinted by permission of Macmillan Publishers Ltd. role rings.

114 Helical bundles

• Two bacteriopheophytins (BPh), which are BChl roles (see below) and the primary quinone is in a more with two protons in place of the Mg2+. hydrophobic environment than the secondary quinone. • Two quinones (one ubiquinone and one menaqui- After being reduced and protonated, the secondary qui- none in B. viridis). none (now quinol) diffuses from the RC, its leaving facili- • A nonheme ferrous ion. tated by nearby carboxylate groups. The similarities in the • A carotenoid, which is a largely linear C40 polyene arrangement of cofactors in all photosystems, including such as β-carotene. the iron–sulfur type PSI complexes, allow definition of a common motif: a dimer of tetrapyrrole molecules flanked There are two types of both BChl and BPh, a and b. by four monomeric tetrapyrrole molecules with an iron The a type is found in the RC from Rb. sphaeroides and at the center. has either a phytyl or geranylgeranyl side chain, whereas The cytochrome of group II RCs has four hemes, the b type, found in the RC from B. viridis and T. tepi- located in pairs on the two heme-binding segments of dum, has only a phytyl side chain and has one more C=C the polypeptide, each having a helix followed by a turn in the side chain of ring II. and the Cys-X-Y-Cys-His sequence typical of c-type The cofactors form two symmetrically related branches cytochromes. The hemes lie parallel to the axis of the within the hydrophobic environment of the closely packed helix with thioether bonds between each heme and the TM helices (Figure 5.11). At the top, two molecules of two cysteines (Figure 5.12). The fifth ligand to the iron BChl are positioned so close together that the edges of is His, and for three of the four hemes the sixth ligand their tetrapyrrole rings overlap. Called the special pair, is a methionine residue within the helix. The reduction they receive the photon of light and release an electron potentials for the hemes vary, and paradoxically the in the primary event of photosynthesis. Each branch has another BChl (called the accessory BChl), a BPh, and a quinone. The nonheme iron is between the two quinones. The two branches follow the same local symmetry dis- played by the L and M chains. The symmetry is not per- fect, and the two branches have different electron transfer properties – in fact, electron transfer uses only the branch that associates with the L subunit. The cofactors are dif- ferentiated by groups of the protein that alter their envi- ronments. For example, the two quinones play different

PB PA

Crt BB BA

HB HA

QB Fe QA

5.11 The arrangement of the RC cofactors. The tetrapyrrole rings of BChl-b, BPh-b, and the quinone 5.12 Structure of the tetraheme cytochrome subunit of the headgroups follow the same local symmetry displayed by B. viridis reaction center. Location of the hemes is apparent the L and M chains. The figure shows the special pair, Pa when the polypeptide backbone of the cytochrome subunit and Pb (coral); the accessory BChl molecules, Ba and Bb is represented by a thread, while the heme groups are (rose); the two BPh, Ha and Hb (cyan); the ubiquinones, represented by space-filling models. The crosses mark the Qa and Qb (yellow); carotenoid, Crt (purple); and non-heme visible positions of the sulfur atoms in the thioether side iron atom (gray). The electron path utilizes only the A half, chains for the heme bridges, as well as free cysteines. indicated by the arrows. From Jones, M. R., et al., Biochim From Knaff, D. B., et al., Biochemistry. 1991, 30:1303– Biophys Acta. 2002, 1565:206–214. © 2002 by Elsevier. 1310. © 1991 by American Chemical Society. Reprinted Reprinted with permission from Elsevier. with permission from American Chemical Society.

115 Bundles and barrels

In low-resolution EM images of native B. viridis membrane, the RC appears to be surrounded by a ring of 15–17 LH1 molecules. A crystal structure of LH2 shows LH-II LH-II a double cylinder formed of an inner ring of its α subu- LH-I nits and an outer ring of its β subunits, with eight- or nine-fold symmetry, depending on the organism of ori- gin. A ring of nine-fold symmetry has 18 BChl molecules between the rings, arranged like a water wheel, and nine LH-II more BChl molecules between the helices of its outer RC wall, along with accessory pigments such as carotenoids (Figure 5.14a). However, both theoretical considera- tions and spectroscopic data suggest that the ring is not intact but is C-shaped. Dimers of LH1 observed at low resolution indicate that two C-shaped complexes face each other to form an S-shape (Figure 5.14b). Dimers 5.13 Model of the antenna system in purple of LH1 could be isolated only in the presence of another photosynthetic bacteria. Viewed from above the plane of membrane protein called PufX, named for photosyn- the membrane, each photosynthetic reaction center is thetic unit formation. This protein, which is required for surrounded by a light-harvesting complex called LH1. In anaerobic photosynthetic growth, associates closely in a addition, several LH2 complexes associate with it and with 1:1 ratio with RC–LH1 complexes. each other to rapidly transfer the excitation generated by absorption of light. From Voet, D., and J. Voet, Biochemistry, 3rd ed., John Wiley, 2004, p. 878. © 2004. Reprinted with The Reaction Cycle permission from John Wiley & Sons, Inc. The overall reaction carried out by the photosynthetic arrangement of hemes by potential does not follow the RC is the reduction of ubiquinone by cytochrome c2, internal symmetry of the cytochrome. The closest heme using the energy from light to initiate the electron trans- to the special pair is heme-3, with the highest reduc- fer. It occurs through the following steps (Figure 5.15): tion potential (370 mV). However, in order of increas- (1) The light energy absorbed via antenna complexes ing distance from the special pair, the heme reduction LH1 and LH2 is transmitted to the RC, where it 1 potentials follow the sequence high, low, high, low. promotes a charge separation with the oxidation Spectroscopic measurements indicate that electrons are of the special pair of BChl and the two-electron transferred through heme-2 and heme-3 to the special reduction of quinone to quinol. When the spe- pair, perhaps through heme-4, which is located between cial pair absorbs light, it gives a distinct optical them but has a much lower reduction potential. absorption band in the near infrared. The cofac- tors in one branch are close enough to delocalize Antennae the electrons in a conjugated system, so elec- tron transfer is fast – within 200 ps the electron To maximize the absorption of light, the membrane con- reaches the primary quinone. The primary qui- tains about 100 times more BChl molecules than RC none accepts one electron and releases it to the complexes. These other chlorophylls function as anten- secondary quinone, which then accepts a second nae to collect photons and pass the energy on to the electron from a second photon event. Next the special pair, greatly increasing the efficiency of each RC. secondary quinone picks up two protons from These chromophores are organized by light-harvesting the cytoplasm and enters the membrane’s qui- proteins into complexes called LH1 and LH2. LH1 is the none pool as quinol (Q H ). core complex, found in a fixed stoichiometry with the RC. B 2 (2) The quinol diffuses in the membrane to the cyto- LH2 is peripheral, and its synthesis depends on factors chrome bc complex, where it is reoxidized, with such as light intensity (Figure 5.13). LH2 absorbs light 1 concomitant release of electrons to periplasmic at shorter wavelengths than LH1, so it rapidly passes the electron carriers, either cytochrome c or in some energy to LH1, which passes it to the RC. Both LH1 and 2 species (including T. tepidum) the abundant LH2 contain many copies of two short protein subunits high-potential iron–sulfur protein (HiPIP). The called α and β (in LH2, α has 56 amino acids and β has cytochrome bc complex is similar to complex III 45 amino acids), each containing a single TM helix. 1 in the mitochondrial membrane. It releases pro- tons into the periplasmic space, from which they 1 In B. viridis, the reduction potentials are heme-1, –60 mV; heme-2, reenter the cytosol via the F1F0-ATP synthase (see +300 mV; heme-3, +370 mV; and heme-4, +10 mV. Chapter 11) to drive the synthesis of ATP.

116 Helical bundles

A. B.

5.14 Organization of the light-harvesting complexes. (A) Crystal structure of LH2 from Rhodopseudomonas acidophiia, viewed from the periplasmic side. The complex has nine- fold symmetry, shown with α subunits in yellow, β subunits in green, BChl in gray, and carotenoids in orange. (B) Projection map of membranes from Rb. sphaeroides grown under photosynthetic conditions in the presence of nitrate, in which two C-shaped LH1–RC complexes dimerize to an S shape. From Vermeglio, A., and P. Joliot, Trends Microbioi. 1999, 7:435–440. © 1999 by Elsevier. Reprinted with permission from Elsevier.

Soluble electron carrier proteins (Cyt c2 or HiPIP) e Light energy 1.0 * H -ATPase BChl2 H e BChl H BPhe 0.5

QA 0.0 QB

Cyt bc complex RC LHI & LHII 1 H 0.5 BChl2 Midpoint potential (Volts) ADP

ATP 1.0

5.15 Illustration of the photosynthetic electron transfer reactions in purple bacteria. The RC (pink) accepts energy from the antenna complexes LH1 and LH2 (purple). After charge separation in the RC complex reduces the secondary quinone (Qb)

to quinol, it diffuses through the membrane to the cytochrome bc1 complex (green), where it is oxidized back to quinone.

Cytochrome bc1 transfers the electrons to soluble electron carrier proteins (blue), either cytochrome c2 or HiPIP, which carry

them back to the RC. Cytochrome bc1 also generates a proton gradient used by the ATP synthase (H+-ATPase, yellow) to drive the synthesis of ATP. Redrawn from Nogi, T., and K. Miki, J Biochem (Tokyo). 2001, 130:319–329. © 2001. Reprinted with permission from the Japanese Biochemical Society. Inset: potentials of cofactors involved in electron transfer in purple bacteria. The dotted line represents the absorption of light by the special pair of BChl, and the solid line represents the energy transfer steps in the RC to the quinones, Qa and Qb (shown in Figure 5.11). The dashed line represents steps that occur outside the RC. Redrawn from Allen, J. P., and J. C. Williams, FEBS Lett. 1998, 438:5–9. © 1998 by the Federation of European Biochemical Societies. Reprinted with permission from Elsevier.

117 Bundles and barrels

Hydrophobic surface Negatively charged surface around heme-1 around heme-1

A. Tch. tepidum: Cyt subunit C. Blc. viridis: Cyt subunit

Hydrophobic surface Positively charged surface around [4Fe-4S] around heme c

B. Tch. tepidum: HiPIP D. Blc. viridis: Cyt c2

5.16 Recognition of electron carriers by the cytochrome subunit of the RC. In the case of HiPIP, the interacting surfaces on both proteins are hydrophobic ((A) and (B)), while the

recognition of cytochrome c2 is electrostatic ((C) and (D)), as the B. viridis cytochrome c2 has a large basic surface and the cytochrome subunit of RC has a large acidic surface. Negatively charged surfaces are red, and positively charged surfaces are blue. The green dotted circles indicate the interacting surfaces involved in recognition. From Nogi, T., and K. Miki, J Biochem (Tokyo). 2001, 130:319–329. © 2001. Reprinted with permission from the Japanese Biochemical Society.

(3) The periplasmic electron carriers bring electrons the RC regulates the electrochemical properties of its to the RC to reduce the special pair. Cytochrome cofactors and precisely how it takes up protons from

c2 is very similar to mitochondrial cytochrome c the cytosol (when it reduces quinone) and electrons and diffuses along the surface of the membrane from reduced periplasmic carriers (to reduce the special to the RC. The Rb. sphaeroides RC has been co- pair at the end of the cycle). To address these and other

crystallized with cytochrome c2, which appears questions, researchers are now exploring structural dif- to bind quite near the special pair and to reduce ferences between RCs in the light and those adapted to it directly. In group II RCs, the soluble electron darkness, between wild type and mutants, and between carriers dock on the cytochrome subunit of isolated RC and the complex consisting of RC and its the RC and reduce its hemes. The binding sites partners of the photosynthetic membrane. envisioned on crystal structures of the separate redox components indicate that the cytochrome c2 interaction with the RC cytochrome is electro- β-BARRELS static, while that between HiPIP and the T. tepi- dum RC is hydrophobic (Figure 5.16). While the structures of bR and the photosynthetic RC Completion of the cycle takes less than 100 μs and came to represent the image of membrane proteins, a gives a quantum yield close to one, meaning that for wholly different picture of membrane protein struc- almost every photon absorbed by the RC, one electron ture was emerging in studies of bacterial porins. Early is transferred. There is much more to learn about how data from circular dichroism and infrared spectroscopy

118 β-Barrels

A. B.

5.17 Tracing of from Rhodobacter capsulatus, the first porin with an x-ray crystal structure. (A) View from the side of the monomer, with chain ends marked by dots. (B) View of the trimer from the top. From Weiss, M. S., et al., FEBS Lett. 1990, 267:379–382. © 1990 by the Federation of European Biochemical Societies. Reprinted with permission from Elsevier.

indicated that these pore-forming proteins contain the nonpolar domain, β-strands must partner with other β-structure and little or no helical content; EM and x-ray β-strands. By forming a circular pattern in a barrel, diffraction studies suggested they span the membrane none of the strands are left without a partner. The inter- as β-barrels. The first x-ray crystal structure of a TM strand H-bonding makes the structure rigid, as well as β-barrel was that of the porin from Rhodobacter capsula- very stable: for a small β-barrel of eight TM strands, the tus (Figure 5.17) and was quickly followed by the high- H-bonding contributes a stabilization of around 40 kcal resolution structures of other bacterial porins. mol–1 (eight strands of ∼10 amino acids, each forming 80 The cell envelope of Gram-negative bacteria consists H-bonds × 0.5 kcal mol–1 H-bond). of two membranes separated by an aqueous compart- Among known β-barrel structures of membrane pro- ment called the periplasm and a thin layer of peptidogly- teins, with one exception (VDAC, see below) the number can, a net-like polymer of amino acid and sugar residues of β-strands is an even number that varies from 8 to 24. that confers structural stability (see Figure 1.1b). The In the smallest of these proteins, the center is filled with β-barrel proteins are found in the outer membrane, polar residues, while barrels of intermediate sizes (16–18 while most proteins in the inner (plasma) membrane strands) contain aqueous pores. The larger barrels with are bundles of α-helices. This difference is attributed to 22 strands have “hatch” or plug domains that close their mechanisms of biogenesis, since the more hydrophilic channels. An even number of strands allows the N and C β-barrel proteins are secreted into the periplasm before termini to meet since the barrels are made from mean- incorporation into the outer membrane, while the inner dering antiparallel strands, which means all of them membrane proteins are incorporated directly into the are hydrogen-bonded to their next neighbors along the bilayer from the translocon (see Chapter 7). New types of peptide chain. With strands connected by tight turns β-barrel proteins with a surprising diversity of function or loops, the twisted β-sheet rolls into a cylinder. In the have been discovered and many more can be expected, known structures, the periplasmic ends of the β-strands since in E. coli alone the outer membrane is estimated to form tight turns and the other ends are loops of varying have about 100 proteins. β-barrel proteins are predicted lengths, either pointing into the extracellular space or to make up 2% to 3% of the proteome of Gram-negative folding back inside the barrel. bacteria. They also are found in mitochondria and chlo- In general, the β-strands are amphipathic, with polar roplasts of eukaryotes but not in archaea or in Gram- side chains pointing to the aqueous interior of the pro- positive bacteria, except Mycobacteria. tein alternating with nonpolar side chains contacting Due to its extended peptide backbone, a β-strand the hydrophobic bilayer. This means the composition needs only seven amino acid residues to cross the non- of the lipid-exposed surfaces of β-barrels, like those of polar domain of the membrane. Typically TM β-strands α-helical integral membrane proteins, is rich in aromatic have 9–11 residues, usually tilted ∼45° from perpendicu- and nonpolar amino acids (Phe, Tyr, Trp, Val, Leu, Ile; lar to the plane of the bilayer, with observed tilts varying Figure 5.18). At the two bilayer interfaces, aromatic from 20° to 45°. To satisfy the H-bonding of the carbonyl residues are ∼40% of the lipid-exposed residues. These and amino groups along their peptide backbones within aromatic side chains form girdles whose intermediate

119 Bundles and barrels

A. B. 6 5 4 3

2

PRO 8 40 TYR THR 1 7 ASP TYR SER GLY THR LYS SER ASP THR ASN ALA TYR GLN ASP 30 THR ASN THR GLN ARG PHE LYLYS SER GLY MET SER LYS SER ASN ASN ASP ARG VAL GLN LYS GLY 20 ALA LYS GLN TYR ILE ASP GLU TYR ARG ASP VAL ASP MET TYR GLY VAL THR TYR GLY SER SER GLY PHE GLY GLY TYR VAL GLN GLY GLY SER GLN THR 10 THR THR GLY TRP ALA PHE ILE TYR GLU TRP TYR PHE VAL ILE ASN THR GLY TYR ILE SER VAL ALA GLY LEU ALA ALA SER ALA GLY GLY GLY 0 GLY LYS GLY GLY PHE GLY THR TYR ILE TYR VAL PRO LEU ASP VAL GLY VAL ARG VAL ILE GLY THR ALA GLN LEU TYR GLY SER ALA TYR SER TYR PHE TYR 10 ARG GLU LEU ARG ALA ASN VAL ARG PHE THR ILE TRP PHE GLU PRO PRO ASN ALA GLU ASP SER ASN ASP MET Distance from bilayer mid-plane (Å) 20 ASN

C. 8 8 External surface Internal

6 6 TYR

4 4 PHE TRP GLY THR VAL SER LEU

2 2 TYR Relative abundance Relative abundance ASN ASP LYS ALA ARG GLU MET HIS VAL PHE TRP LEU PRO ILE CYS GLN ILE ALA THR MET GLY ASN PRO SER HIS GLN ASP LYS ARG GLU CYS

0 0

5.18 The structure of OmpX, a small, closed β-barrel protein. Because OmpX is monomeric, all of its exposed side chains go into the bilayer. Additionally, four of its eight β-strands protrude on the outside of the cell. The barrel interior is filled with polar residues that form a hydrogen-bonding network. (A) Ribbon drawing shows nonpolar side chains (yellow) on the surface of OmpX. From Schulz, G. E., Curr Opin Struct Biol. 2000, 10:443– 447. © 2000 by Elsevier. Reprinted with permission from Elsevier. (B) Topology model shows the primary structure and distribution of amino acids in OmpX, with lipid-exposed (red), interior (black), and external loops (green). The y-axis gives the transbilayer location with the center of the bilayer as 0. To show the interstrand hydrogen bonds between each pair of strands, the eighth strand is repeated on the left (gray). (C) Bar graphs show the distribution of amino acid residues on the external surface (left) and in the interior (right). (b) and (c) from Wimley, W. C., Curr Opin Struct Biol. 2003, 13:404–411. © 2003 by Elsevier. Reprinted with permission from Elsevier.

polarity helps define the two nonpolar–polar interfaces half of the strands of the small β-barrel called OmpX of the membrane, also observed in α-helical membrane protrude into the external medium and nonspecifically proteins. The exposed loops at the extracellular surface bind proteins with exposed β-strands (see Figure 5.18a provide binding sites for colicins and bacteriophage, as and Box 5.1). Because of this unusual feature, OmpX, well as recognition sites for antibodies. Interestingly, which is not a porin, is called an adhesion protein. It is

120 β-Barrels

BOX 5.1 NMR DETERMINATION OF β-BARREL MEMBRANE PROTEIN STRUCTURE Some of the first membrane proteins to have structures solved by solution NMR (see Box 4.2) are β-barrel membrane proteins. Structures of β-barrels are easier to solve by NMR than those of helical bundles because (1) the 1H chemical shift dispersion is larger for β-sheets than for α-helices, especially with alternating hydrophilic/hydrophobic residues, and (2) β-barrels have higher thermal stability, so they can withstand higher temperatures for the longer times needed to get well-resolved NMR spectra. Furthermore, the two β-barrel proteins described here could be overexpressed in E. coli and purified in denatured form from inclusion bodies before refolding in detergent micelles (see Chapter 7), resulting in a mixed micelle particle size of 60–80 kDa. Uniform labeling of the proteins with 2H, 13C, and 15N allows detection of the protein NMR signals with little or no interference from the signals of the unlabeled detergent molecules.

OmpX from E. coli Triply labeled OmpX, a 148-residue protein, was solubilized from inclusion bodies with guanidinium chloride and reconstituted in dihexanoyl-PC (DHPC) micelles. Comparison of the two-dimensional COSY (Correlation Spectroscopy) and TROSY spectra (Figure 5.1.1a) shows the enhancement

A. COSY 105 TROSY 105

110 110 1 1 ( ( 15 115 115 15 N) [ppm] N) [ppm]

120 120

125 125

130 130 10.09.0 8.0 10.0 9.0 8.0 1 1 (a)2( H) [ppm] (b) 2( H) [ppm]

B.

C (a) N (b)

5.1.1 OmpX structure solved by solution NMR. (A) NMR spectra of OmpX in DHPC micelles obtained using 1H 15N COSY (i) and 1H 15N TROSY (ii). (B) Structure of OmpX determined by NMR. When the NMR assignments are mapped onto the x-ray structure of OmpX (i), the barrel portion is well-structured (red) and three loops are disordered (yellow). The flexibility of the loops is clear in the NMR structure represented by the superposition of 20 conformers (ii). From Fernandez C., et al., FEBS Lett. 2001, 504:173–178. © 2001 by the Federation of European Biochemical Societies. Reprinted with permission from Elsevier.

(continued)

121 Bundles and barrels

BOX 5.1 (continued)

G42 A. G14 G126 G165 110.0 G112 E134 T95 G36 Y48 G84 T9 V127

G171 T80 I155 G10

T137 N5 115.0 G50 Q172 K12 D89 N46 F15 V49 N159 F51 F40 S163 I135

E52 120.0 V166,M100 1 (ppm) Y168 N R138 15 D92 M53 L91 L79 125.0 K82 A175 N69 R169 I131 N114 A136 I87

V45 Q44 A176 130.0 Y129 L83 L162 T6

G125 G41 A130 G173 10.00 9.00 8.00 7.00 1H 2 (ppm)

B. L3

L1

L4 L2 3 2

1 8 7 6 5 4 T1 T3 T2 N (a) (b) C 5.1.2 OmpA structure solved by NMR. (A) TROSY-based 1H-15N spectrum of the TM domain of OmpA in DPC micelles. (B) NMR structure of OmpA. The NMR solution structure of OmpA is represented by the superposition of the ten confomers having lowest energy (i). Again, the β-barrel (red) is well-defined, while the loops are disordered. Four flexible loops are resolved in the ribbon diagram based on the NMR results (ii). (C) Comparison of the NMR (red) and x-ray (blue) structures for OmpA show good agreement on the β-strands and considerable differences in the loops. From Arora, A., et al., Nat Struct Biol. 2001, 8:334–338. © 2001. Reprinted by permission of Macmillan Publishers Ltd.

122 β-Barrels

BOX 5.1 (continued)

C. Extracellular end OmpA TM domain from E. coli L3 The β-barrel domain of OmpA (residues L1 172–325) was purified after denaturation in L4 urea and refolding into dodecylphosphocholine L2 (DPC) micelles. The protein was labeled with 15N,13C, and 2H, and TROSY experiments were carried out at 600 and 750 MHz. The protein in a large excess of DPC gave the best NMR C77 spectra, shown in Figure 5.1.2a with several of the assigned resonances labeled. TROSY C143 experiments were carried out with several specific amino acid-labeled samples to aid in Outer assignments. The backbone fold of the OmpA membrane TM domain was initially calculated from 91 C90 C170 NOE distance constraints and 142 torsional T3 angle constraints. It was refined by introducing T2 T1 116 H-bond constraints between adjacent Periplasmic end β-strands that were identified in the initial fold C N calculations. The structure of the eight-stranded 5.1.2 (continued) β-barrel is well defined. Figure 5.1.2b shows ten superimposed conformers (a) and the obtained with TROSY, which allowed the eight- corresponding ribbon diagram of the solution stranded fold of the polypeptide backbone structure (b) of OmpA. Like the OmpX structure, of OmpX to be determined from 107 Nuclear the NMR structure of OmpA shows much Overhauser Effect (NOE)-derived distance agreement with its x-ray structure, illustrated constraints and 140 dihedral angle constraints. in Figure Figure 5.1.2c, where the solution The structure was further refined with selective structure determined by NMR (red) is overlain protonation of the methyl groups of Val, Leu, with the x-ray crystal structure (blue). As NMR and Ile on a perdeuterated background, which gives information about protein dynamics, some enabled assignment of 526 NOE distance of the poorly defined loops are thought to have constraints. When 20 NMR conformers are intrinsic high mobility. A study of the dynamics superimposed, the result clearly reveals the of the backbone from 15N relaxation times β-barrel with its flexible loops (part ii of Figure indicates the H-bonded core is not completely 5.1.1b). The ribbon diagram in Figure 5.1.1a rigid but moves on the microsecond–millisecond maps the NMR results on the x-ray structure time scale. and distinguishes the well-structured regions Several other β-barrel membrane protein (red) from the disordered loops (yellow). The structures have been solved by NMR, including NMR result is strikingly similar to the structure the eukaryotic voltage-dependent anion channel from x-ray crystallography and in particular VDAC, the outer membrane protein OmpG, and confirmed the extension of β-strands on the two conformations of the outer membrane exterior of the protein. enzyme PagP.

induced by stress and is thought to be a part of the viru- Chapter 4) in that each subunit contributes a β-hairpin lence mechanism of E. coli. to make the 16-stranded barrel. The rich diversity of There is much variation in quaternary structure structure and function among β-barrels continues to be among the known β-barrels. Several of them are mon- uncovered, yet the paradigm for this group of membrane omers, while all the porins are homo-oligomers (usu- proteins is the porins. ally trimers) with very tight interactions between the subunits. The enzyme activity of outer membrane phos- Porins pholipase A (OMPLA; see Chapter 9) is regulated by its quaternary structure: it is only active as a homodimer. A The outer membranes of Gram-negative bacteria protect mycobacterial porin forms a β-barrel similar to those of the cells from harmful agents while allowing nutrient anthrax toxin protective antigen and α-hemolysin (see uptake via porins and other transport proteins. Porins

123 Bundles and barrels

function as passive diffusion channels and allow rapid porins varying in different strains. The outer membrane diffusion of their solutes, even at 0°C. Porins are grouped of E. coli K12 is dominated by OmpF and OmpC porins, in two categories: general porins, like OmpF and OmpC, both homotrimers of around 115 kDa whose levels are and specific porins, like PhoE (phosphoporin), LamB controlled by osmolarity of the growth medium and (maltoporin), ScrY (sucrose porin), and Tsx (the nucle- other environmental parameters. This regulation is car- oside channel). The channels of general porins do not ried out by a two-component system made up of EnvZ, discriminate among solutes that are hydrophilic, under the protein sensor of the environment, and OmpR, the ∼600 Da, and not highly charged, which allows them to transcriptional activator for their two genes. OmpC take up many nutrients such as mono- and disaccha- has been called osmoporin, because its expression is rides. In contrast, the specific porins have channels that induced by high osmolarity as well as high pH and high are selective for their solutes, although the line of demar- temperature. On the other hand, higher levels of OmpF cation is blurred as described below for PhoE. Note are expressed in a medium of low osmolarity. In pro- that aquaporin, which is a TM channel for water mol- teoliposomes OmpF exhibits somewhat faster rates of ecules in plasma membranes (see Chapter 12), is not a disaccharide uptake than does OmpC; in planar bilayers β-barrel. OmpC is slightly more cation selective than OmpF. With Porins were first detected by the permeability of the 60% amino acid identity, it is not surprising that the two outer membrane of Gram-negative bacteria to hydrophilic structures are very similar. antibiotics, which allowed them to be assayed by the rate The first crystal structure of porin from Rb. capsulatus of hydrolysis of β-lactams (penicillin and its derivatives) (see above) was quickly followed by the high-resolution in the periplasm of intact cells. Channel activity of porins structures of E. coli porins OmpF, PhoE, and LamB, and is evident when the purified proteins are reconstituted later OmpC. Now there are many structures for porins with lipids: porins make lipid vesicles permeable to small, from other bacteria, along with other specific porins. hydrophilic solutes and make voltage-gated conductance These porins are homotrimers, in which each subunit channels in black films (see Chapter 3). Porins were first forms a β-barrel (Figure 5.19). At the borders of the purified as SDS-resistant two-dimensional crystalline barrel two rings of aromatic residues interact with the aggregates. They can also be solubilized by extraction interfacial regions of the lipid bilayer. The pore through with nonionic detergents, and they bind 0.6 g detergent each subunit is delineated by one or more of the loops per gram of protein (around 200 detergent monomers that fold back into the channel, forming a constriction per protein trimer.) These remarkably stable proteins belt or eyelet. The OmpF barrel consists of 16 antipar- are highly resistant to proteases, chaotropic agents, and allel β-strands tilted by 45°, with a salt bridge between most organic solvents, in addition to many detergents. the amino and carboxyl termini to complete the cyclic structure. OmpF and OmpC OmpF is probably the most thoroughly studied porin. Crystal structures of the wild type and several mutant Typically Gram-negative bacteria have ∼105 copies of OmpF proteins show how the dimensions of the chan- general porins per cell, with the number of species of nel limit the size of solutes, while the constellation of

A. B.

5.19 X-ray structure of OmpF porin. Ribbon diagram of the OmpF trimer is shown from the periplasmic surface (A) and from the plane of the membrane (B). In (a) the lipid bilayer (light blue) is indicated with the extracellular and periplasmic sides marked. From Yamashita, E. et al., EMBO J. 2008, 27:2171–2180.

124 β-Barrels

A. The pore size and overall dimensions of OmpC are very like OmpF, with the same five charged residues in the eyelet of OmpC, even though its cation selectivity is slightly higher and its single channel conductance is lower. The pore lining at the extracellular side differs considerably, and external loops have different confor- mations, so perhaps they influence conductance through the pore.

VDAC, a mitochondrial porin

The outer membrane of mitochondria shares many characteristics of the outer membranes of Gram-nega- tive bacteria, so it is not unexpected that it has a gen- B. eral porin called the voltage-dependent anion channel (VDAC.) The structure of VDAC from human mitochon- dria, solved by a combination of x-ray crystallography and NMR, is a β-barrel with 19 strands (Figure 5.21). While the first 18 β-stands are antiparallel, the last strand is parallel to the first strand. Otherwise, it shares the general architecture of the E. coli porins, with fea- tures such as two aromatic girdles at the bilayer perim- eters. VDAC has an N-terminal α-helix that folds into the barrel to cross the channel midway within the pore, in the position of Loop 3 in OmpF and OmpC. VDACs from 5.20 The constriction zone of OmpF porin. (A) Viewed several species exist in different oligomerization states, from the plane of the membrane, the front β-strands have from monomers to hexamers or higher oligomers. NMR been removed to show the constriction of the OmpF porin relaxation rates for human VDAC in the detergent LDAO monomer by Loop 3 (orange). The black lines delineate the are indicative of a monomer–dimer equilibrium. VDACs membrane bilayer with the extracellular side (EC) on top share high sequence identity, suggesting this first struc- and the periplasmic side (Peri) on the bottom. (B) A view ture will be representative of the family. from the periplasmic side shows the Loop 3 eyelet with two acidic residues (D113 and E117, red) across from a Specific Porins cluster of basic residue (K16, R42, R82, and R132, blue) in the opposite barrel wall. From Delcour, A. H., Biochim Sharing many characteristics of general porins, the spe- Biophys Acta. 2009, 1794:808–816. cific porins add the ability to discriminate among solutes. Since they carry out passive diffusion, they cannot rely on energized conformational changes to release sub- charged residues in the pore determines other trans- strates from high-affinity binding sites (see below and port properties. Loop 3 in OmpF, which has a highly Chapter 11). Rather, the specific porins have low affinities conserved sequence motif PEFGG, constricts the pore for their substrates and have achieved selectivity through at the middle of the barrel to 15 × 22 Å (Figure 5.20). features of their channel architecture. Their specificity is With two acidic residues on Loop 3 across from a cluster apparent only with larger molecules because small sol- of basic residues on the opposite wall, Loop 3 produces utes easily permeate the nonspecific channel interiors. a local transverse electric field that accounts for the Clues to their specific functions come from their regu- cation specificity observed in conductance studies (see lation: for example, in E. coli, PhoE protein is induced Figure 3.16). Loss of five of the charges by mutations under phosphate starvation and LamB protein is induced of these residues gave a larger pore size (larger radius by growth on the carbon source maltose. in the crystal structure and higher uptake rates for dis- accharides in the liposome-swelling assay) but lower PhoE, the Phosphoporin single channel conductance. Even in the double mutant with both acidic residues mutated (D113N/E117Q), the In spite of the very high homology between PhoE, conductance drops by 50% and the cation selectivity is OmpF, and OmpC, PhoE has a specific transport func- removed. The role of the acidic residues in drawing cati- tion. When their functions are compared in whole ons through the constriction was illustrated by Brown- cells, mutants constructed to have only PhoE take up ian dynamics simulations. phosphate and phosphorylated compounds much more

125 Bundles and barrels

A. B.

5.21 The structure of VDAC, a β-barrel with 19-β-strands. (A) The structure of VDAC from human mitochondria is shown from the plane of the membrane as a ribbon presentation (rainbow coloring from blue at the N terminus to red at the C terminus) with the β-strands numbered. Note the parallel strands β1 and β19. The residues from Tyr7 to Val17 make an α-helix inside the barrel. (B) The view from the top shows the α-helix is completely enclosed by the β-barrel. From Bayrhuber, M., et al., Proc Natl Acad Sci USA 2008, 105:15370–15375.

efficiently (with a nine-fold decrease in Km of transport) with limiting maltose as the sole carbon source, and was than mutants with only OmpF or OmpC. Conductance shown to be specific for maltose and maltodextrins by studies with purified PhoE protein demonstrate a strong comparing rates of sugar uptake in the liposome-swell- anion selectivity. Furthermore, polyphosphates such as ing assay (see Figure 3.26). The weak affinity for malto-

ATP inhibit the flux of small ions through reconstituted dextrins (Kds in the low mM range) can be quantitated PhoE channels. by their inhibition of glucose uptake as well as by bind- Like OmpF and OmpC, the PhoE protein is a ing to immobilized starch. Maltodextrins also block con- 16-stranded β-barrel, with the differences between it and ductance through the LamB channel when reconstituted OmpF confined to the loop regions and a single short in black films. turn. The constriction zone of the PhoE pore has two Overall, the high-resolution structure of LamB pro- additional basic groups, which make the calculated elec- tein shows similar architecture to the other porins: it is trostatic potential of the pore more strongly positive a homotrimer in which each monomer has 18 strands in than that of OmpF. Mutation of one of these to an acidic the β-barrel (see Frontispiece). Three of the loops fold in residue (K125E) changes the ion selectivity of the PhoE to constrict the channel to a minimum diameter of 5 Å. pore. Interestingly, when OmpF was engineered to have Six aromatic residues line one side of the channel, form- the two additional lysine residues, it did not convert to ing the “greasy slide,” a hydrophobic path through the an anion-specific pore, indicating that other residues in otherwise aqueous pore (Figure 5.22a). When soaked into PhoE must contribute to its selectivity. Even though it the maltoporin crystals, maltotriose lies lengthwise in the lacks a well-defined specific binding site, PhoE facilitates channel with the hydrophobic sides of its pyranose rings the transport of phosphorylated compounds into cells, along the greasy slide. Sucrose soaked into the crystals and its blockage by polyphosphates is very similar to the gets stuck above the channel constriction, which explains inhibition of the LamB pore by maltodextrins. its very low rate of uptake into liposomes. Interestingly, a sucrose-specific porin, called SrcY, is homologous to LamB, the Maltoporin LamB protein, with a few strategic differences in the residues revealed in its high-resolution structure. Thus LamB protein is named for its role as the receptor for the selectivity of these pores for their sugar substrates is phage λ. It is required for growth of E. coli in chemostats achieved by the specificity built into their channels.

126 β-Barrels

A. B. C.

5.22 β-barrels with substrate specificity. (A) When crystals of LamB protein are soaked in maltotriose, electron density corresponding to the substrate (cyan) is detected at the “greasy slide” made up of six aromatic residues: Trp74 from the adjacent monomer, Tyr41, Tyr6, Trp420, Trp358 and Phe227 (stick side chains, yellow). In this cut-away view of the LamB monomer, Loop 1 has been removed, and Loops 3 (red) and 6 (green) can be seen folding into the barrel. From Schirmer, T., et al., Science 1995, 267:512–514. (B) Crystals of Tsx protein that have been soaked in thymidine exhibit electron density corresponding to three molecules of the nucleoside (pink) within the barrel along a greasy slide (stick models, blue) similar to that in LamB protein, as seen in this cut-away view. From Ye, J. and B. van den Berg, EMBO J. 2004, 23:3187–3195. (C) The crystal structure of FadL shows a hydrophobic groove and pocket which are occupied by detergent molecules and are presumably on the path for fatty acid transport. Two molecules of C8E4 (green stick) are in the groove and an LDAO molecule (green stick) is in the pocket. Positively charged residues (Arg157 and Lys317, blue) are within hydrogen-bonding distance of the LDAO headgroup, while hydrophobic residues (salmon) are close to its alkyl chain. Note that the N-terminal residues 1–42 make a hatch domain (light brown) From van den Berg, B., Curr Op Str Biol. 2005, 15:401–407.

Other β-barrel Transporters for phospholipid biosynthesis and for catabolism as an energy source. The x-ray structure of FadL shows a very Other specific transporters in the outer membrane long (∼50 Å) 14-stranded β-barrel that lacks an interior of E. coli have similarly selective pores. One example channel since there is a plug – called a hatch domain – is the nucleoside transporter, Tsx, named because it that consists of three short helices near the N terminus is the receptor for T6 phage. The Tsx transporter is a (Figure 5.22c). The barrel protrudes far into the extra- 12-stranded monomer with several distinct nucleotide- cellular space and has a hydrophobic groove that points binding sites along its narrow channel (Figure 5.22b). toward a prominent hydrophobic pocket. Occupied by The pore has a greasy slide like that in LamB protein, detergent molecules in the x-ray structure, these are with substrate sites in it consisting of pairs of aromatic thought to be low-affinity and high-affinity binding sites residues flanked by ionizable residues. for substrates. A second crystal structure shows a confor- Transport of hydrophobic compounds across the mational change that allows the detergent in the pocket outer membrane is accomplished by a family of proteins to move toward the periplasm, although it does not show including FadL, the long chain fatty acid transporter from the movement of the hatch required for it to be released. E. coli. These transporters provide long-chain fatty acids A similar need to remove a plug domain is observed in

127 Bundles and barrels

the iron receptors (see below); however, FadL differs in interactions have been thoroughly studied, the mecha- that it does not bind TonB for energy input. nism of energy delivery is unknown. The first high-resolution structures of iron receptors, Iron Receptors the E. coli receptors for enterobactin (FepA) and ferri- chrome (FhuA), show a common architecture consist- Receptors involved in iron transport are β-barrel proteins ing of two domains, a C-terminal domain that makes a with high affinities for their substrates. Iron is abundant 22-stranded β-barrel, and a globular N-terminal domain but unavailable in the environment due to its insolubility of ∼150 residues that fills the interior, making a plug as ferric hydroxides. To solubilize this essential mineral, (Figure 5.23). The barrel has a diameter of ∼40 Å and microorganisms synthesize and secrete siderophores, extends beyond the bilayer on the outside. Some of the iron chelators of 500–1500 Da with extremely high 11 loops on the external membrane surface are unusu- affinities for ferric ions. The enteric bacteria, includ- ally long, up to 37 residues in FepA. The plug domain, ing E. coli, synthesize and transport an iron chelator a four-stranded β-sheet and interspersed α-helices and called enterobactin and also have transport systems for loops, has two loops that extend 20 Å beyond the outer siderophores secreted by other microorganisms, such as membrane interface to frame a pocket with the binding ferrichrome. These transport systems consist of outer site for the siderophores. The largest difference between membrane receptors, periplasmic binding proteins, and the two receptors is the nature of this site, which is tai- inner membrane transport proteins. The iron receptors lored for siderophores that differ in composition and in the outer membrane are also used by colicins and charge. Comparison of the x-ray structures of FhuA in antibiotics to enter the cell. the presence and absence of siderophores reveal small Unlike the porins, the outer membrane iron recep- conformational changes in the N-terminal domain upon

tors bind their substrates with high affinities (Kd ∼0.1 ligand binding, insufficient to open a transport channel. μM) and pump them into the periplasm at the expense There are now structures of a dozen TonB-dependent of energy. The energy is provided via a complex of three transporters, including other iron and colicin receptors, proteins, TonB, ExbB, and ExbD, anchored in the inner with architecture similar to FepA and FhuA and custom- membrane. In E. coli the iron receptors, along with BtuB, ized ligand-binding sites. In addition, structures have

the receptor for vitamin B12, interact with TonB at a con- been solved for iron receptors in complex with the peri- served sequence of five amino acids near the N terminus plasmic domain of TonB (see Chapter 11). The under- called the “TonB box” (see Chapter 11). Although these standing of iron transport in microorganisms, especially

A.

5.23 Structures of FepA and FhuA proteins involved in iron transport. (A) Side views with the extracellular surface on top and the B. periplasmic end on the bottom. (B) Views from the external solvent show the blockage of the interior. The FhuA β-barrel is blue and plug domain is yellow; the FepA β-barrel is green and plug domain is orange. From Ferguson, A. D., and J. Deisenhofer, Biochim Biophys Acta. 2002, 1565:318–381. © 2002 by Elsevier. Reprinted with permission from Elsevier.

128 Conclusion

iron receptors in pathogenic bacteria, is important from From the porins to the heme receptor, outer mem- a medical standpoint because acquisition of iron is criti- brane transporters have much in common. The β-barrel cal for invading microbes, which extract iron from host structure is also used by outer membrane enzymes, proteins such as transferrin and hemoglobin. including two that are discussed in Chapter 9, OMPLA Some pathogenic bacteria gain a competitive advan- and OmpT. While the first β-barrels characterized, the tage by scavenging heme from their environment. Ser- porins, consist of β-strands of fairly uniform lengths, ratia marcescens excretes a heme-binding protein HasA others are distorted by varied strand lengths that make to capture heme and then binds the HasA hemophore the barrel very asymmetric, as seen in OmpX and FadL. with an outer membrane receptor, HasR. HasA forms a In addition, many have small regions of α-helix, for tight complex with HasR, allowing heme to transfer to a example the short appendage to the barrel in Tsx, the binding site in HasR. A TonB-type of energy transducer constriction across the barrel in VDAC, and the mixed is required to release heme into the periplasm. The high- α/β plug domains in the iron receptors FepA, FhuA, resolution structure of a HasA–HasR complex shows and HasR. Even the amazing TolC that spans the peri- the HasR structure is similar to other TonB-dependent plasm (described in Chapter 11) forms a classic β-barrel transporters with a C-terminal β-barrel and an N-termi- to cross the outer membrane. Until recently it could be nal plug domain (Figure 5.24). HasR has long extracel- said that all outer membrane proteins shared this basic lular loops that bind HasA. architecture.

Outer Membrane Secretory Proteins: Not β-barrels

Export across the outer membrane is required for secre- tion of polymers synthesized in the periplasm, such as the capsular polysaccharide that envelopes bacterial cells and helps them to evade the host immune system. The high-resolution structure of Wza, which exports cap- sular polysaccharide, is the first crystal structure of an outer membrane secretory protein. It shows a layered, elongated structure that ends in a unique α-helical barrel (Figure 5.25). Wza is an octomer, in which each subu- nit contributes an amphipathic α-helix to form a barrel with a hydrophobic exterior and a hydrophilic interior. The channel in Wza has an internal diameter of 17 Å, allowing extended polysaccharides to pass. The lower domains of Wza have hydrophilic exteriors and protrude into the periplasm to interact with the complex of pro- teins that assemble the polymer on the outside of the inner membrane. Since EM images of other outer mem- brane secretory proteins, including those for the fibers that make the bacterial pili, appear very similar to Wza, it is likely that this unusual structure will be seen again. With its α-helical barrel, Wza clearly does not fit into the standard two classes of integral membrane proteins.

CONCLUSION 5.24 X-ray structure of the ternary complex HasA–HasR– Heme. Ribbon representations of the hemophore HasA The two large classes of integral membrane proteins, (red) and its receptor HasR (blue, with the plug domain bundles of α-helices and β-barrels, show very differ- in orange) show the structure after the heme (green wire ent characteristics. Paradigms for each are drawn from model) has passed into HasR. The first five strands and early examples: bacteriorhodopsin and photosynthetic loops L1–L3 of HasR are omitted to allow a view into reaction center, and the porins. Many more examples of the barrel interior. The portions of the extracellular loops helical membrane proteins are described in later chap- (Loops 6–11, labeled) and long β-strands that are essential ters. Although fewer in number, β-barrel proteins show for HasA binding are highlighted (yellow). From Krieg, S., surprising diversity. Tremendous progress has produced et al., Proc Natl Acad Sci USA 2009, 106:1045–1050. many elegant structures of β-barrel membrane proteins,

129 Bundles and barrels

A. D4 B. C. C

O.M. D4

N

D3 D3

D2 D2

D1 D1

5.25 X-ray structure of the secretory protein Wza. The structure of the Wza octomer (ribbon representation with each subunit a different color) is viewed from the plane of the membrane (A) and the extracellular side looking down through the helical barrel (B). The four segments of Wza are numbered D1–D4. (C) A side view of the monomer (rainbow colored with the N terminus blue and the C terminus red) shows the extended C-terminal region that makes the helical barrel. From Collins, R. F. and J. P. Derrick, Trends Microbiol. 2007, 15:96–100.

along with enhanced understanding of their transport mechanisms. Even so, it is likely the known examples a high resolution structural view of vectorial represent a small fraction of the proteins in this class proton transport. Biochim Biophys Acta. 2002, of membrane proteins. If the genomic analyses are cor- 1565:144–167. rect, there are many more β-barrel proteins to be char- acterized. The next chapter looks at bioinformatics Seminal Papers techniques used to predict and analyze membrane pro- Henderson, R., and P. N. T. Unwin, Three-dimensional teins and shows how they are grouped in families. model of purple membrane obtained by electron microscopy. Nature. 1975, 257:28–32. Oesterhelt, D., and W. Stoeckenius, Functions of a FOR FURTHER READING new photoreceptor membrane. Proc Natl Acad Sci USA. 1973, 70:2853–2857. Bacteriorhodopsin Reviews Pebay-Peyroula, E., G. Rummel, J. P. Rosenbusch, Andersson, M., E. Malmerberg, S. Westenhoff, et al., and E. M. Landau, X-ray structure of Structural dynamics of light-driven proton pumps. bacteriorhodopsin at 2.5 angstroms from Structure. 2009, 17:1265–1275. microcrystals grown in lipidic cubic phases. Haupts, U., J. Tittor, and D. Oesterhelt, Closing in on Science. 1997, 277:1676–1681. bacteriorhodopsin. Ann Rev Biophys Biomol Struct. Winget, G. D., N. Kanner, and E. Racker, Formation 1999, 28:367–399. of ATP by the adenosine triphosphatase complex Hirai, T., S. Subramaniam, and J. K. Lanyi, Structural from spinach chloroplasts reconstituted together snapshots of conformational changes in a with bacteriorhodopsin. Biochim Biophys Acta. seven-helix membrane protein: lessons from 1977, 460:490–499. bacteriorhodopsin. Curr Opin Struct Biol. 2009, 19:433–439. Photosynthetic Reaction Centers Reviews Lanyi, J. K., and H. Luecke, Bacteriorhodopsin. Curr Deisenhofer, J., and H. Michel, Structures of bacterial Opin Struct Biol. 2001, 11:415–419. photosynthetic reaction centers. Annu Rev Cell Neutze, R., E. Pebay-Peyroula, K. Edman, A. Royant, Biol. 1991, 7:1–23. J. Navarro, and E. M. Landau, Bacteriorhodopsin:

130 For further reading

Nogi, T., and K. Miki, Structural basis of bacterial Iron Receptors photosynthetic reaction centers. J Biochem Chimento, D. P., R. J. Kadner, M. C. Wiener, (Tokyo). 2001, 130:319–329. Comparative structural analysis of TonB- Vermeglio, A., and P. Joliot, The photosynthetic dependent outer membrane transporters: apparatus of Rhodobacter sphaeroides. Trends implications for the transport cycle. Proteins. Microbiol. 1999, 7:435–440. 2005, 59:240–251. Clarke, T. E., L. W. Tari, and H. J. Vogel, Structural Seminal Papers biology of bacterial iron uptake systems. Curr Top Deisenhofer, J., O. Epp, K. Miki, R. Huber, and Med Chem. 2001, 1:7–30. H. Michel, Structure of the protein subunits Krewulak, K. D., and H. J. Vogel, TonB or not TonB: in the photosynthetic reaction centre of is that the question?. Biochem Cell Biol. 2011, Rhodopseudomonas viridis at 3Å resolution. 89:87–97. Nature. 1985, 318:618–624. Létoffé, S., G. Heuck, P. Delepelaire, N. Lange, and Deisenhofer, J., O. Epp, I. Sinning, and H. Michel, C. Wandersman, Bacteria capture iron from heme Crystallographic refinement at 2.3 Å resolution by keeping tetrapyrrol skeleton intact. Proc Natl and refined model of the photosynthetic reaction Acad Sci USA. 2009, 106:11719–11724. centre from Rhodopseudomonas viridis. J Mol Biol. Noinaj, N., M. Guillier, T. J. Barnard, and S. K. 1995, 246:429–457. Buchanan, TonB-dependent transporters: regulation, structure and function. Ann Rev β-Barrels Microbiol. 2010, 64:43–60. Bishop, R. E., Structural biology of membrane- intrinsic β-barrel enzymes: sentinels of the Outer Membrane Secretory Proteins . Biochim Biophys Acta. Collins, R. F., and J. P. Derrick, Wza: a new structural 2008, 1778:1881–1896. paradigm for outer membrane proteins?. Trends Buchanan, S. K., β-barrel proteins from bacterial Microbiol. 2007, 15:96–100. outer membranes: structure, function and Cuthbertson, L., I. L. Mainprize, J. H. Naismith, and refolding. Curr Opin Struct Biol. 1999, 9:455–461. C. Whitfield, Pivotal roles of the outer membrane Fairman, J. W., N. Noinaj., and S. K. Buchanan, The polysaccharide export and polysaccharide structural biology of β-barrel membrane proteins: copolymerase protein families in export of a summary of recent reports. Curr Opin Struct lipopolysaccharides in Gram-negative bacteria. Biol. 2011, 21:523–531. Microbiol Mol Biol Rev. 2009, 73:155–177. Schulz, G. E., β-barrel membrane proteins. Curr Opin Struct Biol. 2000, 10:443–447. First Crystal Structures of Proteins Wimley, W. C., The versatile β-barrel membrane Discussed in Chapter 5 protein. Curr Opin Struct Biol. 2003, 13:404–411. Baslé, A., G. Rummel, P. Storici, J. P. Rosenbusch, and T. Schirmer, Crystal structure of osmoporin Porins OmpC from E. coli at 2.0 Å. J Mol Biol. 2006, Achouak, W., T. Heulin, and J.-M. Pages, Multiple 362:933–942. facets of bacterial porins. FEMS Microbiol Lett. Bayrhuber, M., T. Meins, M. Habeck, et al., Structure 2001, 199:1–7. of the human voltage-dependent anion channel. Delcour, A. H., Solute uptake through general porins. Proc Natl Acad Sci USA. 2008, 105:15370– Frontiers Biosci. 2003, 8:1055–1071. 15375. Delcour, A. H., Outer membrane permeability and Buchanan, S. K., B. S. Smith, L. Venkatramani, et al., antibiotic resistance. Biochim Biophys Acta. 2009, Crystal structure of the outer membrane active 1794:808–816. transporter FepA from Escherichia coli. Nat Struct Dutzler, R., Y.-F. Wang, P. J. Rizkallah, J. P. Biol. 1999, 6:56–63. Rosenbusch, and T. Schirmer, Crystal structures Cowan S. W., T. Schirmer, G. Rummel, et al., Crystal of various maltooligosaccharides bound to structures explain functional properties of two maltoporin reveal a specific sugar translocation E. coli porins. Nature. 1992, 358:727–733. pathway. Structure. 1996, 4:127–134. Deisenhofer, J., O. Epp, K. Miki, R. Huber, and Nikaido, H., Porins and specific channels of bacterial H. Michel, Structure of the protein subunits outer membranes. Mol Microbiol. 1992, 6:435– in the photosynthetic reaction centre of 442. Rhodopseudomonas viridis at 3Å resolution. Schirmer, T., General and specific porins from Nature. 1985, 318:618–624. bacterial outer membranes. J Struct Biol. 1998, Dong, C., K. Beis, J. Nesper, et al., Wza the 121:101–109. translocon for E. coli capsular polysaccharides

131 Bundles and barrels

defines a new class of membrane protein. Nature. Pebay-Peyroula, E., G. Rummel, J. P. Rosenbusch, 2006, 444:226–229. and E. M. Landau, X-ray structure of Ferguson, A. D., E. Hofmann, J. W. Coulton, K. bacteriorhodopsin at 2.5 angstroms from Diederichs, and W. Welte, Siderophore-mediated microcrystals grown in lipidic cubic phases. iron transport: crystal structure of FhuA with Science. 1997, 277:1676–1681. bound lipopolysaccharide. Science. 1998, Schirmer, T., T. A. Keller, Y. F. Wang, and J. P. 282:2215–2220. Rosenbusch, Structural basis for sugar Forst, D., W. Welte, T. Wacker, and K. Diederichs, translocation through maltoporin channels at 3.1Å Structure of the sucrose-specific porin ScrY from resolution. Science. 1995, 267:512–514. Salmonella typhimurium and its complex with van den Berg, B., P. N. Black, W. M. Clemons, and sucrose. Nat Struct Biol. 1998, 5:37–45. T. A. Rapoport, Crystal structure of the long Kreusch, A., A. Neubüser, E. Schiltz, J. Weckesser, chain fatty acid transporter FadL. Science. 2004, and G. E. Schulz, Structure of the membrane 304:1506–1509. channel porin from Rhodopseudomonas blastica at Weiss, M. S., A. Kreusch, E. Schiltz, et al., The 2.0 Å resolution. Protein Sci. 1994, 3:58–63. structure of porin from Rhodobacter capsulatus at Krieg, S., F. Huche, K. Diederichs, et al., Heme 1.8 Å resolution. FEBS Lett. 1991, 280:379–382. uptake across the outer membrane as revealed Ye, J. and B. van den Berg, Crystal structure of the by crystal structures of the receptor-hemophore bacterial nucleoside transporter Tsx. EMBO J. complex. Proc Natl Acad Sci USA. 2009, 2004, 23:3187–3195. 106:1045–1050.

132 FUNCTIONS AND FAMILIES 6

10 50 100 150 200 250 Outside Amino terminus 3 3 4 5 7 1 2 6

0 Hydropathy index 3 Inside 10 50 100 150 200 250 Residue number Carboxyl terminus

A hydropathy plot predicts bilayer-spanning regions of membrane proteins like bacteriorhodopsin, shown here with its TM helices colored to match the corresponding hydrophobic peaks on the plot. From Nelson, D. L., and M. M. Cox, Lehninger Principles of Biochemistry, 4th ed., W. H. Freeman, 2005, pp. 375–376. © 2005 by W. H. Freeman and Company. Used with permission.

The functions of most membrane proteins are tradi- signaling, membrane-bound enzymes participate in elec- tionally described as enzymes, transporters and chan- tron transport chains and other redox reactions, as well as nels, and receptors, although the demarcations of these the metabolism of membrane components such as phos- groups are blurred. For example, ATPases that are ion pholipids and sterols. Many membrane enzymes require channels and other active transport proteins (“per- specific lipids or particular types of lipids for activity (see meases”) are studied as enzymes as well as transporters. Chapter 4). In addition, soluble enzymes may bind to the Some receptors involved in signaling are also tyrosine membrane periphery for catalysis involving substrates in kinases; other receptors are gated ion channels. Mem- the membrane, for efficient access to substrates that are brane proteins can also be classified using data from passed along a series of bound enzymes to avoid dilution bioinformatics, which describe families of proteins, func- in the cytoplasm, or for regulation by modulation of their tions of homologous domains, and evolutionary relation- activities, as described in Chapter 4. In all these cases, the ships. This chapter utilizes both approaches to look at the heterogeneity and dimensionality of the membrane affect roles of membrane proteins before turning to tools for the activity of the enzymes. prediction of membrane protein structure and genomic Whether membrane enzymes require lipids for their analysis of membrane proteins. It starts by describing the activity or catalyze reactions involving membrane-bound general characteristics of membrane enzymes, transport- substrates, they are restricted to the available lipid or sub- ers, and receptors, giving specific examples and briefly strate near them in the membrane. This means the rate of identifying their protein families. enzyme activity (velocity) depends on the concentration of available lipid or substrate in the bilayer in the vicinity of the enzyme, not the concentration in the bulk solution. MEMBRANE ENZYMES For kinetic treatment of these cases, the amount of sub- strate available to the enzyme is not determined by its bulk The enzymes found in membranes carry out diverse concentration but by its concentration in the lipid bilayer, catalytic functions. In addition to solute transport and so it is described with surface concentration terms, either

133 Functions and families

BOX 6.1 SURFACE DILUTION EFFECTS A description of the actions of lipid-dependent enzymes must consider both three-dimensional bulk interactions that occur in solution and two-dimensional surface interactions in the bilayer. Thus a kinetic model for surface dilution effects takes into account the concentrations of the required lipid in both phases, whether the enzyme is reconstituted into micelles or liposomes. The detergent–lipid mixed micelle is particularly amenable for study of surface dilution kinetics because it allows both the bulk concentration and the surface concentration of a lipid substrate to be varied. Then the surface concentration of the lipid is expressed in terms of its mole fraction, [lipid] / ([lipid] + [detergent]). With a fixed number of lipids at several different micelle concentrations, the bulk concentration of lipids does not vary, but the lipid:micelle ratio changes. This is illustrated by the lipid-dependent binding of a phorbol ester by PKC in mixed detergent–lipid micelles (see Figure 6.1.1). PKC requires PS, as described in Chapter 4. When the level of Triton X-100 is increased as the amount of PS is held constant, the activity drops: the required PS is not as available to activate the enzyme. If the ratio of PS to Triton X-100 is held constant as the concentration of Triton X-100 increases, the activity stays the same because the mole percent of PS is maintained. Thus PKC shows a loss of activity at high detergent concentration (without addition of PS) because the surface concentration of lipid has been diluted.

Constant mol % 1.2 phosphatidylserine

1.0

0.8

0.6

0.4 Constant bulk concentration to protein kinase C (pmol) H]Phorbol dibutyrate bound phosphatidylserine 3 0.2 [

0 0 0.25 0.5 0.75 1.0 1.25 1.5 1.75 2.0 Triton X-100 (W/V %) 6.1.1 Surface dilution effect on the lipid-dependent enzyme, PKC. Redrawn from Gennis, R. B., Biomembranes: Molecular Structure and Function, Springer-Verlag, 1989, p. 228. © 1989 by American Society for Biochemistry & Molecular Biology. Reproduced with permission of American Society for Biochemistry & Molecular Biology via Copyright Clearance Center.

The surface dilution kinetic model was developed for cobra venom PLA2 (also described Chapter 4) in Triton X-100/PC mixed micelles. In the case of such peripheral proteins, the analysis includes the binding step that takes place in the three-dimensional bulk solution and results in restricting the interactions to the two-dimensional surface of the membrane. Thus the first step is the binding of the soluble enzyme to the micelles, followed by binding the substrate in the bilayer:

Bulk Step SSurface Step

k k k E+←→ A⎯+←1 EA EA B ⎯←→2→EAB⎯+3 EA Q, k−−1 k 2 k−3

where E is the enzyme, A is the mixed micelle, EA is the enzyme–mixed micelle complex, B is the substrate, EAB is the catalytic complex, and Q is the product. (Note that the equation for the first step applies whether the enzyme binds nonspecifically, in which case A is the sum of the molar concentrations of the lipid and the detergent, or specifically to a phospholipid species, in which case A is the molar concentration of that lipid.) Once bound, the association between EA and B is a function of their surface concentrations, expressed in units of mole fraction or mole percent. For

134 Membrane enzymes

BOX 6.1 (continued) a water-insoluble integral membrane enzyme, the protein is delivered to the assay as a detergent– protein mixed micelle, which is likely to fuse with lipid micelles. In this case, E represents the concentration of the enzyme–detergent complex. The kinetic equation becomes

VAB[][] v = max , A B ++B KKs m Km [] A [][] AB

A = where the dissociation constant, K/s kk−11, and the interfacial Michaelis constant, B =+ Km ()/kkk−23 2 , are expressed in surface concentration units.

mole fraction ([substrate] / {[substrate] + [total lipid]}) with Michaelis–Menten kinetics and rates limited by or mole percent (mole fraction × 100). Then the classical substrate diffusion. Both the substrate and product of Michaelis–Menten equation is applicable, with surface the DAGK reaction are lipid-soluble allosteric effectors concentration units used for the substrate concentration and second messengers in signal transduction in mam- (see Box 6.1). When an enzyme that loses activity upon mals, which have ten isoforms of DAGK. Localized to dilution of the lipid is solubilized and reconstituted into the cytosol or the nucleus, the mammalian DAGKs are micelles or liposomes, the amount of lipid remaining will peripheral proteins that dock on the membrane to access affect its activity. For this reason, the extent of separation their substrate. Two of the isozymes are activated by of lipid and protein components during solubilization of both PE and cholesterol and inhibited by sphingomyelin the membrane components (see Chapter 3) is critical in when reconstituted in large unilamellar vesicles. The studies of membrane enzymes. mammalian isozymes appear to have specialized roles Only a few high-resolution structures of integral mem- in signaling based on their different sites and patterns brane proteins that are classical enzymes are available of expression. (see Chapter 9) in addition to those involved in energy The E. coli DAGK provides an example of a very transduction and transport. However, many membrane well-characterized integral membrane enzyme that has enzymes have been extensively characterized biochemi- resisted formation of well-diffracting crystals. Located in cally. In addition, some enzymes that are integral mem- the inner membrane, DAGK functions to replenish phos- brane proteins have large soluble portions that can be phatidic acid. The phosphatidic acid is needed in a sur- removed by proteolytic cleavage and crystallized. When prising turnover of membrane phospholipid that provides the soluble portion of the enzyme carries out the catalysis, the cell with osmoprotectants called membrane-derived its structure reveals the binding site and catalytic groups oligosaccharides (MDOs). MDOs are made in the peri- to give a picture of the enzyme function, even though it is plasm under conditions of low osmolarity, when they can missing the portion that anchors the enzyme in the mem- account for up to 5% of cell dry weight. These hydrophilic brane and perhaps plays a regulatory role. Diacylglycerol glucans are too large and highly charged to diffuse kinase (DAGK) is an example of a membrane enzyme through the porins, so MDOs stay in the periplasm and that was thoroughly investigated long before its high- balance the high osmolarity of the cytoplasm. MDOs con- resolution structure was known. Some of the P450 cyto- tain 6–12 glucose units joined by β-1,2 and β-1,6 linkages chromes provide examples of membrane enzymes whose that are variously substituted with sn-1-phosphoglycerol, soluble portions have been crystallized and their structure phosphoethanolamine, and O-succinyl ester residues. The solved. Both of these examples are enzymes that occur in phosphoglycerol and phosphoethanolamine are enzy- mammals in numerous isoforms, different forms of the matically added from PG and PE, respectively, leaving enzymes that are encoded by different genes. Isoforms, diacylglycerol (DAG). A second source of DAG is the uti- also called isozymes, are catalytically and structurally lization of phosphoethanolamine from PE in the biosyn- similar and are typically located in different tissues of the thesis of lipopolysaccharide. Too much DAG could affect organism, where they respond to different regulators. the membrane by driving formation of nonbilayer phases. It is the job of DAGK to phosphorylate the DAG to return Diacylglycerol Kinase it to the phospholipid pool in the membrane. The activity of DAGK in E. coli is determined by the rate of TM flip- Diacylglycerol kinase carries out the reaction flop supplying diacylglycerol from the outer to the inner Diacylglycerol+→ MgATP Phosphatidic acid + MgADP leaflet of the plasma membrane.

135 Functions and families

The smallest known kinase, E. coli DAGK, is a homo- orientational restraints, in addition to distances derived trimer of 13-kDa subunits (121 amino acids) with three from biochemical disulfide mapping. DAGK was one active sites. Purified DAGK has good activity when recon- of the first polytopic α-helical membrane proteins to stituted in lipid–detergent mixed micelles and phosphol- be solved by NMR (see Box 4.2). The nine TM helices ipid vesicles, as well as in bicelles, amphipols, and cubic (three from each subunit) are arranged around a central phase monolein. Formation of a DAGK–DAG–MgATP core of the three TM2 helices in a left-handed parallel ternary complex does not depend on order of substrate bundle (Figure 6.2). TM3 is domain-swapped, meaning addition, supporting a random equilibrium kinetic it interacts with TM1 and TM2 from the adjacent subu- mechanism with direct transfer of the phosphoryl group nit. The entire molecule can be viewed as three overlap-

from ATP to DAG. The KM for DAG corresponds to the ping four-helix bundles, each containing helices from concentration of DAG in the membrane of cells grown in all three subunits (Figure 6.2c,d). The interface between low osmolarity when the MDO cycle is active. each four-helix bundle contains a cavity bound by a por- In native membranes DAGK is unusually stable, tico: TM1 and TM3 are pillars and the loop between TM2 active even after a few minutes at 100°C, and mutants and TM3 forms the cap. The spatial limits of the por- have been engineered to increase its stability even more. tico determine the substrate specificity, restricting the The effects of mutations that alter every residue in the head group of the lipid binding in the cavity to glycerol sequence show that most critical residues of DAGK for without restricting the composition of the acyl chains. (A both folding and catalysis are located in or next to the similar lipid binding site in a portico is seen in OMPLA, active site, along with several critical residues clustered see Chapter 9). Titrations of DAGK with its substrates near the N terminus. Patterns of disulfide bond forma- DAG and MgATP, product (phosphatidic acid), and a tion between single-cysteine mutants allowed prediction nonhydrolyzable ATP monitored by NMR show no glo- of intramolecular distances. These results, along with bal conformational change upon binding. Given the spectroscopic studies and topology predictions, indicate structure lacks side chain positions, it is not possible to the DAGK monomer contains three TM helices plus two see a well-defined nucleotide-binding site, and the N ter- short amphipathic helices at its N terminus (Figure 6.1). minus is too dynamic to appear. Despite years of effort, no highly diffracting crystals The E. coli DAGK is clearly unlike the water-soluble of DAGK suitable for x-ray crystallography have been DAGKs, and it lacks the structural and functional motifs obtained. In a major accomplishment, the backbone found in kinases in general. The only other member of structure of DAGK in dodecyl PC micelles was solved by the family is undecaprenol kinase (UDPK), which plays solution NMR, employing structural constraints from an analogous role in Gram-positive bacteria. UDPK paramagnetic relaxation enhancement-derived long- phosphorylates undecaprenol, another lipid carrier of range distances and residual dipolar coupling-based hydrophilic molecules needed for biosynthesis – in this

N 1 A N G Y S N I S E Y W R H T T K K G A I E T G R A G A 80 20 W D R 90 L F I A L I V S A V G R 10 N E A Cytosol A I K E S M D 30 N G A L 71 S A I A A 99 R F E V V Q I A L E M I V G V L A A M I V V S A V Membrane L L S V 6.1 Topology of DAGK, showing sites 39 A I 61 I 109 V L L W T critical for catalysis and/or folding. Mutants V I V C A R T L I were generated by replacing each residue C I L W A W L D S in a cysteineless mutant form of DAGK D V Periplasm H with cysteine. Residues are represented 50 F by spheres that are colored to show their mutant is active G roles: Blue have at least 20% wild type mutant is catalytically impaired 121 C activity, yellow fold but have less than mutant is folding defective and catalytically impaired 20% wild type activity, magenta do not fold no mutant available (hence inactive). Highly conserved residues >95% sequence identity across orthologs have a bold outline. From Van Horn, W. D., et al., Science. 2009, 324:1726–1729.

136 Presenilin, an intra-membrane protease

A. B.

C. D.

6.2 Backbone structure of DAGK solved by solution NMR. Twenty residues at the N terminus are missing. The three subunits are colored red, blue, and green. (A) Ribbon diagram of the trimer from the plane of the membrane highlighted to show a single portico- like active site. (B) Close-up of the active site, with the highly conserved residues indicated in black stick representation. (Additional highly conserved residues are found on the missing N terminus.) (C) Cytosolic view of the structure, again with active site residues of one site in stick representation. (D) View of clipped TM helices to show the topology, with one four-helix bundle comprising xxx in a dashed square. Note the domain swapping: each TM2 interacts with TM1 and TM3 from the adjacent subunit. From Van Horn, W. D. and Sanders, C. R., Ann Rev Biophysics. 2012, 41:81–101.

case, of peptidoglycan in the cell wall. The dgkA gene that membrane utilizing the catalytic mechanisms of soluble encodes both DAGK and UDPK is widely distributed in proteases. Four families of intra-membrane proteases prokaryotes but rarely found in archea and eukaryotes, have been identified: rhomboid, signal peptide peptidase, and the two enzymes have no significant homology with site 2 protease, and presenilin. Rhomboid proteases are any other proteins. The two enzymes present a unique serine proteases that are important in embryogenesis in solution to the need for catalysis within membranes. eukaryotes. A bacterial rhomboid, GlpG, has been crys- tallized and is described in Chapter 9. Signal peptide peptidase (SPP) is an aspartyl protease that plays a role PRESENILIN, AN INTRA-MEMBRANE in the biogenesis of membrane and secreted proteins PROTEASE by digesting the signal peptides removed during their export from the cytosol (see Chapter 7). Site protease 2 Remarkably, intra-membrane proteases or I-CLiPs (SP2) is a metalloprotease that releases a transcription (intramembrane-cleaving proteases) are able to hydro- factor from the sterol regulatory element-binding pro- lyze peptide bonds in the hydrophobic milieu of the tein involved in control of cholesterol and FA synthesis.

137 Functions and families

A. B.

6.3 Role of membrane proteases in Aβ production. The sequential cleavage of the amyloid precursor protein (APP) occurs by two pathways. (A) Nonamyloidogenic processing involves cleavage by α-secretase within the Aβ peptide region (red) to create a soluble fragment (APPsα) and a membrane-bound C-terminal fragment (α-CTF), followed by γ-secretase cleavage of the α-CTF to release a small peptide (p3) and an intracellular C-terminal fragment (AICD). (B) Amyloidogenic processing starts with cleavage by β-secretase to release a soluble fragment (APPsβ) from the C-terminal fragment (β-CTF, also called C99) in the membrane. Then γ-secretase cleaves off the Aβ peptide and again releases AICD. From O’Brien, R. J. and P. C. Wong, Ann Rev Neurosci. 2011, 34:185–204.

Both SPP and SP2 require precleavage of their sub- formation. Therefore, segregation of APP into lipid rafts strates, which is how they are regulated. that are more likely to undergo endocytosis favors the Presenilin is an aspartyl protease included in the amyloidogenic pathway. γ-secretase complex that carries out the final step in the Neuronal β-secretase (also called BACE1) is an aspar- release of the Aβ peptide associated with Alzheimer’s dis- tyl protease that cleaves APP outside the membrane at ease (AD). In fact, the enzyme is named for its role in the one of two sites, numbered +1 and +11 based on the formation of senile plaques that accumulate in the brain, sequence of Aβ. This cleavage generates two C-termi- causing dementia in AD patients. Because it affects mil- nal fragments C99 and C83. C99 is the substrate for lions of people worldwide, the formation of Aβ is the γ-secretase, which can cleave it at one of several resi- focus of much research. dues in the γ-site within the TM segment (Figure 6.4). Aβ is produced from amyloid precursor protein (APP), With several cleavage sites for γ-secretase, Aβ of differ-

a bitopic membrane protein expressed at high levels in ent lengths are observed, of which Aβ1–40 (Aβ40) and

the brain, through proteolytic steps carried out by secre- Aβ1–42 (Aβ42) are the most common. Several lines of tases in the membrane. In spite of intense investigation, experiments indicate that Aβ42 is more toxic than Aβ40. the physiological function of APP is not known. In mice, Clearly the ratio of different proteolytic products is criti- deletion of APP is not deleterious, and overexpression of cal in the development of amyloid disease, which is why APP is not harmful. APP metabolism is complicated by most of the mutations in APP that result in early onset alternate pathways of proteolysis, only some of which AD cluster around the cleavage sites. generate Aβ (Figure 6.3). When APP is cleaved first by γ-secretase is a multiprotein complex composed of α-secretase and then γ-secretase (usually on the cell sur- presinilin (PS1 or PS2), nicastrin (Nct), anterior phar- face), it releases a peptide that does not form toxic amy- ynx defective-1 (Aph-1), and presenilin enhancer-2 (Pen- loid fibrils. When it is cleaved by β-secretase and then 2) in a ratio of 1:1:1:1. PS1 and PS2 are isoforms whose γ-secretase (usually in endosomes formed from clathrin- expression varies in different tissues; they are the cata- coated pits) Aβ is produced and released to the outside lytic subunits with two key Asp residues in the active site. of the cell, where it may aggregate and result in fibril Nct has a single TM segment and a large glycosylated

138 Presenilin, an intra-membrane protease

6.4 Cleavage sites in amyloid precursor protein. The different fragments of the amyloid precursor protein result from different cleavage sites for α-, β-, and γ-secretase (red scissors). Clustered near the cleavage sites are sites of familial mutations (residues in green at black arrows) that affect the N- and C-terminal regions of Aβ. An additional cluster is adjacent to a critical turn region in Aβ that is related to amyloid formation. From Straub, J. E. and Thirumalai, D., Ann Rev Phys Chem. 2011, 62:437–463.

extracellular domain and assists in substrate selection. favors partitioning into rafts and thus pushes more into Aph-1 has seven TM segments and promotes complex the amyloidogenic pathway. formation and trafficking. Pen-2 has 2 TM segments and triggers endoproteolysis that is involved in regulation. P450 Cytochromes The active site of presinilin is similar to that in SPP,

with the Asp residues in the same motif, YDXnLGhGD, P450 cytochromes are a ubiquitous superfamily of heme- where X is any amino acid and h is a hydrophobic resi- containing monooxygenases that are named for their due. Interestingly, they are oriented oppositely in the absorption band at 450 nm. They are involved in metab- membrane, allowing them to cleave oppositely oriented olism of an unusually wide range of endogenous and TM helices in their substrates: presenilin cleaves type exogenous substances. They participate in the metab- I proteins (with their C terminus in the cytosol), while olism of steroids, bile acids, fatty acids, eicosanoids, SPP cleaves type II proteins. The low-resolution struc- and fat-soluble vitamins, and they convert lipophilic ture of the γ-secretase complex indicates it has an inte- xenobiotics (foreign compounds) to more polar com- rior aqueous cavity. pounds for further metabolism and excretion. P450 Because presinilin has other substrates that are cytochromes catalyze hydroxylation of an organic sub- involved in cell growth and differentiation, inhibition of strate, RH, to R–OH with the incorporation of one oxy-

γ-secretase is not a feasible approach for drug therapy. gen atom of O2, while reducing the other oxygen atom to

New Alzheimer’s drugs are designed to modulate the H2O. Their source of reducing power is NAD(P)H, with enzyme to decrease cleavage of C99 while allowing it to electrons either donated directly from a flavin-contain- accept the other substrates. Another strategy for therapy ing reductase (class II) or shuttled to the P450 by small targets cholesterol: Elevated levels of cholesterol are cor- soluble electron carrier proteins (class I; Figure 6.5). related with amyloid formation, and amyloid plaques Some self-sufficient P450s in bacteria have been found in AD patients are highly enriched in cholesterol. It has to contain heme, flavin, and iron–sulfur centers in one been shown by NMR that cholesterol binds APP, which polypeptide.

139 Functions and families

e– e– Class I Heme Fe–S FAD/NAD(P)H

e– Class II Heme FMN/FAD/NADPH

6.5 Examples of the two classes of P450 cytochromes. The classes are distinguished by their redox partners. Class I is represented by the P450 system from Pseudomonas putida with P450cam (with heme), putidaredoxin (with Fe–S), and putidaredoxin reductase (with FAD/NAD(P)H). Class II is represented by P450BM3 from Bacillus megaterium (with heme) and cytochrome P450 reductase from rat iver (with FMN/FAD/NADPH). The high-resolution structures for P450cam and P450BM3 have been solved for proteins lacking their TM segments. From Li, H., and T. L. Poulos, Curr Top Med Chem. 2004, 4:1789–1802. © 2004. Reprinted by permission of Bentham Science Publishers Ltd.

In eukaryotes, most P450 cytochromes are bound to helix that brings the edge of the catalytic domain into the mitochondrial inner membrane or to the ER. The the lipid bilayer. P450s in the ER are integral membrane proteins, each About half of the 57 P450 cytochromes in the human bound by a single N-terminal TM domain. Truncation of genome metabolize endogenous compounds, while many the N-terminal domain is not sufficient to express solu- of the rest metabolize drugs and other xenobiotics. Some ble protein for crystallization and must be accompa- plants have around 300 or more P450s. The genes are nied by several point mutations to disrupt a peripheral classified into families based on sequence identity: to membrane-binding site. Even so, detergent is needed the root symbol CYP is added a number for the fam- to prevent aggregation. High-resolution structures of ily (one of more than 200 groups with >40% sequence such constructs show the large soluble domain is a tri- identity), followed by a letter for subfamilies (having angular prism shape with β-sheets along part of one side >55% identity), followed by a number for the gene. For and α-helices forming the rest (see Figure 6.5). Struc- example, sterol 27-hydroxylase is CYP27A and vita- tures of these P450 catalytic domains in the presence min D 24-hydroxylase is CYP27B, because their amino and absence of substrates reveal that dramatic confor- acid sequences are >40% identical (placing them both mational flexibility must be needed for binding such a in CYP27) but <55% identical. If there were another variety of compounds. In order to react with partners P450 cytochrome in the CYP27A class, then it would be that provide electrons to P450s, such as cytochrome CYP27A2. The presence of many gene clusters, each con- b5, the TM segments of both are required. The first taining up to 15 CYP genes, in most genomes suggests structure of a full length cytochrome P450, the lanos- that the diversity of P450 cytochromes arose from many terol 14αdemethylase (ScErg11p) from yeast, reveals an gene duplications in addition to likely gene amplifica- acute angle between the catalytic domain and the TM tions and lateral transfers.

140 Transport proteins

A. –6 . . –1 TRANSPORT PROTEINS Jmax 1.0 10 mM cm sec ) Transport of molecules across the bilayer is obviously an 6 1.0 10 important function of membrane proteins and utilizes a

variety of mechanisms. These mechanisms are defined –1

and classified by both the stoichiometry and the energet- sec . ics of the transport process. The definitions are essen-

cm 1 J 0.5 /2 max . tial for understanding the systematic classification of all membrane transport proteins. Uniport is the movement

(mM K of one molecule at a time across the membrane, and M cotransport is the tightly coupled movement of more

than one substrate. Symport is the tightly coupled trans- glucose J 0 port of two different molecules in the same direction, 0 246810 and antiport is the tightly coupled transport of two dif- [Glucose] mM ferent molecules in opposite directions. When symport and antiport involve transport of ions, they may be elec- B. 1/Jglucose troneutral – resulting in no net transfer of charge – or Glucose 10 mM electrogenic – creating a charge separation across the 6-O-benzyl- membrane. D-galactose Proteins involved in transport carry out either active or passive transport. Active transporters enable a cell to accumulate solutes against electrical and/or con- centration gradients by making use of energy sources to “pump” them thermodynamically “uphill,” whereas Glucose alone passive transporters allow the “downhill” flow of sol- 1/Jmax utes across the membrane until their electrical and/or concentration gradients are dissipated. Like the specific porins described in the last chapter, passive transport proteins provide saturable pathways that are suscep- tible to competitive inhibition. The classic example is the glucose transporter in erythrocytes, a glycoprotein 1/KM 1/[Glucose] predicted to have 12 TM α-helices (Figure 6.6). Unlike the porins, the channel of the glucose transporter does 6.6 Glucose transport into erythrocytes. (A) A plot of the not leak small molecules or ions, so it is described as flux for glucose uptake by erythrocytes at 5°C as a function a “gated pore” that opens alternatively to one side of of the external concentration of glucose shows the passive the membrane or the other, depending on its conforma- uptake of glucose follows Michaelis–Menten kinetics, with –1 tion (Figure 6.7). This alternating access mechanism for the flux (J, in units of mM cm sec ) in place of the velocity transport was proposed over 50 years ago and is now of a reaction. The saturable curve indicates that the flux the dominant model for many active transporters as well utilizes a carrier. (B) The flux of glucose can be inhibited by (see Chapter 11). 6-O-benzyl-D-galactose, and the kinetic analysis indicates it is Active transport is further divided into primary proc- competitive inhibition, providing evidence for a binding site on esses, which are directly driven by hydrolysis of ATP (or the carrier. Redrawn from Voet, D., and J. Voet, Biochemistry, another exergonic chemical reaction), and secondary 3rd ed., John Wiley, 2004, pp. 728–729. © 2004. Reprinted processes, which carry out symport or antiport coupled with permission from John Wiley & Sons, Inc. to ion gradients made by primary active transporters like ATPases. These distinctions are part of the functional lists of accessory factors and incompletely character- basis for classification of all transporters into hierar- ized transport systems (Table 6.1). About 400 families chies of families and superfamilies. of transport proteins are now included in the TC system, divided as follows: Transport Classification System • Class 1: The channels and pores are proteins (and The transport classification (TC) system endorsed by the peptides) that allow the relatively free flow of solute International Union for Biochemistry and Molecular through the membrane. This class is divided into Biology describes five broad classes: channels and pores, five subclasses: α-helical protein channels; β-barrel carriers or porters, primary active transporters, group porins (see Chapter 5); channel-forming toxins, translocators, and TM electron carriers, in addition to including colicins, diphtheria toxin, and others (see

141 Functions and families

OUTSIDE 6.7 The model for conformational changes involved in transporting glucose via a gated pore. Step 1, glucose binds from outside; step 2, a conformational change closes the gate to the outside; step 3, conformational change to inward facing; step 4, gate opens to inside, INSIDE releasing glucose to the cytoplasm. All the steps are reversible, as indicated, so uptake is driven by the concentration gradient across the membrane.

Chapter 4); nonribosomally synthesized channels, transporters described next. Other examples of pri- such as gramicidin and alamethicin (see Chapter 4); mary active transporters are those driven by oxi- and holins, which function in export of enzymes that doreduction reactions and those driven by light, digest bacterial cell walls in an early step of cell lysis. including the photosynthetic reaction center (RC; • Class 2: The carriers form complexes by binding see Chapter 5). their solutes before transporting them across the • Class 4: The group translocators provide a special bilayer by the secondary processes of symport or mechanism for the phosphorylation of sugars as antiport. For this reason, class 2 is also called elec- they are transported into bacteria, described below. trochemical potential-driven transporters. A very • Class 5: The TM electron transfer carriers in the large family of porters in this class is called the membrane include two-electron carriers, such major facilitator superfamily (MFS) and is exempli- as the disulfide bond oxidoreductases (DsbB and fied by the lactose permease (see Chapter 11). Class DsbD in E. coli) as well as one-electron carriers 2 also contains the ionophores like valinomycin and such as NADPH oxidase. Often these redox proteins nigericin, which are nonribosomally synthesized are not considered transport proteins. ion carriers, as well as the TonB family of proteins Cells have multiple transport systems to take up many involved in transferring energy to the outer mem- nutrients, and the families of uptake systems employed brane of Gram-negative bacteria (see Chapter 11). for any nutrient type are not limited to one class. For • Class 3: The class of primary active transporters example, the uptake of sugars is carried out by nine fam- includes the ATPases and the ATP-binding cassette ilies of ABC transporters, 20 families of secondary car- riers, seven families of porins, and six families of group

a translocation systems. In addition to descriptions of TABLE 6.1 THE TRANSPORTER CLASSIFICATION SYSTEM transport functions, the TC system derives evolutionary 1 Channels/pores 1a α-helical protein channels relatedness from sequence information obtained using β 1b -barrel protein porins the tools of bioinformatics described below that allow 1c Toxin channels 1d Nonribosomally synthesized investigators to compare primary sequences, predict channels topologies, and analyze genomes. 1e Holins 2 Electrochemical 2a Protein porters Superfamilies of ATPases potential-driven 2b Nonribosomally synthesized transporters porters The TC system has two superfamilies of ATPases, one for 2c Ion gradient-driven energizers the P-type ATPases that are found in plasma membranes + + 2+ 3 Primary active 3a P–P-bond hydrolysis-driven and include the Na -K ATPase and the Ca pump (see transporters systems Chapter 9). The other superfamily consists of three other 3b Decarboxylation-driven systems types of ATPases: F-type, such as the mitochondrial and 3c Methyl transfer-driven systems bacterial ATP synthases (see Chapter 13); A-type, which 3d Oxidoreduction-driven systems 3e Light absorption-driven systems transports anions such as arsenate and is mainly found in archaea; and V-type, which maintains the low pH of 4 Group translocators 4a Phosphotransfer-driven systems vacuoles in plant cells and lysosomes, endosomes, the 5 Transmembrane 5a Two-electron transfer carriers Golgi, and secretory vesicles of animal cells. Both the electron carriers 5b One-electron transfer carriers subunit compositions and their protein sequences indi- a The five main classes are listed on the left, with subclasses cate that the A-type is more similar to the V-type ATPases provided on the right. than to the F-type. Source: Saier, M., et al., The Transporter Classification Database: The Na+-K+ ATPase, called the Na+-K+ pump, transports recent advances. Nucl Acid Res. 2009, 37:D274–D278. two K+ in and three Na+ out for every ATP hydrolyzed,

142 Transport proteins

establishing gradients that are critical to osmotic bal- Periplasm ance in all animal cells and to the electrical excitability Inner membrane of nerve cells, as well as driving the uptake of glucose and Cytoplasm amino acids. The high-resolution structure of the Na+-K+ ATPase reveals details of its architecture, which consists (a) (b) (c) of a catalytic α subunit, a bitopic β subunit, and a tis- Periplasm sue-specific γ subunit, and its mechanism (see Chapter Inner membrane 9). ATP phosphorylates an Asp residue in the highly con- served sequence DKTG only in the presence of Na+; the Cytoplasm

aspartyl-phosphate thus produced is hydrolyzed only in (d) (e) (f) the presence of K+, giving strong evidence for two confor- mational states. Thus the mechanism of coupling active 6.8 Organization of the structures of ABC transporters. transport with ATP hydrolysis involves shifting from the The subunit composition of the ABC transporter system phosphorylated form with high affinity for K+ and low varies in different organisms. The two nucleotide-binding affinity for Na+ to the dephosphorylated form with high domains (NBDs) (white) and two TM domains (shaded) affinity for Na+ and low affinity for K+. This electrogenic may be four separate subunits or fused into one, two, or process helps create the typical TM potential of –50 to three proteins, as shown. Redrawn from Higgins, C. F., –70 mV (inside negative) across the plasma membrane Res Microbiol. 2001, 152:205–210. © 2001 by Elsevier. of most cells. The Na+-K+ ATPase is inhibited by digitalis Reprinted with permission from Elsevier. in the well-known treatment for congestive heart failure.

ABC Transporter Superfamily transporters (see Chapter 11). As expected, the various NBDs show strong structural similarities. Each NBD An important class of ATP-dependent transporters is contains an ATPase subdomain with similarities to the

the superfamily of ABC transporters, named for their F1 ATPase (see Chapter 13) and a helical subdomain spe- ATP-binding cassettes, which are nucleotide-binding cific to the ABC transporters (Figure 6.9a). The ATPase domains. ABC transporters are found in large numbers subdomain contains typical Walker A and Walker B in animals, plants, and microorganisms; the genome of motifs found in proteins that hydrolyze ATP and has a E. coli has 80, while the human genome has 49. Some β-sheet region that positions the base and ribose moie- ABC transporters function in uptake and others in efflux. ties of the nucleotide. The helical subdomain contains Some are very specific for their substrates while others the signature motif of ABC transporters, LSGGQ. In the are quite promiscuous. The first ones to be identified dimer, the Walker A motif from one subunit binds the functioned in uptake of amino acids, peptides, sugars, same ATP as the LSGGQ from the other subunit, thus the

and vitamin B12; the high-resolution structures of the two NBDs make an “ATP sandwich.” Each round of ATP maltose transporter and the vitamin B12 transporter are hydrolysis involves three conformations of the NBDs: a described in Chapter 11. ABC transporters also export nucleotide-free conformation, an ATP-binding confor- cell wall polysaccharides, participate in cytochrome mation, and a more open conformation after the hydrol- cbiogenesis, and secrete cellulases, proteinases, and tox- ysis reaction that are displayed in the MalK protein, the ins. The first ABC exporter to have a crystal structure NBD of the maltose transport system, crystallized in all was the drug efflux protein Sav1866 from Staphylococcus three conformations (Figure 6.9b). The movements of aureus (see Chapter 11). the NBD are transmitted to the TM regions of the mem- All ABC transporters have two nucleotide-binding brane transporter to drive active transport. domains and two membrane domains, which are sepa- The substrate-specific components of ABC trans- rate subunits in archaea and eubacteria and are usually porter systems in the periplasm of Gram-negative bac- fused together in higher organisms (Figure 6.8). In addi- teria are obligatory to the transport of their substrates. tion, the ABC transporters that function in uptake have High-resolution structures of several of the periplasmic a substrate-specific domain or protein on the external binding proteins with and without bound substrate indi- side (the periplasmic binding proteins in Gram-negative cate large conformational changes typically occur upon bacteria). The membrane domains typically have 12 TM binding. Each protein has two lobes that close over the helices in either a single peptide or two subunits. substrate when it binds, giving a very high substrate

The nucleotide-binding domains of ABC transporters affinity (micromolar Kds). For some transport systems, are highly conserved, whether they are separate subunits mutants of the inner membrane transporters have been or are fused to the TM domains. Several of them have characterized in which delivery of substrate by these sol- been crystallized either as aqueous proteins or water- uble binding proteins is no longer obligatory. soluble fragments of transporters, and they may be Many of the ABC transporters in humans have medi- seen in x-ray structures that have been solved for intact cal significance. The CFTR protein, the cystic fibrosis

143 Functions and families

A. h8 h4 h1 h3 WA h5 h6 h2 h7 LSGGQ WB WB h7 h6 LSGGQ h2 h5 WA h3 h1 h4 h8

B. Resting state Pre-hydrolysis state

ATP

Post-hydrolysis state Hydrolysis

ADP

Pi

6.9 The MalK dimer, an example of an NBD of an ABC transporter. (A) The x-ray structure of the NBDs of the MalK dimer shows the ATP-binding site. Each NBD has two subdomains, an ATPase subdomain (green) and a helical subdomain (cyan). The conserved segments shown are the Walker A motif (WA, red), Walker B motif (WB, blue), LSGGQ motif (magenta), and the Q-loop (yellow) that connects the ATP-binding region to the helical subdomain, as well as to the TM portions of the ABC transporter. The bound ATP is shown in a ball-and-stick model. The regulatory domain of MalK has been omitted for clarity. From Davidson, A. L., and J. Chen, Annu Rev Biochem. 2004, 73:241–268. © 2004 by Annual Reviews. Reprinted with permission from the Annual Review of Biochemistry, www.annualreviews.org. (B) The conformation of the MalK dimer varies from the resting state (with the two NBDs separated from each other), the ATP-bound state (closed), and the post-hydrolysis state (open). Each subunit has an NBD (green and blue or cyan) and a regulatory domain (yellow). The Walker A motif (red) shows where the nucleotide (ball-and-stick model) binds. From Lu, G., et al., Proc Natl Acad Sci USA. 2005, 102:17969– 17974. © 2005 by National Academy of Sciences, USA. Reprinted by permission of PNAS.

144 Transport proteins

trans-membrane conductance regulator, is a Cl− chan- component of the PTS for glucose is EIIBCglc, which nel that is defective in most cases of cystic fibrosis (see transports and phosphorylates glucose and transfers it Chapter 7). In humans the P-glycoprotein that con- to the cytosolic component. The cytosolic component, fers multidrug resistance to cancer cells is in this class EIIAglc, is sensitive to catabolite repression. Thus trans- (see Chap ter 11), as is TAP, the transporter for antigen fer of the high-energy phosphoryl group in PEP to glu- processing in the immune system (see Chapter 14). cose drives its uptake as glucose-6-phosphate. Many components of the PEP PTS have been characterized Group Translocation thoroughly, and several of the soluble proteins have high- resolution structures. Although the EI and HPr compo- A different mechanism of coupling to an exergonic nents are common elements that are shared by transport reaction is utilized for group translocation of sugars systems for different substrates, different forms of them in bacteria, in which the sugar substrates are phos- are usually present, encoded by more than one gene. For phorylated during the transport process. This unusual example, the PTS families are represented in the E. coli group of transport systems constitutes class 4 in the TC genome by five genes for EI homologs and six for HPr system. The best-characterized example is the uptake homologs. There are 21 EII complexes, including seven of glucose, fructose, mannose, and other sugars by the for the fructose family and seven for the glucose family, phosphoenolpyruvate-dependent phospho-transferase in addition to some regulatory proteins. system (PEP PTS) of E. coli (Figure 6.10). The two cyto- plasmic proteins involved in transfer of the phosphoryl Symporters group from PEP, EI (enzyme I), and HPr (histidine-con- taining phosphocarrier protein) are used in transport of Secondary active transport that uses ion gradients typi- all the sugar substrates of this system. Uptake of each cally involves cotransport of the substrate and the ion. sugar uses a specific TM component called EII, which The well-characterized lactose permease from E. coli in some cases has one or more separate cytoplasmic (see Chapter 11) carries out symport of protons and lac- subunits, called EIIA and EIIB. For example, the TM tose, driven by the proton gradient across the membrane

Inside Outside Lactose H Lactose H Glycerol ATP glycerol LacY Glycerol-3-phosphate ADP kinase n o i t i

b

i

h

n I PEP EI HPr ~ P EIIAglc Glucose EII BC

Pyruvate EI ~ P HPr EIIAglc ~ P Glucose-6-P

A c t iv a ti on Adenylate ATP cyclase cAMP PPi

6.10 The PEP PTS for glucose in E. coli. Group translocation utilizes a combination of shared and specialized proteins, as exemplified by the PTS for glucose. EI and HPr are the shared cytoplasmic proteins utilized for transfer of phosphoryl groups. EIIAglc is a specialized cytoplasmic protein that phosphorylates the glucose as it enters the cytoplasm and is sensitive to catabolite repression. EIIBCglc is the TM glucose transporter. In the cytoplasm EI is phosphorylated by PEP; the phosphoryl group is transferred to HPr, and then to a specific EIIA. Then EIIA phosphorylates the membrane protein(s), in this case EIIBC, which then phosphorylate the substrate as it enters. Variations in the number of EII proteins are seen with different substrates. LacY is shown transporting lactose, which plays an inhibitory role in uptake of glucose by the PEP PTS. Redrawn from Voet, D., and J. Voet, Biochemistry, 3rd ed., John Wiley, 2004, p. 745. © 2004. Reprinted with permission from John Wiley & Sons, Inc.

145 Functions and families

Proton pump Apical surface Basal surface (inhibited by CN) Lactose Intestinal Blood transporter H H lumen H Lactose H H (outside) H H H Microvilli 2 K 3 Na 2 Na NaK ATPase Glucose Glucose

CO2 Fuel Glucose uniporter Na -glucose GLUT2 (facilitates H symporter downhill efflux) (driven by high Lactose H H extracellular [Na ]) (inside) 6.12 Symport of glucose and Na+ ions in the intestinal 6.11 Symport of lactose and protons in E. coli. The epithelium. The Na+ gradient made by the Na+, K+, ATPase proton gradient made by the respiratory chain or other provides the driving force for glucose uptake from the proton pumps is used to drive the uptake of lactose by intestinal lumen. A passive glucose uniporter exports the lactose permease. When cells are treated with CN– to glucose to the bloodstream. Redrawn from Nelson, D. L., inhibit the energy-yielding oxidation pathways that support and M. M. Cox, Lehninger Principles of Biochemistry, the proton pump, active transport of lactose is abolished. 4th ed., W. H. Freeman, 2005, p. 405. © 2005 by W. H. Under this condition, the lactose permease carries out Freeman and Company. Used with permission. passive transport. Redrawn from Nelson, D. L., and M. M. Cox, Lehninger Principles of Biochemistry, 4th ed., W. H. Freeman, 2005, p. 404. © 2005 by W. H. Freeman and increase in cytoplasmic pH produced by this exchanger Company. Used with permission. may activate other enzymes in the mitogenic response. A prominent antiporter in the erythrocyte membrane (Figure 6.11). Bacterial proton symporters are also used is the anion exchanger called Band 3, which is about to accumulate arabinose, ribose, and some amino acids, 25% of the total protein in that membrane. It carries out + − − while Na symport is used in E. coli for uptake of meli- a one-to-one exchange of Cl and bicarbonate (HCO3 ) to

biose, glutamate, and other amino acids, as well as in facilitate the removal of CO2 from respiring cells. (Car-

the intestinal epithelium for uptake of glucose and some bonic anhydrase in the erythrocyte converts waste CO2 − amino acids (Figure 6.12). to HCO3 , which then circulates to the lungs where it + + When symport uses Na , it is driven by the Na gradi- reenters erythrocytes to be converted back to CO2 and ent created by the Na+-K+ ATPase. The energy used can exhaled.) Since the exchange of anions is obligatory, be calculated from the chemical potential and electrical the net flow of transport depends on the sum of the potential of the ion gradient. For example, for the Na+ chloride and bicarbonate concentration gradients. The gradient, membrane portion of the Band 3 protein is predicted ΔΔ=+ℑ++ G nnRTln ([Na ]in /[Na ] out ) E. to have 12 or 14 TM helices, and the cytoplasmic por- tion interacts with cytoskeletal components while the In intestinal Na+–glucose symport, n = 2 for both the extracytoplasmic portion carries the carbohydrate anti- concentration and electrical terms, as two Na+ ions enter genic determinants for several blood groups. Besides + per glucose. Using typical values for ΔE (–50 mV), [Na ]in the anion exchanger in the erythrocyte, humans have + (12 mM) and [Na ]out (145 mM), the ΔG for glucose trans- two other anion exchangers that are closely related to port is –22.5 kJ, enough to pump glucose inside until its Band 3. Similar exchangers are also found in plants and concentration is 9000 times the external concentration. microorganisms. Not all antiporters are electroneutral; for example, the Antiporters ADP/ATP carrier, the most abundant protein in the mito- chondrial inner membrane, exchanges one ATP4− (exit- Antiport is important for ion exchange across many bio- ing) for one ADP3− (entering). This electrogenic exchange logical membranes. For example, the Na+/H+antiporter is driven by the inside-negative potential across the mito- in the plasma membrane exchanges extracellular Na+ for chondrial membrane and utilizes one-third of the energy intracellular H+ and is activated in response to mitogenic stored in the electrochemical proton gradient generated agents such as epidermal growth factor. The slight by the respiratory chain! To meet the energy needs of

146 Membrane receptors

the cell, it facilitates the exchange of adenine nucleotides TABLE 6.2 FAMILIES OF SOME REPRESENTATIVE with a turnover number of 500 per minute. Two natu- RECEPTORS IN EUKARYOTES ral inhibitors bind to alternate sides of the transporter, indicating that it has two conformations that allow its Immunoglobulin superfamily – many diverse functions adenine nucleotide binding site to face in or out. The T cell receptor (α, β subunits) high-resolution structure of the ADP/ATP translocator in Major histocompatibility complex II MHC (α, β subunits) the presence of one of these inhibitors shows a bundle Lymphocyte function – associated antigen-3 (LFA-3) of six TM α-helices forming a basket with the inhibitor CD2 (T cell LFA-2) inside (see Chapter 11). The ADP/ATP carrier belongs to the mitochondrial Immunoglobulin A (IgA)/IgM receptor carrier family (MCF), whose functions ensure that IgG Fc receptor metabolites (such as pyruvate, tricarboxylates and dicar- High-affinity IgE receptor (a subunit) boxylates of the Krebs cycle, carnitine, and citrulline) as Surface immunoglobulins (heavy, light chains) well as nucleotides and reducing power are exchanged N-CAMs (neuronal cell adhesion molecules) between the cytosol and the mitochondrial matrix. The Myelin-associated glycoprotein family also includes uncoupling protein of brown adi- pose tissue, which carries protons back into the matrix, Integrins – bind to extracellular matrix and adhesion proteins allowing respiration to generate heat, and is involved Fibronectin receptors in type II diabetes. The transporters in this family are Vitronectin receptors in the inner membrane of mitochondria and share 20% Platelet glycoprotein complex (IIb/IIIa) sequence identity and a common three-fold symmetry in Leukocyte adhesion proteins (LFA-1, Mac 1) their structure. New members of the family are readily T cell very late antigens (VLA family) identified by the MCF signature, which consists of three copies of the sequence PX[D,E]XX[K,R], along with the Mitogen/growth factor receptors with tyrosine kinase activity – stimulate cell growth three-fold sequence repeats that generate the overall fold. Mitochondria in different eukaryotes contain from Epidermal growth factor (EGF) receptor 35 to 55 different mitochondrial carriers; the human Platelet-derived growth factor (PDGF) receptor genome encodes 48. Insulin receptor Insulin-like growth factor-1 (IGF-1) receptor Ion Channels Colony stimulating factor-1 (CSF-1) receptor Neurotransmitter receptors/ion channels – receptor-operated Most eukaryotic membranes contain selective ion chan- channels nels that are unlike the ion pumps driven by the hydroly- Nicotinic acetylcholine receptor (nAChR) sis of ATP. They are characterized by ion specificity, high γ fluxes – up to 108 ions sec–1 – and gating (opening and -aminobutyric acid (GABA) receptor closing) in response to ligands or voltage detected by Glycine receptor patch clamp measurements (see Chapter 3). Receptors that activate G proteins

High-resolution structures for a number of ion chan- β-adrenergic receptors (β1, β2) nels reveal their basis of selectivity and transport mecha- α-adrenergic receptors (α1, α2) nisms as described in Chapter 12. Ion channels that are Opsins (rhodopsin) involved in the passage of signals in nerves and muscle cells are also receptors for agonists and antagonists that Muscarinic acetylcholine receptors (M1, M2) control their activity. Miscellaneous Asialoglycoprotein receptors Low-affinity (lymphocyte) IgE receptor (unknown function) MEMBRANE RECEPTORS Insulin-like growth factor-2 (IGF-2) receptor Cation-independent mannose-6 phosphate receptor Membrane receptors are integral proteins that function in information transfer by triggering a response to their Cation-dependent mannose-6-phoshate receptor binding of ligands. Receptors have many diverse func- Cation-dependent mannose-6-phoshate receptor tions, listed in Table 6.2. Many receptors are involved (extracytoplasmic domain) in cell surface interactions, while some trigger endo- Source: Gennis, R. B., Biomembranes: Molecular Structure and cytosis or recycling of membrane components. Among Function, Springer Verlag, 1989, p. 324. © 1989 by Robert B. the receptors with enzyme activity are many receptors Gennis. Reprinted by permission of the author. involved in signaling, such as the insulin receptor. The

insulin receptor is an α2β2 tetramer that responds to

147 Functions and families

insulin by autophosphorylation of three Tyr residues on each β subunit, activating its protein kinase activity to start a signal cascade.

Nicotinic Acetylcholine Receptor

One of the most thoroughly studied receptors is the nic- otinic acetylcholine receptor, a neurotransmitter recep- tor in the Cys-loop superfamily. It differs greatly from the muscarinic acetylcholine receptor, a GPCR (see below). The Cys-loop superfamily, named for the signa- ture 13-residue loop linked by a disulfide bond, includes excitatory receptors, such as those for acetylcholine and serotonin, and inhibitory receptors, such as those for γ-aminobutyric acid and glycine. The first high-resolu- tion x-ray structure representing this family is that of the glutamate-gated chloride channel from Caenorhabditus elegans described in Chapter 10. The nicotinic acetylcholine receptor (nAChR) is expressed throughout the nervous system and affects a wide variety of physiological functions. It is perhaps best known for its role mediating voluntary movement in skeletal muscle, which is affected by disease-causing mutations. When nAChR binds acetylcholine, it opens a channel for cations (Na+, Ca2+, and K+), triggering depo- larization of the membrane that causes the muscle fiber to contract. After several milliseconds the channel closes and acetylcholine is released. Knowledge of the structure of the nAChR from two- dimensional EM has been complemented by an x-ray structure of an acetylcholine-binding protein from snail, which has homologous ligand-binding domains. The 6.13 Structural model of the acetylcholine receptor from detailed structural model provides much insight into the Torpedo electric organ. The hetero-pentamer is viewed how the receptor functions (Figure 6.13). The nAChR is from the plane of the membrane as a ribbon diagram α a hetero-pentamer, composed of α2βδε in adult muscle highlighting the -subunits (gold; other subunits, cyan). The (ε is replaced by γ in the fetus). Each subunit comprises channel is visualized as the outer surface of the central three domains: a largely β-sheet extracellular domain, vestibule (dark blue). The membrane boundaries (arrows) an α-helical TM domain, and an α-helical cytoplasmic correspond to the interfaces of the three domains. From domain. Acetylcholine binds to two sites at the inter- Sine, S. M., and A. G. Engel, Nature. 2006, 440:448–455. faces of each α subunit with either ε or β in pockets lined by aromatic residues and capped by a mobile loop, Loop C. The Cys-loop neurotransmitter receptors are consid- The TM domain of each highly homologous subunit ered part of an even larger superfamily, the ligand-gated has four TM helices (M1–M4), of which M2 lines the ion channels (LGICs). Together these superfamilies have channel. Reversible gating of the channel by constriction over 200 entries in the LGIC database. and dilation at the narrowest point allows control of ion flux. The open state of nAChR has a much greater affin- G Protein-Coupled Receptors ity for acetylcholine than the closed state has. Mutant studies have probed the coupling of ligand binding to The largest and most diverse group of receptors involved channel opening, which appears to involve the loops in signaling is the G protein-coupled receptors (GPCRs). that connect the β-sheet domain with the α-helical chan- The majority of these receptors respond to ligand binding nel. Congenital myasthenic syndromes that cause severe by activating heterotrimeric guanine nucleotide binding neuromuscular disorder are caused by mutations in proteins (G proteins) on the cytoplasmic surface of the muscle nAchR that affect either gating or agonist bind- membrane, while a few open ion channels directly. The ing and produce either slow channels or fast channels G proteins transmit and amplify the signals by trigger- (Figure 6.14). ing changes in the concentrations of second messengers,

148 Membrane receptors

such as cAMP (Figure 6.15). GPCRs respond to many intercellular messenger molecules as well as sensory messages. The more than 1000 GPCRs in humans include receptors for nucleotides, neurotransmitters (acetylcholine, dopamine, histamine, and serotonin), prostaglandins and other eicosanoids, and many hor- mones, as well as olfactory and gustatory receptors. With a common protein fold, the GPCRs are called serpen- tine receptors because their seven TM α-helices “snake” across the membrane. The first member of the GPCRs to have its structure solved at high resolution is rhodopsin (described in Chapter 10). Intense interest has led to numerous high-resolution structures of GPCRs, which 6.14 Activity of the nAChR revealed in single-channel are expected to aid in drug design, as approximately half currents from wild type and congenital myasthenic of modern pharmaceuticals are targeted at GPCRs. The syndrome. Mutations associated with congenital GPCRs form a superfamily that is divided into six fami- myasthenic syndrome cause either fast-channel or slow- lies with no statistically significant sequence similarity channel activity, shown here with single-channel currents between them. Their seven-TM topology was attributed measured for representative mutant receptors and the wild to convergent evolution until some of the bioinformatics type expressed in human embryonic kidney fibroblasts. tools described next enabled searches for distant homol- Both phenotypes cause muscle weakness but by different ogy and new members of the super-families. mechanisms. From Sine, S. M., and A. G. Engel, Nature. 2006, 440:448–455.

Hormone Adenylate G-protein cyclase Receptor

GDP

GDP GTP GDP

Pi

ATP

GDP cAMP + PPi

6.15 Interaction of a receptor with a G protein to stimulate the production of cAMP. The activation/deactivation cycle for hormonally stimulated adenylate cyclase (AC) starts with inactive AC and guanosine diphosphate (GDP) bound to the G protein. When hormone binds to its receptor (a), the hormone–receptor complex stimulates the G protein to exchange GDP for guanosine triphosphate (GTP; b). The GTP–G protein complex binds to AC, activating it to produce cAMP (c). When the G protein catalyzes hydrolysis of GTP to GDP, it dissociates from AC, inactivating it. Redrawn from Voet, D., and J. Voet, Biochemistry, 3rd ed., John Wiley, 2004, p. 674. © 2004. Reprinted with permission from John Wiley & Sons, Inc.

149 Functions and families

BOX 6.2 BIOINFORMATICS BASICS Bioinformatics develops statistical methods and computational models for analysis of biological data to identify and predict the composition and structure of biological molecules. The rapid growth of sequence data and the huge numbers of structures solved by x-ray crystallography or NMR created the need for a central repository for structure information, the Protein Data Bank (PDB) (http://www.pdb.org/). In use since 1971, the PDB provides atomic coordinates, chemical and biochemical features, details of structure determination, and features derived from the structures of over 80 000 proteins, with more added yearly. Many of the PDB entries are redundant, because the same structure was determined by different groups or resulted when the sequence contained a point mutation, so the Astral compendium (http://astral.berkeley.edu) provides tools to provide nonredundant data based on unique protein domains. Swiss-Prot (http://expasy.ch) is a widely used annotated protein sequence databank that contains over 500 000 entries for sequences assigned to proteins based on homologies or known identities. It provides a description of the proteins’ features, such as disulfide bonds, binding sites, and secondary structure elements (including TM α-helices), along with references and links to other databases. It also points out conflicts between data provided by different references. Many bioinformatics tools use the structural information of the PDB to display images that allow visualization of the structures, to simulate their dynamic interactions, to classify proteins into families and superfamilies, and to predict three-dimensional structures based on sequence homologies. Two programs in use for pairwise sequence alignments are BLAST, a Basic Local Alignment Search Tool (http://www.ncbi.nlm.nih.gov/BLAST), and FASTA (http://www.ebi.ac.uk/ fasta33), which search a chosen database for sequences that align with the query sequence. A more powerful search can be carried out with the new program PSI-BLAST (at the same URL as BLAST), an iterative process that extends the search based on the profile first generated by BLAST. Created to probe evolutionary relationships, SCOP, the Structural Classification of Proteins (http://scop.mrc-lmb.cam.ac.uk/scop), is a database that describes the three-dimensional structure of proteins. SCOP organizes proteins by family, superfamily, fold, and class, based on common structural domains (see Figure 6.2.1). A similar database, CATH, for Class, Architecture, Topology, and Homologous superfamily (www.cathdb.info), takes a slightly different approach to classify folds into 30 major protein architectures, such as α-bundle, β-barrel, αβ-propeller, and so on. There are over 2000 super-families of proteins identified in CATH and SCOP. Other tools probe the interactions between proteins (see, for example, DIP, the Database of Interacting Proteins; http://dip.doe-mbi.ucla.edu) and between proteins and their ligands or their solvent. These tools were developed for soluble proteins; bioinformatics of membrane proteins relies more on specialized algorithms described later in the chapter.

SCOP Classes March 2001

All α All βαOther Proteins /βαβ 138 folds 93 folds 97 folds 184 folds

Multiple domain Membrane and cell surface Small S-S stabilized Coiled coil Low resolution Small peptides Designed proteins

6.2.1 SCOP classifies proteins into four classes based on common folds. From Reddy B. V. B., and P. E. Bourne, P. E. Bourne and H. Weissig (eds.), Structural Bioinformatics, Wiley-Liss, 2003, pp. 239–248. © 2003. Reprinted with permission from John Wiley & Sons, Inc.

A useful text is Structural Bioinformatics, edited by P. E. Bourne and H. Weissig, published by Wiley- Liss in 2003.

150 Bioinformatics tools for membrane protein families

BIOINFORMATICS TOOLS FOR MEMBRANE where χ is the mole fraction of the amino acid in the aque- ous phase (w) or the organic phase (o) at equilibrium. PROTEIN FAMILIES Numerous other hydrophobicity scales have been developed to model partitioning into the membrane inte- Because purification and crystallization of membrane rior based on partitioning into other organic solvents, proteins are complicated by the presence of lipids and vapor pressures of side chain analogs, or distributions detergent (see Chapters 3 and 8), the number of high- of amino acids in the interior of soluble proteins. Two of resolution structures of membrane proteins, while grow- the most frequently used are the Goldman–Engelman– ing rapidly, is still a small proportion of those predicted Steitz (GES) and the Wimley–White (WW) scales (Table in the genome. With nearly 400 structures of membrane 6.3). The GES scale is based on calculated estimates for proteins in the Protein Data Bank (see Box 6.2), analy- the ΔGtr of amino acids in α-helical peptides arrived at by sis of the wealth of genomic information available relies combining terms for the appropriate hydrophobic and on methods for interpretation and comparison of the hydrophilic contributions. To the data from partitioning approximately 540 000 identified protein sequences in experiments the hydrophobic component adds a func- Swiss-Prot, up to one-third of which are likely membrane tion for the surface area of the amino acid side chain proteins. The planar dimensionality and the hydropho- in an α-helix, computed based on geometric considera- bicity of the bilayer simplify the application of bioinfor- tions. The hydrophilic component considers the pKas for matics to membrane proteins to give topology models charged groups and the energy required to produce an describing the number and orientation of TM segments. uncharged species by protonation or deprotonation. In Classification of membrane proteins into families has contrast, the WW scale is an empirical scale based on relied on both sequence homologies and common topol- water/octanol partitioning of small random-coil pep- ogies or folds. An early advance was the construction of tides, so it provides an empirical measure that includes hydrophobicity plots to identify potential TM α -helices the contribution from the peptide backbone. It is aug- ∼ based on occurrences of 18 or more contiguous non- mented by varying the charged states of Asp, Glu, and polar amino acids. The problem then became how best His residues to allow for formation of salt bridges. For to assign values for the hydrophobicity of the different example, neutralizing Asp115 in bR improved detection residues, with numerous scales sometimes offering con- of helix D as a TM segment (see below). These biophysi- trasting results. Application of various statistical tools cal hydrophobicity scales can now be compared with a brought large improvements in prediction methods, and biological hydrophobicity scale that describes quantita- β methods were added to identify -barrels. The rest of tively the tendency for each amino acid to be inserted in this chapter looks at these developments in some detail. the membrane as part of a TM helix during biogenesis of Structural predictions of integral membrane proteins membrane proteins in cells (see Chapter 7). have evolved from fairly simple computations using phys- ical data to the increasingly sophisticated algorithms that are the tools of bioinformatics today. At the heart of the Hydrophobicity Plots searches for structural similarities that are the basis for A simple yet powerful advance was the creation of hydro- determining families and superfamilies is the ability to phobicity plots, also called hydropathy plots, to display predict TM segments from the primary structure data. the distribution of hydrophobic segments along the lin- ear peptide chain and thus predict its two-dimensional Predicting TM Segments topology. The original Kyte–Doolittle algorithm moves a sliding window along the sequence of amino acids

With the vast majority of membrane proteins expected and calculates the mean ΔGtr at each position. To corre- to be bundles of α-helical TMs, the prediction of integral spond to the length of bilayer-spanning helices, the size membrane protein structure has focused on recognizing of the window should be around 18 amino acids. When those portions of polypeptides with favorable free energy plotted with the primary structure along the x-axis and changes for transfer from the aqueous solvent to the non- hydrophobicity along the y-axis, with positive values (for polar milieu of the membrane. The transfer free energy hydrophobic residues) above the line, the peaks corre- change (ΔGtr, see Chapter 1) for partitioning of each spond to predicted TM segments. This method became amino acid between ethanol and water was the basis of widely used, with results depending heavily on the the first scale for hydrophobicity used to predict which hydrophobicity scale used (Figure 6.16). Hydrophobic- sequences of amino acids were likely to form a TM helix. ity plots give a satisfying picture of the predicted fold of

To emphasize the polarity of the side chain, each ΔGtr was many integral membrane proteins (see Box 6.3), yet they normalized to that of glycine to subtract the contributions are quite imprecise at the end points of the TM helices. of the amino group, the carboxyl group, and the α carbon: The typical bands of aromatic Trp and Tyr residues at the boundaries of nonpolar and interfacial regions (see Δ =−χχwo χw χo GRTln(/)RTln(/tr gly gly ), Chapter 5) can help to define the ends of the TM region.

151 Functions and families

TABLE 6.3 HYDROPHOBICITY SCALES FOR PARTITIONING OF THE AMINO ACIDS INTO NONPOLAR PHASES, AS DETERMINED BY TANFORD; GOLDMAN, ENGELMAN, AND STEITZ (GES); AND WIMLEY AND WHITE (WW)a Nozaki and Goldman– Wimley–White with Amino acid Tanford Engelman–Steitz Wimley–White charged side chains Phe –2.65 –3.70 –1.71 Cys – –2.00 –0.02 Ile –2.96 –3.11 –1.12 Met –1.30 –3.39 –0.67 Leu –2.41 –2.80 –1.25 Val –1.68 –2.61 –0.46 Trp –3.01 –1.90 –2.09 His –0.67b 3.01 0.11b 2.33 Tyr –2.53 0.70 –0.71 Ala –0.73 –1.60 0.50 Thr –0.44 –1.20 0.25 Gly – –1.00 1.15 Asn 0.01 4.80 0.85 Ser –0.04 –0.60 0.46 Pro – 0.20 0.14 Gln 0.10 4.11 0.77 Glu –0.55b 8.20 0.11b 3.63 Asp –0.54b 9.20 0.43b 3.64 Arg –0.73 12.3 1.81 Lys –1.50 8.80 2.80

a The ΔGtr is given for the transfer of single amino acids from water to ethanol (Tanford), from water to an α-helix in the membrane (GES), and for the transfer from water to octanol of the amino acid in a peptide (WW). Note that Wimey and White compare the un-ionized and ionized states of Asp, Glu, and His. They + – also determined the ΔGtr for residues linked in salt bridges, such as Arg . . . Asp (not shown). b Values for the un-ionized state. Sources: Jones, M. N., and D. Chapman, Micelles, Monolayers, and Biomembranes, Wiley-Liss, 1995, p. 16 (Tanford and GES); Jayasinghe, S., et al., Energetics, stability, and prediction of transmembrane helices. J Mol Biol. 2001, 312:927–934.

Even then, hydrophobicity plots are unable to establish A successful, though laborious, genetic approach to the orientation of the TM helices. determining orientation was developed in E. coli using gene fusions with reporter enzymes inserted into pre- Orientation of Membrane Proteins dicted loop regions of the membrane protein. Three reporter enzymes that each require a particular location Establishing the correct orientation of an integral mem- in the cell are commonly used. Alkaline phosphatase brane protein has long been a challenge for membrane (PhoA) is normally exported to the periplasm, where biochemists, and many biochemical techniques have essential disulfide bonds are formed, so if it is expressed been used to label the external portions of membrane in the cytoplasm (on a cytoplasmic domain of a mem- proteins, with varying degrees of success. Chemical brane protein) it gives little or no activity. β-lactamase modification with impermeant reagents may be invali- (Bla) is also effective only in the periplasm, because its dated by findings that the reagent can permeate the substrate is ampicillin, which inhibits peptidoglycan membrane after all. Protease sensitivity is limited by formation; thus only when this reporter is exported to the extensive protease resistance of many membrane the periplasm does it confer ampicillin resistance to the proteins, once they are fully folded and inserted into the cell. The third reporter enzyme, β-galactosidase (LacZ), bilayer. Epitopes recognized by antibodies do identify is active only in the cytoplasm because attempts to trans- external binding sites as long as the antibody-binding locate it across the inner membrane leave it embed- data give unambiguous results. ded in the membrane, which renders it inactive. When

152 Bioinformatics tools for membrane protein families

A. 1.76 BOX 6.3 MAKING AND TESTING HYDROPHOBICITY 1.25 PLOTS Membrane Protein Explorer (MPEx, http://blanco. biomol.uci.edu/mpex) is a tool for the simple generation of hydrophobicity plots using the WW scale and a window size set initially to 19. It allows 0

Hydrophobicity 20 40 60 100 140 200 the user to choose a protein sequence from the PDB database or to choose a protein of known topology. Kyte–Doolittle The site divides the latter proteins into three groups: 0.76 “3D-helix” contains helical membrane proteins whose B. 1.96 structures are known from x-ray crystallography or NMR; “1D-helix” contains proteins with evidence from gene fusions and other techniques that supports their predicted topology; and “3D-other” contains β-barrels and monotopic proteins. Alternative 0 20 60 100 140 200 functions allow for analysis of β-barrels and for use of the translocon-based biological partitioning scale

Hydrophobicity (see Chapter 7). It is easy and fun to generate and Goldman– Engelman–Steitz compare hydrophobicity plots for known integral 1.80 membrane proteins. However, MPEx predicts TM segments in a large fraction (up to 43%) of soluble C. 20 proteins chosen from the PDB because it picks up ) 1 segments that are simply buried in the hydrophobic Asp115 10 neutral interior of a globular protein, as well as cleavable signal sequences on secreted proteins (see Chap- 0 ter 7). As a research tool, it must be combined with other methods to improve the success of topology 10 predictions (see below). Asp115 charged Source: Snider, C., et al., MPEx: A tool for exploring 20 membrane proteins. Protein Sci. 2009, 18:2624– G of window (kcal mol

Wimley–White 2628. 30 02550 75 100 125 150 175 200 225 250 Center amino acid in window proteins have been cloned into E. coli for expression and 6.16 Hydrophobicity plots for bR using different testing with these markers. Identification of glycosyla- hydrophobicity scales. (A) Kyte–Doolittle; (B) Goldman– tion sites has long been used to indicate which loops or Engelman–Steitz (GES); (C) Wimley–White (WW) with domains are exported from the cytoplasm in eukaryotes, varying charge on Asp115. The agreement between the since the sugar chains are always extracellular. Another WW plot and the structure of bR is indicated by the lines strategy to localize proteins in eukaryotes uses fusions for identified (red) and known (blue) TM helices in (C), and to green fluorescent protein, which can reveal the fusion is illustrated in the frontispiece to this chapter. (A) and protein’s compartmentalization if the resolution of the (B) redrawn from Gennis, R. B., Biomembranes: Molecular fluorescence microscope is high enough. Structure and Function, Springer-Verlag, 1989, p. 125. © 1989 by Robert B. Gennis. Reprinted by permission The Positive-Inside Rule of the author; (c) from Jayasinghe, S., et al., J Mol Biol. 2001, 312:927–934. © 2001 by Elsevier. Reprinted with Using the topological data on E. coli inner membrane permission from Elsevier. proteins plus data from the proteins in the photosyn- thetic RC, a statistical analysis of the amino acids of normalized to their expression levels, the enzyme activi- internal and external loops revealed a prevalence of ties obtained for different inserts can be correlated with basic residues in the cytoplasmic loops. In general the the sites of localization and used to map the results as a non-translocated loops of membrane proteins have two topology model (Figure 6.17). to four times more Lys and Arg residues than found in While the gene fusion approach works well for bac- translocated domains. This characteristic is the basis of terial proteins, relatively few eukaryotic membrane the von Heijne positive-inside rule and makes it possible

153 Functions and families

A. insertion of helical hairpins carrying highly positively charged sequences. The positive-inside rule was tested by engineering Periplasm the charge distribution in the E. coli inner membrane protein leader peptidase (Lep). Lep has two TM helices Cytoplasm (H1 and H2) separated by a cytoplasmic domain (P1), N C with a short N terminus and a large C terminus (P2), both in the periplasm. The small P1 domain has nine 2 12 positively charged residues. When basic residues were redistributed to put more in the N terminus than the P1 domain, the orientation of Lep flipped to the inverted topology (Figure 6.18a). More tests of engineered pro- teins with at least four TM segments show that alter- ing the charge distribution can invert or “frustrate” the N N Pho Pho topology of the protein (Figure 6.18b). An example of the positive inside rule at work during evolution is the pair of highly homologous proteins called RnfA and RnfE, Alkaline phosphatase membrane proteins involved in nitrogen fixation in Rho- dobacter capsulatus that differ in orientation across the B. membrane and exhibit opposite distributions of Lys and 1 8 26 31 Arg residues (Figure 6.19). 18 Analysis of the amino acid distributions in integral membrane proteins from over 100 genomes (see below) indicates that the positive-inside rule is universal. Com- PERI bining the positive-inside rule with hydrophobicity plots significantly improves the power of topology predictions, MEM as carried out by TopPred, an algorithm that calculates a standard hydrophobicity profile and ranks the possible CYT topologies according to the positive-inside rule. Struc- tural predictions also benefit from knowledge of which side of the membrane the N or C terminus is on. Another approach is the use of gene fusions to rapidly identify 28 3 21 30 36 the localization of C termini using PhoA as a periplasmic 14 marker and green fluorescent protein as a cytoplasmic marker, since it does not fold correctly in the E. coli peri- 6.17 Analysis of membrane protein topology using plasm. This approach enabled a global topology analysis alkaline phosphatase fusions. (A) Diagram that of over 700 inner membrane proteins in E. coli. demonstrates the method: When the fusion puts alkaline phosphatase in the periplasm, as at site 1, it is active, and when it puts it in the cytoplasm, as at site 2, it is inactive. Inverted Repeats From Manoil, C., and J. Beckwith, Trends Genet. 1988, Repeated structural elements that are common in trans- 4:223–226. © 1998 by Elsevier. Reprinted with permission porters have proven to be useful in predicting alternate from Elsevier. (B) Application of the method to the LacY conformations once one structure is available. Such protein. The sites of fusions are labeled by the level of structural repeats consisting of two or more groups alkaline phosphatase activity they produced: high (dark of TM helices are recognized in the fold of many sec- blue arrows), >190 units; medium (hatched green arrows), ondary transporters (see Chapter 11). They likely arise 46–99 units; and low (light blue arrows), <35 units. The from gene duplication but are often difficult to detect ends of some TM segments are adjusted to maximize their in sequence analyses due to sequence divergence during hydrophobicity. Redrawn from Calamia J., and C. Manoil, evolution. However, they may be recognized in the topol- Proc Natl Acad Sci USA. 1990, 87:4937–4941. ogy once the three-dimensional architecture is known (Fig ure 6.20). The repeats are frequently “inverted” because they have an antiparallel orientation related by a to predict the orientation of the TM helices from the pseudo-symmetry axis perpendicular to the plane of the amino acid sequences. Specifically, a loop with several membrane. In other words, after a rotation of approxi- basic residues within 30 residues from the end of a TM mately 180° of one half of the repeat, the structures are helix will be an internal loop, probably due to restricted closely aligned when superimposed.

154 Bioinformatics tools for membrane protein families

A. P2 P2

I-Met-Ala-Asn-Met-Phe P1 I-Met-Ala-Asn-Met-Phe Periplasm H1 H2 H1 H2 H1 H2 P1 P1 Ala-Asn-Met-(Lys) -Phe 4 Cytoplasm P2 Lep wt Lep Lep-inv

Y B. Y Y

Y 0 0 Lumen 0 3 Lumen

and/or 3K/0K/0K/3K 0K/3K/3K/0K

Y 3 3 0 3 Y 03Cytoplasm3 0 Cytoplasm

6.18 Demonstration of the positive-inside rule with leader peptidase (Lep). (A) Mutations that affect charge distribution in the loops can change the topology of Lep. At left is wild type Lep in its normal orientation with P1 in the cytoplasm. The middle construct is a mutant with a shortened P1, still in normal orientation. The right illustrates that the addition of four basic residues to the N-terminal end gives an inverted orientation. From von Heijne, G., Annu Rev Biophys Biomol Struct. 1994, 23:167–192. © 1994 by Annual Reviews. Reprinted with permission from the Annual Review of Biophysics and Biomolecular Structure, www.annualreviews.org. (B) A variety of Lep constructs have been engineered from duplicates of the lep gene to have four TM segments and a varying number of lysine residues in soluble portions, as indicated. Some of them, such as the 3K/0K/0K/3K construct, are “frustrated,” meaning that regions that would normally span the membrane cannot do so. Redrawn from Gafvelin, G., et al., J Biol Chem. 1997, 272:6119–6127. © 1997 by American Society for Biochemistry & Molecular Biology, Reproduced with permission of American Society for Biochemistry & Molecular Biology via Copyright Clearance Center.

The symmetry of the structural repeats observed in Genomic Analysis of Membrane Proteins transporters is likely related to the alternating access model of transport described earlier. Often a transporter, Identification of membrane proteins in genomes such as LacY, energetically favors one conformation requires fast, accurate methods designed to start with (open to the cytosol in the case of LacY) so much that the primary structure of a protein derived from a genetic it is difficult to crystallize in the other conformation. To sequence and predict its topology from the locations predict the other conformation (LacY open to the peri- of TM segments (Figure 6.22). A number of algorithms plasm) the conformation of the structural repeats are carry out sophisticated statistical analyses (Table 6.4). swapped in silico, generating a model of the alternate Some, for example TMHMM and HMMTOP, use hidden transport state (Figure 6.21). This approach has been Markov models, while MEMSAT uses dynamic program- validated with examples in which the crystal structure of ming to optimally “thread” a polypeptide chain through the second state was then obtained, such as GltPh, a bac- a set of topology models, and PHD builds a neural net- terial sodium/aspartate symporter related to glutamate work for predicting secondary structure from multiple transporters in the brain (see Chapter 11). sequence alignments (see Box 6.4). As these methods

155 Functions and families

N C Periplasm 1 1 0 1 15 59 74120 133 191 ORF 193 (RnfA homolog) 32 38 92 98 154 171 Cytoplasm 3 3 3

Periplasm 0 0 3 33 38 87 93 148 183 Ydg Q (RnfE homolog) 15 59 69 115 127 202 Cytoplasm 2 2 3 7 N C

6.19 Topology of the homologous proteins RnfA and RnfE from R. capsulatus. The E. coli homologs, ORF193 (RnfA) and YdgQ (RnfE), were tested by PhoA fusion analysis. The opposite orientations of the two proteins are consistent with the positive-inside rule, indicated by the number of Lys + Arg residues in each loop and tail (with the number of + charges shown). Redrawn from von Heijne, G., Q Rev Biophys. 1999, 32:285–307. © 1999. Reprinted with permission from Cambridge University Press.

were applied, a number of difficulties became evident several algorithms to a protein of interest and get vary- and newer predictors were developed. ing results. In a comparison with a test set of 60 bacte- One challenge is the need to distinguish between rial integral membrane proteins with known topology, signal peptides and TM segments, since both contain the highest fraction (around three-quarters) was cor- stretches of hydrophobic amino acids. Signal peptides rectly predicted with two algorithms using hidden occur at the N terminus of many secreted and membrane Markov model, TMHMM, and HMMTOP, while MEM- proteins and are usually removed by membrane pepti- SAT and TOPPRED had less success and the lowest frac- dases (see Chapter 7). Independent programs designed tion (around one-half) were correctly predicted with the to determine the presence of a signal peptide, such as neural network predictor, PHD. The best results were SignalP (an artificial neural network) and SignalP-HMM, achieved by carrying out analyses by all five and search- can be run prior to performing the topology prediction. ing for consensus: a very high rate of correct predictions To avoid this extra step, combination programs such as occurs when all five or four of the five agree. Therefore Phobius (using HMM) were developed to predict both algorithms were developed to compare the results of dif- TM segments and signal peptides. ferent methods: CONPRED compares the results of nine, Since most predictors define a TM segment as a while TOPCONS compares the results of five algorithms. stretch of 15–25 amino acids, another kind of problem As expected, these methods give the highest accuracy in emerges from the irregularities of helical bundle proteins comparisons with the individual algorithms. in actual membrane proteins. Such irregularities include Since the five topology predictors incorporated wide variations in lengths of TM helices, kinks in heli- into TOPCONS require a PSI-BLAST search against a ces, and reentrant loops that vary in secondary structure. sequence database to incorporate evolutionary informa- OCTOPUS is a combination of artificial neural networks tion, the process is relatively slow: running a database and HMM models that attempts to identify reentrant of 101 proteins took 4483 sec, compared to 2–26 sec for regions and interface helices along with TM segments. the individual methods. In order to speed the process to SPOCTOPUS adds the prediction of signal peptides. scan whole genomes, a combination of topology predic- Another difficulty arises when, after testing the algo- tors that do not involve the PSI-BLAST search was con- rithms with known data sets, they are less accurate stituted as TOPCONS – single. This program took only with other proteins. It is a common experience to apply 64 sec for the database of 101 proteins and could analyze

156 Bioinformatics tools for membrane protein families

A. LacY

N C Ext

Int

TM1 2 3564789 10 11 12

ATP/ADP carrier B. N C Ext

Int

TM1 2 3564

C. LeuT CoreV motif Arm Core V motif Arm

Ext

Int N C TM1 2 3564 7 8 9101112

D. Glt Ph V motif Arm V motif Arm Core Core HP2 HP1 Ext

N Int C TM 1 2 3564 7 8

6.20 Structural repeats in transporters. The membrane topologies of many transporters show parallel and inverted structural repeats. Examples shown are for transporters described in detail in Chapter 10: LacY (A), the ATP–ADP carrier (B), LeuT (C), and GltPh (D). The structural repeats are highlighted by trapezoids (shaded gray) to emphasize their relative orientation, parallel (in (A) and (B)) or inverted (in (C) and (D)), with TM helices (rods) colored to show their relationships. TM helices that are not part of the symmetrical repeats are white, and non-helical regions in the middle of core TMs are shown as lines. From Boudker, O. and Verdon, G., Trends in Pharmac Sci. 2010, 31:418–426.

the complete human genome (∼21 000 sequences) in data sets. For example, TOPCONS, SCAMPI, and MEM- about one hour. SAT3 perform the best (∼80% correct predictions), test- The strengths and weaknesses of the methods in Table ing sets of experimentally verified topologies, and they 6.4 are revealed when they are compared using different do even better when the location of the C terminus is

157 Functions and families

6.21 Detection of inverted repeats in GltPh. The structure of GltPh, first crystallized in an outward-facing conformation, shows the relationship between the topology (A) and the sequence alignment for inverted repeats (B). The four repeat segments, I–IV, are colored the same in (A) and (B) to illustrate domain swapping. For example, segment I (blue) uses the structure of segment II (green) as a template, which flips its orientation for the prediction. With the four segments flipped and the conformation of the loop between TM3 and TM4 (3L4, gray) based on the x-ray structure, the template provides the predicted inward-facing structure, later validated by the x-ray structure of GltPh in the inward-facing conformation. From Chrisman, T. J., et al., Proc Natl Acad Sci USA, 2009, 106:20752–20757.

Experimentally Cell exterior determined structure (3D) Lipid membrane bilayer

Cell interior

Protein sequence (1D) FHEPIVMVTIAGIILGGLALVGLITYFGKWTYLWKEWLTS VDHKRLGIMYIIVAIVMLLRGFADAIMMRSQQALASAGEA GFLPPHHYDQIFTAHGVIMIFFVAMPFVIGLMNLVVPLQI

Input Predicted helical transmembrane segments (1D) Prediction algorithms 9999999864442001112346689999999999999999

9999863202567788888887765214678899999999 Output 9999887622567788888888888888777665544442

numbers Predicted helical Observed helical Reliability indices transmembrane transmembrane segment segment

6.22 Strategy for predicting topology from protein sequence. Algorithms such as TMHMM, PHD, and MEMSAT all derive topology information from the primary structure of proteins. The example shows a partial sequence of cytochrome o ubiquinol oxidase subunit 1 (cyob), drawn to show the three TM helices determined experimentally (top) and the analysis by PHDhtm (bottom). The bars in the last panel allow comparison between predicted (white) and observed (shaded) TM segments, and the numbers below the line give the reliability of the prediction for each residue on a scale of 0 to 9. The low reliability values for the top segment illustrate how TM segments can be underpredicted. Redrawn from Rost, B., et al., Protein Sci. 1995, 4:521–533. © 1995. Reprinted with permission from Protein Science.

158 Bioinformatics tools for membrane protein families

TABLE 6.4 MEMBRANE TOPOLOGY PREDICTORS IN CURRENT USE Release Method year Algorithm URL MSA SP Constrained HMM-TM 2006 HMM http://biophysics.biol. –– + uoa.gr/HMM-TM HMMTOP 2001 HMM www.enzim.hu/ *– + hmmtop MEMSAT1 1994 ANN + http://single.topcons. –– – grammar net MEMSAT3 2007 ANN + http://bioinf.cs.ucl.ac. +– – grammar uk/psipred MEMSAT-SVM 2009 SVM + model http://bioinf.cs.ucl. ++ – ac.uk/psipred OCTOPUS 2008 ANN + HMM http://octopus.cbr. +– + su.se Philius 2008 DBN www.yeast-ctermrc. –+ – org/philius Phobius 2004 HMM http://phobius.sbc. –+ + su.se PolyPhobius 2005 HMM http://phobius.sbc. ++ – su.se/poly.html PRO 2004 HMM http://topcons.net + – + PRODIV 2004 HMM http://topcons.net + – + SCAMPI 2008 Hydrophobicity http://topcons.net + – + + model SCAMPI – 2008 Hydrophobicity http://scampi.cbr. –– + single + model su.se SPOCTOPUS 2008 ANN + HMM http://octopus.cbr. ++ – su .se TMHMM 2001 HMM www.cbs.dtu .dk/ –– – services/TMHMM TOPCONS 2009 HMM – http://topcons.net + – + consensus TOPCONS – 2011 HMM – http://single.topcons. –– – single consensus net TopPred2 1994 Hydrophobicity http://mobyle. –– – profiles pasteur.fr

MSA: the predictor uses multiple sequence alignments as input; *: the original version of HMMTOP is capable of utilizing homologous sequences to improve predictions, however we benchmark the single- sequence version; SP: the predictor also predicts signal peptides; Constrained: the predictor allows constrained predictions, i.e., it allows using (experimentally derived) topology information as input. Source: Tsirigos, K. D., et al., A guideline to proteome-wide α-helical membrane protein topology predictions. Proteomics. 2012, 12:2282–2294.

known. On the other hand, Phobius and related pro- is typically reported (99%) for small test sets. A rec- grams, SPOCTOPUS, HMMTOP, and TMHMM, more ommended strategy is to filter nonmembrane proteins successfully predict identified extracellular glycosyla- with PolyPhobius and then predict the topology of the tion sites in over 700 proteins. Polyphobius is the best remaining proteins with TOPCONS. predictor of GPCRs in a set of 1300 proteins suspected A different approach is the application of ROSETTA to be GPCRs. Finally Phobius PolyPhobius, Philius, and to membrane proteins. ROSETTA is a protein-folding SCAMPI are most successful at distinguishing between algorithm that predicts three-dimensional structures membrane and nonmembrane proteins when tested from ab initio calculations. To adapt ROSETTA to with sets of cytosolic and secreted proteins. Even so, membrane proteins requires embedding the protein when applied to genomes they perform far worse than chain into a model membrane in order to maximize the

159 Functions and families

BOX 6.4 STATISTICAL METHODS FOR TM PREDICTION While most of the methods listed in Table 6.4 employ hydrophobicity analysis and the positive- inside rule to find TM domains, some are greatly enhanced by the computational power of the statistical methods they employ. These methods give improved accuracy over the simple sliding- window approach that is limited to analysis of the linear sequence of the protein of interest. In particular, they can be based on multiple sequence alignment since there is now a sufficient databank of membrane proteins of known topology to be used in the search for other membrane proteins. Two successful approaches are neural networks and hidden Markov models.

Neural Networks The program PHD uses neural networks to score “preferences” for TM helices generated from multiple sequence alignments. Neural networks compute relationships between units (amino acid residues) in layers, described as the input layer, one or more hidden layers, and the output layer, that together define the network architecture (Figure 6.4.1). For multiple protein sequences, the input is a sequence profile of a protein family, with each sequence position represented by the amino acid residue frequencies derived from multiple sequence alignments in the database. The hidden layer is set up to determine the state of each unit as it depends on the states of the units in the previous layer to which it is connected and the weights of the connections. The output layer has the results of the analysis.

Protein Alignments Profile table

Pick maximal unit

Current prediction c

s0 s1 s2 Input First or Second or layer hidden layer output layer 6.4.1 Neural networks compute relationships between residues in layers to make predictions based on aligned sequences. From Rost, B., and C. Sander, Proc Natl Acad Sci USA. 1993, 90:7558–7562. © 1993 by National Academy of Sciences, USA. Reprinted by permission of PNAS.

PHD analyzes protein sequences with two levels of neural network analysis, so that the output from the first level is the input for the second. For the first level, the local input is a weighted term including the frequency of occurrence of each amino acid at that position in the multiple alignments, plus the number of insertions and deletions in the alignment for that residue. The output of the first level indicates whether or not the segment is predicted to be a TM helix. This is the local input for the second level. The global input for both levels describes characteristics of the protein outside the window of 13 residues and includes its amino acid composition and length, plus the distance

160 Bioinformatics tools for membrane protein families

BOX 6.4 (continued)

6.4.2 PHD uses two levels of neural networks to score structure preferences generated from multiple sequence alignments. From Rost, B., et al., Protein Sci. 1995, 4:521–533. © 1995. Reprinted with permission from Protein Science.

(continued)

161 Functions and families

BOX 6.4 (continued) (number of residues) from the first residue in the window of 13 adjacent residues to the N terminus and the distance from the last residue in the window to the C terminus. The output is expressed in units that signify TM helix and not-TM helix. As shown in Figure 6.4.2, the first level of PHD analyzes the protein sequence in windows of 13 amino acids (only seven are shown) and produces the output code for HTM (helix TM) or NotHTM, which becomes the input for the second level. The box on the left shows the data from sequence information from the protein family used for both local and global factors. The center box shows how at each position in the alignment the amino acid frequencies are compiled, the number of insertions and deletions are counted, and a conservation weight is computed for the local input. Global information gives the amino acid composition, length of the protein, and position of the current window from the N and C termini. All of this was codified and fed into the neural network input for the first level and then to the second level, shown in the box on the right. When PHD was tested with an initial data set of 69 membrane proteins with experimentally determined locations of TM segments, its accuracy was >95%. Assessments of its accuracy when tested on a data set not used in setting up the analysis are considerably lower. One weakness of the system is the step it uses to filter out TM helices that are too long (>35) or too short (<17). If too long, they are split in the middle into two helices; if too short, they are deleted or elongated.

Hidden Markov Models Hidden Markov models (HMMs) apply statistical profiles to describe a series of states connected by transition probabilities. For proteins, each state corresponds to the columns of a multiple sequence alignment, which are intermediate steps in the algorithm that the user does not see. A matrix describes the possible states and the transitions between the states, and an algorithm is employed for a “random walk” through the states, that is, one that derives each possibility from the previous state. For sequence alignment, the HMM generates profiles compiled of high scores (if sequence is highly conserved), low scores (if sequence is weakly conserved), and negative scores (if sequence is unconserved) and identifies sequences with the highest scores.

Outside loop

Outside Tail Helix Helix Helix Helix Helix Helix

Tail Inside

Inside loop

Amino acid seq: MGDVCDTEFGILVA . . . SVALRPRKHGRWIV . . . FWVDNGTEQ . . . PEHMTKLHMM . . .

State seq: o oooooooohhhhh . . . hhhhi i i i i i i hhh. . . hhhooooOO . . . OOOoooohh h . . .

Tail Tail-Tail Tail - Loop - Tail Topology: Helix Helix Out Short loop Long loop 6.4.3 Structural states defined for HMMTOP, where thick lines represent tails while thin lines are loops. Redrawn from Tusnady, G. E., and I. Simon, J Mol Biol. 1998, 283:489–506. © 1998 by Elsevier. Reprinted with permission from Elsevier.

162 Bioinformatics tools for membrane protein families

BOX 6.4 (continued)

A. Cytoplasmic non-cytoplasmic In HMMs for membrane protein topology, side Membrane side structural states are defined to describe cap cap short loop glob- helix core cyt. non-cyt. non-cyt. ular portions of the protein. HMMTOP uses five

glob- loop structural states: inside loop (I), inside tail (i, ular cyt. the region of a loop that is close to the TM cap cap long loop helix core glob- helix), TM helix (h), outside tail (o), and outside cyt. non-cyt. non-cyt. ular loop (O) (Figure 6.4.3). TMHMM further divides the residues in TM helices according to whether B. Loop cap they are in the center of a helix (helix core) or glob- ular on one end of a helix (helix cap). Thus seven states are considered in TMHMM: helix core,

1 234567891012345 helix caps, short loop on inside, short and long loop on outside, and globular regions. Each state has a probability distribution over the 20 C. Helix core 12345678 22 23 24 25 amino acids, based on the data set with known topologies. The overall layout of the HMM is a function of the different structural states (Figure 6.4.4). 6.4.4 The layout of the hidden Markov model. Redrawn from Krogh, A., et al., J Mol Biol. 2001, 305:567–580. Arrows show the possible connections between

O

Lt

MAXLt o Tail MINLt Loop

MINLt

MAXLt Tail o Outside Lt Lb MAXLb

MINL h Helix b

MINL b Membrane

Helix h MAXLb Lb

Lt MAXLt i Inside Tail MINLt

MINLt I

Tail i MAXLt Lt

Loop

6.4.5 The architecture of HMMTOP. Redrawn from Tusnady, G. E., and I. Simon, J Mol Biol. 1998, 283:489–506. © 1998 by Elsevier. Reprinted with permission from Elsevier.

(continued)

163 Functions and families

BOX 6.4 (continued) the different states, which are limited by the constraints of the protein structure as depicted in (a), with boxes corresponding to one or more states in the model. The connectivity of the different states varies. For the inside and outside loops and helix caps the connectivity is shown in (b), and for the helix core it is shown in (c). This model allows the helix core to be between 5 and 25 residues, which makes the entire length of TM helices (including caps) between 15 and 35 residues. The program follows rules of “grammar” that state a helix must be followed by a loop, and inside and outside loops must alternate. Then it calculates probabilities for the sequences of these states. This is depicted in the architecture of HMMTOP (Figure 6.4.5), which shows states within the same transition matrices (gray = helix states, yellow = tail states, red = loop states). The rectangular areas represent fixed-length states, which include the helices whose lengths need to be sufficient to cross the nonpolar domain of the bilayer and the tails that are defined to be the short segments of loops adjacent to ends of helices. The hexagonal areas represent the non-fixed-length states, which are the loops that are allowed to be any length. By considering fixed-length states and non-fixed-length states, the methods allow realistic length constraints on TM helices. While the current programs using these methods (listed in Table 6.4) are very powerful, new computational approaches can be expected to make them even more useful in the near future.

exposure of hydrophobic residues to the hydrophobic lacks sight and hearing has finely developed chemosen- region of the membrane and minimize their exposure sation to find its food! to the hydrophilic environment outside. Fragments of Approximately half of the membrane proteins iden- the protein are built up from libraries of helix pairs and tified by genome analysis have an even number of TM

evaluated as additional helices are added to a growing helices with the Nin-Cin topology. The other three combi- chain. ROSETTA has been found to work well for fami- nations of N- and C-terminal locations are about equally lies of related structures, such as the GPCRs. While ab represented in the other half. The high proportion of initio methods are computationally expensive, they have proteins with both N and C termini inside is attributed the advantage of mimicking aspects of how membrane to the mechanism of biogenesis, because this topology proteins fold in cells. results from insertion of helical hairpins (see Chapter 7). Analysis of 26 genomes for membrane proteins pro- Amino acid distributions in the TM segments of the duced a total of 637 families of polytopic TM domains, putative membrane proteins in the 26 genomes show the based on a combination of TM helices predicted using expected high amounts of nonpolar amino acids, along TMHMM and those annotated in Swiss-Prot. The with the polar residues Ser and Thr that participate in domains were classified based on homologies with hydrogen bonding (see below; Figure 6.25a). known families or characterized using sequence align- Similarly, the composition of nearly 50 000 TM seg- ment, hydrophobicity plots with the GES scale, and ments annotated in the Swiss-Prot database (see Box 6.2) identification of consensus sequences for TM segments reveals that the six amino acids Leu, Ile, Val, Phe, Ala, (Figure 6.23). When the families are sorted by the number and Gly make up two-thirds of TM residues. When the of TM helices, the number of families with domains of sequences are aligned to determine conserved residues a given number of TM helices decreases as the number (see Figure 6.23), the nonpolar amino acids are not prev- of helices increases (Figure 6.24). Within this trend, the alent in conserved positions, presumably because they plot shows the highest occurrences of two- and four-TM are quite interchangeable. Interestingly, the amino acids helices, a slight excess of seven-TM helices, and a major that are prevalent at highly conserved positions in the spike at 12-TM helices due to the prevalence of this helices are Gly, Pro, and Tyr (Figure 6.25b). The proline topology among transporters and channels. residues form kinks in the helices, which are conserved Interestingly, the total number of membrane protein even after mutation of the Pro residues, while tyrosine domains in the genome is roughly proportional to the residues play a special role near the interfaces due to number of open reading frames in all of the genomes their electronic properties (see Chapter 4). The positions except that of the nematode worm C. elegans, which has of glycine residues are often highly conserved in soluble a huge number (754) of seven-TM chemoreceptors. In proteins because they occur at positions where the pro- fact, the genome of C. elegans conforms to the general teins do not accommodate larger side chains. This is also picture when its three large families of chemoreceptors the case in TM helices, where Gly is commonly observed are removed from the total. Clearly this organism that where two helices are in close contact. The GxxxG motif,

164 Helix–helix interactions

6.23 Classification of a family of polytopic membrane domains. The example shown is family PFO1618. The steps involved are (top to bottom): sequence alignment, hydrophobicity plot based on GES scale, consensus sequence displayed by sequence logo, and consensus sequences of TM helices, where the nonconserved amino acids are represented by “x.” From Liu Y, et al., Genome Biology, 2002, 3:54.1–54.12. © 2002 by Yang Lui, Donald M. Engelman, and Mark Gerstein. Reprinted with permission from the authors.

30 first observed in glycophorin dimers, places both Gly residues on the same side of the α-helix (see Figure 4.30). 25 Genomic analysis for pair motifs shows a very high 20 presence of GxxxG and GxxxxxxG pairs, as well as simi- lar motifs with Ala or Ser replacing Gly residues. The 15 resulting location of small side chains is expected to be important for helix–helix interactions in a broad range 10 Occurrences of membrane proteins. 5 0 HELIX–HELIX INTERACTIONS 2 3 4 5 6 7 8 9 10 11 12 13 Number of TM helices The prevalence of the GxxxG motif in polytopic mem- 6.24 The number of TM helices in families of polytopic brane proteins in the genome reflects both the close pack- membrane domains. The number of families of polytopic ing of TM helices observed in many helix-bundle proteins membrane domains is plotted on the y-axis as a function of and the importance of protein–protein interactions in oli- the number of TM helices, plotted on the x-axis. The green gomeric membrane proteins and complexes. To analyze bars are the numbers from all studied families from the helical packing patterns within proteins, a comparison of Pfam-A database. The yellow bars are the numbers from pairs of helices in membrane proteins and pairs in soluble families from the Pfam-A database that are annotated as proteins shows that most helix–helix pairs in membrane transporters and channels. Redrawn from Liu, Y., et al., proteins have homologs in soluble proteins. The excep- Genome Biology, 2002, 3:54.1–54.12. © 2002 by Yang tions to this correlation are the irregular helices of some Lui, Donald M. Engelman, and Mark Gerstein. Reprinted membrane proteins (described in Chapter 4). The high with permission from the authors. occurrence of GxxxG and GxxxA in helices of membrane

165 Functions and families

A. All residues D B. Positionally conserved residues 1% K K 1% 1% R C 1% 1% E M R Q D L 1% C 1% 2% E 2% H G 2% 16% 2% 2% Q 2% 19% N 2% N 3% 3% W H 3% 3% Y 3% W 4% I P 10% 4% T L 4% M 12% 4% V 4%

A T I 9% 7% 4%

F Y 5% S 9% F 7% S 8% 5% V P G A 8% 9% 8% 8%

6.25 Amino acid distributions in TM helices. Amino acid distributions were determined for the TM segments from the 168 families from the Pfam-A database that have more than 20 members. (A) A pie diagram of the amino acid compositions of TM helices shows their high content of nonpolar residues, along with glycine, serine, and threonine. (B) Consensus sequences identified as shown in Figure 6.23 allow comparison of the amino acid residues in conserved positions of the TM helices. The diagram of the compositions of these positionally conserved residues shows that the prevalence of three amino acids (Gly, Pro, and Tyr) has increased significantly, indicating they are the ones whose positions are highly conserved. Redrawn from Liu, Y., et al., Genome Biology, 2002, 3:54.1–54.12. © 2002 by Yang Lui, Donald M. Engelman, and Mark Gerstein. Reprinted with permission from the authors.

proteins allows the helices to approach more closely than shapes and distances, detects six different types of tri- those in soluble proteins, forming “knob-into-hole” inter- plets in the glycophorin dimer: GGV, GTV, GVV, ILL, IIT, actions, as well as to interact over longer distances, that and ITT (Figure 6.27a). Likewise, bacteriorhodopsin and is, over the width of the bilayer. The information on helix its homologs halorhodopsin and sensory rhodopsin II pairs within proteins probably also applies to helix–helix have three triplets that are conserved in sequence and in interactions between subunits because the TM helices structure (Figure 6.27b). from different subunits often align as much as TM heli- Furthermore, an additional 13 triplets are conserved ces within one subunit. This can be seen with oligomeric structurally but not formed by identical residues, allowing proteins, where a cross-section through the middle of the conservative mutations that suggest that the triplet inter- membrane shows the extensive interaction of helices from action has been conserved to maintain the orientation of different subunits (Figure 6.26): the gray-scale image of the helices in the three highly homologous proteins. Com- cytochrome coxidase in the figure shows how difficult it parison with a set of soluble α-helical proteins identified is to assign subunits from the relationships of the helices! triplets that are unique to membrane proteins, such as Additional information about helix–helix interac- AGF, AGG, GLL, and GFF. Such triplets are often found tions comes from the analysis of interhelical three-body in regions of the closest contact between helices, and are interactions to find “triplets” where three atoms from therefore often correlated with GxxxG-type motifs. at least two different helices are closely packed. A pro- Close contact between TM helices is also stabilized gram called INTERFACE-3, which computes interhe- by two types of hydrogen bonding. The hydrogen bonds lical atomic triplets based on algorithms for geometric observed in glycophorin dimers are relatively weak

166 Helix–helix interactions

A. B. a

Photosynthetic reaction center L(I) (Rhodopseudomonas virdis) L(II) Deisenhofer et al. (1995) T(III)

b G79:C V80:C V80:C

Cytochrome c oxidases in bovine heart mitochondria 6.27 Three-body contacts, or triplets, in helix–helix Tsukihara et al. (1996) Cytochrome bc1 complexes interactions. (A) The TM helices (residues 73–91) of a in bovine heart mitochondria Iwata et al. (1998) glycophorin dimer with one of the triplets illustrated in space-filling representation. The atoms involved, C from 6.26 Helix interactions viewed from the membrane Gly79 and Cα from Val80 on both chains, are shown in midplane. Positions of five-residue sections at the middle space-filling representation (a), and viewed from the top (b). of TM helices are shown for photosynthetic reaction center, (Orange, C from G79, chain A; green, Cα from V80, chain cytochrome c oxidase, and cytochrome bc1 complex, as A; blue, Cα from V80, chain B.) (B) A conserved triplet labeled. The subunits are colored differently. The grayscale in the archael rhodopsins is shown by superposition of image of cytochrome c oxidase shows that the subunit helices C, E, and F from bacteriorhodopsin, halorhodopsin, composition cannot be inferred from the relationships of and sensory rhodopsin II. The residues in the triplet are the helices. Redrawn from Liu, Y., et al., Proc Natl Acad Sci two Leu and a Thr, corresponding to L97, L152, and T178 USA. 2004, 101:3495–3497. © 2004 by National Academy in bR, again in space-filling representation. The retinal of Sciences, USA. Reprinted by permission of PNAS. is drawn in green. From Adamian, L, et al., J Mol Bio. 2003, 327:251–272. © 2003 by Elsevier. Reprinted with permission from Elsevier. bonds because they use the Cα proton as the donor. Each of these hydrogen bonds contributes less than 1 kcal mol–1 to the stability of the dimer. The second occur at the same frequency. The majority of these heli- type of hydrogen bond makes use of polar residues in cal pairs have one H-bond between two amino acids TM segments; such bonds are detected in high-resolu- from two neighboring helices. However, two other types tion structures of polytopic membrane proteins such as of H-bonding patterns emerge. One type is a “serine zip- bacteriorhodopsin. Interactions between polar residues per” motif, named for its similarity to a leucine zipper account for around 4% of all atomic interhelical con- (Figure 6.28). A search for homologous serine zippers tacts in membrane proteins, as well as in soluble pro- by PSI-BLAST identified more than 100 sequences with teins. In contrast to soluble proteins, where interacting highly conserved Ser residues in positions 7, 14, and 21 ionized residues typically form salt bridges, the types of of one helix and 1, 8, and 15 of the other. A three-body polar interactions are more varied in TM segments of analysis also found triplets of two serine residues with a membrane proteins, which also have H-bonds between leucine. The other type of H-bonding pattern is called a ionizable and polar residues (the most common are D–Y polar clamp because the side chain of an amino acid at and Y–R) and between polar nonionizable residues (the a given position i that is capable of forming at least two most common are Q–S and S–S). hydrogen bonds (i.e., E, K, N, Q, R, S, T) is “clamped” by How prevalent are these interhelical hydrogen bonds? H-bonding to two other residues, one at position i + 1 or In a data set based on the high-resolution structures of 13 i + 4 and one on the other helix (Figure 6.29). The polar membrane proteins, 134 unique TM helices form nearly clamp shown in rhodopsin, involving residues T160 and 300 helical pairs, 53% of which are connected by at least W161 (both on helix IV), and N78 (on helix II), is highly one H-bond. In the 299 interhelical H-bonds identified, conserved in the GPCR family. almost half involve Ser, Tyr, Thr, and His. Hydrogen When the numbers of intermolecular and intramo- bonds between two side chains and hydrogen bonds lecular helix–helix interactions are estimated for mem- between a side chain and a backbone carbonyl oxygen brane proteins in whole genomes, the totals are in the

167 Functions and families

A. B. C. S142 S115

Helix III 111: d 159: d L Helix IV P L 104: d L L 152: d H F L 145: d O S149 S108 A S 115: a O V S 156: a W S 108: a D S 149: a Ser Helix 2 L S 101: a Helix 1 Ser S156 S101 H S 142: a M L O I A O P L H I T S L G S F F V L

6.28 The serine zipper motif in helix–helix interactions. (A) Schematic representation of hydrogen bonding between two serine residues on two adjacent helices. (B) An example of a serine zipper in bovine cytochrome c oxidase. The two helices shown are helices III and IV, and the hydrogen-bonded serine residues are S101–S156, S108–S149, and S115– S142. (C) The pairs of serines can be predicted from helical wheels of helices III and IV from subunit 1 of bovine cytochrome c oxidase. A helical wheel designates residues i and i + 3 as a and d, and shows how they fall on the same side of the helix. Nonpolar residues are shaded pink. Three Ser–Ser pairs and two Leu–Leu pairs form a zipper at different a and d positions in this example. (A) and (B) redrawn from Adamian, L., and J. Liang, Proteins. 2002, 47:209–218. © 2002. Reprinted with permission from John Wiley & Sons, Inc.; (C) from Adamian, L., et al., J Mol Biol. 2003, 327:251–272. © 2003 by Elsevier. Reprinted with permission from Elsevier.

millions. Clearly an understanding of them will deepen A. B. as the number of high-resolution structures increases and will provide in turn a more complete insight for predicting membrane protein structure. Furthermore, helix–helix interactions are a critical step in the assem- bly of polytopic membrane proteins, as described in the W161 next chapter. Since the nonpolar residues are quite non- S159 Q86 specific in their interactions, the often conserved hydro- gen bonds between polar residues in TM segments must be crucial in helix alignments. N78 S155 T160 PROTEOMICS OF MEMBRANE PROTEINS

With thousands of membrane proteins predicted by genomic analysis, many questions are raised about their identities and interactions. Using TMHMM predictions 6.29 Polar clamp hydrogen bonding in helix–helix combined with PhoA/GFP fusions to localize the C ter- interactions. (A) Three residues in rhodopsin, W161, T160, mini, a global topology analysis of E. coli membrane and N78, form a clamp between helix IV and helix II. The proteins detected 601 inner membrane proteins. When side chain of N78 (on helix II) makes two hydrogen bonds: these proteins are sorted by their known or predicted its O atom is hydrogen-bonded to the N atom from W161, functions, 40% of them are involved in transport, while and one of its amide hydrogen atoms is hydrogen bonded nearly 20% are involved in metabolism, biogenesis, and to the oxygen of T160. (B) A polar clamp in subunit I of signaling (Figure 6.30). Over one-third are “orphans” cytochrome c oxidase from Thermus thermophilus involves with unknown functions. residues S155 and S159 from helix α4 and Q86 from The functions of some orphan membrane proteins can helix α2. From Adamian, L., and J. Liang, Proteins. 2002, be ascertained from their association with known pro- 47:209–218. © 2002. Reprinted with permission from teins in complexes. (The yeast two-hybrid analysis, an John Wiley & Sons, Inc. important proteomic tool for identifying protein–protein

168 β-Barrels

Metabolism - 7% Biogenesis - 6% Channels - <1% Signaling - 5% Flagellar - <1% Lipid - 4% Unknown - 36%

6.30 Functional categorization of the E. coli inner membrane proteome. The 737 proteins identified in the inner membrane are assigned to different functional categories. Redrawn from Daley, D. O., et al., Science. 2005, 308:1321–1323. © 2005. Transport/efflux - 7% Transport/influx - 33% Reprinted with permission from AAAS.

interactions in complexes, does not work with membrane except for the girdle of aromatic residues) along with proteins because loss of function in the fusion proteins their relatedness (amino acid composition and second- generated can result from loss of correct compartmen- ary structure element alignments). When applied to a talization, topology, or orientation in the membrane.) sample outer membrane protein of known structure, Complexes of membrane proteins can be detected by MPtopo accurately identifies the β-strands (Figure 6.31). two-dimensional gel electrophoresis using native gel elec- Other algorithms called PRED-TMBB, TMB-HMM, trophoresis for the first dimension, combined with mass TMBETAPRED-RBF, and TMBpro use computational spectrometry to identify the proteins. To carry out this approaches discussed above, including statistical pro- procedure with Gram-negative bacteria such as E. coli, pensities, nearest neighbor methods, neural networks, the inner and outer membrane fractions were first sepa- and hidden Markov models. A new algorithm to predict rated by gradient centrifugation and solubilized in a mild β-barrels based on OCTOPUS and MEMSAT-3 is called detergent, such as 0.5% n-dodecyl-β-D-maltoside. When BOCTOPUS. BOCTOPUS uses a combination of support native gel electrophoresis was carried out in Coomassie vector machines and HMM and correctly predicted 30 blue, complexes involving 44 inner membrane proteins out of 36 β-barrel proteins. The “Z-coordinate predictor” and 12 outer membrane proteins were isolated. The ZPRED can then construct a three-dimensional model results enabled roles to be assigned to six orphan pro- fairly accurately, except in the case of flattened (ellipti- teins of unknown function, in addition to identifying a cal) β-barrels such as OMPLA. number of known proteins. A majority of the inner mem- Application of these methods to genomes of different brane proteins in complexes are members of respiratory Gram-negative bacteria indicates that 2% to 3% of the chains (e.g., succinate dehydrogenase, F1F0-ATP syn- genomes encode β-barrel proteins, which corresponds to

thase, and cytochrome bo3 ubiquinol oxidase). Others are <10% of integral membrane proteins. These results sig- involved in biogenesis (including the SecYEG translocon; nify how many β-barrel proteins remain to be character- see Chapter 7) and some are transporters (including the ized: one-third of the approximately 100 β-barrels they intact PTS systems for mannose and galactitol, as well identify in the E. coli genome are unknown proteins. as ABC transporters for maltose and glutamine). The There are now 35 families of β-barrels, including 29 in trimeric porins in the outer membrane were also identi- Gram-negative bacteria. The others are in mitochondria, fied, as the method also recognizes homo-oligomers. chloroplasts, and the acid-fast Gram-positive bacteria, such as mycobacteria. There is no significant sequence similarity between members of different families, in β-BARRELS spite of frequent resemblances in the fold. Equipped with increasingly sophisticated computa- The prediction methods described above for genomic tional tools, bioinformatics of membrane proteins pro- analysis utilize algorithms that identify TM α-helices vides a wealth of information about families of membrane and are not useful for recognition of β-barrel mem- proteins and their characteristics, especially for the large brane proteins. One way to illustrate this is to apply the class of helical bundle proteins. It provides patterns and hydropathy analysis of MPtopo to the β-barrel proteins frameworks that clarify much of what is known about in group 3 of the program (see Box 6.3). Now MPtopo membrane proteins, and by viewing data on hundreds of also performs a β-barrel analysis, using one of several predicted membrane proteins with unknown structures methods available. These prediction algorithms make and functions, it stimulates the desire to know more use of the main characteristics of β-barrels (an even about the many important roles these proteins play. The number of β-strands, antiparallel strands, periplas- expansive reach of genomics will make bioinformatics mic N and C termini, aliphatic residues on the surface important for many years to come.

169 Functions and families

A. B. 6.31 Prediction of the Hatch β domain β-barrel domain topology of BtuB, a -barrel protein. The correlation 16 between predicted (A) and 14 actual (B) regions of the E. coli outer membrane 12 protein BtuB is emphasized 10 with color coding. The 8 arrows show the peaks that correspond to the β-strands 6 in the ribbon diagram of the -hairpin score

β 4 x-ray structure. From Wimley, 2 W. C., Curr Opin Struct Biol. 2003, 13:404–411. © 2003 0 by Elsevier. Reprinted with 0 100 200 300 400 500 600 permission from Elsevier. Amino acid number

Transporters FOR FURTHER READING Abramson, J. and Wright, E. M., Structure and function of Na+-symporters with inverted repeats. Enzymes Curr Op Str Biol. 2009, 19: 425–432. Carman, G. M., R. A. Deems, and E. A. Dennis, Lipid Boudker, O. and G. Verdon, Structural perspectives on signaling enzymes and surface dilution kinetics. secondary active transporters. Trends Pharm Sci. J Biol Chem. 1995, 270:18711–18714. 2010, 31:418–426. Busch, W., and M. H.Saier, Jr., The transporter DAGK classification (TC) system. Crit Rev Biochem Mol Van Horn, W. D. and Sanders, C. R., Prokaryotic Biol. 2002, 37:287–337. diacylglycerol kinase and undecaprenol kinase. Gruber, G., H. Wieczorek, W. R. Harvey, and V. Müller, Ann. Rev. Biophysics. 2012, 41:81–101. Structure–function relationships of A-, F- and Van Horn, W. D., H.-J. Kim, C. D. Ellis, et al., Solution V-ATPases. J Exp Biol. 2001, 204:2597–2605. nuclear magnetic resonance structure of Law, C. J., P. C. Maloney, and D. N. Wang, Ins and membrane-integral diacylglycerol kinase. Science. outs of major facilitator superfamily antiporters. 2009, 324:1726–1729. Ann Rev Microbiol. 2008, 62:289–305. Proteases Locher, K. P., Structure and mechanism of ATP- Beel, A. J., and Sanders, C. R., Substrate specificity binding cassette transporters. Phil Trans R of γ-secretase and other intramembrane Soc B. 2009, 364:239–245. (See also http:// proteases. Cell Mol Life Sci. 2008, 65:1311– nutrigene.4t.com/humanabc.htm) 1334. Pebay-Peyroula, E., and G. Brandolin, Nucleotide Erez, E., D. Fass, E. Bibi, How intramembrane exchange in mitochrondria: insight at a proteases bury hydrolytic reactions in the molecular level. Curr Opin Struct Biol. 2004, 14: membrane. Nature. 2009: 459:371–378. 420–425. O’Brien, R. J., and P. C. Wong, Amyloid precursor Tchieu, J. H., V. Norris, J. S. Edwards, and M. H. Saier, protein processing and Alzheimer’s disease. Ann Jr., The complete phosphotransferase system in Rev Neurosci. 2011, 34:185–204. Escherichia coli. J Mol Microbiol Biotechnol. 2001, Straub, J. E., and D. Thirumalai, Toward a molecular 3:329–346. theory of early and late events in monomer to Ziegler, C., E. Bremer, and R. Krämer, The BCCT family amyloid fibril formation. Ann Rev Phys Chem. of carriers: from physiology to crystal structure. 2011, 62:437–463. Molec Microbiol. 2010, 78: 13–34. Transporter Classification Database from the Saier Cytochromes P450 Lab Bioinformatics Group: www.tcdb.org Li, H., and T. L. Poulos, Crystallization of cytochromes The Ligand-Gated Ion Channel Database from P450 and substrate–enzyme interactions. Curr Top European Bioinformatics: www.ebi.ac.uk/ Med Chem. 2004, 4:1789–1802. compneur-srv/LGICdb/LGICdb.php

170 For further reading

Receptors Elofsson, A., and von Heijne, G., Membrane protein Nicotinic acetylcholine receptor structure: prediction versus reality. Ann Rev Chen, L., In pursuit of the high resolution structure of Biochem. 2007, 76:125–140. nicotinic acetylcholine receptors. J Physiol. 2010, Forrest, L. R., R. Krämer, and C. Ziegler, The structural 588:557–564. basis of secondary active transport mechanisms. Sine, S. M. and A. G. Engel, Recent advances in Biochim Biophys Acta. 2011, 1807:167–188. Cys-loop receptor structure and function. Nature. Hayat, S. and A. Elofsson, BOCTOPUS: improved 2006, 440:448–455. topology prediction of transmembrane β barrel Sixma, T. K., and A. B. Smit, Acetylcholine binding proteins. Bioinformatics. 2012, 28:516–522. protein (AchBP) . . . pentameric ligand-gated ion Kall, L., A. Krogh, and E. L. Sonnhammer, A combined channels. Annu Rev Biophys Biomol Struct. 2003, transmembrane topology and signal peptide 32:311–334. prediction method. J Mol Biol. 2004, 338:1027– Yakel, J. L., Gating of nicotinic Ach receptors: latest 1036. insights into ligand binding and function. J Physiol. Lehnert, U., Y. Xia, T. E. Royce, et al., Computational 2010, 588:597–602. analysis of membrane proteins: genomic occurrence, structure prediction and helix GPCRs interactions. Q Rev Biophys. 2004, 37:121–146. Katritch, V., V. Cherezov, and R. C. Stevens, Structure- Tsirigos, K. D., A. Hennerdal, L. Kall, and A. Elofsson, function of the G protein-coupled receptor A guideline to proteome-wide α-helical membrane superfamily. Ann Rev Pharmacol Toxicol. 2013, protein topology prediction. Proteomics. 2012, 12: 53:531–556. 2282–2294. Lebon, G., T. Warne, C. G. Tate, Agonist-bound Tusnady, G. E., and I. Simon, Principles governing structures of G protein-coupled receptors. Curr Op amino acid composition of integral membrane Str Biol. 2012, 22:1–9. proteins: application to topology prediction. J Mol Rosenbaum, D. M., S. G. F. Rasmussen, and B. K. Biol. 1998, 283:489–506. Kobilka., The structure and function of G-protein- Yarov-Yarovoy, V., J. Schonbrun, and D. Baker, coupled receptors. Nature. 2009, 459:357–363. Multipass membrane protein structure prediction Information system on G protein-coupled receptors: using Rosetta. Proteins. 2006:62:1010–1025. www.gpcr.org/7tm Transmembrane proteins in the PDB (Protein Data Protein–Protein Interactions Bank): http://pdbtm.enzim.hu Adamian, L., R. Jackups, Jr., T. A. Binkowski, and J. Liang, Higher-order interhelical spatial Predictions and Genomic Analysis interactions in membrane proteins. J Mol Biol. Daley, D. O., M. Rapp, E. Granseth, K. Melén, D. Drew, 2003, 327:251–272. and G. von Heijne, Global topology analysis of Stenberg, F., P. Chovanec, S. L. Maslen, et al., Protein the Escherichia coli inner membrane proteome. complexes of the Escherichia coli cell envelope. Science. 2005, 308:1321–1323. J Biol Chem. 2005, 280:34409–34419.

171 PROTEINTOOLS FOR FOLDING STUDYING AND MEMBRANE BIOGENESIS COMPONENTS: DETERGENTS AND MODEL SYSTEMS 73

Folding

A populated unfolded state in Insertion the membrane Coupled folding and insertion

Folding and insertion of helical bundle and β-barrel membrane proteins utilize different mechanisms, according to results from in vitro folding studies. From Bowie, J. U., Proc Natl Acad Sci USA. 2003, 101:3995–3996.

With an appreciation of the structural characteristics α-helical and β-barrel classes. While the folding mecha- and functional diversity of membrane proteins, the nisms determined by in vitro studies of membrane pro- question of their biogenesis arises. How does a nas- teins may differ significantly from mechanisms of their cent peptide, a chain of amino acids emerging from the biogenesis in the cell, these studies do provide insights ribosome, fold into a three-dimensional structure and into the necessary steps, as well as their stability and insert into the membrane bilayer? Evidence suggests their lipid requirements. Thermodynamic analysis of the that folding and insertion are coupled processes in cells. in vitro folding process gives insights into the evolution However, given the nature and complexity of these proc- of the complex machinery used to assemble membrane esses, different approaches are taken to study folding proteins in cells. Finally, the in vitro studies provide valu- and insertion in vitro. able practical information on refolding techniques that Protein folding studies give information about the are applicable to other membrane proteins that have thermodynamic forces that drive folding and about its been denatured during purification. kinetic pathways, identifying transient but detectable Insertion of nascent proteins into the membrane intermediates. Recently these techniques have been involves their translocation out of the cytoplasm by applied to purified membrane proteins of both the the same export machinery used to secrete proteins.

172 Protein folding

There are strong similarities between prokaryotic and tions and orientations of all its atoms during the folding eukaryotic protein translocation systems, including rec- process. The native protein samples this conformational ognition of the signal for export, structure of the main space by dynamic motions, with small, fast fluctuations translocation apparatus, and mechanisms for insertion between configurations. into the bilayer. Progress in defining and characterizing This dynamical complexity is overlooked in the ele- the components of these systems is leading to a detailed mentary form of the folding reaction, which views it picture of the processes involved in export and integra- as a two-state process: U ↔ N, where U is the unfolded tion of proteins. This chapter views how membrane protein and N is the native (folded) protein. For small, proteins fold in vitro before turning to the complex cel- one-domain globular proteins, the unfolding process lular process of translocation and integration of nascent appears to be a first-order transition, which typically proteins into the membrane. It ends with a look at how exhibits cooperativity (Figure 7.1). When this reaction is misfolding of membrane proteins can lead to diseases fully reversible, it can go to equilibrium, at which point Δ =−=− in humans. Gfolding GNU G RTlnK eq

where Keq = [N]/[U]. Keq is also equal to (1 – α)/α, with α defined as the average fraction of unfolding, so it is PROTEIN FOLDING directly obtained from equilibrium folding or unfolding data. Before discussing folding studies of membrane proteins, In most unfolding studies, the environment of the it is useful to review some principles established with protein is simply aqueous solvent; for membrane pro- protein folding studies using purified small soluble pro- teins, it is complicated by the presence of either deter- teins, with which both theory and methodology for in gent or lipids due to the need for an amphipathic system vitro folding studies were developed. Typically the pro- that provides a nonpolar domain to solubilize the native tein of interest is unfolded with high concentrations of a protein. However, once such a system has been defined, denaturant such as urea or guanidinium hydrochloride, it can be exploited to give additional information, such whose removal by dilution or dialysis triggers refolding. as demonstrating the importance of bilayer elasticity by Temperature and extremes of pH can also be used to obtaining thermodynamic parameters in the presence of cause denaturation, although frequently it is not revers- different lipids, as shown below. ible in these cases. Of course, solution conditions includ- ing temperature, pH, and ionic strength are critical in all Folding α-helical Membrane Proteins folding studies. A standard assay follows the changes in intensity and Because integral membrane proteins require the spe- wavelength of the intrinsic fluorescence of the protein: cial environment of the lipid bilayer (as described in As it unfolds, the aromatic side chains move to more Chapter 4), their in vitro folding process includes their polar environments and their fluorescence emissions insertion into a lipid bilayer, micelle, or other model shift to longer wavelengths. A complementary method membrane. An important early proposal for the mecha- monitors the loss of secondary structure in the protein nism of spontaneous membrane insertion was the helical using circular dichroism, a method of spectroscopy that hairpin hypothesis. It described how two hydrophobic detects secondary structure in a peptide backbone by α-helices connected by a hairpin turn can more favora- following the differential absorption of circularly polar- bly penetrate the nonpolar region of the bilayer as a pair ized light. Other techniques include monitoring changes of helices than as unfolded peptides (Figure 7.2). The with ultraviolet and visible absorption, FTIR spectros- next important development built on the recognition copy, FRET, Raman resonance spectroscopy, NMR, and that TM helices of a polytopic membrane protein could even AFM. Both kinetic and equilibrium measurements independently insert into the membrane as a first step in are informative: Kinetic experiments reveal intermedi- assembling the helical bundle, the premise of the widely ates in the folding pathway, and equilibrium measure- accepted two-stage model for folding membrane pro- ments give thermodynamic constants if the process is teins (Figure 7.3). In stage I, the hydrophobic α-helical reversible. TM segments partition into the bilayer, and then in stage An intact peptide chain can fold to a vast set of dif- II, they assemble by packing together. For some pro- ferent configurational isomers, whose relative free ener- teins, there is a third stage that involves the binding of gies are determined by the interactions within the fold prosthetic groups, the folding of loops, the assembly of and the surfaces they present to the environment. A few oligomers, or even the movement of other segments into configurations have a relatively low free energy, corre- the bilayer region of the protein, all of which can occur sponding to multiple minima in the energy landscape once the bundle of helices creates an environment that that depicts how the energy of the system (comprising is both more organized and more polar than the bilayer all configurations of the protein) depends on the posi- itself.

173 Protein folding and biogenesis

A. Stage I of the two-stage model describes the inser- 0 tion of each individual α-helix as driven by the hydro-

70 ) phobic effect and stabilized by hydrogen bonding along –3 the backbone. The prediction that TM segments can insert independently is supported by the partitioning 60 behavior of peptides that result from either gene splic- ing that cuts polytopic membrane proteins into sepa- 50 rate TM pieces or synthesis of peptides that correspond to single TM domains of membrane proteins. Further- more, the TM segments observed in known structures 40 of membrane proteins fit quite well with those pre- dicted based on hydrophobicity algorithms, support-

Fluorescence intensity ing the idea that they each could interact with the lipid 30 nonpolar domain before clustering together to decrease

–10 Molar ellipticity at 220 nm ( 10 protein–lipid contacts. For thermodynamic analysis, stage I can be further 0 0.5 1.0 1.5 2.0 separated into three stages: partitioning, folding, and insertion (Figure 7.4). As described in Chapter 4, the GdmCl (M) hydrophobic side chains provide more than enough free energy for partitioning, even considering the changes B. in solvation of the protein (or peptide), the perturba- 1 tion of the lipid by the protein, and the loss of degrees of freedom of the protein. Folding to α-helix is likely to be induced by partitioning, when the ΔG of partitioning the folded peptide is more negative than the ΔG of par- titioning the unfolded peptide (see Box 7.1). Insertion of the helix to span the bilayer restricts both its rotational 0.5 degrees of freedom and its space, in addition to order- ing the nearby acyl chains. These entropic costs are also compensated by the hydrophobic effect.

Fraction unfolded Since stage I is determined by interactions relating to the hydrophobic effect, other factors must drive the assembly of helices in stage II. These factors include intrinsic forces such as packing, electrostatic effects, and 0 interactions among the loops between helices, as well as 0 0.5 1.0 1.5 2.0 interactions with prosthetic groups or with components GdmCl (M) at the surface of the membrane. Hydrogen-bonding side chains (Asp, Asn, Glu, Gln, Ser, Thr, Tyr, His) are often 7.1 Folding studies of the soluble enzyme important in helix–helix interactions (see Chapters 4 and phosphoglycerate kinase. The data for unfolding 6). As helix packing increases helix–helix interactions, it phosphoglycerate kinase show a two-state mechanism. also increases lipid–lipid interactions while decreasing (A) Unfolding as a function of guanidinium chloride (GdmCl) helix–lipid interactions. This effect lessens the entropic concentration is monitored by fluorescence (closed cost of helix–lipid interactions due to the packing of circles) and circular dichroism (open circles). Samples are relatively “soft” lipids against the rigid, relatively “hard” allowed to go to equilibrium. The steepness of both curves helices. indicates that unfolding is a highly cooperative process. Helix packing is sufficiently close for helix–helix (B) When the fraction unfolded is plotted as a function interactions to be dominated by van der Waals forces. of the concentration of GdmCl, the two methods give The tight packing between two helices typically involves the same unfolding curve, consistent with a two-state “knobs” formed by branched residues such as valine and mechanism. Redrawn from Creighton, T. E., Proteins: isoleucine fitting into “holes” formed by glycine residues. Structures and Molecular Properties, 2nd ed., W. H. This was originally seen in the dimerization of glycoph- Freeman, 1992, p. 288. © 1992 by W. H. Freeman and orin A, a protein with only one TM helix, as described Company. Used with permission. in Chapters 4 and 6 (Figure 7.5). Similar knob-into- hole packing has been observed in many multispanning (polytopic) membrane proteins. Dimerization motifs for helix–helix interactions have been identified in many

174 Protein folding

C N C N

G0 ~ O 7.2 The helical hairpin hypothesis 1 Cytoplasm describes the insertion of a hairpin structure composed of two helices into the nonpolar interior of the bilayer. It is driven by the free energy arising from burying G0 46 G0 60 3 2 hydrophobic helical surfaces, estimated to be –60 kcal mol–1 for a pair of membrane- C N spanning helices. The alternate pathway C N Polar headgroups of inserting a random coil prior to folding in the bilayer, with a free energy change of G0 106 Hydrophobic 4 +46 kcal mol–1, is so unfavorable that the lipid region insertion of an unfolded peptide effectively cannot occur. Redrawn from Engelman, Polar headgroups Helix Helix D. M., and T. A. Steitz, Cell. 1981, 23: 2 1 411–422. © 1981 by Elsevier. Reprinted Helical hairpin with permission from Elsevier.

membrane proteins, but as they are absent from helix bilayers. When its seven helices (A to G, see Figure 5.6) pairs in others, sequence motifs are not sufficient to were combined as two fragments, such as AB + CDEFG, explain helix packing. bR usually assembled as a stable protein of the correct While insertion and packing of separate TM helices are structure. However, some fragment combinations, such emphasized in the two-stage model, the loops between as ABC + DEFG, produced less stable proteins. Therefore helices can also be critical. This was demonstrated in the connecting loops between helices C and D, and also experiments that tested the ability of bacteriorhodopsin between E and F, while not indispensable for assembly, (bR, described in Chapter 5) expressed as two fragments are essential for the stability of the native protein. A dif- to assemble into stable, functional molecules in lipid ferent approach tested the contribution of each loop by

B.

A.

7.3 The two-stage model for folding helical membrane proteins. (A) The two-stage model for folding helical membrane proteins consists of (I) helix insertion and (II) helix interaction. In stage I, the stability of individual helices is postulated to allow them to insert as stable domains in the lipid bilayer. In stage II, side-to-side helix association results in a folded and functional protein. From Popot, J.-L., and D. M. Engelman, Annu Rev Biochem. 2000, 69:881–922. © 2000 by Annual Reviews. Reprinted with permission from the Annual Review of Biochemistry, www.annualreviews.org. (B) A third stage for some proteins involves additional folding steps, such as loop rearrangement (top), or addition of prosthetic groups (P, bottom). From Engelman, D. M., et al., FEBS Lett. 2003, 555:122–125. © 2003 by the Federation of European Biochemical Societies. Reprinted with permission from Elsevier.

175 Protein folding and biogenesis

Interface (i) HC core (h)

Gif Gihf G Interface ha path Gwiu Gwif Gwhf Gwha Water path

Gwf Gwa Unfolded (u) Folded (f)

Partitioning Folding Insertion Association (a)

7.4 The four-step model for the thermodynamics of folding helical membrane proteins breaks stage I into three steps: partitioning, folding, and insertion. All four steps are represented occurring in the aqueous phase as well as at the bilayer, with the possibility

of partitioning at each step. The process can follow either the ΔGwiu or ΔGwif step, or a combination of both. The ΔG symbols indicate the standard transfer free energies, with subscripts indicating the steps: w, water; i, interface; h, hydrocarbon core of the bilayer;

u, unfolded; f, folded; and a, association. For example, ΔGwif is the standard free energy of transfer from water to interface of a folded peptide. This allows the overall standard transfer free energy to be described by a sum of contributions from the various steps, which may be obtained from different types of experiments. From White, S. H., and W. C. Wimley, Annu Rev Biophys Biomol Struct. 1999, 28:319–365. © 1999 by Annual Reviews. Reprinted with permission from the Annual Review of Biophysics and Biomolecular Structure, www.annualreviews.org.

genetically replacing each with a linker of Gly–Gly–Ser of the same length, and showed that four loops (AB, CD, BOX 7.1 ENERGETICS OF FOLDING AND INSERTING α EF, and FG) contribute to the resistance of bR to SDS. A HYDROPHOBIC -HELIX INTO THE BILAYER Furthermore, only helices A, B, D, E, and also C when While numerous peptide studies have been carried its aspartate residues are protonated, are able to insert out, it is not feasible to obtain values of all the independently into phospholipid vesicles, as expected by free energy changes depicted in Figure 7.4 from the two-stage model. A great deal has now been learned experiments, for several reasons. First, most about bR folding in vitro. peptides that are sufficiently hydrophobic to partition into the membrane will aggregate in water and bR Folding Studies possibly in the interfacial region of the lipid bilayer as well. Without changing experimental conditions, bR was the first integral membrane protein to be fully most such peptides will either partition to the unfolded and then refolded to regain its activity. Early interface or perhaps insert across the bilayer, and studies demonstrated its complete unfolding requires it is difficult to control the insertion step. Finally, formic acid or anhydrous trifluoroacetic acid; when peptides that form a helix in the bilayer will not transferred into SDS, it regains about half its helical con- unfold in the bilayer. tent, and then when added to retinal and mixed micelles The last of these reasons is illustrated by (e.g., cholate or CHAPS and lipid), it refolds to the native considering the energetics associated with folding structure. Since bR sufficiently unfolds in SDS to lose and insertion of a peptide of 20 hydrophobic amino its native structure and its chromophore, kinetic stud- acids. When the peptide forms an α-helix in aqueous ies can follow the folding and insertion of SDS-unfolded solution, the net number of hydrogen bonds does bR into micelles (or bilayers) using circular dichroism, not change significantly, since there is an exchange fluorescence, or absorption spectroscopy. The protein of some of the hydrogen bonds with the water for in the absence of retinal is called bacterio-opsin (BO). hydrogen bonds along the helical backbone. When Binding of retinal to BO is marked by the return of

176 Protein folding

pathway consistent with the kinetic data from folding BOX 4.1 (continued) SDS-denatured bR is: the peptide partitions into the bilayer, the favorable R ⇒ partitioning of its nonpolar side chains into the BO I12 I IR BR bilayer gives an estimated free energy change The first intermediate, I1, is accompanied by changes –1 on the order of –30 kcal mol , mainly due to the in the surrounding lipid/detergent solvent. The second hydrophobic effect discussed in Chapter 1. This is intermediate, I , has native-like secondary structure Δ 2 the favorable G for inserting the hydrophobic helix but lacks some tertiary contacts of the native protein. into the bilayer. In contrast, if the unfolded peptide Formation of I2 is rate-limiting in apoprotein forma- transfers from water to the nonpolar milieu of the tion, and this folding event must occur before retinal bilayer, it will dehydrate and the loss of hydrogen can bind. I2 has a binding pocket that enables it to bind –1 bonds is estimated to cost about 40 kcal mol . retinal noncovalently, forming the intermediate Ir. For- Inserting the backbone of the helix into the mation of the Schiff base between retinal and Lys216 nonpolar region of the membrane is unfavorable, converts IR to bR. The intermediate IR has at least two since the hydrogen-bonded peptide bonds are forms distinguished by their absorption maxima at 380 hydrophilic. The cost of moving each peptide bond and 440 nm, which are populated differently at differ- Δ from the water to the bilayer, based on the Gtr ent values of pH, suggesting there are parallel folding obtained for partitioning an internally hydrogen- pathways to the same native structure. Since the inter- bonded peptide bond from water into an alkane, mediates in bO folding are different from those in bR + –1 is 2.1 kcal mol . Including the partitioning of unfolding, pseudo two-state equilibria are attained the αC of each residue brings that cost down to by unfolding bR in SDS to IR (which has ∼56% of the + –1 1.15 kcal mol , which can be viewed as the per α-helix content of folded bR and binds retinal) and then α residue cost of transferring a polyglycine -helix into refolding. This system gives good agreement between the bilayer. It means the full cost of dehydrating the the apparent equilibrium denaturation curves and the α –1 backbone of a 20-residue -helix is 23 kcal mol . kinetic measurements. What is the cost if the peptide is not folded The nature of the folding intermediate can be probed Δ into a helix? The Gtr for a peptide bond that is with a powerful method developed to characterize tran- not internally hydrogen bonded to partition into sition states of soluble proteins. This method is called + –1 alkane is 6.4 kcal mol . Again, partitioning of Phi (ϕ) value analysis and involves measuring the α –1 the C is favorable, –0.95 kcal mol , so the cost changes in the free energy of intermediate, transition, of partitioning a residue of an unfolded peptide and folded states of a protein due to single mutations. into the nonpolar milieu of the bilayer is about With conditions established for carrying out equilibrium + –1 5.4 kcal mol . folding reactions and studying the folding kinetics, the For a helix in the bilayer to unfold, the per residue † free energy of folding ΔGf and the activation energy ΔG + –1 –1 cost is therefore 5.4 kcal mol to –1.15 kcal mol are determined for mutant and wild type proteins. The (because the cost of dehydration has already been difference in free energies between the mutant and wild ∼ –1 paid), which comes to 4.2 kcal mol . This means type is the ΔΔG for each. The ϕ value is the ratio of the the cost to unfold a 20-residue helix in the bilayer is † differences ΔΔG /ΔΔGf and gives the change in transition –1 84 kcal mol ! state energy relative to the change in the folded state. Clearly, the energetics depend on the amino When ϕ = 1, the mutation has induced the same change acid composition of the TM helix. In the example of in both states, so the transition state has the same struc- glycophorin, the membrane-spanning helix has 19 ture as the folded state at the site of that mutation. When Δ hydrophobic residues. The G for its insertion is ϕ = 0, the structure at the site of the mutation is the same –1 estimated to be –36 kcal mol . Since the cost of in the transition state as in the unfolded state. With site- + –1 dehydrating the helix backbone is 26 kcal mol , the directed mutations in regions of the protein structure, ϕ –1 net energy favoring insertion is –10 kcal mol . value analysis reports whether that region is folded or unfolded at the transition state. To apply ϕ value analysis to bR, each residue of a sin- † the absorbance maximum at ∼560 nm, as well as by its gle TM helix was mutated to Ala and its ΔΔG and ΔΔGf quenching of intrinsic fluorescence. determined from refolding studies in DMPC–CHAPS Stopped-flow absorption spectroscopy and circular mixed micelles (Figure 7.6). Quite different results are dichroism reveal a rate-limiting, pH-dependent, and obtained for the Ala scans of Helix B and Helix G: In the lipid-dependent step that produces an intermediate transition state, helix B is mainly folded while helix G is with most of the normal helical content. This could be mainly unfolded. This result obtained from in vitro fold- attributed to the slow folding of some of the helices or to ing studies fits well with the mechanism of folding in the incomplete formation of all seven helices. The simplest cell (see below): Since the TM helices insert in order of

177 Protein folding and biogenesis

Val 80 N N

lle 76 lle 76

2.6 Gly 79 Gly 79 Val 80 Gly 79 Gly 83 Val 80 2.1 Val 84 2.6

Gly 83 Val 84 Val 84 2.6

Thr 87

Gly 83

7.5 “Knobs-into-holes” packing is observed at helix–helix interfaces in many helical membrane proteins. Glycophorin A dimerizes when the TM helices from two molecules associate, bringing the “knobs” of branched amino acid side chains into “holes” due to glycine residues. Viewed from the top (left), the close contacts between pairs of valine and glycine residues are evident. Viewed from the side (center), the proximity of such pairs allows the two helices to interact tightly. The third drawing (right) shows the close contacts between the two TM strands, with interatomic distances given in angstroms. From Smith, S. O., et al., Biochemistry. 2001, 40:6553–6558. © 2001 by American Chemical Society. Reprinted with permission from American Chemical Society.

synthesis, the C-terminal helix G is last to be synthesized pressure in the center of the bilayer. In bicelles consist- and last to be folded. ing of different proportions of DHPC (dihexanoyl PC, A detergent-free system for bR refolding provides the chain length of six) and DMPC (chain length of 14), the ability to investigate the effect on folding of the state of bending rigidity of the bilayer is calculated to increase the bilayer lipids, in particular the effect of lateral pres- two-fold as the mole fraction of DMPC increases from sures that derives from the tendency of each leaflet to 0.3 to 0.7. In refolding experiments in this system the curve away from the plane of the bilayer (see Figure rate of bR folding decreased ten-fold over the same 2.18). This curvature stress confers bending rigidity to range, revealing a clear effect of lateral pressure of the the bilayer, decreasing its elasticity. Such stress can be bilayer on kinetics of bR folding. These results suggest introduced into reconstituted bilayers by mixing bilayer- that BO has more difficulty inserting into stressed, rigid forming lipids with a nonbilayer former (e.g., PC and bilayers as it refolds to bR. PE) and can be altered by varying acyl chain compo- From ϕ value analysis of its folding transition to its sition; for example, it is reduced with a shorter chain refolding in different conditions of bilayer stress, in vitro length that decreases crowding (pressure) near the mid- folding studies with bR have started to produce a sophis- dle of the bilayer. ticated understanding of the interplay of the protein The yields of bR formed from BO added to retinal and with the membrane during the folding process. The ele- different lipid vesicles are clearly affected by curvature gant folding studies of bR provide a paradigm for other stress of the bilayer (Figure 7.7). The vesicles are made helical membrane proteins. of DPoPC (dipalmitoleoyl PC) under conditions that give 70% refolding. The yield increases with addition of Folding Studies of β-barrel Membrane Proteins DMPC, and even more with addition of a lysophospholi- pid having one acyl chain removed to reduce the lateral How do β-barrels fit the model for folding and insertion pressure. The yield decreases with the addition of PE of membrane proteins? They are fundamentally unlike with unsaturated acyl chains, which increases the lateral bundles of α-helices, because each helix can be formed

178 Protein folding

A. B.

B.Box Transition state structure

F=0 F=1 Transition state

7.6 ϕ value analysis of the transition state in bacteriorhodopsin folding. (A) A ribbon representation of bR (gray, with the cytoplasmic side at the top) shows the residues of Helix B mutated to Ala for ϕ value analysis. Most of the mutations give ϕ > 0.8 (deep red color), meaning they are sites that are folded in the transition state; Y43A and T46A give intermediate values that are harder to interpret. From Curnow, P. and P. J. Booth, Proc Natl Acad Sci USA, 2009, 106:773–778. (B) Schematic representation of the ϕ value data for Helix B and Helix G shows that in the transition state Helix B is largely folded (red), while Helix G is more like the unfolded state (blue). From Booth, P. J., Curr Op Str Biol. 2012, 22:465–475.

independently with hydrogen bonds along the helix axis while β-barrels have hydrogen bonds between neighbor- ing strands, even between the strands closest to the N and C termini. A single β-strand is not a stable struc- ture; rather, formation of at least three to five strands is required for stability in all β-structures, including domains of soluble proteins. Furthermore, geometric considerations make it difficult to envision a portion of a barrel folding first, so all the strands of a β-barrel can be expected to form at roughly the same time. Since the β-barrel proteins are found in outer membranes, they fold and insert from an aqueous environment, such as the periplasm of Gram-negative bacteria. The first and most extensive in vitro folding studies of a β-barrel membrane protein have been done with the

7.7 Effect of bilayer curvature stress on folding of β-barrel domain of OmpA (OmpABB), an eight-stranded bacteriorhodopsin. Three different lipids are added to monomer that can be unfolded in urea and refolded DPoPC: DpoPE (red circles), which increases curvature upon dilution into a suspension of PL vesicles. OmpA stress; DMPC (black triangles), which decreases curvature is an abundant protein in the outer membrane of E. coli stress; and lyso-PPC (blue squares), which relaxes the that exhibits mobility shifts on SDS polyacrylamide gel bilayer still further. The folding yields are determined electrophoresis upon folding. From the structure solved from the area under the purple absorption band of folded with NMR (see Box 5.1), the positions of its five tryp- functional bR. From Allen, S. J., et al., J Mol Biol. 2004, tophan residues are oriented such that four of them 342:1293–1304. © 2004 by Elsevier. Reprinted with are near one end of the barrel while the fifth is close to permission from Elsevier. the opposite end. During assembly of the β-barrel, four

179 Protein folding and biogenesis

A. (a) (b) (c) (d)

B.

B.key

7.8 Folding studies of the TM barrel domain of OmpA. (A) Schematic drawing of four stages in the insertion of OmpA, with the locations of the Trp residues at each stage determined from fluorescence quenching with lipids carrying Br at different positions. In stage A, each circle represents a Trp residue; in B–D the Trp side chains are stick figures (purple). From Otzen, D. E. and K. K. Andersen, Arch Biochem Biophys. 2012, in press. (B) Schematic drawing to show when associations of neighboring β-strands during folding and insertion of

OmpABB produce quenching of individual Trp residues due to nitroxyl labels on the indicated

Cys residues (circles colored according to key). Strands β1, β2, and β3 are contiguous, while

β1 and β8 only become neighbors when the barrel is formed. Closing of the barrel is required for insertion into the trans leaflet of the bilayer (E). Different intermediates can be trapped at different temperatures, with temperatures >20°C required to reach the final folded stage. From Kleinschmidt, J. A., et al., J Mol Biol. 2011, 407:316–332.

180 Protein folding

Trp residues must translocate to the outer leaflet of the bilayer, crossing its midsection, while the fifth remains

in the inner leaflet. Using a series of OmpABB mutants with all but one of the Trp residues deleted, the loca- tions of the remaining Trp could be determined during the folding process with a set of “spectroscopic rulers,” lipids carrying a quenching bromine atom at different positions on their acyl chains (Figure 7.8a). As predicted, during insertion four Trp residues cross the bilayer while the fifth does not. The time course of insertion suggests a model for β-barrel assembly from a flattened disc com- posed of β-strands at the interface to an inserted “molten globule” as the β-strands penetrate the bilayer in a rela- tively slow temperature-dependent step. To further analyze the association of neighbor-

ing β-strands during the insertion of OmpABB, single Cys residues were introduced into strands adjacent to the position of the single Trp residues in the mutants (Figure 7.8b). To measure proximity between them dur- ing folding, the Cys was reacted with a nitroxide spin label that quenches fluorescence only at short ranges. Fluorescence was monitored during folding from urea into DOPC SUVs over a time of 60 min. Pairs of Trp/Cys on the trans-side of β-strands 1, 2, and 3 reached close proximity sooner than pairs on β-strands 1 and 8 (which come together to seal the barrel), while the last pairs to come close were on the cis-side (periplasmic side) of 7.9 Residues in OmpABB used in FRET experiments. β-strands 1 and 2 The observed folding intermediates are Single Trp mutants of OmpABB were also used in FRET consistent with simultaneous β-strand association and experiments, using Trp15, Trp57, and Trp143 as donors. insertion into the bilayer. The acceptor was a cysteine-linked naphthalene derivative β Similar results are obtained with FRET experiments placed at W7C or W57C. The distances between -carbons designed to follow folding by placing an acceptor mol- of the donor and acceptor in the folded protein are given. ecule (a naphthalene derivative) in various locations Note that W7 and W15 are on the same strand. During β relative to single Trp residue donor (Figure 7.9). FRET folding the -barrel forms as the strands insert into the spectra show decreasing distances between donor and membrane (direction of arrow), which brings the FRET pairs close enough for fluorescence energy transfer. From Kang, acceptor as OmpABB folds from urea into DPPC over a period of 15–60 min for all donor–acceptor pairs except G., et al., Biochim Biophys Acta. 2012, 1818:154–161. for one, where they are on the same strand and loop together in the unfolded state. The data support a model of initial insertion of a compact pore, followed by slower Refolding was observed both by fluorescence and elec- traversing of the bilayer. trophoretic mobility. The reference bilayer was POPC

The full length OmpA protein (OmpAFL) consists with 7.5 mol% POPG, which gives two-state folding with –1 of OmpABB (171 residues) and a periplasmic domain, a free energy of unfolding of 3.4 kcal mol . The ΔG° for

OmpAPER (154 residues). Since OmpAPER lacks Trp resi- unfolding was linearly proportional to chain lengths dues, CD spectroscopy is used to monitor unfolding and when saturated and monounsaturated acyl chains were

refolding of OmpAPER and to investigate its effect on used, and inversely proportional to chain lengths when

OmpABB. When OmpAFL is unfolded in urea, aggrega- double-unsaturated acyl chains with more curvature tion competes with refolding upon dilution. However, stress were used (Figure 7.10). A possible explanation reversible unfolding and refolding of OmpAFL in octyl for the increasing folding rates with increased curva- maltoside micelles was accomplished with guanidinium ture is that the more curved surfaces bring the β-strands chloride and monitored on SDS polyacrylamide gel elec- closer together when they adsorb on the surface prior trophoresis (PAGE), and produced kinetic results that to insertion. Refolding also increased with increased indicate the existence of a folding intermediate. lateral pressure due to addition of PE. In spite of their

Folding of OmpABB is also affected by stresses in different mechanisms for folding, the studies with both the bilayer created by variations in lipid compositions bR and OmpA make it clear that folding and stabil- that introduce curvature or increase lateral pressure. ity of membrane proteins are affected by the material

181 Protein folding and biogenesis

8 di-C14:1PC di-C16:1PC A. di-C PC 18:1 Individual leaflets ) 6 di-C PC 1 20:1

prefer to curve

4 C18:0C18:1PC

C16:0C18:1PC (kcal mol

O 2 2 di-C14PC H , u di-C12PC G 0 B. di-C PC 10 Curvature frustration 2 when forced into bilayer

15 20 25 30 35

dhydrophobic (Å) 7.10 Folding studies of the TM barrel domain of OmpA reveal the effects of bilayer stresses. The graph shows –1 the free energy of unfolding (ΔG°u, in kcal mol ) as a function of the hydrophobic thickness of the PC bilayer. C. For saturated and monounsaturated chains (filled circles), Curvature frustration the ΔG°u increases linearly with chain length (thickness), relieved by hourglass- while for cis double-unsaturated acyl chains (open circles), shaped protein it decreases linearly with chain length. From Hong, H., and L. K. Tamm, Proc Natl Acad Sci USA. 2004, 101:4065– 4070. © 2004 by National Academy of Sciences, USA. Reprinted by permission of PNAS. Cylindrical lipid Cone-shaped lipid (bilayer) (nonbilayer)

characteristics of the bilayer, in particular the bending 7.11 Effect of protein insertion on bilayer curvature rigidity resulting from curvature stress (Figure 7.11). stresss. Bilayer stress resulting from incorporation of a nonbilayer lipid (cone shaped) can be relieved by insertion Other Folding Studies of a protein with an hourglass shape. Redrawn from Bowie, J., Nature. 2005, 438:581–589. © 2005. Reprinted by Classical in vitro folding studies in lipids or detergents permission of Macmillan Publishers Ltd. have now been carried out with many other mem- brane proteins, including the enzyme DAGK, the lac- tose permease LacY, the potassium channel KcsA, and The Whole Protein Hydrophobicity Scale the β-barrels OMPLA and PagP. Some studies have employed different strategies to gain different types Folding studies of the β-barrel enzyme outer membrane of information about folding membrane proteins. For phospholipase A (OMPLA, see Chapter 9) provide quan- example, the stability of various mutants of DAGK was titative assessment of the effect of substituting different assessed with thermodynamic analysis of refolding the amino acids into a TM segment that is the basis for a new partially denatured protein. When DAGK is reversibly hydrophobicity scale (see Chapter 6). The spontaneous, denatured in SDS, it retains most of its α-helical content reversible unfolding of OMPLA into DLPC LUVs at pH 3.8 and the refolding that takes place upon the removal of is followed by Trp fluorescence, and refolding is verified SDS is sufficient to differentiate the stability of different by enzyme assay and protease protection. OMPLA is the mutants. host for guest substitution of every other amino acid at To study the TM portion of the KcsA potassium chan- Ala210, a position in the middle of the β-barrel exposed nel (see Chapter 9), a novel approach combined partial to lipid (Figure 7.12). By subtracting the free energy de novo synthesis with folding. The semisynthetic TM of unfolding the OMPLA host from the free energy of

protein was constructed by fusion of a recombinant pep- unfolding the host–guest, the difference (ΔΔGw,l) gives tide (residues 1–73) with a synthetic peptide (residues the energy of partitioning of the guest side chain from an 74–125). The refolding of this construct was then used to aqueous environment into the membrane relative to the define the lipid requirement for folding. alanine side chain in the wild type sequence. Comparison

182 Protein folding

B.

A.

C.

7.12 Folding of OMPLA as a host–guest system for measuring the membrane insertion energies of lipid-exposed amino acid residues. (A) Ribbon diagram of OMPLA showing the Ala210 site for guest substitutions (black sphere) at midplane (dashed blue line) of the transmembrane region. (B) The whole-protein hydrophobicity scale determined from the OMPLA host–guest system with each amino acid variant at position 210. The free energy of unfolding (ΔΔGw,l) of each amino acid variant is compared to the wild type OMPLA, with error bars for the standard errors of the mean from individual titration experiments. Residues are colored for types: F, I, L, M, P, V, W and Y (orange); C and G (gray); N, Q, S, and T (teal); D, E, H, K and R (blue). (C) Use of other guest sites shows how the energetics of side chain partitioning varies by depth in the membrane. The positions used are at residues 120,

164, 210, 212, 214, and 223 (black spheres). The ΔΔGw,l of leucine and arginine variants (relative to alanine variants) are shown aligned with the OMPLA image, along with normal distributions fit to the data. From Moon, C. P. and K. G. Fleming, Proc Natl Acad Sci USA. 2011, 108:10174–10177.

183 Protein folding and biogenesis

of ΔΔGw,l for every guest is the basis of the whole pro- proteins by a process that allows their lateral integration tein hydrophobicity scale (Figure 7.12b). Since low pH into the lipid bilayer and determines their topology. The is required for reversibility, the acidic residues are less process of integration and topogenesis (the genesis of charged than at neutral pH and partition more readily. topology) will be examined after describing the process However, the scale is the first to represent the thermody- for export of nascent (newly synthesized) proteins from namics of water to bilayer partitioning in the context of the cytosol and the partners involved in the transloca- a TM segment of a native protein. Further, by varying the tion step. position of the guest residue, the energetics of side chain partitioning can be determined as a function of depth in Export from the Cytoplasm the membrane (Figure 7.12c). Folding of membrane proteins can be a very impor- Membrane proteins are targeted for their destinations tant step of their purification after they are harvested in as they exit the ribosomes. In eukaryotes, they may denatured form from insoluble inclusion bodies, which enter the ER and end up in the plasma membrane or are lipid-bounded storage sites within the cytoplasm of endocytic organelles (the ER pathway), or they may fol- bacterial cells that are overproducing the proteins from low other pathways for mitochondrial, chloroplast, and high-expression vectors. An example of a membrane pro- nuclear membranes. The destinations in prokaryotes are tein that was refolded after it was obtained in inclusion the plasma membrane and, in Gram-negative bacteria, bodies is the enzyme OMPLA (described in Chapter 9). the outer membrane. Once targeted for export, the proc- The denatured OMPLA was solubilized in 8 M urea and esses of translation and translocation may be coupled diluted into Triton X-100, which produced a mixture of (cotranslational translocation) or sequential (posttrans- folded and unfolded OMPLA that was resolved by ion- lational translocation), each of which involve many spe- exchange chromatography to recover the native enzyme. cialized proteins (Figure 7.13). The choice of detergent is critical, as refolding studies The classical signal sequence, the first recognized, is under different conditions revealed a strong preference located at the N terminus of the nascent peptide and is for Triton X-100 in the case of OMPLA. cleaved during export. (For other types of signals, see Finally, the use of cell-free systems to synthesize mem- “Topogenesis in Membrane Proteins” below.) The sig- brane proteins reveals conditions needed for folding and nal sequence consists of ∼20 amino acids, with a few insertion of the protein of interest. Using in vitro tran- polar and basic residues followed by a stretch of 7–15 scription/translation systems in the presence of lipid ves- primarily hydrophobic residues before the polar cleav- icles or micelles, combinations of lipids and detergents age site at the beginning of the mature protein. When are tested. For example, in pioneering studies of the E. cleavage is blocked, the larger size of the precursor pro- coli outer membrane protein PhoE (see Chapter 5), fold- tein is readily recognized by SDS PAGE (see Box 7.2). ing and insertion require lipopolysaccharide. In similar Although not all N-terminal signals are cleaved, this studies of the E. coli lactose permease (see Chapter 11), size difference between precursor and mature forms of protein folding took place only with inside-out vesicles, the protein has been used to follow export in thousands mimicking the vectorial insertion of the lactose per- of experiments. mease into the inner membrane. The common prob- Signal sequences are highly degenerate and tolerate lems caused by expressing high amounts of eukaryotic many mutations as long as the nonpolar region is suf- membrane proteins in bacteria because their targeting ficiently hydrophobic and long enough (but not so long signals may not be recognized or their overexpression as to become a TM segment). This stretch of hydro- is toxic has led to increasing use of cell-free systems for phobic amino acids binds to a hydrophobic groove eukaryotic proteins. In these cases information gleaned on a chaperone or piloting protein, which recognizes from in vitro folding may be relevant to folding into cell the signal as it emerges from the ribosome, protects the membranes. nascent chain from aggregation, and targets it to the membrane. This role is often carried out by a ribonu- cleoprotein complex called the signal recognition par- BIOGENESIS OF MEMBRANE PROTEINS ticle (SRP). The SRPs in prokaryotes and eukaryotes have homologous core components, as do their recep- Folding and insertion of membrane proteins are just tors in the membrane (Figure 7.14). Both SRP and its two elements of the very complex process needed for receptor are GTPases, and when the SRP is docked, the assembly of proteins into cell membranes. The the catalytic sites on both SRP and the SRP receptor first steps of that process are shared by membrane and come together to form a unique catalytic chamber secreted proteins and utilize a complex apparatus called that binds two GTPs. Once bound, they stimulate each a translocon that has been conserved through eukaryo- other's GTPase activity. The hydrolysis of GTP by the tic, bacterial, and archael kingdoms. In later steps inte- complex releases SRP, which enables the ribosome to gral membrane proteins are differentiated from secreted dock on the translocon and resume translation to carry

184 Biogenesis of membrane proteins

7.13 Diagram illustrating posttranslational translocation and cotranslational translocation in bacteria. The Sec translocon (blue) spans the cytoplasmic membrane (CM) and consists of SecY, SecE, and SecG. SecA (red) is the peripherally associated motor protein. (A) Proteins synthesized at the ribosome (light yellow) that bind SecB (light blue) are brought to SecA at the translocon. Their posttranslational translocation may be aided by SecDF (gray) and or by YidC (yellow), and their signal peptide cleaved by signal peptidase (green). (B) Cotranslational translocation begins with the nascent chain recognized by SRP (pink), which brings it to the SRP receptor, FtsY (purple), and targets it to the translocon. (C) A subset of proteins can be inserted by YidC alone. The three pathways use different energy sources: SecA hydrolyzes ATP, SRP/FtsY and the ribosome hydrolyze GTP, and YidC uses the proton motive force (pmf). From du Plesis, D. J. F., et al., Biochim Biophys Acta. 2011, 1808:851–865. out cotranslational translocation. In E. coli, SRP and can insert into the translocon (see Chapter 4). A groove its receptor FtsY are required for integration of very called a clamp between the nucleotide binding domains hydrophobic proteins into the inner membrane. and the peptide binding domains is hypothesized to hold A different set of chaperones are used for posttrans- the peptide while a two-helix finger pushes it through the lational translocation (export that follows the synthesis translocon as SecA hydrolyzes ATP in repetitive cycles of the entire peptide chain) of less hydrophobic proteins, (Figure 7.15). An x-ray structure of SecA with the trans- including those destined for the outer membrane. In E. locon is shown in Figure 7.20, below. coli (and other bacteria) the signal sequence is recognized For posttranslational translocation in yeast (and by SecB, which brings it to SecA. SecB is a tetrameric probably other eukaryotes) several chaperones can bind chaperone that prevents newly synthesized proteins the chain as it emerges from the ribosome in the cytosol from folding or aggregating in the cytosol by binding to to prevent aggregation prior to export. In the ER lumen exposed hydrophobic surfaces without an expenditure of an essential chaperone called Bip captures the chain ATP. SecB has well-defined binding sites for SecA and during export and prevents it from sliding back into the pilots the nascent peptide to the SecA protein. Trans- cytosol. fer of the nascent peptide chains from SecB to SecA In addition to the purification and characterization of releases SecB and allows the initiation of translocation. the many components of the translocation process, its SecA is the motor, a cytosolic ATPase that provides the detailed mechanism has been probed with sophisticated energy for translocation. Highly conserved in bacteria, studies designed to follow the progress of a nascent pep- it functions as a dimer that binds to the membrane and tide. The first recognition of a nascent peptide destined

185 Protein folding and biogenesis

BOX 7.2 EVIDENCE FOR CLEAVABLE SIGNAL SEQUENCES INVOLVED IN PROTEIN TRANSLOCATION The protease protection assay distinguishes between translocated membrane proteins and secreted proteins and provides evidence that both can have cleavable signal sequences. The experiment uses SDS polyacrylamide gel electrophoresis (PAGE) to distinguish products of in vitro protein synthesis in the presence or absence of membrane vesicles, with and without added protease. When translation of a protein with a classical N-terminal cleavable signal sequence is carried out in vitro in the absence of membranes, a precursor containing the signal is produced. The precursor is not protected by membranes, so it is digested by added protease (sample 1 in Figure 7.2.1). When it is carried out in the presence of membrane vesicles, the signal inserts into the membrane, along with any TM segments of the protein (samples 2 and 3). Signal peptidase on the membrane cleaves the signal from both secreted and inserted proteins, resulting in mature forms that migrate faster on SDS PAGE as shown. To determine whether the protein is integrated into the membrane or secreted across it, a nonspecific protease such as proteinase K is added. If the protein is soluble and outside the vesicles, it is fully digested (sample 1); if it is integrated into the membrane, the external portion is digested, resulting in a smaller protein or fragments (sample 2); and if it is fully transported into the vesicle (secreted), it is protected from the protease and retains its size (unaltered mobility on the gel, sample 3). Many applications of this basic protocol have revealed the nature and role of signal sequences on hundreds of proteins.

Ribosome N

1 No translocation 1 1

C C 2 2 Simple membrane protein N N C N C N Signal Add external peptidase protease Full translocation of 3 3 3 secreted protein

SDS-PAGE Before external After external protease protease

123 123

Precursor Mature form

7.2.1 In vitro assay for protein insertion into membrane vesicles, utilizing protease sensitivity (top) and analysis by SDS-PAGE. Based on unpublished figure from Professor A. Driessen.

186 Biogenesis of membrane proteins

5 3 domain IV M NG Eubacterial SRP system Ffh E. coli GTP

GTP A FtsY NG Cytosol 7.14 The core components of the E. coli and human signal recognition particle (SRP) and its receptor are highly homologous. Both SRP54 in Periplasm eukaryotes and the Ffh protein of the E. coli SRP have three domains. The M domain is methionine- SRP19 rich and binds RNA (lavender). The N domain 3 4 is the α-helical domain, and the G domain is a SRP14 SRP72 7 6 nucleotide-binding site with GTPase activity (drawn 2 together in purple). The receptor, which is FtsY 5 5 5 8 asym sym 3 in E. coli and SRβ in eukaryotes, also has similar SRP9 SRP72SRP68 M domains, homologous N and G (light pink) and A, Alu S NG SRP54 the N-terminal domain (dark pink). The remaining GTP proteins of the eukaryotic SRP along with SR13 Eukaryotic SRP system GTP do not have homologs in E. coli, and the larger human SR GTP A SR NG eukaryotic RNA has an Alu domain that, along Cytosol with two of the proteins, functions in translational arrest. From Luirink, J., and I. Sinning, Biochim Biophys Acta. 2004, 1694:17–35. © 2004 by Endoplasmic reticulum Elsevier. Reprinted with permission from Elsevier.

SecB Preprotein

+ ATP

Signal sequence

SecA PPXD

Two-helix Clamp finger ATP ADP + Pi

Cytosol SecY 2

1 3 Periplasm

7.15 Model for the role of SecA in the posttranslational translocation of nascent polypeptides. SecA (gray) has a polypeptide crosslinking domain (PPXD, orange) with a clamp for binding the polypeptide, and a two-helix finger (light green). The SecB chaperone (lavender circles) binds the nascent chain with its signal sequence (red) and brings it to SecA. Step 1: this complex binds the SecY translocon (shown as a dimer, light blue), releasing SecB, and SecA inserts its two-helix finger into the translocon, carrying the bound precursor. Step 2: when SecA hydrolyzes the ATP, it pushes the polypeptide into the channel, then withdraws the two-helix finger from the translocon, while the peptide remains clamped within the channel. SecA binds ATP and another segment of the peptide chain to repeat the cycle (dotted line). Step 3: after sufficient repetitions, the peptide chain is fully translocated and SecA diffuses into the cytoplasm. From Park, E. and T. A. Rapoport, Ann Rev Biophys. 2012, 41:21–40.

187 Protein folding and biogenesis

for the membrane (or secretion) occurs in the exit tunnel The Translocon of the ribosome. Fluorescence and crosslinking experi- ments indicate that two proteins of the large subunit Remarkably, the insertion of nascent membrane pro- make contact with helical TM segments of nascent mem- teins into either ER membranes or bacterial plasma brane proteins and not with signal sequence segments of membranes utilizes the same translocation apparatus nascent secretory proteins. In cotranslational transloca- that exports most proteins bound for extracellular des- tion in E. coli, one of the proteins at the ribosome exit, tinations. Therefore the translocon is a protein-conduct- L23, makes contact with SRP and then with the trans- ing channel that functions in two dimensions, allowing locon when the ribosome docks at the membrane (see proteins to go into the membrane or to cross it into the Figure 7.13). lumen of the ER or the periplasm of Gram-negative bac- Once the SRP guides the ribosome translating a pep- teria. It does this while maintaining the permeability tide to the translocon, or SecB/SecA chaperone a fully barrier of the membrane even though its channel is large synthesized protein to it, the protein threads into the enough to accommodate at least one α-helix. translocon and either crosses to the periplasm or the ER The translocon is a passive pore that associates with lumen or inserts into the lipid bilayer. To track nascent various partners to provide the driving force for translo- TM segments as they move through the translocon and cation (see above). A universally conserved heterotrimer, into the lipid bilayer, photoactivated crosslinking groups it is called the Sec61 translocon in eukaryotes, with sub- are incorporated into nascent peptides at known loca- units named Sec61 α, β, and γ, and the SecY translocon tions and irradiated at different times (see Box 7.3). The in bacteria, with subunits SecY, E, and G (see Table 7.1). nascent peptides carrying TM segments were observed to Two of the three proteins are essential for viability, while crosslink initially to cytosolic components, then to sub- the third (β or SecG) is not. SecG stimulates transloca- units of the translocon, and then to lipid (Figure 7.16). tion as well as the ATPase activity of SecA in E. coli. Addi- The data suggest that the translocon has a dual pathway, tional protein partners in E. coli include the YidC protein a prediction borne out by structural studies. involved in insertion of hydrophobic TM segments into

BOX 7.3 CROSSLINKING TRACES NASCENT PEPTIDES THROUGH THE TRANSLOCON INTO THE BILAYER To investigate the path of nascent membrane proteins through the translocon, full-length or truncated proteins are engineered with sites for incorporation of chemical crosslinking agents. Translation intermediates of various lengths are generated using polymerase chain reaction (PCR) with different downstream primers to amplify portions of the gene before transcription to produce a set of truncated mRNAs. In vitro translation is carried out in the presence of 35S-methionine and is stopped with cycloheximide, which inhibits the peptidyl transferase activity of the ribosome. Many different experiments have employed crosslinking to study the fate of translocated proteins. For the protocol described here, the truncated mRNAs are designed to allow the incorporation of a photoactivatable derivative at a unique position in the TM segment of the protein to be studied by incorporating a UAG stop codon at the position of interest. A suppressor tRNA carrying the anticodon CUA will bind to the UAG stop codon and incorporate its amino acid into the growing chain, instead of letting translation terminate. When the suppressor tRNA carries trifluoromethyl-diazinyl phenylalanine, it incorporates this UV-sensitive crosslinking agent in that position (Figure 7.3.1). Only when irradiated will it react, forming a covalent bond with other molecules in its immediate environment. In addition, chemical crosslinking was carried out with bis-maleimidohexane, a homobifunctional sulfhydryl-reactive agent (Figure 7.3.1). It can react with two cysteines, linking one in the protein substrate to another within ∼15 Å. In this case, the targeted cysteine is in the Sec61β subunit.

O NH3 O O CH CH C tRNA N 2 N N O O N CF3

7.3.1 Chemical structures of the crosslinking agents used: trifluoromethyl-diazinyl phenylalanyl-tRNA (left), which is added during translation to incorporate a photoreactive derivative of Phe in the TM segment, and bis-maleimidohexane (right), a reagent for homobifunctional sulfhydryl-reactive crosslinking with a spacer arm of 13 Å).

188 Biogenesis of membrane proteins

BOX 7.3 (continued) The substrate protein for these experiments is a signal-anchor protein engineered for glycosylation at the N terminus as well as for crosslinking as described (Figure 7.3.2a). The TM domain from residues 19 to 41 (shaded gray) contains a site for incorporation of the photoreactive phenylalanine derivative (*, residue 28 labeled “stop codon”) as well as the cysteine (C) residue at position 35 for the chemical crosslinking. Below it the positions marked with (++) indicate where two arginine residues were inserted in experiments designed to test whether charges in the TM segment disrupted translocation and lipid partitioning (and they do). In vitro translation is carried out in the presence or absence of rough (i.e., not highly purified) microsomal membranes, and proteinase K is added to see at what point the translation intermediates are no longer protected from digestion across the membrane (see Box 7.2). The extent of glycosylation

7.3.2 (A) The signal-anchor membrane-protein used in photocrosslinking experiments. (B) Protection assay of radiolabeled chains of different lengths. Synthesis of the signal-anchor nascent chains of different lengths in the presence and absence of rough microsomes (RM) was followed with and without treatment with proteinase K (protK) by SDS-PAGE and autoradiography. From Heinrich, S., et al., Cell. 2000, 102:233–244. © 2000 by Elsevier. Reprinted with permission from Elsevier.

(continued)

189 Protein folding and biogenesis

BOX 7.3 (continued) indicates whether the N terminus extends into the lumen of the microsomes. The results in Figure 7.3.2b are autoradiographs from experiments in which radiolabeled chains are synthesized from different length messages in the presence or absence of rough microsomes and in the presence and absence of proteinase K, as indicated. Samples are analyzed by SDS PAGE prior to autoradiography. The 35S-labeled products show that at least 61 amino acids (the 61mer) are needed to see some

7.3.3 Chain length requirement for membrane integration. (A) Chains of different lengths were crosslinked by exposure to ultraviolet irradiation and the membranes were sedimented at alkaline or neutral pH. Pellets (P) and supernatant (S) were analyzed. (B) Immunoprecipitation with antibodies to Sec61a showed which chains interacted with the translocon.

(C) Treatment with phospholipase Az showed which chains interacted with lipids. From Heinrich, S., et al., Cell. 2000, 102: 233–244. © 2000 by Elsevier. Reprinted with permission from Elsevier.

(continued)

190 The translocon structure

BOX 7.3 (continued) protection from proteolysis by the addition of microsomal membranes (bands marked with arrows), indicating the ribosome-nascent chain complex has reached the translocon at that length. By the length of 85 amino acids (85mer), glycosylation can take place, producing slower-migrating bands (marked with stars), which indicates that the chain is emerging on the other side of the membrane. (Open arrows and open stars mark nonproteolyzed nonglycosylated and glycosylated chains, respectively; closed arrows and closed stars mark membrane-protected nonglycosylated and glycosylated chains, respectively, in the 85mer to the 154mer.) When samples are irradiated after translation and then sedimented at either neutral or alkaline pH, crosslinks to the translocon (Sec61α and Sec61β) and to lipid are observed (Figure 7.3.3). Sedimentation at neutral pH that does not occur at alkaline pH indicates that a nascent peptide chain is associated at the membrane periphery and not yet integrated across it, as observed with the 61mer and 68mer in Figure 7.3.3a. These two nascent chains are also able to crosslink to the Sec61α of the translocon (marked with closed diamonds, lanes 14 and 22; see also Figure 7.3.3b). Chemical crosslinking to Sec61β appears with the 71mer, the 78mer, and the 85mer (closed circles). Membrane integration begins with the 71mer, with clear crosslinking to lipid (open diamonds; see also Figure 7.3.3c), while the 71mer does not crosslink to Sec61α. As the length increases, the lipid crosslinks remain, indicating the TM segment remains in the lipid bilayer. Figure 7.3.3b shows the results when the samples are immunoprecipitated with antibodies to Sec61α and verifies that the peptides of 61 and 68 residues are the only ones that remain in its close vicinity. Figure 7.3.3c shows what happens when the crosslinked 71mer is treated with phospholipase A and verifies that the faster-migrating band is the result of crosslinking to lipid. When all the data are plotted according to the length of the translation intermediate, they give a sequential picture of contacts, indicating that the nascent TM segment is first exposed to the cytosol, and then enters the translocon and inserts into the lipid bilayer; finally, the N terminus becomes glycosylated as the C terminus diffuses away from the translocon into the cytosol (see Figure 7.15). the bilayer and the SecDF heterodimer that facilitates is proposed to act as a hinge at the back side of the rec- protein translocation (see below). tangle, with the γ subunit wrapping around to hold them The first images of translocons from canine ER, yeast, together. Many of the helices are tilted up to 35° from the and E. coli were obtained by cryo-EM of two-dimensional bilayer normal, contributing to the funnel shape of the crystals and showed large structures, suggestive of dim- central cavity. Not all the helices span the bilayer fully, ers or larger assemblies. The first high-resolution struc- and in particular TM2a extends only halfway through ture of the translocon from Methanococcus jannaschii the membrane and is only partially hydrophobic. The shows a visible pore in the Sec heterotrimer, illustrating nonessential β subunit, a single TM at the edge of the that it can function as a unit. A popular hypothesis is complex, makes limited contact with the α subunit near that dimers of translocons associate when contacted by its N terminus, close to the C terminus of the γ subunit. a ribosome; however, a high-resolution EM of Sec61 in The γ subunit makes a band along two sides of the rec- complex with a ribosome with an active synthesis of a tangle. It consists of two helices: an amphipathic helix translocated peptide shows that a monomer is sufficient that lies on the cytoplasmic surface and a long, curved (Figure 7.17). helix that runs along the back side of the α subunit. The M. jannaschii structure can be superimposed on the EM structure of E. coli translocon (SecYE) at 8 Å resolution THE TRANSLOCON STRUCTURE with a good match of the TM α-helices (Figure 7.19). It also agrees very well with several bacterial translocons The first crystal structure of a translocon, SecYαβγ from that have been crystallized more recently. the archaea M. jannaschii, greatly advanced understand- This first high-resolution structure provided a ing of the structure of the heterotrimeric translocon and number of insights into the function of the translocon. suggested how it could move proteins both across the The channel is shaped like an hourglass, with funnel membrane and laterally into the bilayer. The α subunit shapes above and below its central constriction (see has ten TM helices positioned in the membrane to form Figure 7.18b). The opening of the constriction is <5 Å a sort of barrel with a rectangular shape when viewed in diameter and is lined by a ring of bulky hydrophobic from the cytosol and a pseudo-symmetry between two side chains. Crosslinking experiments that engineered halves formed by TM1–5 and TM6–10 (Figure 7.18). The cysteine residues in SecY showed the constriction is the loop between TM5 and TM6 connecting the two halves only region that bonded to a translocating polypeptide,

191 Protein folding and biogenesis

7.16 Crosslinking results indicate three stages in the biosynthesis of the SAI protein, a signal-anchor protein. Data were obtained as described in Box 7.3 and were quantitated to show the percentage of total chains that had each characteristic plotted (y-axis) as a function of the length of the truncated chain (x-axis). (Note: the top plot also gives a scale for the percentage sedimented at neutral pH, right axis.) The top graph shows crosslinking to Sec61, which requires a minimum of about 60 amino acids. The middle graph shows that over 70 residues are required before the nascent chain sediments at alkaline pH, at which length it is integrated into the membrane. At the same length, it can be crosslinked to lipid. The bottom graph shows that glycosylation occurs on chains longer than 85 residues, indicating they have reached the outside, while the proportion that is exposed to protease after translocation continues to grow. These stages are diagrammed along the side of the graphs. From Heinrich, S., et al., Cell. 2000, 102:233–244. © 2000 by Elsevier. Reprinted with permission from Elsevier.

TABLE 7.1 COMPONENTS OF THE PROTEIN TRANSLOCATION giving strong evidence that this is the pore. However, APPARATUS IN BACTERIA AND MAMMALIAN CELLSa the small diameter of the constriction means that other TM helices would have to shift to open the channel to Function Bacterial system: Mammalian 12–14 Å to allow passage of an extended peptide or an E. col system: ER α-helical peptide; this may happen dynamically as a pep- Translocon core SecYEG Sec61αβγ tide is passing through the channel to accommodate the Homologs SecY Sec61α peptide without a leak. TM2a is proposed to be a plug SecE Sec61γ that closes the channel. Indeed, when the TM2a segment SecG Sec61β in E. coli is locked by an engineered disulfide bond to the γ (SecE) subunit, the channel stays open. The plug Piloting complex Ffh + 4.5 S RNA SRP = 6 proteins + 7S RNA is displaced (but not enough to open the channel) in the structure of SecY with SecA present. Pilot receptor FtsY SRP receptor In a crystal structure of a complex of SecY and SecA + + Driving force SecA ATP; pmf Ribosome GTP from Thermotoga maritime with ADP plus beryllium a The bacterial system is represented by that of E. coli, and the fluoride present to mimic ATP, the SecA lies above the mammalian system is that found in the ER. plane of the membrane with its clamp just over the SecY Source: Based on Wickner, W., and R. Schekman, Science. 2005, channel (Figure 7.20). Also the two-helix finger of SecA 310:1452–1456, and earlier papers. is inserted into the cytoplasmic side of the channel in

192 The translocon structure

TM1,2 SecE 7 3 2 8 TM1 SecG 9 5 4 6 10

1

7.17 Cryo EM image of Sec61 translocon with a 7.19 Superposition of the x-ray structure of the M. eukaryotic ribosome actively translating a polypeptide jannaschii translocon on the cryo-EM structure of the E. chain. The surface representation of a eukaryotic ribosome coli translocon. The M. jannaschii x-ray structure (numbered (40S, yellow, 60S gray) docked on the translocon or helices, colored as in previous figure) was visually docked protein-conducting channel (PCC, multicolored TM helices onto the electron density map of the E. coli SecY complex in interior) is visualized by high-resolution cryo EM. The from cryo-EM (light blue). The labeled gray cylinders nascent chain (NC, green) extends from the P-tRNA represent TM helices in E. coli that have no correspondence (green) at the elongation site of the ribosome through the in M. jannaschii. The diamond indicates the axis of two-fold ribosome and through the translocon. The ribosome is symmetry in the E. coli complex. From van den Berg, B., interacting with a single Sec complex (rather than a dimer). et al., Nature. 2004, 427:36–44. © 2004. Reprinted by From Becker, T., et al., Science. 2009, 326:1369–1373. permission of Macmillan Publishers Ltd.

7.18 Crystal structure of the Methanococcus jannaschii SecY channel. (a) When the SecY channel is viewed from the cytosol, a pseudosymmetry is seen between TM 1–5 (blue) and TM 6–10 (red) of the α subunit (SecY in prokaryotes). The β subunit (SecG in prokaryotes, gray) is seen at the bottom, and the γ subunit (SecE in prokaryotes, beige) across the back and top. Across from the back is the suspected lateral gate, between TM2b and TM7. The pore ring consists of bulky nonpolar residues (transparent spheres with side chains, green). (b) In the view from the plane of the membrane, the hourglass shape shows the constriction at the pore ring. In addition, the plug helix (yellow) can be seen below the pore ring. From Park, E. and T. A. Rapoport, Ann Rev Biophys. 2012, 41:21–40.

193 Protein folding and biogenesis

B. A. NBD2 PPXD

SecA Two-helix finger HWD NBD1

SecE 2b 3 9 6 SecG 8 5 4 7

SecY Plug 7.20 X-ray structure of a complex of SecA and SecY from Thermotoga maritima. (A) A ribbon representation of the SecA–SecYEG complex (SecY, gray, SecE red, SecG, green) is viewed from the plane of the membrane (solid lines). The domain structure of SecA is highlighted, with two parts of the nucleotide-binding domain, NBD1 (blue) and NBD2 (cyan), the polypeptide-crosslinking domain (PPXD, yellow) and the helical wing domain (HWD, gray). (B) A ribbon representation of SecA with a space-filling representation of the translocon highlights the two-helix finger of SecA (red) that is inside the channel of the translocon (SecYEG, colored as in A) as well as the plug residues (orange). Ribbon representation highlights TM2b, TM8 and the tip of the 6–7 loop in SecY. From Zimmer, J., et al., Nature. 2008, 455:936–943.

SecY, in a position that might allow it to drag the peptide substrate into the pore.

TM Insertion

The x-ray structure of the M. jannaschii translocon also suggested how a TM segment of the nascent protein could be released into the lipid bilayer, with a hinge movement between TM5 and TM6 at the back that would open the front to allow the peptide access to the bilayer. An x-ray structure of the translocon from Pyro- coccus furiosus containing its two essential subunits, SecY and SecE, in a different crystallographic packing shows the same general architecture as the M. jannas- chii translocon except for a lateral exit portal opposite the hinge (Figure 7.21). The seal of the central ring of 7.21 X-ray structure of the translocon from Pyrococcus hydrophobic residues is compromised where a cavity furiosus with the lateral gate open. A ribbon representation large enough for a helix opens along TM7. This portal of the SecYE from P. furiosus is viewed from the cytoplasm. spans the membrane, providing a lateral gate for pep- The helices are numbered E1 and E2 from Sec E (red) and tide substrates to contact the lipid bilayer while the plug TM1–TM10 from SecY (rainbow coloring, with N terminus blocks the opening to the trans side of the membrane. If blue and C terminus red). The TM2a plug occludes the the peptide is sufficiently nonpolar, it will partition into channel in the center, and the lateral gate is open along the lipid as a TM helical segment. TM7 (direction of arrow) dividing the two halves of SecY. Insertion of bacterial membrane proteins can also From Egea, P. F., and R. M. Stroud, Proc Natl Acad Sci USA. utilize YidC along with or independent of the SecY 2010, 107:17182–17187.

194 The translocon structure translocon. YidC has homologs in mitochondria (Oxa1) P1(Head) and in chloroplasts (Alb3). YidC was discovered for its role in inserting small membrane proteins that were cat- Hinge egorized as Sec-independent proteins, such as the M13 phage coat protein. YidC is a polytopic membrane pro- P1(Base) tein with five TM helices and a periplasmic domain with P4 a β fold. Both YidC and the Sec translocon are required for the insertion of certain proteins, such as subunit a of the ATP synthase. While photo-crosslinking studies show these substrates interact with SecY before YidC, 2 8 the precise role of YidC is not known. Periplasm Yet another site for integration of membrane proteins is the SecDF complex that uses the proton gradient (or 3 12 9 in some bacteria, a sodium ion gradient) to enhance 7 protein export. Its role was clarified with studies using a TM1-6 1 TM7-12 6 substrate protein, proOmpA, blocked during transloca- 5 tion by an engineered disulfide bond creating a 59-resi- due loop. When the loop was released by reduction with 4 11 10 dithiothreitol, translocation via SecDF could resume in Cytoplasm the absence of ATP, driven by the proton motive force. The crystal structure of SecDF from Thermus ther- mophilus shows a 12-TM membrane domain that con- SecD region SecF region ducts protons, along with two periplasmic domains that 7.22 X-ray structure of SecDF from Thermus thermophilus. undergo conformational changes when they interact A ribbon representation of the SecDF is viewed from with substrate proteins (Figure 7.22). This heterodimer the plane of the membrane. In T. thermophilus SecDF is is a member of the RND family of transporters, with a composed of a single peptide chain, containing 12 TM proton channel similar to AcrB (see Chapter 11). It may helices that correspond to SecD (TM1–6, red) and SecF be associated with YajC, a small bitopic membrane pro- (TM7–12, blue) in other bacteria. In addition it has two tein whose function is unknown. major periplasmic domains, P1 (orange and green) and An additional participant in membrane protein inte- P4 (cyan). Two conformations of the periplasmic domains gration in eukaryotic cells is called TRAM (translocating are observed in different crystals, which are related by a chain-associated membrane protein). TRAM is required conformational movement at the hinge in P1 that moves for translocation of some proteins and could be recon- the P1 head over the P1 base. From Tsukazaki, T., et al., stituted with the Sec61 translocon and SRP to transport Nature. 2011, 474:235–238. them in liposomes. Data from crosslinking experiments of the sort described in Box 7.3 show that TRAM spe- cifically interacts with signal sequences during their pas- of the nascent proteins to bind sugars and/or lipid is a sage through the Sec61 channel and associates with TM critical part of the maturation process in the ER lumen segments as they leave the translocon and integrate into of eukaryotes. Clearly, the translocon is at the epicenter the lipid bilayer. TRAM may guide the integration of TM of an complex molecular machine that carries out many segments to provide the correct topology. dynamic processes. Numerous other proteins that play important roles Other translocons or protein export complexes are in the translocation of nascent polypeptides include much less well understood than the well-characterized enzymes and chaperones. Cleavable N-terminal signal Sec system. The cytoplasmic membrane of bacteria and sequences are removed by signal peptidase (also called the thylakoid membrane in plants have a poorly under- leader peptidase, Lep), which is anchored with its active stood system called the twin arginine translocation (TAT) site on the noncytoplasmic side of the membrane. An E. pathway that transports folded proteins across the mem- coli chaperone called trigger factor (TF) competes with brane. Precursor proteins are recognized by their cleav- the SRP for emerging signal peptides to sort nascent able signal sequences with twin arginine residues in the peptides between the SecB/A pathway and the SRP path- motif S/T-RRXFLK. The protein components TatA, TatB, way. Other E. coli chaperones in the periplasm, such as and TatC form homo- and hetero-oligomeric complexes, Skp and SurA, prevent aggregation of outer membrane with pore formation induced by the folded Tat precursor proteins secreted by the translocon. Also in the peri- proteins. The structure of the pore and the mechanism plasm, the Dsb system oxidizes cysteine residues to form of transport are unknown. disulfide bonds, and other enzymes make bonds to lipid Another interesting protein transport system is the or lipopolysaccharide. Similarly, covalent modification assembly complex for insertion of β-barrel proteins into

195 Protein folding and biogenesis

the outer membrane, called the BAM complex (previ- directly exposed to the bilayer. Interestingly, some amino ously Omp85). The complex in E. coli consists of a pore- acids give different values when they are positioned away forming protein BamA and four lipoproteins, BamB, from the center of the segment: notably, positions closer BamC, BamD, and BamE, of which BamA and BamD to the ends of the segment are more favorable than the are essential for outer membrane biogenesis. BamA has center for Trp and Tyr, as well as Lys and Asn. Struc- a large N-terminal periplasmic domain and a C-termi- tural data show a prevalence of Trp and Tyr in aromatic nal β-barrel domain in the membrane. The periplas- girdles of integral membrane proteins at the interfacial mic portion of BamA has five smaller domains called regions of the bilayer, while “snorkeling” of Lys and Arg POTRA (polypeptide-transport associated) domains that residues enables their basic groups to move to the more are thought to bind unfolded outer membrane protein polar interface (see Chapter 4). precursors in the periplasm: They recognize a motif in During the insertion of multispanning membrane pro- the C-terminal β-strand that ends with an obligatory teins, TM segments that alternate with hydrophilic seg- aromatic residue as the C terminus. A recent crystal ments have stop transfer sequences, which are regions structure shows BamB to be a β-propellar, but how it with the peptide sufficiently nonpolar to partition into the and the other lipoproteins interact with the BamA pore membrane. The presence of more than one stop transfer is not known. Nor is it understood how a β-barrel pore sequence in polytopic proteins allows each hydropho- can enable other β-barrels to fold and insert into the bic TM segment to be inserted sequentially as each one membrane. Similar systems are found in the outer mem- reaches the translocon. An engineered protein that con- branes of chloroplasts (Toc75) and mitochondria (the tains four copies of a stop transfer sequence inserts each SAM complex), where they are one of the systems called as a TM segment, provided the loops between them are translocators (transport complexes) in the specialized of sufficient length (Figure 7.24). However, some poly- mechanisms for protein import into chloroplasts and topic proteins appear to exhibit cooperation between mitochondria (see Box 7.4). adjacent TM segments, which implies more than one can be in the translocon simultaneously. Although the Biological Hydrophobicity Scale channels observed in high-resolution structures of the translocon are only wide enough for the passage of an The mechanism of integration of proteins into the mem- unfolded polypeptide or a single α-helix, it is possible brane by opening the lateral gate in the translocon signi- that alternate conformations of the translocon involving fies that when nascent TM segments are exposed to the dimers or higher oligomers allow more than one helical lipid as well as the aqueous channel of the translocon peptide segment to fit in the channel. This would explain they partition into the bilayer due to their hydropho- the evidence that some TM segments can influence the bicity. Thus the nonpolar character of these segments insertion of neighboring TM domains in some polytopic allows their prediction from the primary sequence based proteins (see below). on hydrophobicity scales. As shown in Table 6.3, tradi- tional biophysical scales for hydrophobicity give values Topogenesis in Membrane Proteins of the free energy for partitioning of each amino acid into organic solvents representing the lipid phase. In discussing the signals that determine the topology of A biological hydrophobicity scale encompassing the integral membrane proteins, which have been studied role of the translocon is derived from an ingenious set extensively in the mammalian ER and also in E. coli, it of experiments that determined the apparent free energy is useful to refer to the outside compartment (lumen or

(ΔGapp) of inserting each amino acid, as a part of a TM periplasm) as the exocytoplasmic side of the membrane. helix, into the cell membrane. The amino acid is placed Thus for bitopic membrane proteins, type I are oriented

in a test segment engineered to follow the two TM seg- Nexo/Ccyt and type II are Ncyt/Cexo. Multispanning or poly- ments of leader peptidase (see Figure 6.18) so that when topic proteins, called type III membrane proteins, may synthesized in the cell, it is either inserted into the mem- have an even or an odd number of TM segments and thus brane or exported past it; in the latter case, it is glyco- have their N and C termini either on the same side of the sylated, which allows quantification (Figure 7.23a). The membrane or on opposite sides with either orientation. test segment consists of leucine and alanine residues The first signal occurring in the sequence of a polytopic flanked by GGPG linkers, with the amino acid of interest protein is treated much as the signal of a bitopic pro- in the center position. tein, and insertion of the remaining TM segments can

As expected, the nonpolar amino acids have ΔGapp < 0, be assumed to follow in sequence, even though there are promoting membrane insertion, while the polar and other factors influencing insertion, as will be discussed.

charged residues have ΔGapp > 0 (Figure 7.23b). The bio- Signals that govern topogenesis are classified as (1) logical scale for membrane insertion correlates very well signal sequences, (2) signal-anchor sequences, and (3) with the White–Wimley scale derived earlier, supporting reverse signal-anchor sequences. The presence of one of the postulate that during translocation the segment is these sequences determines the orientation of the first

196 The translocon structure

BOX 7.4 IMPORT OF MITOCHONDRIAL PROTEINS Nearly all (98%) of the >1000 mitochondrial proteins are synthesized in the cytosol and imported into the mitochondria, destined either for the matrix, the intermembrane space, the inner membrane, or the outer membrane. The imported proteins are classified in two major groups: precursor proteins with cleavable amino-terminal signals, called presequences; and hydrophobic membrane proteins with targeting signals dispersed throughout their primary structures. In addition, the proteins of the outer membrane and the intermembrane space constitute special classes. All imported proteins are transported across the outer membrane of the mitochondria via the translocase of the outer membrane (TOM). In addition, the outer membrane has a complex for the assembly of β-barrel proteins into the outer membrane, after they have been taken into the intermembrane space via the TOM complex (Figure 7.4.1). This SAM complex is similar to the BAM complex in bacterial outer membranes. Two more translocators are located in the mitochondrial inner membrane: the TIM23 complex transports cleavable preproteins, while the TIM22 complex inserts hydrophobic proteins into the inner membrane (Figure 7.4.2). The components of these complexes are proteins that act as receptors, form pores, and/or exhibit chaperone functions and are well characterized in yeast. These complexes appear to accumulate at adhesion sites where the inner and outer membranes are 18–20 nm apart, which could allow protein complexes in the two membranes to cooperate during translocation.

The TOM Complex Translocation across the outer membrane is a passive process mediated by receptors (Tom20, Tom22, and Tom70) and a Tom40 complex. The presequences of cleavable precursor proteins are positively charged and can form amphipathic helices that fit into a hydrophobic groove of Tom20. They are transferred to Tom22 by its recognition of the opposite, charged surface of the helices. Tom70 is a receptor for hydrophobic precursors and forms oligomers upon binding them, which could help prevent aggregation. The structures of the soluble receptor domains of Tom70 and the related Tom71 have been determined by x-ray crystallography, and the structure of Tom20 in known

7.4.1 Biogenesis of mitochondrial outer membrane proteins. The two translocons in the mitochondrial outer membrane are the TOM complex and the SAM complex. The TOM complex is used by all mitochondrial proteins that are synthesized in the cytosol, including those destined for the outer membrane. The precursors of β-barrel proteins use the TOM complex to enter the intermembrane space, where they are chaperoned by the Small Tim proteins. Then they are folded and inserted into the outer membrane by the SAM complex, and perhaps assembled with the help of morphology maintenance proteins, such as Mdm10. From Becker, T., et al., Curr Op Cell Biol. 2009, 21:484–493.

(continued)

197 Protein folding and biogenesis

BOX 7.4 (continued)

7.4.2 Biogenesis of mitochondrial inner membrane proteins. After transport across the outer membrane via the TOM complex, inner membrane proteins use either the TIM22 complex (a) or the TIM23 complex (b). (a) The TIM22 complex is the carrier pathway used by metabolite carrier proteins that are generally very hydrophobic. They are chaperoned in the intermembrane space by the Small Tim proteins prior to insertion into the inner membrane. (b) The proteins with presequences that are destined for the inner membrane or for the matrix use the TIM23 complex, along with its multi-subunit motor, the PAM complex. From Becker, T., et al., Curr Op Cell Biol. 2009, 21:848–893.

from both NMR and x-ray studies. The main channel is formed by Tom40, a β-barrel with a pore of ∼20 Å diameter – large enough to accommodate two α-helical polypeptide segments but not a folded protein. In the membrane, Tom40 functions with the smaller proteins Tom5, Tom6, and Tom7, which may help stabilize several Tom40 dimers to form a large, dynamic complex containing several pores. Interestingly, in biogenesis the import of precursors to the Tom proteins utilizes the TOM apparatus, before they are inserted into the membrane by SAM. Assembly of β-barrel proteins into the mitochondrial membrane utilizes a highly conserved complex with a poorly understood mechanism. The last β-strand of these proteins contains a β-signal that is crucial for targeting them to the SAM complex. Sam50 (analogous to BamA in bacteria) has been shown to bind Tom40 (as a substrate) before it inserts into the membrane. Two peripheral proteins are involved in the SAM complex on the cytosolic side: Sam35 interacts with the β-signal sequence, and Sam37 helps release the substrate proteins. The small Tim proteins, which are soluble hexameric α-propellers composed of Tim9/10 or Tim8/13, chaperone the unfolded

198 The translocon structure

BOX 7.4 (continued) substrates in the intermembrane space. The morphology maintenance protein, Mdm10, associates with the SAM complex and appears to help Tom40 assemble with the other Tom proteins to form the TOM complex.

TIM Complexes The pathways for two other classes of imported proteins diverge after translocation via TOM (see Figure 7.4.2). Translocation of presequence-containing proteins across the inner membrane uses the TIM23 complex, which consists of Tim23, Tim17, Tim21, and Tim50 in the inner membrane. The TIM23 complex may be sufficient to insert single-spanning proteins. To insert polytopic proteins the TIM23 complex interacts with a peripheral motor, the PAM complex anchored by the Pam18 membrane protein. In the PAM complex, Tim44 provides a binding site for mitochondrial Hsp70 (mHsp70 or Ssc1p). The mitochondrial Hsp70 hydrolyzes ATP in repeated cycles of binding of the transported peptides in the matrix to prevent their return. It works with a nucleotide exchange factor, Mge1, that promotes ADP release. The structures of the soluble domains of Tim14 and Tim21, as well as the peripheral proteins Tim44 and Tim16 are known from x-ray crystallography. When reconstituted in liposomes, purified TIM23 forms a voltage-activated cation-selective channel that is activated by presequence peptides and a membrane potential (ΔΨ, negative inside), while inhibited by presequence peptides alone. While this suggests that ΔΨ is required for channel opening, translocation across the membrane also uses hydrolysis of ATP. The mitochondrial processing peptidase (MPP) cleaves the amino-terminal presequence in the matrix. TIM23 also interacts with complexes III and IV of the respiratory chain (see Chapter 13). The TIM22 complex is a translocator responsible for insertion of hydrophobic inner membrane proteins, including the metabolite carrier proteins such as the ATP–ADP carrier (described in Chapter 12), as well as the Tim proteins themselves. It utilizes the Tim9–Tim10 complex in the intermembrane space, the peripheral protein Tim12, and three integral membrane proteins, Tim22, Tim54, and Tim18, of which only Tim22 is essential for viability. Tim22 contains a double channel and uses the membrane potential to drive insertion; when purified and reconstituted, it forms a gated channel with two pore sizes, the larger of which allows insertion of tightly packed loops. The mitochondrial translocators are clearly fruitful areas for research.

TM segment of the protein (Figure 7.25). Insertion of Reverse signal-anchors are like signal-anchors except the later TM segments of polytopic membrane proteins that they create type I membrane proteins (d in Figure is initiated by stop transfer sequences, which have suffi- 7.25). They are differentiated from signal-anchors not cient length to span the membrane and can insert stably by obvious physical characteristics but by more sub- into the bilayer. tle factors that affect their function, allowing them to

Signal sequences have already been discussed for their insert with an Nexo/Ccyt orientation. The factors that influ- role in targeting the nascent chain to the membrane. ence the orientation of the signal-anchor sequence as it Not all signal sequences are cleaved by signal peptidase. emerges from the translocon are just being elucidated. Those that are cleaved are released and degraded by The topologies that result from the different signals signal peptide peptidase. After cleavage, which makes a indicate that a TM segment can insert into the bilayer new N terminus outside the membrane, the rest of the in either orientation and can thus translocate its C ter- peptide is fully exported as a secreted protein unless it is minus or its N terminus (Figure 7.26). Current evidence followed by a stop transfer sequence to create a TM seg- points to five factors that influence the orientation of sig- ment (see a and b of Figure 7.25). nals within the translocon. A stop transfer sequence that can insert stably in the membrane is called a signal-anchor sequence. Signal- (1) The charges flanking the hydrophobic sections anchor sequences remain uncleaved and have enough of signals greatly influence the resulting topol- nonpolar residues to span the hydrophobic domain ogy. As described in the last chapter, the posi- of the bilayer. When it is the only TM segment and is tive-inside rule, discovered in bacteria where not preceded by a cleavable signal sequence, a signal- positive residues are more abundant in cytoplas- anchor inserts to create a type II membrane protein mic loops than in periplasmic loops, holds for all (c in Figure 7.25). organisms. In eukaryotes the charge difference

199 Protein folding and biogenesis

A.

P2

G2 Translocated

H 7.23 Biological hydrophobicity scale G1 for insertion of TM segments. (A) A test G1 segment is engineered in leader peptidase Inserted ER lumen (Lep) constructs that will be either inserted into the membrane or translocated across it. The test segment, H, consists of a H TM1 TM2 flanking sequence GGPG, followed by 19 leucine residues, then the flanking Cytoplasm G2 sequence GPGG, with each amino acid to P1 P1 be tested inserted in its center. Two sites for N-glycosylation are also inserted, as P2 indicated by G1 and G2. Each construct was expressed in BHK cells that were immunoprecipitated with Lep antiserum after labeling with 35S-methionine. The B. 4.0 bands corresponding to singly and doubly 3.5 glycosylated proteins were quantified on 3.0 a phosphorimager after separation by SDS-PAGE. (B) The ΔGapp for membrane )

1 2.5

insertion of each amino acid placed in the 2.0 center of a TM segment is calculated from the apparent equilibrium constant of the 1.5 singly glycosylated and doubly glycosyated

(kcal mol 1.0 Lep molecules obtained with the contructs aa app

G 0.5 described in (a). The bar indicates the standard deviation in the ΔGapp for Ile; the 0.0 standard deviation for each of the others 0.5 is similar. Redrawn from Hessa, T., et al., 1.0 Nature. 2005, 433:377–381. © 2005. I L F V CMAWT YGSNHPQREKD Reprinted by permission of Macmillan Amino acid Publishers Ltd.

A. B. C N

N N C N C N C C 7.24 Functional topogenic determinants throughout the polypeptide. Downstream portions of polytopic membrane proteins may influence topology, depending on loop size. (A) Insertion of the bitopic ASGP receptor follows the positive-inside rule. However, in an engineered protein consisting of the four copies of its signal-anchor sequence with flanking charges, separated from each other by ∼100 residues, the TM segments insert sequentially in spite of their flanking charges. It appears that the long loops between TM segments allow insertion to override the positive-inside rule. (B) A natural protein with 12 TM segments with short loops between them, Glut1, inserts according to the positive-inside rule (wild type, in center). Perturbing its topogenic determinants by introduction of different charges disturbs the local topology but not the other TM segments (right and left). From Goder., V, and W. Spiess, FEBS Lett. 2001, 504:87–93. © 2001 by the Federation of European Biochemical Societies. Reprinted with permission from Elsevier.

200 The translocon structure

a b c d e f g 100 aa

C C N N N N N N exo N N

cyt N C N C C C C abcd e f g

Cleavable signal Signal-anchor Reverse signal-anchor 7.25 Three types of signals initiate topogenesis. Cleavable signal sequences (green, with arrowheads for cleavage sites) are translocated, leading the N terminus to the outside before they are cleaved. After cleavage by signal peptidase, they are digested by signal peptide peptidase outside the membrane (not shown). The protein is released (a) unless the cleaved signal is followed by another TM segment (b). Uncleaved signal-anchor sequences (red) induce translocation of the rest of the peptide while they remain TM (c). Reverse signal- anchors (blue) insert to become TM as the N terminus is translocated (d). Thus for bitopic

proteins, both a cleaved signal and a reverse signal-anchor produce the Nexo/Ccyt orientation

(b and d), while a signal-anchor produces the Ncyt/Cexo orientation (c). For polytopic proteins (e, f, and g), additional TM segments insert in alternating orientations (light red for

Ncyt/Cexo and light blue for Nexo/Ccyt). The sequences shown represent a secreted protein (a, prolactin), two type I membrane proteins (b, asialoglycoprotein receptor; d, cation- dependent mannose-6-phosphate receptor), a type II membrane protein (c, synaptotagmin I) and three type III proteins (e, gap junction protein; f, vasopressin receptor V2; g, glucagons receptor). From Higy, M., et al., Biochemistry. 2004, 43:12716–12722. © 2004 by American Chemical Society. Reprinted with permission from American Chemical Society.

between the flanking segments of a signal’s translocated, even if the flanking charges suggest hydrophobic core, described as Δ(N–C), results it should. Evidence for this comes from many in the more positive flanking sequence prefer- studies with fusions and truncations that attach ring the cytoplasmic side. Therefore if the signal or eliminate rapidly folding globular domains. can change directions while in the translocon, However, natural proteins that have extensive it will orient to allow the more positive flanking globular domains are probably protected from residues to stay on the more negative side of the folding in the cytoplasm by chaperones. membrane and perhaps to interact with residues (3) Another factor that can override the positive- of the translocon on the cytoplasmic side. inside rule is hydrophobicity, specifically the (2) Because a large globular folded domain will not length and composition of the nonpolar sequence get through the translocon, an N-terminal domain of the signal. When artificial signals are made up that folds into a large globular domain will not be of leucine residues, orientation is a function of

201 Protein folding and biogenesis

Positive charge difference (N–C)

Folding of N-terminal domain

Hydrophobicity N C Out ?

In N N C N

N

7.26 Several factors affect the orientation of signals in the translocon. Signals translocate either their C or their N terminus once they enter the translocon, depending on factors such as the difference in positive charges, the hydrophobicity, and the folding of the N-terminal domain. These factors are thought to affect the orientation of signal-anchor and reverse signal-anchor sequences as they engage with the translocon. The resulting orientation determines whether the C terminus or the N terminus emerges. From Goder, V., and W. Spiess, FEBS Lett. 2001, 504:87–93. © 2001 by the Federation of European Biochemical Societies. Reprinted with permission from Elsevier.

length: Leu7 drives mostly C-terminal transloca- to end up on the outside and prevent reorienta-

tion, resulting in Nin/Cout, while Leu22 and Leu25 tion of that segment. produce N-terminal translocation, resulting in

Nexo/Ccyt. When amino acids are ranked for their Several observations, such as the reverse orientation ability to promote N-terminal translocation, the of type II proteins, raise questions of how nascent chains best are the most hydrophobic, which also have emerge from the ribosome and from the translocon. At the highest propensity to form α-helices in non- what point do TM helices form? Proteolysis after pulse- polar environments. chase labeling, FRET results, and cryoEM images make (4) An additional factor for multispanning mem- it clear that helices form in the last ∼30 Å of the ribos- brane proteins derives from cooperation with the ome exit tunnel (Figure 7.27). Furthermore, a TM pep- rest of the molecule. Charge mutations designed tide sequence that is helical inside the exit tunnel stays to invert the topology of polytopic proteins can helical when it emerges if it is directly inserted into the instead affect a region of the protein without membrane. (In contrast, peptide sequences from soluble inverting it, as observed with the human glucose proteins have negligible helicity both inside and upon transporter, Glut1, a 12-TM helix bundle (Figure emerging from the exit tunnel.) 7.24). One or two hydrophobic segments can be Can two or more TM helices interact in the translocon? “frustrated” enough by the mutation to remain The topology of some polytopic proteins is dependent on outside the membrane, rather than perturbing the order of TM segments in the sequence, suggesting that the rest of the topology. The short loops between “strong” TM domains can help integrate adjacent “weak” many TM segments suggest they insert as helical TM domains. The interactions between strong and weak hairpins, preserving the expected topology, if the helices seem to take place in the translocon when the translocon accommodates two helices. two helices are close enough in the sequence to enter it (5) Finally, any glycosylated proteins have sites for together, because in experiments that prevent the interac- glycosylation on the outside. In the ER, an oli- tion by engineering a longer distance between them, the gosaccharyl transferase associates with the trans- “weak” helix no longer inserts into the bilayer. Therefore locon and glycosylates nascent chains as they mechanisms may exist that allow simultaneous inser- emerge. An engineered site that becomes glyco- tion of more than one helix from the translocon (Figure sylated will force that portion of the polypeptide 7.28). Such mechanisms are not ruled out by the small

202 Misfolding diseases

channels observed in x-ray structures of the translocon because they may utilize dynamic changes within the translocon or between interacting translocons. Finally, a number of polytopic proteins can insert into the membrane with alternative topologies, including the EmrE multidrug resistance protein (see Chapter 11), the prion protein, PrP, and the ion channel involved in cystic fibrosis, CFTR (see below). The mechanisms for their biogenesis are important subjects for research, especially for proteins whose altered topology results in disease.

MISFOLDING DISEASES

Since correct insertion of membrane proteins deter- mines their topology (and therefore their fold) and mis- takes cannot be remedied simply by refolding due to the barrier to reorienting TM segments across the bilayer, errors in this process can be costly. In an intriguing case, 7.27 Protein folding in the ribosome exit tunnel. A when the prion protein, PrP, is inserted with its N ter- profile of the exit tunnel of the eukaryotic ribosome was minus in the ER lumen, it does not result in disease; constructed based on cryo-EM maps of different ribosome- however, when it is inserted with its N terminus in the bound nascent chains that are observed with different cytoplasm, it can lead to neurodegeneration, apparently peptidyl tRNAs. Helix formation is observed in about 30 Å by a mechanism involving aggregation (Figure 7.29). In near the exit. From Fedyukina, D. V., and S. Cavagnero, Ann Rev Biophys. 2011, 40:337–359.

A.

Localization of the N terminus a b

B. N Integration of TM domain C C untranslocated PrP

C. C C ER lumen N N NtmPrP CtmPrP SecPrP 7.29 The three topologies of the prion protein, PrP. The TM segment of PrP can insert in the bilayer in both Ntm Ctm 7.28 Models for mechanisms of TM integration. (A) The orientations, Next/Ccyt ( PrP) and Ncyt/Cext ( PrP). In conventional model is for sequential integration of each addition, it can be secreted into the lumen in a soluble TM helix independently from the translocon into the lipid form (SecPrP), which can then become attached to a bilayer. (B) A second model allows TM helices to transfer GPI-anchor. There is evidence for a role of CtmPrP in the from the translocon as a pair. (C) It is possible that larger neurodegenerative scrapie diseases. Redrawn from Ott, groups of helices collect in the translocon and transfer C. M., and V. R. Lingappa, Biochemistry. 2004, 43:11973– together into the lipid environment. From Skach, W. R., Nat 11982. © 2004 by American Chemical Society. Reprinted Str Mol Biol. 2009, 16:606–611. with permission from American Chemical Society.

203 Protein folding and biogenesis

Oilgosaccharide chains of CFTR

A. Outside

Inside NH 3 NBD2 NBD1

Phe508 COO

R domain

B. CFTR (cystic fibrosis)

1 200

201 400

401 600 NBD1

601 800 Regulatory domain

801 1000

1001 1200

1201 1400 NBD2

1401 1481 7.30 The CFTR protein and cystic fibrosis disease. (A) The topology of the cystic fibrosis TM conductance regulator (CFTR) is shown with 12 TM helices and three cytoplasmic domains. Two of the cytoplasmic domains are nucleotide-binding domains, NBD1 and NBD2, and the third is a regulatory domain (R domain) in which the protein is phosphorylated by cAMP-dependent protein kinase. The position of the most common mutation in cystic fibrosis patients, Phe508, is indicated. Redrawn from Nelson, D. L., and M. M. Cox, Lehninger Principles of Biochemistry, 4th ed., W. H. Freeman, 2005, p. 403. © 2005 by W. H. Freeman and Company. Used with permission. (B) The locations (starred sites) of other mutations in the CFTR sequence that result in cystic fibrosis disease. The hatched bars below the sequence correspond to TM segments of the protein. From Sanders, C. R., and J. K. Myers, Annu Rev Biophys Biomol Struct. 2004, 33:25–51. © 2004 by Annual Reviews. Reprinted with permission from the Annual Review of Biophysics and Biomolecular Structure, www.annualreviews.org.

204 Misfolding diseases

A.

90

B. PMP 22 1 (Charcot- 161 Marie-Tooth)

Aquaporin 1 200 (diabetes insipidis) 201 272

Vasopressin receptor 1 200 (diabetes insipidis) 201 378

Rhodopsin (retinitis 1 200 pigmentosa) 201 349

Connexin 32 1 200 (Charcot- Marie-Tooth) 201 284

7.31 Sites in membrane proteins of mutations linked to diseases. The sites for which mutations have been linked to disease are well distributed throughout the sequences of numerous membrane proteins. (A) Ribbon diagram of the structure of rhodopsin showing the locations of missense mutations implicated in retinitis pigmentosa. The affected side chains (shown in black) are distributed all over the structure. (B) The sequences of rhodopsin and other membrane proteins linked to diseases, with the positions of mutations (asterisks) and of TM segments (hatched bars) shown. From Sanders, C. R., and J. K. Myers, Annu Rev Biophys Biomol Struct. 2004, 33:25–51. © 2004 by Annual Reviews. Reprinted with permission from the Annual Review of Biophysics and Biomolecular Structure, www.annualreviews.org.

205 Protein folding and biogenesis

general, damage can result from either accumulation of Engelman, D. M., and T. A. Steitz, The spontaneous misfolded proteins or from loss of needed functions of insertion of proteins into and across membranes: the affected proteins. the helical hairpin hypothesis. Cell. 1981, In humans the misfolding of membrane proteins is 23:411–422. known to cause several heritable diseases, such as cystic Gruner, S. M., Instrinsic curvature hypothesis for fibrosis and retinitis pigmentosa. Many of the 1000-point biomembrane lipid composition. Proc Natl Acad mutations causing cystic fibrosis result in the misfolding Sci USA. 1985, 82:3665–3669. of the cystic fibrosis TM conductance regulator (CFTR), Hartl, F. U., S. Lecker, E. Schiebel, J. P. Hendrick, and and even the wild type CFTR assembles with a low effi- W. Wickner, The binding cascade of SecB to SecA ciency under some conditions. Almost one-third of the to SecY/E mediates preprotein targeting to the E. 55 genes linked to disorders of the human retina encode coli plasma membrane. Cell. 1990, 63:269–279. integral membrane proteins, with many of the disorders Heinrich, S. U., W. Mothes, J. Brunner, and T. A. caused by missense mutations. Over 100 different rho- Rapoport, The Sec61p complex mediates the dopsin mutants produce retinitis pigmentosa. integration of a membrane protein by allowing Cystic fibrosis is a relatively common hereditary dis- lipid partitioning of the transmembrane domain. ease that results in severe obstruction of the respiratory Cell. 2000, 102:233–244. and gastrointestinal tracts of people with two defective Mitra, K., C. Schaffitzel, T. Shaikh, et al., Structure of copies of the gene for CFTR. The CFTR protein (Figure the E. coli protein-conducting channel bound to a 7.30a) functions as a Cl− ion channel that is activated by translating ribosome. Nature. 2005, 438: cAMP-protein kinase. Loss of the channel activity in epi- 318–324. thelial cells allows the lungs to be clogged with debris and Randall, L. L., Translocation of domains of nascent become subject to frequent bacterial infections that dam- periplasmic proteins across the cytoplasmic age the lungs, reduce respiratory efficiency, and shorten membrane is independent of elongation. Cell. the afflicted person's lifetime. In 70% of cystic fibrosis 33:231–240. cases, a deletion of Phe508 from CFTR results in misfold- Simon, S. M., and G. Blobel, A protein-conducting ing of the protein, which results in its degradation. In channel in the endoplasmic reticulum. Cell. 1991, addition, many other disease-related mutations are dis- 65:371–380. tributed throughout the protein sequence (Figure 7.30b). van den Berg, B., W. M. Clemons, I. Collinson, et al., Like CFTR, the mutations in rhodopsin that lead to X-ray structure of a protein-conducting channel. retinal disease are located in many different parts of Nature. 2004, 427:36–44. the protein structure (Figure 7.31a). From the variety Zimmer J, Y. Nam, and T. Rapoport, Structure of a of domains affected by mutations, it is evident that the complex of the ATPase SecA and the protein- mutations do not target an active site, or even a couple translocation channel. Nature. 2008, 455: of localized regions. In fact, this pattern of widely dis- 936–43. tributed disease-causing mutations is true in five other membrane proteins associated with diseases (Figure Selected Reviews 7.31b). It strongly suggests that these are misfolding Folding mutations, because a large variety of single changes in Booth, P. J., A successful change of circumstance: virtually any part of a protein are not expected to criti- a transition state for membrane protein folding. cally impair function. They could, instead be crucial to Curr Op Str Biol. 2012, 22:1–7. folding by tipping the delicate energetic balance between Bowie, J., Solving the membrane protein folding correct folding and misassembly. The high frequency of problem. Nature. 2005, 438:581–589. misfolding of these membrane proteins provides new Kleinschmidt, J. H., Folding kinetics of the outer targets for drug development, as ligands that stabilize membrane proteins OmpA and FomA into the native state could act as chemical chaperones that phospholipid bilayers. Chem Phys Lipids. 2006, rescue the misfolded membrane proteins. 141:30–47. Popot, J.-L., and D. M. Engelman, Helical membrane protein folding, stability and evolution. Annu Rev Biochem. 2000, 69:881–922. FOR FURTHER READING Sanders, C. R., and J. K. Myers, Disease-related Seminal Papers misassembly of membrane proteins. Annu Rev Egea, P. F., and R. M. Stroud, Lateral opening of a Biophys Biomol Struct. 2004, 33:25–51. translocon upon entry of protein suggests the White, S. H., and W. C. Wimley, Membrane protein mechanism of insertion into membranes. Proc folding and stability: physical principles. Annu Rev Natl Acad Sci USA. 2010, 107:17182–17187. Biophys Biomol Struct. 1999, 28:319–365.

206 For further reading

Biogenesis Bacterial Outer Membrane Biogenesis Park, E. and T. A. Rapoport, Mechanisms of Sec61/ Hagan, C. L., T. J. Silhavy, and D. Kahne, β-barrel SecY-mediated protein translocation across membrane protein assembly by the Bam complex. membranes. Ann Rev Biophys. 2012, 41:21–40. Ann Rev Biochem. 2011, 80:189–210. Calo, D., and J. Eichler, Crossing the membrane in Knowles, T. J., A. Scott-Tucker, M. Overduin, and I. R. archaea, the third domain of life. Biochim Biophys Henderson, Membrane protein architects: the Acta. 2011, 1808:885–891. role of the BAM complex in membrane protein Dalbey, R. E., P. Wang, and A. Kuhn, Assembly of assembly. Nat Rev Micro. 2009, 7:206–214. bacterial inner membrane proteins. Ann Rev Biochem. 2011, 80:161–187. Mitochondrial Biogenesis du Plesis, D. J. F., N. Nouwen, and A. Driessen, The Becker, T., M. Gebert, N. Pfanner, and M. van der Sec translocase. Biochim Biophys Acta. 2011, Laan, Biogenesis of mitochondrial membrane 1808:851–865. proteins. Curr Op Cell Biol. 2009, 21:848–893. Higy, M., T. Junne, and M. Spiess, Topogenesis of Chacinska, A., C. M. Koehler, D. Milenkovic, T. membrane proteins at the endoplasmic reticulum. Lithgow, and N. Pfanner, Importing mitochondrial Biochemistry. 2004, 43:12716–12722. proteins: machineries and mechanism. Cell. Sardis, M. F., and A. Economou, SecA: a tale of two 2009, 138:628–644. protomers. Molec Microbiol. 2010, 76:1070– Endo, T., K. Yamano, and S. Kawano, Structural 1081. insight into the mitochondrial protein import Skach, W. R., Cellular mechanisms of membrane system. Biochim Biophys Acta. 2011, 1808: protein folding. Nat Str Mol Biol. 2009, 16: 955–970. 606–611. White, S. H., and G. von Heijne, Transmembrane helices before, during and after insertion. Curr Opin Struct Biol. 2005, 15:378–386.

207