JBB2026 Fall 2018 Gil Privé

Protein Structure • peptide conformations and residue preferences • elements of secondary structure • supersecondary structure and motifs • packing of helices and sheets • chain topologies • internal packing • protein interfaces • membrane proteins • multimeric proteins • domain motions

The Machinery of Life David S. Goodsell http://mgl.scripps.edu/people/goodsell Figure 1. Transcription and RNA processing in the nucleus.

Figure 2. Transport through the nuclear pore.

Figure 3. Endoplasmic reticulum.

Figure 4. Transport from the endoplasmic reticulum.

Figure 5. Protein sorting in the Golgi. Plasma cell - IgG secretion Figure 6. Transport from the Golgi.

Figure 7. Transport of a vesicle through the cytoplasm. David Goodsell The Machinery of Life Figure 8. Export of proteins across the cell membrane. http://www.3dmoleculardesigns.com/Teacher-Resources/Tour-of-a-Human-Cell.htm

Eukaryotic cell panorama

1. Transcription and RNA processing in the nucleus. 2. Transport through the nuclear pore.

Biochemistry and Molecular Biology Education Yellow: DNA, proteins Volume 39, Issue 2, pages 91-101, 28 MAR 2011 DOI: 10.1002/bmb.20494 Pink: RNA, proteins http://onlinelibrary.wiley.com/doi/10.1002/bmb.20494/full#fig2 Blue: Cytoplasmic proteins http://www.3dmoleculardesigns.com/Teacher-Resources/Tour-of-a-Human-Cell.htm Purple: Ribosomes Green: Membranes,proteins 1. Transcription and RNA 2. Transport through the 3. Endoplasmic reticulum. processing in the nucleus. nuclear pore.

4. Transport from the endoplasmic reticulum. 5. Protein sorting in the golgi. 6. Transport from the golgi.

7. Transport of a vesicle 8. Export of proteins across the through the cytoplasm. cell membrane.

Tyr Thr Gly Cys Ile Ile Ala Gly The structure (conformer) defined by the dihedral angles main chain (φ,ψ) side chains (!1, !2, …)

φ =180 ; ψ=180

φ =-60 ; ψ=-45

φ ψ ω φ ψ

Note: unsaturated C-N bond length is 1.45 Å -Peptide bond has ~40% double bond character - dihedral is constrained. ω is constrained to ~180° (trans peptide - we will revisit this).

Bond lengths Bond angles Dihedrals

sp2-hybridized atoms Shorter 120° (flat) More restrained (trans) sp3-hybridized atoms longer 109° (tetrahedral, Less restrained often chiral) (gauche-, gauche+, trans) One way of categorizing the 20 amino acids - each amino acid has particular characteristics

Amino acid hydrophobicity Relevant amino acid properties

Size (number of atoms) Shape (torsion angles) Flexibility (how many degrees of freedom?) Charge (N , O ; pKa values) Polarity (electronic structure) Hydrophobicity Aromaticity (F, Y) Protein conformations

The conformation is the arrangement of the atoms in 3D space. The most stable conformation is at a potential energy minimum

Proper treatment is quantum mechanical - but this is intractable with proteins (too many atoms). We use Newtonian mechanics and describe the system as a set of potential energy terms, each with a particular form.

The overall potential energy can be broken down into a set of energy functions:

LOCAL bond length (1,2) (strong) bond angle (1,3) (strong) dihedral angle (1,4) (medium)

NON-LOCAL solvation / hydrophobic effect van der Waals (packing, steric clashes) electrostatics (incl. hydrogen bonds)

conformational entropy

These are additive: calculate the overall potential energy as the sum of these individual functions

LOCAL bond length (1,2)

bond angle (1,3)

dihedral angle (1,4)

NON-LOCAL van der Waals (packing, steric clashes) electrostatics (incl. hydrogen bonds) hydrophobic effect … E = f(bonds, angles, dihedrals, vdW, electrostatics, hydrophobic effect, ...) E = f(x,y,z)

Can describe the energy of the system from the positions of the atoms! (need to consider the structure of the entire system, including waters) The structure (conformer) defined by the dihedral angles main chain (φ,ψ) side chains (!1, !2, …)

The minimum energy conformer of the polymer is determined largely by the non-local interactions between the side chains.

D. Goodsell

Non-local molecular interactions

attractive or distance Type repulsive? dependance

Basis for van der Short Range Pauli exclusion Repulsive 1/r12 Waals radii of atoms * Electrostatic either; depends on Coulomb’s law 1/r (charge-charge) q1q2

either; depends of the involves polar Dipole-dipole Keesom interactions 1/r3 directions of the dipole molecules/groups moments polarization: Dipole-induced dipole Debye interactions attractive 1/r4 change in a dipole due to an external electronic field.

Charge-dipole attractive

London dispersive resonant induced Fluctuating dipoles attractive 1/r6 interactions dipoles * often treated as Hydrogen bond attractive electrostatic

similar to charge- Cation-pi attractive induced dipole

Hydrophobic effect

Loren Williams website http://ww2.chemistry.gatech.edu/~lw26/structure/molecular_interactions/mol_int.html *Lennard-Jones “6-12” potential Lennard-Jones “6-12” potential

ro: sum of the van der Waals radii of the two atoms

vdW contact H-bonded 1.5 1.6 1.5 1.6 1.0 1.0 1.0 1.0

O H N O H N

3.5 Å 2.6 Å r(oxygen) 1.5 Å r(hydrogen) 1.0 Å closer contact than sum of radii r(nitrogen) 1.6 Å http://ww2.chemistry.gatech.edu/~lw26/structure/molecular_interactions/mol_int.html

http://ww2.chemistry.gatech.edu/~lw26/structure/molecular_interactions/mol_int.html Hydrogen bonds - often high cooperative; zipper effect

Figure 25. Self assembly of biological macromolecules is driven by complementary hydrogen-bonding interactions. (Left) Base pairing between complementary hydrogen bond donors and acceptors on the sidechains of nucleic acids. (Center) Backbone assembly between self-complementary hydrogen bond donors and acceptors of the protein backbone to form anti-parallel β-strands in a β-sheet, and (Right) Self-complementary hydrogen bond donors and acceptors in carbohydrate, between glucose moieties within cellulose.

http://ww2.chemistry.gatech.edu/~lw26/structure/molecular_interactions/mol_int.html

Partial charges in peptide produces dipoles

bond dipoles

3.7 Debye (10-18 esu*cm)

peptide group dipoles These are additive and can produce a macroscopic C dipole (esp. in α-helices).

Alpha helix Hydrogen bond between carbonyl of residue i with amide-H of residue i + 4

i i+1 i+2 i+3 i+4 i+5 i+6 i+7

N

N C

Milner-White, Protein Science 6, 2477 (1997) The hydrophobic effect water molecules adjacent to a hydrocarbon molecule maintain their molecular interactions by sacrificing rotational and translational freedom.

fewer low entropy waters

Figure 32 shows how aggregation of hydrocarbon molecules causes the release of interfacial water molecules. Therefore the system gains entropy (positive TΔS) upon hydrocarbon aggregation. Release of low entropy interfacial water molecules into the bulk solution drives hydrocarbon aggregation. The bottom panel illustrates that there is more interfacial water on the left hand side of the equation than on the right hand side. http://ww2.chemistry.gatech.edu/~lw26/structure/molecular_interactions/mol_int.html

Dihedrals revisited

Main Chain Calculated energy surface “Classic” Ramachandran Plot (theoretical) Based on hard sphere potentials (sum off vdw radii; simple form of the L-J potential) Conformational energy of butane as a function of the central torsion angle Boltzman distribution: Populate according to energies 4 0 3 9 3 8 3 7 3 6 potentialenergy (kcal/mol) 35 0 90 180 270 360 e g+ t g- e Torsion angle (°) Population: 0% 15% 70% 15% 0%

Potential surfaces for side chain dihedral angles This defines the major rotamers for each amino acid.

E.g. χ 1 χ 2 plot for Phenylalanine

gauche-

χ 2 χ2 χ1

gauche+ φ ψ

χ 1 Why can’t Phe χ 2 be trans?

gauche- gauche+ trans Ch1 - Chi2 plots

φ ψ

omega

From: Introduction to Protein Structure (Branden and Tooze) Potential energy curve for the peptide omega dihedral angle :

Barriers for dihedral angle rotation can be attributed to the electronic structure of the amide bond (delocalized).

Energy difference between cis and trans can be attributed to:

-the exchange interaction of electrons in adjacent bonds

-repulsive interactions between overlapping bond orbitals

- steric clashes between atoms (Clash between groups 1 and 4 in the 1,4 bond disfavors cis).

Note : only two minima here- why?

Xaa-Pro is an exception (peptide bond preceding a proline) - lower barrier to interconversion - only ~2 kcal/mol energy difference between cis and trans omega bond - but slow interconversion (usu. needs to be catalyzed to equilibrate) - ~6% of Xaa-Pro have cis omega angles, otherwise v. rare (<0.5%) The only kind of of symmetric three dimensional structure for a linear polymer is a helix Helix: combination of a rotation and a translation (screw)

n: residues per d: rise per residue (Å)

(other parameters include pitch, twist, …)

In this example, n = 8 d

Snake toy

Features: • Fixed bond lengths and angles

• 8-fold torsional potential minima at φ = 0, 45, 90, 135, 180, 225, 270, 315, 360°

• Linear polymer of 11 units 811 = 8.6 X 109 conformers! (not all are accessible)

• Torsion dihedral is not colinear with the chain This makes it interesting… the toy would be pretty boring otherwise. But note that the angles are not the same as in a peptide Twist/step

45°

90°

180°

Name Frequency* φ (°) ψ (°) n d (Å) H-bonding

310 helix ~4% -74 -4 +3.0 2 i,i+3

α helix ~35% -57 -47 +3.6 1.5 i,i+4

αL helix - +57 +47 -3.6 1.5 i,i+4

Π helix - -57 -70 +4.3 1.1 i,i+5

Collagen (PP type II) Fibres -78 +149 -3.3 2.9 planar β-sheet (para) - -115 +115 2 3.2 twisted β- sheet parallel ~25% -120 +135 -2.3 3.3 interstrand twisted β- sheet antiparallel ~8 -139 +135 -2.3 3.3 interstrand

B-DNA - - 10 3.4 interchain *Crude estimate in globular proteins β

α

Three rules for secondary structure

1) Local “bonded” potentials must be minimized - bond lengths (1,2) - bond angles (1,3) - dihedrals (1,4) (Ramachandran) regular: all (phi,psi) the same

2) Satisfy main chain hydrogen bonding - Typically, >90% of the potential backbone hydrogen bond donors and acceptors are involved in hydrogen bonds

3) No unfavourable steric interactions - Ramachandran Observed (φ,ψ) distributions from over 500 high quality experimental structures (97,368 residues)

From: Lovell et al. Proteins 50, 437 (2003).

General and special cases ψ φ ψ φ

any residue preceding Proline Proline

Pre-Pro - φ is relatively normal Proline - φ “fixed” at -60° ψ is restricted to 90 - 180° ψ = -55° or 145°

Residue-specific Ramachandran plots C

Alpha helix Hydrogen bond between carbonyl of residue i with amide-H of residue i + 4

i i+1 i+2 i+3 i+4 i+5 i+6 i+7

N

i,i+3 i,i+4 i,i+5

n: residues per turn 3 α 10 R π d: rise per residue n,d: 3.0, 2.0Å 3.6, 1.5Å 4.3, 1.1Å Long helices - rarely straight.

smooth bends (e.g. tropomyosin - coiled coiled dimers) kinks waters often bridge i,i+4 H-bond. membrane proteins often amphipapthic - one face interacting with bulk solvent, one with protein core. lots of strains due to longer-range contacts the proteins (non-local effects). Transmembrane helices

60 Å radius of curvature: bending is not energetically expensive (< 2 kcal/mol for a 5-turn helix) Saposin A kink in alpha3

closed form open form (ligand bound) (apo)

Ahn et al. , PNAS (2003)

Y54 (n)

Ahn et al. , Protein Science (2006) Antiparallel

Parallel beta sheet • Parallel sheets •generally buried •Less twisted

•Antiparallel and mixed sheets •Generally, one side exposed •Can withstand greater distortions (twisting and beta-bulges)

A beta-bulge leads to higher twisting in a sheet There are ~8 types of hairpin turns

residue number

i i+1 i+2 i+3 turn type I’ (60, 30) (90, 0) II’ (60,-120) (-80, 0)

These definitions are approximate (+/- 30°)

Intrinsically disordered proteins (aka natively unfolded proteins, intrinsically unstructured, ...)

- no stable secondary or tertiary structure under physiological conditions

- dynamic

- abundant in eukaryotes, less in bacteria and archaea

- highly abundant in certain classes of protein (e.g. signaling proteins)

- often involved in protein-protein interactions (disorder-order transitions)

- often have lower sequence complexity

- typically rich in polar residues and disorder-promoting residues (R, K, E, Q, S, P, G)

- typically depleted of hydrophobic and aromatic residues (I, L, V, W, Y, F)

- structure ensembles not equivalent to chemically denatured proteins that are natively folded Natively unfolded proteins have low overall hydrophobicity and large net charge.

CH plot for ordered proteins (open circles) and natively unfolded proteins (grey).

Uversky et al. Proteins 41, 415 (2000).

Chain Topologies (tertiary structure)

Huge variety some are regular (e.g. TIM barrel; β-barrel), some are not.

Proteins often assembled from “domains”

“Never” see knots

Minimum size for stability ~60 amino acids if small - often stabilized by disulfides, co-factors, etc e.g. Zinc fingers

If extracellular, often have - S-S bonds. - glycosylation

Salt bridges are not very common Internal Packing of a folded protein

• Inside of a protein is packed as tightly as in an organic crystal - largely driven by the hydrophobic effect and van der Waals packing (also electrostatics, etc.)

• position of the side chain - the path of the main chain determines the Calpha-Cbeta vector

• side chain rotamers - coordinated - entropy effects - dihedral angles of the side chains are critical!

• Can think of a packed protein interior as a “3D jigsaw puzzle”

• small cavities can occur

Shape and Dynamics in self-assembled systems - detergent micelles vs. well-packed proteins

Amphiphiles are driven to self association by the hydrophobic effect. But the chains can't all point in since this would not produce a uniform packing density in the micelle (water is excluded, and nature abhors a vacuum). There is not one satisfactory packing arrangement. The micelle structure is highly dependent on the shape and size of the monomers.

The fast dynamics are due to the fact that no one packing arrangement is favored over another. The free energy profile has a very shallow minimum populated by many many states with low barriers to interconversion. This is unlike a stably folded protein with a single native structure (rigid internal packing) at a deep free energy minimum. John Holyoake, Régis Pomès Protein Taxonomies

•All alpha

•All beta b-sandwiches b-propellers b-helices b-barrels Ig fold ….

•Alpha/beta TIM barrel Rossmann fold •Alpha + beta …

Some of the projects that classify proteins: SCOP (Structural Classification of Proteins) http://scop.mrc-lmb.cam.ac.uk/scop

CATH (Class, Architecture, Topology and Homologous superfamily) http://www.biochem.ucl.ac.uk/bsm/cath

Classes > Folds > Superfamilies > Families Hierarchy of divergence according to evolutionary distance Sequence > Function > Structure.

% identity 100 Identical sequences ; identical function; identical structures.

Highly related sequences - high confidence that the two proteins have 75 similar structure and function.

50 Similar structure - probably similar function.

“twilight zone” : sequences with 25-35% sequence identity have a 50:50 chance of having similar structure. 25 In general, any two unrelated proteins (I.e. different sequence, structure and function) can be aligned to produce 10-25% identity).

But note that proteins with unrelated sequences may 0 have very similar structures!

(i.e. the two proteins may have diverged to the point that there is no detectable sequence signal - but the structures remain similar).

Pet peeve:

“Protein A and Protein B are 68% homologous.”

‘Homology’ has a well-defined meaning when referred to proteins: ‘two proteins are homologous if they have a common origin’.

It is not possible to associate the term to an adjective as low or high, or indicate a degree of homology with a number, as an example a percentage value.

The misuse of terms in scientific literature A. Marabotti and A. Facchiano, Bioinformatics (2010). Supersecondary Structure

• Simple assemblies of 2-3 secondary structure elements • Include turns (to re-orient the chain) • Modules - used to build up the 3° level folds • Generally not stable on their own • May or may not be folding intermediates

Used to build … Specialized function? β-hairpin antiparallel sheets α -hairpin helix bundles β - α - β parallel sheets, TIM greek key helix-turn-helix DNA binding EF hand Ca++ binding etc.

Much of the material in this section is from: Introduction to Protein Structure (Branden and Tooze)

βαβ motif β hairpin αα motif Four-helix bundle (up-down-up-down)

- one of the simplest folds - widely used as a protein design target - two consecutive α -α motifs - all helix-helix contacts are antiparallel

Helix-Helix associations

Can be intramolecular or intermolecular

Consider parallel and antiparallel here, but many other crossing angles occur. Parallel coiled-coil (e.g. GCN4 transcription factor; bzip)

g

d c

a f

e b

MKQLEDKVEELLSKNYHLENEVARLK abcdefgabcdefgabcdefgabcde

“Leucine zipper”: a positions: M, V, N*, V d positions: L, L, L, L

3.6 residues/turn ---> 100° /residue

Often talk about a heptad repeat - but 7x100° is 700° , or 1.94 turns of helix. (Need to go 18 residues before reaching an integral number of turns).

The “7-pointed star” in helical wheel projection assumes 102.8° turn /residue.

abcdefg/abcdefg/abcd straight helix axis supercoiled helix axis

d

a

Often see a “heptad repeat” in natural protein There are 3.6 residues/turn in an alpha helix, sequences. These almost always fold as a so an alpha-helix twists by 100°/residue. “coiled-coil”.

1 turn = 360°, and 360°/3.6 = 100° In a coiled-coil, the helix axis is itself a helix. But can plot as a helical wheel with a straight 5 turns x 3.6 residues/turn = 18 residues axis in which the twist (turn/residue) is 5 turns = 1800° (divisible by 360°) (2x360°)/7 = 102.8°

These are values for ideal helices (unbent). The local twist is still 100° NOT 102.8° Positions a and d are typically designated as the interface residues in a coiled-coil.

Many variations on the theme.

parallel antiparallel

dimer trimer tetramer pentamer etc.

Contacts are Knob into hole (or ridges in grooves)

Walshaw and Woolfson JMB (2001).

Apostolovic et al., Chem. Soc. Rev.,(2010). Coiled-coils are often used in structural proteins

Myosin tropomyosin (schematic)

1. Transcription and RNA 2. Transport through the 3. Endoplasmic reticulum. processing in the nucleus. nuclear pore.

4. Transport from the endoplasmic reticulum. 5. Protein sorting in the golgi. 6. Transport from the golgi.

7. Transport of a vesicle 8. Export of proteins across the through the cytoplasm. cell membrane. Supersecondary Structure

• Simple assemblies of 2-3 secondary structure elements • Include turns (to re-orient the chain) • Modules - used to build up the 3° level folds • Generally not stable on their own • May or may not be folding intermediates

Used to build … Specialized function? β-hairpin antiparallel sheets α -hairpin helix bundles β - α - β parallel sheets, TIM greek key helix-turn-helix DNA binding EF hand Ca++ binding etc.

Much of the material in this section is from: Introduction to Protein Structure (Branden and Tooze)

βαβ motif β hairpin αα motif Four-helix bundle (up-down-up-down)

- one of the simplest folds - widely used as a protein design target - two consecutive α -α motifs - all helix-helix contacts are antiparallel

β−α−β-motif

TIM barrel

•parallel beta-strands connected by longer regions containing alpha-helical segments

•almost always has a right- handed fold Two examples of how the beta-alpha-beta motif can be used to build up tertiary structure.

Triosephosphate isomerase (TIM) Nucleotide binding domain (Rossmann fold)

Connections between adjacent β-sheets

anti-parallel parallel: right handed (almost always)

the “returning” connection is an α-helix in the β−α−β motif

parallel: left handed connection (very, very rare) β-hairpin

Bovine Pancreatic Trypsin Inhibitor Snake Venom toxin (BPTI)

Greek Key

Staphyloccocus nuclease

Sheet facts • Repeat distance is 7.0 Å • R group on the Amino acids alternate up-down-up above and below the plane of the sheet • 2 - 15 amino acids residues long • 2 - 15 strands per sheet • Average of 6 strands with a width of 25 Å • parallel less stable than anti-parallel • “always” twisted

Thioredoxin Domain from Aspartate Flavodoxin transcarbamylase plastocyanin

The jelly roll fold Repetitive structures are common

β-helix (1M8N)

LRR (Leucine-Rich Repeats) Beta-loop-alpha E.g. Ribonuclease inhibitor β-propeller (1ERJ) N C

In these cases, each repeat is 22-28 residues long (the individual repeats do not fold on their own)

Many proteins are modular and are made up of smaller domains

Examples of protein domain databases: Interpro: www.ebi.ac.uk/interpro Pfam: www.sanger.ac.uk/resources/databases/Pfam SMART: http://smart.embl-heidelberg.de Prosite: www.expasy.org/prosite Mutidomain protein: Src family tyrosine kinase Hck F. Sicheri, J. Kuriyan

Ig-like domain

alpha/beta domain Ig-like domain

three-helix bundle

“monolithic” fold: Golgi Mannosidase II ( D. Rose) 1045 residue monomer. Quaternary structure / Protein interfaces

• Most proteins are oligomers

• Common 3° structure does not imply common 4° structure

• Most oligomeric proteins are not stable as monomers

- think about the free energy of the monomer

- can we describe the single chain folding of a protein with 4° structure purely at the 3° level?

• Subunit interfaces are often used as functional hotspots

• protein-protein interactions are a type of quaternary structure.

Quaternary structure: More than 80% of proteins are oligomeric

45.0

40.0 heterooligomers 35.0 homooligomers 30.0

25.0

20.0

15.0

10.0

5.0

0.0 1 2 3 4 5 6 7 8 9 10 11 12 <12

oligomeric state

Goodsell and Olson Ann. Rev. Biomol. Struc.29, 105 (2000) The glycolytic enzymes PDB “molecule of the month” Feb 2004.

http://www.rcsb.org

Interfaces in oligomeric proteins “geometric and electrostatic complementarity”

• Generally multiple, weak contacts (large surface area). • Shape complementarity (VdW) • Additional stabilization: disulfides, metals, cofactors, ... • Huge variety - cannot make many generalizations • Range of affinities/exchange rates: “stable” (low Kd, generally has a hydrophobic character) vs. “transient” or “exchangeable” (higher Kd, usually more hydrophilic intermolecular surface).

• Individual exchange rates (kon, koff) do not necessarily correlate with affinity (Kd)

(but Kd is given by their ratio koff/kon)

• Stable: usually shielding many hydrophobic groups - Net result is a very strong association. (i.e. these proteins typically exists exclusively in the oligomeric form) These interfaces are often indistinguishable from the interior of proteins

• Transient or exchangeable: usually more polar, often contain bridging waters buried surface typically : 600-2000 Å**2 , polar, 0-10 H-bonds, bridging waters Advantages of oligomeric proteins

Evolutionary advantage - Symmetric oligomers are more “fit” Functional Genetic Physicochemical Large size Larger proteins more stable against denaturation (smaller surf/vol) More efficient use of intracellular water Bigger is better - but limitations of protein synthesis Oligomeric Error control Coding efficiency Symmetrical Stability of association Multivalent binding / allostery Self assembly

Goodsell and Olson Ann. Rev. Biomol. Struc.29, 105 (2000)

Examples of Point Group symmetries

Chiral point groups in 3D - the operation does not change the object - rotations only (no reflections or inversions) - rotations axes intersect at a point - results in repeated, equivalent surfaces between chains - generates “closed” structures

Cyclic (Cn) n=2,3,4,... rotation about a single axis

Dihedral (Dn) rotational symmetry about one axis, and C2 rotations perpendicular to the n-fold axis (results in n Cn axes)

Cubic T: tetrahedral (C2 and C3 axes) O: Octahedral (C4, C3 and C2 axes) I: Icosahedral (C5, C3 and C2 axes) Line Group symmetry (helical symmetry) • Combination of rotational and translational (1D) symmetries Leads to fibrous structures.

• Combine rotation with translation along axis (screw rotation) • Produces supermolecular helices e.g. Actin, microtubules, flagella, …

• Often used in structural/architectural roles

• “open” structures - how to terminate the chain?

need to cap the ends

Keratin

• rare (biologically): plane group (2 translations), space group (3 translations)

Generation of Oligomers by 3D Domain Swapping

mechanism for the formation of new interfaces results in higher order quaternary structures requires only small changes in a “hinge loop” Usually involves a C or N terminus Domain swapping

Liu and Eisenberg, Protein Science 11, 1285 (2002).

2:2 complex of the BCL6 BTB domain with the SMRT BBD peptide

Ahmad et al. Mol Cell (2003). Membrane Proteins

Some general rules for understanding membrane protein structure:

- Satisfy the main-chain hydrogen bonds - transmembrane alpha-helices - beta barrels - No unsatisfied H bonding groups exposed to lipid

- Match the hydrophobic properties of the side chains and lipids - align protein surfaces with the hydrocarbon chains

Exceptions occur!

Most membrane proteins can be classified into one of two structural classes Very few exceptions known so far, but there are many variations to the theme. Proteins are complicated! Alpha-helical bundles Beta-barrels

• Single or Multiple “passes” through the • antiparallel beta-sheet membrane • the “last”” strand is H-bonded to the • Bundles with ~ aligned helices “first” strand to close the barrel. • Commonly find cofactors This solves the “edge” problem and satisfies all main chain H bonding. Membrane proteins

• Low dielectric constant within the bilayer • No water in the middle of the bilayer • Main chain fully H-bonded (2° structure) - α-helices: local (i, i+4) H-bonding - β-barrels: H-bonding between widely separated parts of the chain

• TM region: Simpler topologies (3° structure) than soluble proteins α-helix bundles ; β-barrels • Side chains point outward from helices or barrels • energetics and folding pathways are very different than for soluble proteins • Many MPs have both a membrane domain and a soluble domain

The membrane presents a very complex environment - and there are many types of membranes!

Bulk solvent

15 Å Headgroup region

30 Å Hydrocarbon region

15 Å Headgroup region

Bulk solvent Contributions to the free energy of folding for membrane proteins are different (and often more complex) than for soluble proteins

• The “cost” of unformed H-bond in a membrane is very high • No competing water in the bilayer • MPs largely limited to TM helices and beta-barrels. Simpler topologies • The dielectric constant changes from 2 - 4 within the bilayer to ~80 on the outside surface. • charged and H-bonding groups are driven away from the lipid phase. • There is essentially no water in the middle of the bilayer - no “hydrophobic force” to drive 3° structure! - Helices pack in a hydrophobic environment What holds them together?

MP structure is not driven by the effects of water - van der Waals (packing) term is important

Composition of TM helices

• Rich in hydrophobic amino acids (Leu, Val, ...).

• Trp and Phe often observed at the interfacial region.

• Glycine, proline and cysteine are fairly common. But “no” disulfide bonds

• Sometimes find polar residues (H-bonding between TM helices)

• Salt bridges can occur.

• Channels, pores, transporters, etc. often have polar interiors.

• Polar amino acids in a TM helix often point in. Functionally important. Residue distributions in alpha helices

6% 24% charged charged

W D W DE F K H R 4 1 5 E F 3 1 1 1 2 L 6 8 1 G 9 8 K 6 N 33% I H 3 Q 6 2 L 2 16 phobic Y R 3 5 A M 9 2 G V 7 6 40% I S C N 10 7 polar 2 5 54% P Q 5 4 phobic Y T T A 3 M 7 6 S 4 P 7 V C 4 7 8 2 41% polar

Transmembrane (TM) domains in soluble proteins membrane proteins

F,L,I make up over 1/3 of the residues in TM helices

Example of an α-helical membrane protein: ABC transporter

α-helical membrane proteins occur in: - eukaryotes: plasma membrane, most organelle membranes, inner membrane of the mitochondria - bacteria: cell membrane (Gram positive); inner membrane (Gram negative) Rules are sometimes broken!

The pink helix goes only halfway through the TM region!

GlpF - Glycerol facilitator KcsA - potassium channel allows diffusion of glycerol - “reentrant loop” forms the selectivity filter

some beta-barrel structures:

n=3x4=12

n=8 n=22 n=12 n=16

Present in: - outer membrane of Gram -ve bacteria - outer membrane of mitochondria