<<

JBB2026 Fall 2012 Lectures 2 & 3 -- Gil Privé

Protein Structure • conformations and residue preferences • elements of secondary structure • and motifs • packing of helices and sheets • chain topologies • internal packing • interfaces • membrane • multimeric proteins • domain motions 1

The Machinery of Life David S. Goodsell http://mgl.scripps.edu/people/goodsell

2 Tyr Thr Gly Cys Ile Ile Ala Gly 3

φ =180 ; ψ=180

φ =-60 ; ψ=-45

4 Note: unsaturated C-N bond length is 1.45 Å -Peptide bond has ~40% double bond character - dihedral is constrained.

Bond lengths Bond angles Dihedrals

sp2-hybridized atoms Shorter 120° (flat) More restrained (trans) sp3-hybridized atoms longer 109° (tetrahedral, Less restrained often chiral) (gauche-, gauche+, trans) 5

Crambin

TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

1CRN

6 partial charges in peptide produces dipoles

These are additive and can produce a macroscopic dipole (esp. in α-helices).

Figure from Branden and Tooze Intro to 7

One way of categorizing the 20 amino acids - each has particular characteristics

8 Amino acid hydrophobicity Relevant amino acid properties

Size (number of atoms) Shape (torsion angles) Flexibility (how many degrees of freedom?) Charge (N , O ; pKa values) Polarity (electronic structure) Hydrophobicity Aromaticity (F, Y)

9

Protein conformations

The conformation is the arrangement of the atoms in 3D space. The most stable conformation is at a potential energy minimum

Proper treatment is quantum mechanical - but this is intractable with proteins (too many atoms). We use Newtonian mechanics and describe the system as a set of potential energy terms, each with a particular form.

The overall potential energy can be broken down into several energy functions:

LOCAL bond length (1,2) (strong) bond angle (1,3) (strong) dihedral angle (1,4) (medium)

NON-LOCAL solvation / hydrophobic effect van der Waals (packing, steric clashes) electrostatics (incl. hydrogen bonds)

conformational entropy

These are additive- calculate the overall potential energy as the sum of these individual functions 10 LOCAL bond length (1,2)

bond angle (1,3)

dihedral angle (1,4)

NON-LOCAL van der Waals (packing, steric clashes) electrostatics (incl. hydrogen bonds) hydrophobic effect

E = f(bonds, angles, dihedrals, vdW, electrostatics, H) E = f(x,y,z)

Can describe the energy of the system from the positions of the atoms! (need to consider the structure of the entire system, including waters) 11

H φ ψ ω φ ψ ω H φ ψ H

12 “Classic” Ramachandran Plot Calculated energy surface Based on hard sphere potentials - simple form of the van der Waals potential. (theoretical)

13

Conformational energy of butane as a function of the central torsion angle Boltzman distribution: Populate according to energies 40 39 38 37 36 potential energy (kcal/mol) energy potential 35 0 90 180 270 360 e g+ t g- e Torsion angle (°) Population: 0% 15% 70% 15% 0% 14 Potential surfaces for side chain dihedral angles This defines the major rotamers for each amino acid.

E.g. χ 1 χ 2 plot for Phenylalanine

gauche-

χ 2 χ2 χ1 gauche+ φ ψ

χ 1 Why can’t Phe χ 2 be trans?

gauche- gauche+ trans 15

Ch1 - Chi2 plots

16 φ ψ

omega

From: Introduction to Protein Structure (Branden and Tooze) 17

Potential energy curve for the peptide omega dihedral angle :

Barriers for dihedral angle rotation can be attributed to the electronic structure of the amide bond (delocalized).

Energy difference between cis and trans can be attributed to:

-the exchange interaction of electrons in adjacent bonds

-repulsive interactions between overlapping bond orbitals

- steric clashes between atoms (Clash between groups 1 and 4 in the 1,4 bond disfavors cis).

Note : only two minima here- why?

18 Xaa-Pro is an exception (peptide bond preceding a ) - lower barrier to interconversion - only ~2 kcal/mol energy difference between cis and trans omega bond - ~6% of Xaa-Pro have cis omega angles, otherwise v. rare (<0.5%) 19

The only kind of of symmetric three dimensional structure for a linear polymer is a helix Helix: combination of a rotation and a translation (screw)

n: residues per d: rise per residue (Å)

(other parameters include pitch, twist, …)

In this example, n = 8

d

20 n: residues per turn d: rise per residue

Cantor and Schimmel Biophysical Chemistry 21

Snake toy

Features: • Fixed bond lengths and angles

• 8-fold torsional potential minima at φ = 0, 45, 90, 135, 180, 225, 270, 315, 360°

• Linear polymer of 11 units 811 = 8.6 X 109 conformers! (not all are accessible)

• Torsion dihedral is not colinear with the chain This makes it interesting… the toy would be pretty boring otherwise. But note that the angles are not the same as in a peptide

22 Twist/step

45°

90°

180°

23

Name Frequency* φ (°) ψ (°) n d (Å) H-bonding

310 helix ~4% -74 -4 +3.0 2.0 i,i+3

α helix ~35% -57 -47 +3.6 1.5 i,i+4

αL helix - +57 +47 -3.6 1.5 i,i+4

Π helix - -57 -70 +4.3 1.1 i,i+5

Collagen (PP type II) Fibres -78 +149 -3.3 2.9 planar β-sheet (para) - -115 +115 2.0 3.2 twisted β- sheet parallel ~25% -120 +135 -2.3 3.3 interstrand twisted β- sheet antiparallel ~8 -139 +135 -2.3 3.3 interstrand

B-DNA - - 10 3.4 interchain *Crude estimate in globular proteins 24 β

α

25

Three rules for secondary structure

1) Local “bonded” potentials must be minimized - bond lengths (1,2) - bond angles (1,3) - dihedrals (1,4) (Ramachandran) regular: all (phi,psi) the same

2) Satisfy main chain hydrogen bonding - Typically, >90% of the potential backbone donors and acceptors are involved in hydrogen bonds

3) No unfavourable steric interactions - Ramachandran

26 Observed (φ,ψ) distributions from over 500 high quality experimental structures (97,368 residues)

From: Lovell et al. Proteins 50, 437 (2003). 27

General and special cases

28 ψ φ ψ φ

any residue preceding Proline Proline

Pre-Pro - φ is relatively normal Proline - φ “fixed” at -60° ψ is restricted to 90 - 180° ψ = -55° or 145°

29

Residue-specific Ramachandran plots

30 C

Alpha helix Hydrogen bond between carbonyl of residue i with amide-H of residue i + 4

i i+1 i+2 i+3 i+4 i+5 i+6 i+7

N 31

i,i+3 i,i+4 i,i+5

n: residues per turn 3 α π 10 R d: rise per residue n,d: 3.0, 2.0Å 3.6, 1.5Å 4.3, 1.1Å 32 33

Long helices - rarely straight.

smooth bends (e.g. tropomyosin - coiled coiled dimers) kinks waters often bridge i,i+4 H-bond. membrane proteins often amphipapthic - one face interacting with bulk solvent, one with protein core. lots of strains due to longer-range contacts the proteins (non-local effects). Transmembrane helices

60 Å radius of curvature: bending is not energetically expensive (< 2 kcal/mol for a 5-turn helix)

34 Saposin A kink in alpha3

closed form open form (ligand bound) (apo)

35

Y54 (n)

36 Beta-sheets

Parallel

Antiparallel

37

Parallel

38 Antiparallel beta sheet

39

• Parallel sheets •generally buried •Less twisted

•Antiparallel and mixed sheets •Generally, one side exposed •Can withstand greater distortions (twisting and beta-bulges)

40 A beta-bulge leads to higher twisting in a sheet 41

There are ~8 types of turns

residue number i i+1 i+2 i+3 turn type Iʼ (60, 30) (90, 0) IIʼ (60,-120) (-80, 0)

These definitions are approximate (+/- 30°) 42 Intrinsically disordered proteins (aka natively unfolded proteins, intrinsically unstructured, ...)

- no stable secondary or tertiary structure under physiological conditions

- dynamic

- abundant in eukaryotes, less in bacteria and archaea

- highly abundant in certain classes of protein (e.g. signaling proteins)

- often involved in protein-protein interactions (disorder-order transitions)

- often have lower sequence complexity

- typically rich in polar residues and disorder-promoting residues (R, K, E, Q, S, P, G)

- typically depleted of hydrophobic and aromatic residues (I, L, V, W, Y, F)

- structure ensembles not equivalent to chemically denatured proteins that are natively folded

43

Natively unfolded proteins have low overall hydrophobicity and large net charge.

CH plot for ordered proteins (open circles) and natively unfolded proteins (grey).

Uversky et al. Proteins 41, 415 (2000).

44 Chain Topologies (tertiary structure)

Huge variety some are regular (e.g. TIM barrel; β-barrel), some are not.

Proteins often assembled from “domains”

“Never” see knots

Minimum size for stability ~60 amino acids if small - often stabilized by disulfides, co-factors, etc e.g. Zinc fingers

If extracellular, often have - S-S bonds. - glycosylation

Salt bridges are not very common

45

Internal Packing of a folded protein

• Inside of a protein is packed as tightly as in an organic crystal - largely driven by the hydrophobic effect and van der Waals packing (also electrostatics, etc.)

• position of the side chain - the path of the main chain determines the Calpha-Cbeta vector

• side chain rotamers - coordinated - entropy effects - dihedral angles of the side chains are critical!

• Can think of a packed protein interior as a “3D jigsaw puzzle”

• small cavities can occur

46 Shape and Dynamics in self-assembled systems - detergent micelles vs. well-packed proteins

Amphiphiles are driven to self association by the hydrophobic effect. But the chains can't all point in since this would not produce a uniform packing density in the micelle (water is excluded, and nature abhors a vacuum). There is not one satisfactory packing arrangement. The micelle structure is highly dependent on the shape and size of the monomers.

The fast dynamics are due to the fact that no one packing arrangement is favored over another. The free energy profile has a very shallow minimum populated by many many states with low barriers to interconversion. This is unlike a stably folded protein with a single native structure (rigid internal packing) at a deep free energy minimum. John Holyoake, Régis Pomès 47

Protein Taxonomies •All alpha

•All beta b-sandwiches b-propellers b-helices b-barrels Ig fold ….

•Alpha/beta TIM barrel •Alpha + beta …

Some of the projects that classify proteins: SCOP (Structural Classification of Proteins) http://scop.mrc-lmb.cam.ac.uk/scop

CATH (Class, Architecture, Topology and Homologous superfamily) http://www.biochem.ucl.ac.uk/bsm/cath 48 Classes > Folds > Superfamilies > Families

49

Hierarchy of divergence according to evolutionary distance Sequence > Function > Structure.

% identity 100 Identical sequences ; identical function; identical structures.

Highly related sequences - high confidence that the two proteins have 75 similar structure and function.

Similar structure - probably similar function. 50

“twilight zone” : sequences with 25-35% sequence identity have a 50:50 chance of having similar structure. 25 In general, any two unrelated proteins (I.e. different sequence, structure and function) can be aligned to produce 10-25% identity).

But note that proteins with unrelated sequences may 0 have very similar structures!

(i.e. the two proteins may have diverged to the point that there is no detectable sequence signal - but the structures remain similar). 50 Pet peeve:

“Protein A and Protein B are 68% homologous.”

ʻHomologyʼ has a well-defined meaning when referred to proteins: ʻtwo homologous proteins have a common originʼ.

It is not possible to associate the term to an adjective as low or high, or indicate a degree of homology with a number, as an example a percentage value.

The misuse of terms in scientific literature A. Marabotti and A. Facchiano, Bioinformatics (2010).

51

Supersecondary Structure

• Simple assemblies of 2-3 secondary structure elements • Include turns (to re-orient the chain) • Modules - used to build up the 3° level folds • Generally not stable on their own • May or may not be folding intermediates

Used to build … Specialized function? β-hairpin antiparallel sheets α -hairpin helix bundles β - α - β parallel sheets, TIM greek key helix-turn-helix DNA binding EF hand Ca++ binding etc.

Much of the material in this section is from: Introduction to Protein Structure (Branden and Tooze) 52 βαβ motif β hairpin αα motif

53

Four- (up-down-up-down)

- one of the simplest folds - widely used as a target - two consecutive α -α motifs - all helix-helix contacts are antiparallel

54 Parallel coiled-coil (e.g. GCN4 transcription factor; bzip)

g

d c

a f

e b

MKQLEDKVEELLSKNYHLENEVARLK abcdefgabcdefgabcdefgabcde

“Leucine zipper”: a positions: M, V, N*, V d positions: L, L, L, L! 55

3.6 residues/turn ---> 100° /residue

Often talk about a heptad repeat - but 7x100° is 700° , or 1.94 turns of helix. (Need to go 18 residues before reaching an integral number of turns).

The “7-pointed star” in helical wheel projection assumes 102.8° turn /residue.

abcdefg/abcdefg/abcd

56 straight helix axis supercoiled helix axis

d

a

Often see a “heptad repeat” in natural protein There are 3.6 residues/turn in an , sequences. These almost always fold as a so an alpha-helix twists by 100°/residue. “coiled-coil”.

1 turn = 360°, and 360°/3.6 = 100° In a coiled-coil, the helix axis is itself a helix. But can plot as a helical wheel with a straight 5 turns x 3.6 residues/turn = 18 residues axis in which the twist (turn/residue) is 5 turns = 1800° (divisible by 360°) (2x360°)/7 = 102.8°

These are values for ideal helices (unbent). The local twist is still 100° NOT 102.8° Positions a and d are typically designated as the interface residues in a coiled-coil. 57

Ho et al., PNAS (2008). 58 59

Many variations on the theme.

parallel antiparallel

dimer trimer tetramer pentamer etc.

Contacts are Knob into hole (or ridges in grooves)

Walshaw and Woolfson JMB (2001).

Apostolovic et al., Chem. Soc. Rev.,(2010). 60 Coiled-coils are often used in structural proteins

Myosin tropomyosin (schematic) 61

β−α−β-motif

TIM barrel

•parallel beta-strands connected by longer regions containing alpha-helical segments

•almost always has a right- handed fold

62 Two examples of how the beta-alpha-beta motif can be used to build up tertiary structure.

Triosephosphate isomerase (TIM) Nucleotide binding domain (Rossmann fold)

63

Connections between adjacent β-sheets

anti-parallel beta hairpin parallel: right handed (almost always)

the “returning” connection is an α-helix in the β−α−β motif

parallel: left handed connection (very, very rare)

64 β-hairpin

Bovine Pancreatic Trypsin Inhibitor Snake Venom toxin (BPTI)

Greek Key

Staphyloccocus nuclease 65

Sheet facts • Repeat distance is 7.0 Å • R group on the Amino acids alternate up-down-up above and below the plane of the sheet • 2 - 15 amino acids residues long • 2 - 15 strands per sheet • Average of 6 strands with a width of 25 Å • parallel less stable than anti-parallel • “always” twisted

Thioredoxin (1TRX)

66 Domain from Aspartate Flavodoxin transcarbamylase plastocyanin

67

The jelly roll fold

68 Repetitive structures are common

LRR (Leucine-Rich Repeats) Beta-loop-alpha E.g. Ribonuclease inhibitor β-helix topology E.g. Pectate lyase C N C

Here, each repeat is 22-28 residues long (individual repeats do not fold on their own) 69

Many proteins are modular and are made up of smaller domains

Examples of databases: Interpro: www.ebi.ac.uk/ : www.sanger.ac.uk/resources/databases/Pfam SMART: http://smart.embl-heidelberg.de Prosite: www.expasy.org/prosite 70 Mutidomain protein: Src family tyrosine kinase Hck F. Sicheri, J. Kuriyan 71

“monolithic” fold: Golgi Mannosidase II ( D. Rose) ~1000 amino acids 72