BMB/Bi/Ch 170 – Fall 2017 Problem Set 1: I

Please use ray-tracing feature for all the images you are submitting. Use either the Ray button on the right side of the command window in PyMOL or variations of the commands below:

ray 300, 500 #creates a ray-traced image of width (300) and height (500) set ray_shadow, 0 #removes shadow in the ray-traced image

1. Basics of structure (18 points) Open the structure of murine Major Histocompatibility Complex Class I (MHC I) in complex with a peptide from Influenza A (IAV) (PDB ID: 3BUY) in PyMOL either by downloading PDB file and opening it or by simply using the command below:

fetch 3BUY #retrieves a protein structure from the PDB, saves it in your home directory, and loads it into PyMOL

a. The Peptide Bond: Make a stick figure of the residues 208-210 (FYP) from Chain A (Chain A is at beginning of sequence) and indicate all peptide bonds. Explain how the planarity of the bonds is achieved and briefly list relevant data that support this. Measure all w torsion angles by using the dihedral measurement tool in the Wizard menu. Explain why proline takes cis-peptide conformation (w ≈ 0o) in Xaa-Pro (Xaa: any ) more often than the non-proline residues. (8 points)

To show the primary sequence of the protein, check Sequence On under the Display menu. Locate the residues 208-210 (FYP) in the primary sequence and click on them. Doing so will highlight the residues in the graphics window below. Enter the following commands in the command line.

hide everything #hides everything

bg white #makes the background white

show sticks, sele #shows the selected residues in stick representation Or, select FYP, (resi 208-210) #selects and names residues 208-10 as ‘FYP,’ which appears as a new object in the menu to the right side of the graphics window

show sticks, peptide #shows ‘YFP’ in stick representation

You may also find it helpful to show the polar hydrogens. You can do this by going to the side bar and selecting Action > hydrogens > add polar.

1 Peptide Bond w o 1 = 179.5 Peptide Bond o w2 = 0.9

One evidence for the planar peptide bond is a ~40% double bond character of the bond formed between the carboxyl group of one and the amino group of next amino acid residue due to a resonance. Another evidence is the bond length differences of C=O and C-N in the peptide bond from those in ketone and methylamine, respectively.

The energy difference between cis- and trans-conformation of proline residue is relatively smaller than that of the other residues, whose cis-conformation is much less favorable due to steric interactions. The proline isomerization can slow down protein folding, leading to proper folding of the rest of the protein.

b. Ramachandran Plot: The classical version (1968) and refined version (2002) of the Ramachandran plot of glycine are given below (Figure 1). Why are the allowed regions determined by the f and y values much smaller in the refined version compared to the classical version? Nowadays, Ramachandran plot is used to evaluate the quality of three-dimensional protein structures. In Figure 2, you notice that several dots reside outside of the “allowed” regions on the plot. What is your guess on the identity of this residue? In what types of regions would you expect a residue of this nature to appear in the protein? Explain. (6 points)

2

(Figure 1) (Hovmöller et al. (2002) Acta Cryst D58: 768-776)

The refined version was created based on the information of a non-redundant set of 1,042 protein subunits extracted from the PDB in January, 2002. These protein structures were determined by X-ray diffraction data with a resolution higher than 2.0 angstroms. Also, these structures were subject to extensive refinement process (R factor < 0.2). The refined high- resolution data have made the residues clustered within the predicted or classical Ramachandran plot.

(FYI: Some studies have been done to figure out why the most occupied a-helical and turn regions were predicted to be only permissible in the classical version (Ho et al, (2003) Protein Science 12:2508). Also, others are on why populations in all representative regions (a-helix, β- sheet, and turn) are diagonal (Hu et al. (2003) Proteins 50:451).

(Figure 2)

The dots that reside outside the “allowed” (yellow/red) region probably belong to glycine residues. Glycine, unlike all the other amino acids, has a hydrogen instead of a carbon bound to Ca. It is small and side chain steric limitations are reduced, so it can occupy restricted areas. Glycine residues can appear in regions where the lack of a side chain is required to accommodate bulky side chains elsewhere.

c. The : A large part of proper protein folding depends on the hydrophobic effect. This effect, however, is not caused by the attractive forces between hydrophobic side

3 chains. Describe the nature of the hydrophobic effect and the significance of entropy as the driving force. (4 points)

Hydrophobic effect is the thermodynamic consequence of removal of nonpolar side chains from their unfavorable contacts with water molecules. This is dominated by the entropic effect. Exposing hydrophobic residues in aqueous solvent leads to the rearrangement of water molecules in cages around hydrophobic entities. The “ordered” state of water molecules is to satisfy their hydrogen bonding needs, which lowers the entropy of the system. Proteins are in the aqueous cytosol, and hydrophobic collapse into a “molten globule” state led by the major stabilizing forces, hydrophobic effect and hydrogen bonding, has been observed in protein folding process.

2. Secondary structure (38 points) According to the Ramachandran plot, proteins have a limited variability of polypeptide backbone conformation (based on f and y). The two main allowed regions in the plots correspond to the two main types of regular secondary structure (a helices, b sheet, turn). Individual amino acid side chains also have strong preference for certain conformations, called rotamers (based on cs).

a. Helix Formation: Navigate to the first domain in chain A which contains a beta sheet framed by two alpha helices. Notice that the helices are not perfect. Now look at the sequence for these regions. What residues are causing these kinks? Why? (5 Points)

The kink in one helix is caused by a glycine residue. Glycine residues have a hydrogen atom as the side chain and which results in higher flexibility and rotation. The second kink is caused by a Proline and Glycine residue. Proline residues are more constrained than most amino acids due to their side chain which makes them unable to adapt to the preferred helical geometry. Both Glycine and Proline residues commonly break alpha helices.

b. Helical Wheel Diagrams: One section of a helix in our murine MHC I is:

163 - ECVEWLHRYLKN -174

Navigate to the Helical Wheel Projections website (http://rzlab.ucr.edu/scripts/wheel/wheel.cgi?sequence=ABCDEFGHIJLKMNOP&subm) and submit this sequence to generate a helical wheel diagram shaped and color coded according to the chemical properties of each residue. This particular helix is placed externally on the protein such that one side faces the hydrophobic beta sheet below it and the other is solvent exposed. Label the two sides in your helical wheel diagram. What pattern in the primary sequence hints that this helix may have a buried face and a solvent exposed face? (5 points)

4 solvent exposed face

hydrophobic face

(Shape: hydrophilic residues as circles, hydrophobic residues as diamonds, negatively charged as triangles, positively charged as pentagons/ Color: the most hydrophobic residue is green, and the amount of green is decreasing proportionally to the hydrophobicity, with zero hydrophobicity coded as yellow. Hydrophilic residues are coded red with pure red being the most hydrophilic (uncharged) residue, and the amount of red decreasing proportionally to the hydrophilicity. The charged residues are light blue.)

1-2 polar residues are found alternating with 1-2 hydrophobic residues in primary sequence. This pattern indicates that this is an amphipathic helix.

c. Helices: In your MHC I structure, find the helix from part b. Verify your findings by generating a figure displaying the hydrophobic patch of the helix as transparent orange spheres, and polar (both uncharged and charged) residues as transparent marine spheres. Use the following commands and variations thereof to accomplish this:

select helix, (resi 163-174) #selects and names the residues as ‘helix’ and an object is created in the menu on the right side of the graphics window

select , (resn phe+tyr+ala+leu+val+ile) and helix #selects and names the hydrophobic residues in the helix as ‘hydrophobes,’ in the menu

show spheres, hydrophobes #shows residues in the object in spheres

color orange, hydrophobes #shows residues in the object in orange

Use the same commands above for the polar residues (lys+thr+arg) in the helix, but name them as ‘polar’ and color them ‘cyan’

set sphere_transparency=0.5 #sets sphere transparency at 50%

Once you have made your figure showing hydrophobic residues, hide everything and show the helix as sticks. Make another figure that shows hydrogen-bonding pattern holding the helix

5 backbone together. What kind of helix is this? Explain your reasoning. (Note: Three kinds are covered in class.) (10 points)

This is an α-helix because the carbonyl of the i residue hydrogen bonds with the nitrogen of the ~i+4 residue. (e.g. The backbones of E163(i), W167 (i+4), and Y171(i+8) are highlighted in yellow.)

d. β Sheets: Make a cartoon figure by selecting and highlighting the two adjacent β-strands from Chain B (sequences: 36-EIQMLK-41/ 78-YACRVK-83) with a different color in your MHC I structure.

In another figure, isolate the two strands and show them in both cartoon and stick representations. Color the residues by element using the menu on the right side of graphics window (C(Color)®by element). Show all hydrogen bonds stabilizing the two β-strands. What type of this β-sheet is this? (Note: Three kinds are covered in class.) Explain why this type of β- sheet is generally twisted. (8 points)

6

This is an example of an immunoglobulin (Ig) domain. These β-strands form an antiparallel β- sheet. Antiparallel β-sheets are usually twisted (left-handed) because the polypeptide backbone has an intrinsic tendency of right-handed twist due to the chirality of amino acid residues.

e. Secondary Structure Prediction: Navigate to the three different secondary structure prediction servers listed below:

PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/) RaptorX Property (http://raptorx.uchicago.edu/StructurePropertyPred/predict/) SSpro (http://scratch.proteomics.ics.uci.edu/)

Go to the 3BUY PDB page and copy the FASTA Sequence under the dropdown menu ‘Download Files’. Submit your sequence for Chain A to each of the servers. Briefly describe how well each prediction worked by comparing the predictions to the 3-D crystal that was obtained experimentally using X-ray crystallography. What is the principle behind each prediction that you think causes variation? (10 points)

Crystal Structure: PHSMRYFETAVSRPGLEEPRYISVGYVDNKEFVRFDSDAENPRYEPRAPWMEQEGPEYWERETQKAKGQ EQWFRVSLRNLLGYYNQSAGGSHTLQQMSGCDLGSDWRLLRGYLQFAYEGRDYIALNEDLKTWTAADMA AQITRRKWEQSGAAEHYKAYLEGECVEWLHRYLKNGNATLLRTDSPKAHVTHHPRSKGEVTLRCWALGF YPADITLTWQLNGEELTQDMELVETRPAGDGTFQKWASVVVPLGKEQNYTCRVYHEGLPEPLTLRWEP

7

PSPIRED:

PSIPRED is not good at predicting all of the α-helices or beta sheets in this region and the correctly predicted α-helices and β-sheets are often shorter in length than the actual ones. This server uses a two-stage neural network with sequence profile information from PSI-BLAST.

RaptorX:

Overall, RaptorX does a pretty ok job at predicting where the helices and sheets are. It has trouble with the exact lengths of these secondary structures, but the overall number and general location for them in the sequence is correct. The algorithm used here is DeepCNF (Deep Convolutional Neural Fields). This method has been suitable especially for proteins without any known homologous structures in PDB or with sequences that do not have much information from the evolutionary stand point.

8 SSpro:

Predicted Secondary Structure (3 Class): CEEEEEEEEEECCCCCCCCEEEEEEEECCEEEEEEECCCCCCCCEECCHHHHHCCHHHHHHHHHHHHHH HHHHHHHHHHHHHHCCCCCCCCCEEEEEEEEEECCCCCEEEEEEEEEECCEEEEEECCCCCCEEECCHH HHHHHHHHHHCCHHHHHHHHCCCCHHHHHHHHHHHCHHHHCCCECCEEEEEEEECCCCEEEEEEEEEEE ECCCCEEEEEECCEECCCCCEECCCEECCCCCEEEEEEEEEECCCHHHEEEEEECCCCCCCEEECCCC E = extended strand, H = alpha-helix, C = the rest

SSpro prediction is pretty accurate with assigning not only correct secondary structures, but also the length of them. This server uses evolutionary information from sequence homology and homologous structures in the PDB, unlike RaptorX property. This method uses an ensemble of bidirectional recurrent neural networks and PSI-BLAST profiles. Accuracy of the prediction is associated with the assignment of secondary structures by the DSSP database in proteins deposited in the PDB.

3. Tertiary/quaternary structure (16 points) a. Structural motif: While the peptide binding subunit of MHC class I (Chain A in our structure) has considerable variability, the smaller subunit of the complex called β2 microglobulin (B2M). B2M is required for MHC I surface expression. What structural fold/motif is B2M an example of? (2 points) Immunoglobulin (Ig) domain or immunoglobulin fold b. Structural Alignment: Copy the B2M structure from your original 3BUY structure into a new object. Rename it 3BUY_B2M. Open one of the homologous structures of murine B2m, 5CKG, in PyMOL. This PDB ID corresponds to a human B2M structure with two stabilizing mutations. Align the structures using the following command:

i. align 3BUY_B2M, 5CKG #aligns 3BUY_B2M and 5CKG *note that there are two molecules in the 5CKG structure. This is related to how the crystal structure was solved. You can either delete the second chain or ignore it.

Generate a cartoon figure of the structural alignment between the two. Launch PDBeFold (http://www.ebi.ac.uk/msd-srv/ssm/) with the two PDB IDs. When selecting chains, use Chain B from 3BUY and Chain A from 5CKG. Report the root mean square deviation (RMSD) value. What does this value tell you? What is the benefit of aligning two structures using the secondary- structure matching (SSM) algorithm in PDBeFold? (8 points)

9

(cyan: 3BUY Chain B/ dark blue: 5CKG Chain A) The RMSD value for this structure alignment from PDBeFold is 1.09 angstrom and tells the average distance of the backbone atoms of the two aligned structures. Generally, the smaller RMSD, the more the two structures are matched. The align command in PyMOL performs a sequence alignment first and then a structural superposition, followed by cycles of refinement to reject structural outliers if any is found. In order to use this command, sequence similarity between two proteins to be aligned should be higher than 30%. One benefit of using the SSM algorithm is that Q score, which represents the quality of a three- dimensional structure comparison, takes into account both RMSD and the alignment length.

Most of the variability in the MHC I complex between species occurs in the MHC I binding domain, so it is not that surprising that the two B2M structures have similar folds. In fact, a sequence alignment of both proteins shows that they are very similar, although not exact. Notice that most of the differences are in the flexible loops that connect the beta sheets. While this may be due to mutations, it is possible that these are just differences in how the proteins crystallized, especially if these are unstructured, flexible loops. Whenever looking at a protein structure, it is important to remember that proteins are flexible, dynamic molecules and a crystal structure is just one snapshot of their structure.

c. Sequence Alignment: Navigate to the EMBL MUSCLE website (http://www.ebi.ac.uk/Tools/msa/muscle/)

Download the sequences for 5CKG and 3BUY from their PDB pages as in problem 2e and perform a sequence alignment of the Chain B sequence from 3BUY and the Chain A sequence from 5CKG using MUSCLE. Note that your sequences must be in FASTA format (Here’s an example if you’ve never used it before http://prodata.swmed.edu/mummals/info/fasta_format_file_example.htm).

10

Paste your alignment in your answer.

Based on your alignment, are your results for part 3b surprising? Where are most of the differences? (6 Points)

The way to interpret your alignment’s consensus symbols are as follows: * = fully conserved residue; : = conservation between groups of strongly similar properties; . = conservation between groups of weakly similar properties; no symbol means not conserved.

Look at the figure of mouse B2M with the mutated residues highlighted on the next page. You’ll notice that the mutations are pretty spread out over the whole molecule. What may be a bit harder to see at first glance is that most of the mutations that are weakly conservative or not conservative actually localize predominantly to the flexible, unstructured loops. From a structural perspective, this would make sense because significant changes in the of the residues that make up the β-sheets may actually disrupt the secondary structure of the protein.

11

Left: 3BUY chain B.

Orange Residues are strongly conservative changes (i.e. I to L).

Cyan residues are either weakly conservative changes (i.e. K to T) or non-conservative changes (i.e. E to T).

4. Protein-Protein Interactions (28 points) a. Electrostatics vs. hydrophobicity: In class, you talked about a number of different weak forces that underlie interactions between macromolecular complexes. Briefly describe the difference between electrostatic vs. hydrophobic protein-protein interactions. (3 Points)

Electrostatic protein interactions are characterized by attractions/repulsions between charged groups in proteins.

Hydrophobic protein interactions are characterized by interactions between non-polar groups as a result of reducing the unfavorable disruption of interactions between polar solvent molecules.

b. Surface Charges: MHC I molecules are receptors that display peptides on the surface of cells to immune cells. This is a key way that the immune system can recognize intracellular pathogens. The binding groove for these peptides is located in between the helices packed against the β- sheet. The structure you have been working with has a peptide from influenza A bound in the binding pocket.

Make a figure showing the surface charges of the binding groove with the peptide bound. To do this, first you should extract the peptide into its own object. Select the residues for the peptide (Chain C). Go to the Action button on the right for selection and choose extract object. Feel free to rename the new peptide object.

12 To Generate the electrostatic surface map, go to the Action (A) button for the new protein structure without the peptide (3BUY). Select generate, then vacuum electrostatics. To change the surface transparency, go to the Setting drop down menu, select Transparency, and Surface. Set it to 40%. Show the peptide in stick form with sidechains and the main structure in cartoon beneath the surface map. What do the blue and red patches represent? (8 Points)

The blue and red patches represent regions that are predicted to be either positively (red) or negatively (charged). The darker the patch color, the more charged it is.

c. Models and Programs: When you made the electrostatic map, a warning came up in the final dialog box:

Why do you think the program developers chose to include this warning? Why is it important to take into consideration the limitations of models and programs? (5 points)

The programmers include this to point out that you are calculating electrostatics in a vacuum. This is a crystal structure and depending on what buffer conditions this protein was purified and crystallized in (i.e. pH, salt, etc), it is possible that these charges are not accurate to how the protein in the crystal actually looked.

This structure, as well as all the other structures in the PDB, are models. It is important to remember that protein models are only representations of protein structures and, while they may be potentially very, very accurate, these structures have been determined experimentally and even the best experiments will always have sources of error. The best way to use any models is to be aware of their limitations. Likewise, when using different

13 analysis tools it is important to understand what they are actually showing you in order to avoid making assumptions that may not be correct.

d. Peptide Binding: One MHC I protein is able to bind lots of different peptides 8-9 residues long with very different sequences. T cell receptors that bind to MHC I with peptides identify the latter based on the way the peptide bulges out of the pocket. Look closely at the binding pocket. What are the biochemical properties of the residues that make up the peptide? Based on the biochemical character of the peptide and your electrostatic map of the binding groove, and what kind of protein-protein interaction is this? (5 points)

The peptide is made up primarily of hydrophobic residues.

While there is a positively charged pocket on one end of the pocket, most of the pocket is not strongly charged. You would expect this to be a predominantly hydrophobic interaction.

e. Salt Sensitivity: Would you expect the interaction between this MHC I and peptide to be sensitive to small changes in salt concentration? Propose an experiment you could do to check salt sensitivity of the interaction. (7 points)

Electrostatic interactions, which are dependent on charges would be more likely affected by small changes in salt. This is because raising or lowering the salt would make ions in solution either more or less available to interact with these charged atoms, potentially disrupting or stabilizing interactions that may be necessary for the protein-protein interaction. This interaction should not be strongly affected by small changes in salt concentration as it is driven by hydrophobic interactions.

It is worth pointing out that large changes in salt concentration do affect both hydrophobic and electrostatic interactions (i.e. salting out effect, etc).

One experiment you could do is to look at complex formation of the peptide to the MHC molecule in increasing salt concentrations. The formation of a complex could be assessed using a gel filtration column, which separates proteins based on hydrodynamic radius with larger complexes eluting off of the column earlier and smaller ones eluting later. You would expect both the peptide and MHC molecule to elute in the same fraction if they have formed a complex. By analyzing the fractions of volume from the column using SDS-PAGE and a protein stain, it is possible to see if the peptide and MHC protein are in the same fraction. Repeating this experiment in different buffer conditions of increasing salt, it would be possible to assess if the interaction can be broken at higher salt concentrations.

14