Protein Folding and Structure Prediction

Protein Folding and Structure Prediction A Statistician's View Ingo Ruczinski Department of Biostatistics, Johns Hopkins University Proteins Amino acids without peptide bonds. Amino acids with peptide bonds. ¡ Amino acids are the building blocks of proteins. Proteins Both figures show the same protein (the bacterial protein L). The right figure also highlights the secondary structure elements. Space Resolution limit of a light microscope Glucose Ribosome Red blood cell C−C bond Hemoglobin Bacterium 1 10 100 1000 10000 100000 1nm 1µm Distance [ A° ] Energy C−C bond Green Noncovalent bond light Glucose Thermal ATP 0.1 1 10 100 1000 Energy [ kcal/mol ] Non-Bonding Interactions Amino acids of a protein are joined by covalent bonding interactions. The polypep- tide is folded in three dimension by non-bonding interactions. These interactions, which can easily be disrupted by extreme pH, temperature, pressure, and denatu- rants, are: Electrostatic Interactions (5 kcal/mol) Hydrogen-bond Interactions (3-7 kcal/mol) Van Der Waals Interactions (1 kcal/mol) ¡ Hydrophobic Interactions ( 10 kcal/mol) The total inter-atomic force acting between two atoms is the sum of all the forces they exert on each other. Energy Profile Transition State Denatured State Native State Radius of Gyration of Denatured Proteins Do chemically denatured proteins behave as random coils? The radius of gyration Rg of a protein is defined as the root mean square distance from each atom of the protein to their centroid. For an ideal (infinitely thin) random-coil chain in a solvent, the average radius 0.5 ¡ of gyration of a random coil is a simple function of its length n: Rg n For an excluded volume polymer (a polymer with non-zero thickness and non- trivial interactions between monomers) in a solvent, the average radius of gyra- 0.588 tion, we have Rg n (Flory 1953). ¡ The radius of gyration can be measured using small angle x-ray scattering. Radius of Gyration of Denatured Proteins 90 80 70 60 50 40 Creatine Kinase 30 ] ° A [ g R 20 Angiotensin II 10 Confidence interval for the slope: [ 0.579 ; 0.635 ] 10 50 100 500 Length [residues] Deviations from Random Coil Behaviour Are there site-specific deviations from random coil dimensions? Forster¨ Resonance Energy Transfer enables us to measure the distance between two dye molecules within a certain range. This can be used to study site-specific deviations from random coil dimensions in highly denatured peptides. Deviations from Random Coil Behaviour 80 60 40 20 number of photons 0 0 200 400 600 800 1000 time 30 25 20 15 10 number of photons 5 0 0 10 20 30 40 50 time Deviations from Random Coil Behaviour 200 We have two underlying distributions for the green and red photons: 150 One stemming from a peptide only having a donor dye. 100 One stemming from a peptide being properly tagged with a donor and an number of green acceptor dye. 50 Assume a photon has probability ¢¡ of be- £ ing red in the former situation, and in 0 the latter. 0 20 40 60 80 number of red photons Deviations from Random Coil Behaviour £ Assume we observe ¢¡ photons at time point . Then the number of red photons ¤¡¦¥ ¡ ¡ ¢¡ £ is simply Bernoulli( ), where is either or . Assume that the probability of observing photons from a peptide without an acceptor dye at any time is , independent of the total number of photons observed. Let § be the number of red photons. Then ¨ © ¨ © ¨ © © § § ¥ § ¥ ¡ ¡ ¡ ¡ ¡ ¡ ¡ £ © © © ¡ ¡ ¥ ¡!¦" & £(" $#% )#* ¡ £ ¡ '¡ and hence . © © © +,© ¢¡ 4¡ ¡ ¥ ¥ - ¡!¦" 2 £(" 76 ¡ £ $#% 5#% ¡ £ ¡3 '¡ ¡0/ £21 Deviations from Random Coil Behaviour 80 60 40 number of red photons 20 0 50 60 70 80 90 100 total number of photons Deviations from Random Coil Behaviour ^ p1 = 0.431 0.0 0.2 0.4 0.6 0.8 1.0 nred nred + ngreen Energy Profile Tmutant ∆∆GT−D Twildtype Dmutant Dwildtype Nmutant ∆∆GN−D Nwildtype £ ¡ ¡¢¡ ¡¢¡ The -value is defined as the ratio GT-D GN-D. Energy Profile Tmutant ∆∆GT−D = ∆∆GT−D Tmutant ∆∆GT−D = 0 Twildtype Twildtype Dmutant Dmutant Dwildtype Dwildtype Nmutant Nmutant ∆∆GN−D ∆∆GN−D Nwildtype Nwildtype If the part of the protein that contains the mutant amino acid is fully structured ¥ ¡¢¡ in the transition state, we have ¡¢¡ GT-D = GN-D and hence = 1. If the part of the protein that contains the mutant amino acid is equal in denatured and the transition state, we have ¡¢¡ GT-D = 0, and hence = 0. Chevron Plots wildtype mutant ¡ ¢ ¡¢¡ GT-D = RT log(kf ) – log(kf ) wildtype wildtype mutant mutant ¡ ¢ ¡¢¡ GN-D = RT log(kf ) – log(ku ) – log(kf ) + log(ku ) 6 Wildtype Mutation I28A 4 ) s b o k ( 2 log 0 −2 0 1 2 3 4 5 6 7 8 Denaturant concentration ( GuHCl [M] ) CGuHCl CGuHCl £ ¢ ¢¥¤ log(kobs) = log exp log(kf)+ mf RT + exp log(ku)+ mu RT More Chevron Plots 6 6 6 Mutation I28A Mutation I28L Mutation I28V 4 4 4 ) s b o k 2 2 2 ( log 0 0 0 −2 −2 −2 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 6 6 6 Mutation V55A Mutation V55M Mutation V55T 4 4 4 ) s b o k 2 2 2 ( log 0 0 0 −2 −2 −2 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 Denaturant concentration ( GuHCl [M] ) Denaturant concentration ( GuHCl [M] ) 6 Wildtype 4 ) s UC Santa Barbara b o k 2 ( Rice University log UC Berkeley 0 −2 0 1 2 3 4 5 6 7 8 Denaturant concentration ( GuHCl [M] ) Variability φ − values 2 1 0 φ −1 UC Santa Barbara Rice University −2 UC Berkeley ∆∆GN−D − values 10 5 D − N G 0 ∆ ∆ −5 −10 I28A−I28L I28L−I28V I28A−I28V V55A−V55T V55M−V55T V55A−V55M Wild type−I28L Wild type−I28A Wild type−I28V Wild type−V55T Wild type−V55A Wild type−V55M Variability Between lab φ − value standard deviation 1.50 1 Wild type−I28A 6 2 Wild type−I28L 8 1.25 3 Wild type−I28V 4 I28A−I28L 5 I28A−I28V 6 I28L−I28V 1.00 11 7 Wild type−V55A 8 Wild type−V55M 9 Wild type−V55T b a l 0.75 10 V55A−V55M ^ 2 σ 11 V55A−V55T 12 V55M−V55T 0.50 3 0.25 4 10 7 5 9 12 1 0.00 0 1 2 3 4 5 6 7 8 9 10 11 12 average ∆∆GN−D values Variability 10 5 D − T G 0 ∆ ∆ −5 −10 UC Santa Barbara Rice University UC Berkeley −10 −5 0 5 10 ∆∆GN−D Some Simulation 3.5 3.0 2.5 2.0 b a l ^ σ 1.5 1.0 0.5 0.0 2 4 6 8 10 12 average ∆∆GN−D values Some Simulation 3.5 3.0 2.5 2.0 b a l ^ σ 1.5 1.0 0.5 0.0 2 4 6 8 10 12 average ∆∆GN−D values Some Simulation 3.5 3.0 2.5 2.0 b a l ^ σ 1.5 1.0 0.5 0.0 2 4 6 8 10 12 average ∆∆GN−D values Some More Simulations D ( φ^ | φ , ∆∆G ) 1 0.37 ^ φ 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 ∆∆GN−D values Some More Simulations 1 1 1 ^ φ 0.3 0.2 0.1 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 1 1 0.6 ^ φ 0.5 0.4 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 1 1 0.9 0.8 0.7 ^ φ 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 average ∆∆GN−D values average ∆∆GN−D values average ∆∆GN−D values Phi-Value Estimation ∆∆GTD 13.5 13.0 12.5 6 8 10 12 14 D T 12.0 G ∆ ∆ ∆∆GND 11.5 11.0 10.5 6 8 10 12 14 7.0 7.5 8.0 8.5 9.0 ∆∆GTD ¢¡ ¡ ¡¥£ ¡¢¡¤£ £ ¡ TD TD 6 ¥ 6 ¡¢¡¤£ ¥£ ¡¢¡ ND ND ¦¨§ © 1 1 with £ F F F F U U W F U M F U F F W F U M F U Phi-Value Estimation ¡ © ¢ ¡ ¡¥£ TD ¡ ¡ ¥ ¡¢¡¤£ ND © with ¡ ¡ ¡ ¡ © ¢ © © ¡ ¡ ¡¢¡¥£ ¡¢¡¤£ ¡ ¡¥£ ¡ ¡¥£ £ © ND TD ND TD ¥£ ¤£ ¡¢¡ ND 0.55 0.60 0.65 0.70 0.75 Φ^ Phi-Value Estimation § ¦¨§ ¥ ¢ ¢! ¡ ¡ © © £ ¡ ¡ £ ? "*" "%" # # 1.0 0.8 Φ 0.6 0.4 0.2 0.0 Evolution and Folding Kinetics Are amino acids in proteins conserved because of folding kinetics? To what extent does natural selection act to optimize the details of protein folding kinetics? Is there a relationship between an amino acid’s evolutionary conservation and its role in protein folding kinetics? Some comments: Our studies of sequence conservation among residues known to participate in the folding nuclei of all of the appropriately characterized proteins reported to date have not provided any evidence that highly conserved residues are more likely to participate in the protein folding nucleus than poorly conserved residues.

Protein Folding and Structure Prediction

POLYMER STRUCTURE and CHARACTERIZATION Professor

Ideal Chain Conformations and Statistics

Native-Like Mean Structure in the Unfolded Ensemble of Small Proteins

CD of Proteins Sources Include

Large-Scale Analyses of Site-Specific Evolutionary Rates Across

Helix-Coil Conformational Change Accompanied by Anisotropic–Isotropic Transition

Native-Like Secondary Structure in a Peptide From

Non-Covalent Interactions and How Macromolecules Fold

Molecular Dynamics of Folded and Disordered Polypeptides in Comparison with Nuclear

Spring 2012 Lecture 10-12

Hydrolyzed Collagen—Sources and Applications

Chapter 7. Polymer Solutions 7.1. Criteria for Polymers Solubility 7.2