Investigating the Structural Determinants of Electrostatic Binding Among Protein-Protein Complexes: a Systematic, Large-Scale Computational Study

Investigating the Structural Determinants of Electrostatic Binding among Protein-protein Complexes: A Systematic, Large-scale Computational Study Emma Nechamkin April 2012 Wellesley College Wellesley, MA Submitted in Partial Fulfillment of the Prerequisites for Honors in Chemistry Acknowledgements Writing my senior thesis has been a wonderful experience, and there are many people to whom I owe thanks for support and guidance. First and foremost, my brilliant advisor Mala Radhakrishnan has been incredibly supportive and understanding. In dorkier terms, her advising style is incredibly close to the hypothetical optimum. Without her constant guidance, motivation, and feedback, my thesis simply would not exist. I am truly grateful to be a student in her lab. Second, the faculty on my thesis committee, David Haines and Don Elmore of the Chemistry Department, and YuJin Ko of the English Department, have dedicated significant time to my thesis. In particular, the chemistry faculty on my committee have provided insight on much of my work and offered much needed encouragement. Third, I would like to thank the entire Wellesley College Chemistry Department. Clearly the best department on campus, the chemistry department has an unparal- leled involvement with and dedication to its students. At a sometimes stressful and overwhelming school like Wellesley, I am glad to have found my niche. Fourth, each member of the Radhakrishnan Lab over my time here has taught me something about research and listened to at least one incredibly long story I have shared. For the group's helpfulness, reassurance, and companionship, I am deeply grateful. In particular, Ying Yi Zhang has shared some wonderful code with me. Fifth, my friends and family have demonstrated bottomless love, giving me both valu- able feedback and the ability to take a break and relax. I'd especially like to thank my mother, who read through this thesis multiple times to check syntax, grammar, and standardization. Last, my funding sources since first year have nurtured my scientific growth and understanding and enabled my thesis to happen. The Sherman Fairchild Foundation, Janina A. Longtine '76 Fund, HHMI, and Wellesley College have all enabled me to pursue this research. 1 0.1 Abstract We seek to understand how Nature uses electrostatics in protein binding. Though electrostatics are but one component of the total binding free energy of protein complexes, electrostatics are commonly manipulated in design and are extremely relevant to protein recognition. This project aims to map out electrostatic contributions to protein-protein binding as they relate to structural components of proteins. Using a continuum electrostatic model on a large set of protein complexes, we aim to quantify the electrostatic contributions of protein structural elements, such as backbone, side chain, proximal, and distal residues toward binding free energy. This study addresses whether trends exist among proteins when correlated to monopole or size, and ex- amines specific cases of promiscuity and specificity in proteins. A large dataset of protein structures has elucidated statistically significant trends. 2 Contents 0.1 Abstract . .2 1 Introduction 6 1.1 Overview . .6 1.2 Characteristics of Protein Binding . .8 1.3 Computational Modeling and the Continuum Electrostatic Model . 11 1.3.1 The Math behind the Continuum Electrostatic Model . 13 1.3.2 Solving the LPBE Using the Finite Difference Method . 19 1.3.3 Limitations of the Model . 21 1.4 Component Analysis . 23 1.4.1 Basics of Component Analysis . 23 1.4.2 The Math behind Component Analysis . 24 1.5 Goals of This Study . 29 2 Methods 31 2.1 Selection and Preparation of Protein Crystal Structures . 32 2.2 Continuum Electrostatics Calculations . 33 2.3 Component Analysis Calculations . 34 2.4 Robustness of Methods . 35 2.5 Statistical Analysis . 36 3 3 Results 38 3.1 Quantifying the Electrostatic Contributions of Backbone and side chain to Binding . 38 3.1.1 Statistical Analysis of Data . 39 3.1.2 Representative Structures of Established Biological Interest . 41 3.1.3 The Difference in Side Chain and Backbone Contributions Rel- ative to Size . 44 3.1.4 Analyzing Hydrogen Bonding in a Component Analysis Frame- work................................ 45 3.2 The Relationship between Monopole and ∆G . 45 3.3 Considering Electrostatic Contributions of Distal and Local Compo- nents of Proteins . 46 4 Discussion 56 4.1 Analysis of Results . 56 4.2 Direction of Future Study . 57 4.2.1 Further Analysis of Contributions of Dimers . 57 4.2.2 Further Analysis of Monopole and Long-Range Electrostatics . 57 4.2.3 Comparisons to Hypothetical Optima . 58 4.2.4 Component Analysis in the Framework of Protein Secondary Structure and Fold . 59 4.2.5 Reconsidering Some Parameters . 59 4.2.6 Addressing Further Traits of Proteins . 59 5 Appendix 60 5.1 Notes on Scripts for Preparing Structures . 60 5.1.1 Steps in the Code of wrapper.pl . 60 4 5.1.2 Setting Up and Cleaning Up Quickly . 66 5.1.3 Other Caveats . 66 5.1.4 Sample Configuration File . 67 5.1.5 Included Parameters . 69 5.2 Notes on Scripts for Running Calculations . 72 5.2.1 Steps in the Code of wrapper script.sh . 72 5.2.2 Sample Template File . 75 5.2.3 Parameters Included . 75 6 Works Cited 77 5 Chapter 1 Introduction 1.1 Overview Proteins are the main machinery of cells and function in almost all aspects of cellular activity, including immune defense, transport among and within cells, cellular messaging, and catalysis of cellular reactions. Such functions often occur via protein interactions that are mediated by protein recognition. Protein interactions are targeted|a protein has a binding partner, often with each binding interaction serving as a step within a larger interaction network [1]. Thus, studying protein recognition is crucial to understanding cellular function. One way proteins recognize each other is via electrostatic interactions, or interactions among charged and polar parts of proteins. Electrostatics are easily manipulated and studied, and can be modeled fairly accurately. Determining which parts of different types of proteins contribute most to the electrostatic free energy of binding may elucidate Nature's own underlying principles of molecular recognition. Using computational methods, this study will quantify electrostatic contributions of structural elements of proteins towards the electrostatic component of binding free 6 energy within a large and diverse set of protein-protein complexes. By gathering and analyzing the data acquired through this large scale study, we will begin to address the following questions: • Do proteins of higher overall charge (monopole) use electrostatics more than others? • Is electrostatically-mediated binding favored in certain protein folds? • How do electrostatic interactions differ between specific and promiscuous kinds of proteins? • What are the relative contributions of distal and local parts of proteins to electrostatic binding free energy? • Do electrostatics play different roles in proteins of different sizes? • Do backbone and side chain components of residues play similar or different roles in binding electrostatics? Taken together, the answers to these questions may shed insight on the structural determinants of molecular recognition. To address these questions, we will use a large set of categorized proteins and a technique that allows the determination of electrostatic interactions from particular parts of each protein. This introduction will first provide a simple review of proteins and a brief review of literature related to our experimental goals. Then, the theory behind our model, the numerical solution to our model, and calculations on our model will be described in detail. 7 1.2 Characteristics of Protein Binding Proteins are organizable polymers of monomeric amino acids. Amino acids have variable regions, called side chains, and invariable regions that join through peptide bonds to form a protein backbone. There are 20 naturally occurring amino acids whose side chains have dis- tinct characteristics: polar, non- polar, acidic, or basic (Figure 1.2). In a complete protein, every amino acid is called a residue, and the Figure 1.1: The Structure of Amino Acids sum of all residues' charges, the In this figure, the parts of an amino acid are shown. Backbone atoms are colored in blue and monopole. side chain parts, magenta. The names of pos- sible side chains and their characteristics are This study focuses on electro- shown under the figure of the monomer. static binding free energies of protein complexes. Though electrostatic energetics are but one facet of the binding energies of proteins, which also include hydrophobic and Van der Waal's forces, they are of particular importance for several reasons. First, amino acids can carry charges. Thus, electrostatics can play a large role in the interactions of proteins both inter- and intra-molecularly [2]. Precise electrostatic interactions are often essential for protein binding and mediate specific interactions between partners. For example, both anti-hen egg white lyzosyme-(HEL)-antibody- HEL complexes, as well as barnase-barstar complexes, are electrostatically regulated 8 [3]. More generally, the electrostatic interactions between charged and polar residues on binding interfaces often stabilizes binding. Second, electrostatic interactions have both long- and short-range effects. Locally, electrostatic forces govern H-bonds and ion pairs [3]. Other short-range electrostatic interactions, like salt-bridges, also contribute to protein conformation [4, 5]. Similarly, long-range electrostatic interactions are important to protein binding as well. For example, \action-at-a-distance" interactions occurring at least 7 A˚ away from the binding interface have been shown to contribute to proteins' abilities to modulate binding tightness and other functions, including catalysis [6]. Third, electrostatic interactions can help mediate specificity and promiscuity. Some proteins bind many targets and are called promiscuous, like trypsin. Trypsin is a serine protease that functions in digestion and promiscuity allows it to recognize and cleave many proteins, and thus function in digestion.

Investigating the Structural Determinants of Electrostatic Binding Among Protein-Protein Complexes: a Systematic, Large-Scale Computational Study

Backbone Dynamics of Free Barnase and Its Complex with Barstar Determined by 15N NMR Relaxation Study

Characterization of in Vitro Oxidized Barstar

The Folding of Groel-Bound Barnase As a Model for Chaperonin-Mediated Protein Folding (Chaperone/Protein Engineering)

FOR Environmental Release of Genetically Engineered Mustard

A General Approach for the Generation of Orthogonal Trnas

Part II – Summary

Identification and Characterization of Protein Folding Intermediates Stefano Gianni, Ylva Ivarsson, Per Jemth, Maurizio Brunori, Carlo Travaglini-Allocatelli

Spectrin SH3 Domain Xavier Periole,1* Michele Vendruscolo,2 and Alan E

Recombinant Expression of Barnase in Escherichia Coli and Its

Molecular Constructor on the Basis of Barnase–Barstar Module

Adaptation of E. Coli Towards Tryptophan Analog Usage

Using Stochastic Roadmap Simulation to Predict Experimental Quantities in Protein Folding Kinetics: Folding Rates and Phi-Values