Computational Modeling of Structural Heterogeneity in Folded Proteins

COMPUTATIONAL MODELING OF STRUCTURAL HETEROGENEITY IN FOLDED PROTEINS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF MECHANICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Ankur Dhanik August 2010 © 2010 by Ankur Dhanik. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/vn290ds4143 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Jean-Claude Latombe, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Scott Delp, Co-Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Serafim Batzoglou Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii Abstract Proteins are biomolecules that play a key role in a wide diversity of vital functions, such as metabolism and signal transmission. Each protein is a linear chain of amino acids that folds into a flexible three-dimensional structure. Protein’s flexibility is widely believed to be essential for its function. Indeed, proteins usually achieve their main functions by binding other molecules, called ligands. Binding requires shape and chemical complementarity of the two molecules at their binding interface. Conformation selection theory suggests that the protein and the ligand exist in an ensemble of continuously deforming conformations and that the most compatible conformations recognize each other and bind together. Binding conformations of proteins often differ significantly from non-binding ones. To understand protein’s function one must be able to determine or predict these binding conformations. Motion of a protein occurs at timescales that span several orders of magnitude. Ther- mal fluctuations, which occur in picoseconds, are small-amplitude, uncorrelated, harmonic motions of the individual atoms. In contrast, conformational deformations closely related to the protein’s function occur in microseconds to milliseconds. These slow deformations are usually large-scale, correlated, anharmonic motions that correspond to transitions between meta-stable states, such as binding and non-binding states. In this dissertation we are mainly interested in modeling structural heterogeneity associated with such slow deformations. This dissertation presents new computational methods to study the flexibility of folded iv protein in the context of three important biological problems: - Loop sampling: Loop between helices and/or strands are often highly flexible frag- ments of proteins that participate in binding sites. The problem is to determine if a given protein loop can achieve a shape where it may bind against a specified ligand. - Interpretation of electron density maps: X-ray crystallography is the most common experimental method to determine the folded structure of a protein. The experiment pro- vides an electron density map (EDM) from which the positions of the protein atoms can be determined. However, when there is heterogeneity among the protein conformations present in the crystal, the EDM is blurred and extremely difficult to interpret. The problem is to model structural heterogeneity present in the crystal. - Determination of allosteric pathways: An allosteric protein is a protein whose shape changes when it binds an effector ligand at the protein’s allosteric site. This change alters the ability of the protein to bind another molecule at its functional site. The problem is to identify the sequences of side-chains through which the change in shape propagates from the allosteric to the binding site. Computational modeling of structural heterogeneity in the folded state of a protein is a challenging problem, mainly because of the high-dimensionality of the protein’s conformation space and the very small relative volume of its feasible motion space. Although our methods are specific to each of the three problems, they share the same sample and select approach: they combine efficient sampling algorithms that allow us to represent structural heterogeneity in a folded protein by a collection of sampled conformations and selection algorithms that allow us to reliably pick the sampled conformations that provide a solution to the problem. In addition, they share several similar techniques, like efficient kinematic modeling, fast collision detection among atoms to handle van der Waals volume exclu- sion among atoms, and optimization techniques. This dissertation demonstrates the power of geometric computation and efficient sampling to model structural heterogeneity in the folded protein. v Acknowledgements I would like to thank my adviser, Professor Jean-Claude Latombe, for his invaluable guidance and support throughout my PhD. I have benefited immensely from the discussions we had on scientific concepts, research methodologies, and life in general. His world- wide mountaineering trips also provided me an unique opportunity to learn from his travel experiences. He is a role model on how to build a productive and joyful research career. I would like to thank my co-adviser, Professor Scott Delp, for his guidance that helped me achieve PhD milestones. He is a very kind spirit. I would also like to thank my committee members Professor Serafim Batzoglou, Professor Axel Brunger, Professor Gill Bejer- ano, Professor Russ Altman, and Professor Eric Darve for their feedback on my research. Most of the work in this dissertation has been done in collaboration with Dr. Henry van den Bedem at Joint Center for Structural Genomics. He always brought new insights into the research topics that are focus of this dissertation. I have greatly benefited from his pointed critiques that always helped in improving and refining my research. Thanks for all the help. Many thanks to the members of Latombe group Peggy Yao, Guanfeng Liu, Ruixiang Zhang, Liangjun Zhang, Kris Hauser, Tim Bretl, Mitul Saha, Philip Fong, Nathan Marz, Ryan Propper, and Charles Kou, with whom I enjoyed discussions on research and general fun topics. Peggy Yao worked with me on some of the research presented in this dissertation and was very helpful throughout. Nathan, Ryan, and Charles worked with me on early software development that proved very useful later. The administrative assistant of vi Latombe group, Alex Sandra Pinedo, generously helped me with many things that saved me a lot of time. Special thanks to her. I would also like to thank my friends Benoit, Tarun, Menaka, Arjun, Gaurav, Supreet, Shloke, Anshika, Sachin, Renu, Deepak, Mini, Rajeev, Ujvalla, Nitin, Gauri, Tirthankar, Beatrice, Sonti, Desingh, Naveen, Ashok, and Musu, with whom I shared fun moments of my student life. My parents, brother Vyom, sister Taru, and sister-in-law Shweta have been a constant source of love and support. My parents were always concerned about when I will finish my PhD but they never lost their patience. Their unwavering confidence in my abilities was always reassuring. I would also like to thank my in-laws, sister-in-law Niti, and brother-in- law Deepak for their love and best wishes. This dissertation would not have been possible without the love and support of my wife Neha. She came into my life when I was getting into the hectic final year of my PhD and provided all the support that allowed me to solely focus on research. She always helped me feel relaxed and provided constant motivation. Her valuable comments on my research reports and especially on my PhD defense presentation helped in improving this dissertation. I share this dissertation with her. Finally, I would like to thank Stanford University for providing me the facilities and a conducive environment for research. I would also like to thank Indian Institute of Tech- nology Kanpur and National University of Singapore for their contribution to my early technical education. This research was partially funded by NSF grant DMS-0443939 and by two research grants from the KAUST-Stanford Academic Excellence Alliance (AEA) program. Thanks for the support. vii Contents Abstract iv Acknowledgements vi 1 Introduction 1 1.1 Motivation and Goals . 1 1.2 Structural Heterogeneity of a Folded Protein . 5 1.3 Why Model Structural Heterogeneity? . 7 1.4 Computational Challenge . 8 1.5 Main Contributions . 10 1.6 Relation to Previous Work . 11 2 Exploring the Motion Space of Protein Loops 13 2.1 Related Work . 15 2.2 Seed Sampling Algorithm . 18 2.2.1 Sampling front/back-end conformations . 19 2.2.2 Sampling mid-portion conformations . 19 2.2.3 Placing side-chains . 20 2.3 Deformation Sampling Algorithm . 21 2.3.1 Overview . 21 viii 2.3.2 Computation of a basis of the tangent space . 22 2.3.3 Selection of a direction in the tangent space . 22 2.3.4 Placing side-chains . 23 2.4 Collision Detection . 23 2.5 Experiments . 24 2.5.1 Seed sampling . 24 2.5.2 Deformation sampling . 29 2.5.3 Placements of side-chains . 31 2.5.4 Calcium-binding site prediction . 32 2.6 Conclusion . 35 3 Modeling Structural Heterogeneity From X-ray Data 37 3.1 Related Work . 39 3.2 SAMPLE-SELECT Algorithm . 40 3.2.1 Selection step . 41 3.2.2 Conformation sampling . 42 3.3 Side-chain Driven Heterogeneity .

Computational Modeling of Structural Heterogeneity in Folded Proteins

Alberto Cisneros, Amanda M. Duran, Jessica A. Finn, Darwin

Jonathan Sheehan

Cheminfo - Qualitative Analysis of Machine Learning Models for Activation of HSD Involved in Alzheimer’S Disease

Efforts to Establish a Federally Supported Rosetta Center

Jens Meiler Improving Quantitative Structure-Activity Relationship

By Submitted in Partial Satisfaction of the Requirements for Degree of in In

A Glance Into the Evolution of Template-Free Protein Structure Prediction Methodologies Arxiv:2002.06616V2 [Q-Bio.QM] 24 Apr 2

Protein-Small Molecule Docking with Full Side-Chain Flexibility

Bild Der Wissenschaft 2010, 4, 18-23

Introducing Foldit Education Mode

New Approaches for Practical and Time Efficient Cloning, Construct Design and the Strategic Selection of Protein-Protein Interaction Techniques

201005 Aushang Guest Lecture Jens Meiler Ohne Zoomdaten