Rotamer-Specific Statistical Potentials for Protein Structure Modeling

Rotamer-specific statistical potentials for protein structure modeling by Jungkap Park A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Mechanical Engineering) in the University of Michigan 2013 Doctoral Committee: Professor Kazuhiro Saitou, Chair Associate Professor Angela Violi Assistant Professor Matthew Young Associate Professor Yang Zhang © 2013 Jungkap Park All Rights Reserved Acknowledgements First and foremost, I would like to thank my advisor, Kazuhiro Saitou, for his gentle guidance, great inspiration, and financial support throughout my graduate study. Whenever I got lost in my research or frustrated with unexpected difficulties, he was always patient and understanding. He gave me much freedom to explore diverse research areas and to develop myself independently. I sincerely thank my dissertation committee members, Yang Zhang, Angela Violi and Matthew Young for their time, advice, and patience through the doctoral program. I wish to express my thanks to Naesung Lyu, who first introduced me my advisor, Kazu and helped me to start smoothly my graduate study in the lab. I have been lucky to great lab members, Karim Hamza, Mohammed Shalaby, Jihun Kim and Jean Chu. I have had the opportunity to work with great collaborators. Gus Rosania and Ye Li helped me to work on ChemReader project. Although it’s not directly involved in my thesis, it helped me a lot to build research insight and skills. I have been very lucky to meet great people in Michigan. I thanks to Jonghwa Yoon, Janghee Jeong, Seungjea Lee, Seunghwan Lee, Jeongseok Kim, Dongsuk Kum, Jonggirl Ok, Joosup Lim, Donghoon Song, Minjang Jin, Kyung-eun Lee, Youngki Kim, Youjin Choi, Mingoo Seok, Kyungjoon Lee, JaesunSeo, Seoungchul Yang, InhaPaick, Minjoong Kim, Jinyoung Kim, ii Sanghun Lee, Soohyung Park, Nayoung Park, Myoungdo Chung and Hyoncheol Kim. I will remember all the great memories made during my time in Michigan. Last, but definitely not least, I deeply thanks to my family (parents, wife and sisters, sister’s husband) for their patience and long-term support in my pursuit of this degree. The biggest thank-you goes to my wife Hyunjoo, who has always encouraged me and showed her faith in my potential. Without their supports and loves, I would not have been able to go all along my winding road through the program. iii Table of Contents Acknowledgements ......................................................................................................................... ii List of Tables ................................................................................................................................ vii List of Figures ................................................................................................................................ ix Abstract ........................................................................................................................................ xiii Chapter 1 : Introduction ............................................................................................................. 1 1.1 Background ...................................................................................................................... 1 1.2 Motivation ........................................................................................................................ 4 1.3 Thesis Goal ....................................................................................................................... 5 1.4 Thesis Outline .................................................................................................................. 6 Chapter 2 : Rotamer-specific Environmental Features for Computational Protein Design ...... 8 2.1 Abstract ............................................................................................................................ 8 2.2 Introduction ...................................................................................................................... 9 2.3 Materials and Methods ................................................................................................... 13 2.3.1 Preparation of PDB structures ................................................................................ 14 2.3.2 Grid-based description of the local steric environment of residues ........................ 15 2.3.3 Analysis of the relationship between the rotameric state and the local steric environment of residues ........................................................................................................ 18 2.3.4 Rotamer-specific environmental feature ................................................................. 20 2.3.5 Scoring function ...................................................................................................... 21 2.3.6 Test of side-chain prediction ................................................................................... 26 2.4 Results and Discussion ................................................................................................... 26 iv 2.4.1 Grid-based description of the local steric environment of residues ........................ 26 2.4.2 Relationship between the rotameric state and the local steric environment of residues. ................................................................................................................................ 28 2.4.3 Rotamer-specific environmental features ............................................................... 31 2.4.4 Derived scoring function......................................................................................... 34 2.4.5 Test of side-chain prediction ................................................................................... 35 2.5 Conclusion ...................................................................................................................... 41 Chapter 3 : A Rotamer-dependent, Atomic Statistical Potential for Assessment and Prediction of Protein Structures ..................................................................................................................... 43 3.1 Abstracts ......................................................................................................................... 43 3.2 Introduction .................................................................................................................... 44 3.3 Materials and Methods ................................................................................................... 47 3.3.1 Derivation of ROTAS ............................................................................................. 47 3.3.2 Defining the local structural environment .............................................................. 50 3.3.3 Construction of distance-dependent pairwise potential .......................................... 52 3.3.4 Construction of orientation-dependent pairwise potential ...................................... 53 3.3.5 Interaction cutoff for the ROTAS potential ............................................................ 56 3.3.6 Preparation of PDB structures ................................................................................ 57 3.3.7 Performance evaluation using decoy sets ............................................................... 57 3.4 Results and Discussion ................................................................................................... 58 3.4.1 The influence of rotameric states on atomic interactions ....................................... 58 3.4.2 Native structure recognition .................................................................................... 61 3.4.3 Best model selection ............................................................................................... 64 3.4.4 Correlation between the energy score and decoy model quality ............................ 67 3.4.5 Interaction cutoff effect on the performance .......................................................... 69 v 3.4.6 Different reference states for distance-dependent pairwise potential ..................... 71 3.5 Conclusion ...................................................................................................................... 72 Chapter 4 : Side-chain modeling with ROTAS ....................................................................... 74 4.1 Abstracts ......................................................................................................................... 74 4.2 Introduction .................................................................................................................... 75 4.3 Materials and Methods ................................................................................................... 76 4.3.1 Rotamer library ....................................................................................................... 76 4.3.2 Scoring function ...................................................................................................... 76 4.3.3 Search method ......................................................................................................... 78 4.3.4 Evaluation ............................................................................................................... 79 4.3.5 Training and testing protein sets ............................................................................. 79 4.4 Results and Discussion ................................................................................................... 80 4.5

Rotamer-Specific Statistical Potentials for Protein Structure Modeling

Bioinformatics 1: Lecture 3

BASS: Approximate Search on Large String Databases

Computational Biology Lecture 8: Substitution Matrices Saad Mneimneh

B.Sc. (Hons.) Biotech BIOT 3013 Unit-5 Satarudra P Singh

Parameter Advising for Multiple Sequence Alignment

Pairwise Statistical Significance of Local Sequence Alignment Using Sequence-Specific and Position-Specific Substitution Matrices

Deriving Amino Acid Exchange Matrices (II) and Multiple Sequence Alignment (I) Summarysummary Dayhoff’Sdayhoff’S PAMPAM--Matricesmatrices

Pairwise Sequence Alignment Algorithm by a New Measure Based

Scoring Matrices for Sequence Comparisons

Multiple Sequence Alignment

Practical Considerations of Working with Sequencing Data File Types

Amino Acid Substitution Matrices from Protein Blocks (Amino Add Sequence/Alignment Algorithms/Data Base Srching) STEVEN HENIKOFF* and JORJA G