Computational Protein Design: Assessment And

COMPUTATIONAL PROTEIN DESIGN: ASSESSMENT AND APPLICATIONS Zhixiu Li Submitted to the faculty of University Graduate School in partial fulfillment of the requirements for the degree Doctor of Philosophy in the School of Informatics and Computing, Indiana University May 2015 Accepted by the Graduate Faculty, Indiana University, in partial fulfillment of the requirements for the degree of Doctor of Philosophy. ________________________________ Yunlong Liu, PhD, Chair ________________________________ Huanmei Wu, PhD Doctoral Committee ________________________________ Samy Meroueh, PhD November 24, 2014 ________________________________ Yaoqi Zhou, PhD ii © 2015 Zhixiu Li iii DEDICATION Dedicated to my family and friends. iv ACKNOWLEDGEMENTS I would like to thank my research committee members, my family and friends for their help in my Ph.D study. Foremost, I would like to express my sincere gratitude to my advisor Prof Yaoqi Zhou. He has provided me excellent research advice, great insights, enthusiasm and encouragement throughout all my research projects. His guidance and support help all the time in research and dissertation writing. I would never have been able to finish my dissertation without his help. Besides my advisor, I would like to extend my sincerest thanks and appreciation the rest of my research committee: Profs Yunlong Liu, Samy Meroueh, and Huanmei Wu, for all the useful discussions, encouragement, and insightful comments. I also want to thank my colleagues, collaborators, teachers and classmates in various projects. I shared a great time with them while learning from them. They are Profs Qizhuang Ye, Song Liu, Shiaofen Fang, Mohammad Al Hasan, James H. Hill, Jihua Wang, Drs Yuedong Yang, Huiying Zhao, Jian Zhan, Tuo Zhang, Liang Dai, Eshel Faraggi, Wenchang Xiang, Hui Huang, Md Tamjdul Hogue, Mr. Arthur Liu, Mr. Haoyu Cheng, Mr. Liang-Chin Huang, and others. Last but not the least, I would like to thank my family for their love and unconditional support throughout my life. v Zhixiu Li COMPUTATIONAL PROTEIN DESIGN: ASSESSMENT AND APPLICATIONS Computational protein design aims at designing amino acid sequences that can fold into a target structure and perform a desired function. Many computational design methods have been developed and their applications have been successful during past two decades. However, the success rate of protein design remains too low to be of a useful tool by biochemists whom are not an expert of computational biology. In this dissertation, we first developed novel computational assessment techniques to assess several state-of-the- art computational techniques. We found that significant progresses were made in several important measures by two new scoring functions from RosettaDesign and from OSCAR-design, respectively. We also developed the first machine-learning technique called SPIN that predicts a sequence profile compatible to a given structure with a novel nonlocal energy-based feature. The accuracy of predicted sequences is comparable to RosettaDesign in term of sequence identity to wild type sequences. In the last two application chapters, we have designed self-inhibitory peptides of Escherichia coli methionine aminopeptidase (EcMetAP) and de novo designed barstar. Several peptides were confirmed inhibition of EcMetAP at the micromole-range 50% inhibitory concentration. Meanwhile, the assessment of designed barstar sequences indicates the improvement of OSCAR-design over RosettaDesign. Yunlong Liu, PhD, Chair vi Contents List of Tables .................................................................................................................... xi List of Figures ................................................................................................................. xiii List of Equations ............................................................................................................. xv Chapter 1 Introduction ................................................................................................. 1 1.1 Protein: From Sequence to Structure ................................................................... 1 1.2 Computational Protein Design ............................................................................. 3 1.2.1 Searching Algorithm ..................................................................................... 5 1.2.2 Energy Function ............................................................................................ 6 1.3 Overview of the Dissertation ............................................................................... 7 Chapter 2 Energy Functions in De Novo Protein Design .......................................... 9 2.1 Abstract ................................................................................................................ 9 2.2 Introduction .......................................................................................................... 9 2.3 De Novo Designed and Structurally Validated Proteins .................................... 12 2.4 Origin of Low Success Rate in Protein Design .................................................. 15 2.5 Energy Function in Protein Design .................................................................... 18 2.5.1 RosettaDesign Energy Function ................................................................. 18 2.5.2 EGAD Energy Function .............................................................................. 20 2.5.3 Liang-Grishin Energy Function .................................................................. 21 2.5.4 Balancing Nonlocal and Local Interactions ................................................ 22 2.5.5 RosettaDesign-SR Energy Function ........................................................... 22 2.6 Computational Assessment of Designed Proteins .............................................. 23 2.6.1 Sequence Assessment: Native Sequence Recovery .................................... 25 vii 2.6.2 Local Assessment: Secondary Structure Recovery ..................................... 27 2.6.3 Local Assessment: Intrinsic Disorder ......................................................... 28 2.6.4 Surface Assessment: Solvent Accessibility Recovery ................................ 28 2.6.5 Surface Assessment: Hydrophobic Patch ................................................... 29 2.6.6 Packing Assessment: Total Accessible Surface Area ................................. 30 2.6.7 Global Structure Assessment ...................................................................... 31 2.6.8 Summary ..................................................................................................... 33 2.7 Community-wide Scoring Function Assessment ............................................... 34 2.8 Current Challenges and Future Prospects .......................................................... 35 Chapter 3 Assessment of Novel Energy Functions for Design ................................ 38 3.1 Introduction ........................................................................................................ 38 3.2 Results ................................................................................................................ 40 3.2.1 Sequence Assessment: Native Sequence Recovery .................................... 41 3.2.2 Local Assessment: Secondary Structure Recovery ..................................... 44 3.2.3 Local Assessment: Predicted Intrinsic Disorder and Low Complexity Residues ..................................................................................................................... 45 3.2.4 Surface Assessment: Solvent Accessibility Recovery ................................ 47 3.2.5 Surface Assessment: Hydrophobic Patch ................................................... 50 3.2.6 Packing Assessment: Total Accessible Surface Area ................................. 51 3.2.7 Global Structure Assessment ...................................................................... 54 3.3 Conclusion .......................................................................................................... 57 viii Chapter 4 Direct Prediction of the Profile of Sequences Compatible to a Protein Structure by Neural Networks with Fragment-Based Local and Energy-Based Nonlocal Profiles .................................................................................... 59 4.1 Abstract .............................................................................................................. 59 4.2 Introduction ........................................................................................................ 60 4.3 Methods .............................................................................................................. 62 4.3.1 Datasets ....................................................................................................... 62 4.3.2 Neural Network ........................................................................................... 64 4.3.3 Input Features.............................................................................................. 65 4.3.4 Output Layer ............................................................................................... 66 4.3.5 Ten-fold Cross Validation and Independent Test ....................................... 67 4.3.6 Performance Evaluation .............................................................................. 67 4.3.7 RosettaDesign ............................................................................................

Computational Protein Design: Assessment And

Molecular Dynamics Simulations in Drug Discovery and Pharmaceutical Development

Elephantid Genomes Reveal the Molecular Bases of Woolly Mammoth Adaptations to the Arctic

The TIM Barrel Fold Nagarajan D

Smurflite: Combining Simplified Markov Random Fields With

Open Ratulchowdhury Etd.Pdf

Recent Advances in Automated Protein Design and Its Future

2020 Program Book

Protein Design Is NP-Hard

Functional Analysis of Somatic Mutations Affecting Receptor Tyrosine Kinase Family in Metastatic Colorectal Cancer

Commentary Rational Protein Design: Combining Theory and Experiment

Modeling and Predicting Super-Secondary Structures of Transmembrane Beta-Barrel Proteins Thuong Van Du Tran

Analyse Et Identification D'acides Aminés Essentiels De L'extension C