Bcl::Fold - De Novo Protein Structure Prediction by Assembly Of
Total Page:16
File Type:pdf, Size:1020Kb
BCL::FOLD - DE NOVO PROTEIN STRUCTURE PREDICTION BY ASSEMBLY OF SECONDARY STRUCTURE ELEMENTS By Mert Karakaş Dissertation Submitted to the Faculty of the Graduate School of Vanderbilt University In partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY in Chemical and Physical Biology December, 2011 Nashville, Tennessee Approved: Professor Jens Meiler Professor Albert Beth Professor Phoebe Stewart Professor Charles Sanders Professor Brandt Eichman To Gülfem, my parents and my brother ii ACKNOWLEDGEMENTS I am deeply grateful to my advisor Dr. Jens Meiler for his support throughout my graduate career. Being one of the first members of his laboratory, I had the great chance to work on a very challenging and large scale project from the ground up. He has provided very valuable input as well as showing great confidence in me which allowed me to grow as an independent researcher. Last six years has been a great learning experience for me. I would like to thank my colleagues, specifically Nils Woetzel and Nathan Alexander who have been invaluable in every part of my thesis project. I would also like to thank Dr. Rene Staritzbichler and Dr. Brian Weiner for their scientific contributions to BCL::Fold project. I would also like to acknowledge the members of my thesis committee Dr. Al Beth, Dr. Phoebe Stewart, Dr. Chuck Sanders and Dr. Brandt Eichman. It has not always been easy in graduate school. Fortunately I was lucky to have great friends who provided great companionship in this treacherous journey. I would like to especially thank Andrew Morin, Yoana Dimitrova, Kazım Tuncay Tekle, Can Envarlı and Cem Albayrak. I would like to show my greatest gratitude to my loving family; my mother Ufuk Karakaş, my father Mehmet Karakaş and my brother Murat Karakaş. I feel extremely fortunate to have such a great family. They have been the inspiration for me every day in my life to pursue my dreams and to become the person I am today. They have provided me with constant love and support all my life. iii Finally, I would like to thank the love of my life, my better half and my best friend Gülfem Güler. She has always been there for me with her never-ending love, understanding and encouragement. None of this would have been possible without her. iv TABLE OF CONTENTS ACKNOWLEDGEMENTS ........................................................................................... iii LIST OF TABLES ........................................................................................................ ix LIST OF FIGURES .........................................................................................................x LIST OF ABBREVIATIONS ........................................................................................ xi SUMMARY ................................................................................................................. xii I. INTRODUCTION .....................................................................................................1 Protein Structure ..................................................................................................1 Protein Structure Determination ...........................................................................2 Protein Structure Prediction .................................................................................2 Protein Structure Comparison Methods ................................................................5 Template Based Protein Structure Prediction ........................................................6 De novo Protein Structure Prediction ...................................................................7 Protein Structure Prediction using Limited Experimental Restraints .....................9 BCL::Fold .......................................................................................................... 10 Contact Prediction .............................................................................................. 13 BCL::Contact ..................................................................................................... 14 BioChemistry Library ........................................................................................ 15 II. BCL::CONTACT – LOW CONFIDENCE FOLD RECOGNITION HITS BOOST PROTEIN CONTACT PREDICTION AND DE NOVO STRUCTURE DETERMINATION ................................................................................................ 16 Introduction ....................................................................................................... 16 Methods ............................................................................................................. 19 Contact Definitions and Contact Types ......................................................... 19 Protein Data Sets and Training Procedures ................................................... 20 Numerical Representation ............................................................................ 21 ANN Training and ROC Curve Analysis ...................................................... 25 Rosetta Model Building Guided by BCL::Contact ........................................ 25 Enrichment of native-like de-novo models ................................................... 26 RMSD and MAXN% Distributions of de-novo Models ................................ 27 Results and Discussion ....................................................................................... 28 v The Sequence-based mode correctly predicts 42% of native contacts with a 7% false positive rate while structure-based mode correctly predicts 45% of native contacts with a 2% false positive rate. .................................... 28 The Structure-based mode has been ranked as one of the three best methods in CASP6. ...................................................................................... 32 The Sequence-based mode predicted long distance contacts in CASP7 with up to 40% accuracy. ............................................................................. 33 BCL::Contact induces up to 5 Å shift in average RMSD distributions and up to 26% shift in average MAXN% distributions when guiding de-novo folding.......................................................................................................... 36 BCL::Contact enriches for native-like models by factors of up to five .......... 40 Structure-based Contact Prediction Outperforms Sequence-based Contact Prediction even for hard Fold-Recognition Targets ....................................... 41 Conclusion ......................................................................................................... 42 III. BCL::SCORE - KNOWLEDGE BASED ENERGY POTENTIALS FOR PROTEINS WITH IDEALIZED SECONDARY STRUCTURE REPRESENTATION FOR DE NOVO PROTEIN STRUCTURE PREDICTION .... 44 Introduction ....................................................................................................... 44 Results ............................................................................................................... 47 A database of 4379 chains and 3409 protein structures covers the space of topologies seen in the PDB ....................................................................... 47 The inverse Boltzman relation converts statistics into free energy functions ...................................................................................................... 48 Amino acid pair distance potential ................................................................ 48 Amino acid environment potential ................................................................ 50 Loop length potential.................................................................................... 51 β-Strand pairing potential ............................................................................. 53 Secondary structure element packing potential ............................................. 54 Contact order score....................................................................................... 56 Radius of gyration potential.......................................................................... 58 Phi Psi backbone potential ............................................................................ 60 Amino acid clash, SSE clash and Loop closure potentials ............................. 60 Amino acid pair clash ................................................................................... 61 SSE clash potential ....................................................................................... 61 Loop closure potential .................................................................................. 62 53 protein model sets have been generated using Rosetta, BCL::Fold and perturbation ........................................................................................... 64 vi Enrichment can evaluate the performance of an energy potential .................. 65 Benchmark enrichment of native like structures through potentials ............... 66 Discussion.......................................................................................................... 68 Knowledge based potentials resemble first principles of physics and chemistry ..................................................................................................... 68 Secondary structure packing resembles possible geometric arrangements ..... 69 Size dependent radius of gyration measure discriminates for compact structures .....................................................................................................