Data-Driven Methods and Models for Predicting Protein Structure Using Dynamic Fragments and Rotamers

Data-driven Methods and Models for Predicting Protein Structure using Dynamic Fragments and Rotamers Steven J. Rysavy A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2014 Reading Committee: Valerie Daggett, Chair James Brinkley Ira Kalet Program Authorized to Offer Degree: Biomedical Informatics and Medical Education UMI Number: 3618332 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. UMI 3618332 Published by ProQuest LLC (2014). Copyright in the Dissertation held by the Author. Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106 - 1346 © Copyright 2014 Steven J. Rysavy University of Washington Abstract Data-driven Methods and Models for Predicting Protein Loop Structure using Dynamic Fragments and Rotamers Steven J. Rysavy Chair of the Supervisory Committee: Professor Valerie Daggett Bioengineering Proteins play critical roles in cellular processes. A protein’s conformation directly relates to its biological function and, consequently, determination of such structure can provide great insight into a protein’s function. Using a computational technique called molecular dynamics (MD), we are able to simulate and observe protein dynamics at a much higher temporal and spatial resolution than allowed by experimental methods. Dynameomics is a research endeavor that uses MD to produce thousands of protein simulations, resulting in hundreds of terabytes of data. Using novel visual analytics techniques, we have mined the Dynameomics data warehouse for data on protein backbone segments and side-chain behavior, called fragments and rotamers, respectively. Knowledge derived from these dynamic fragments and rotamers was used to improve the quality of protein loop structure predictions. We have created novel data models to store, analyze and compare fragments and side-chain rotamers, then developed methods to predict loop structures with information inferred from these data models. Protein loop regions predicted from these fragments and rotamers produce biologically relevant structures that improve upon current protein loop prediction methods. In conjunction with the fragment and rotamer research, we produced a novel visual analytics framework called DIVE, a Data Intensive Visualization Engine. This software has been instrumental in advancing our bioinformatics research, but it is a general- purpose framework applicable to a wide range of big data problems. 1 TABLE OF CONTENTS Chapter 1: Protein Structure Prediction .................................................................................... 7 1.1 Visual Analytics .............................................................................................................. 9 1.2 Software Engineering.................................................................................................... 10 1.3 Protein Fragments ......................................................................................................... 10 1.4 Amino Acid Side Chain Rotamers ................................................................................ 11 1.5 Continuing Research ..................................................................................................... 11 1.6 Conclusions ................................................................................................................... 12 Chapter 2: DIVE: A Graph-Based Visual Analytics Framework for Big Data .................... 14 2.1 Summary ....................................................................................................................... 14 2.2 Contributions................................................................................................................. 14 2.3 Introduction ................................................................................................................... 14 2.4 The DIVE Architecture ................................................................................................. 15 2.5 Object Parsing ............................................................................................................... 22 2.6 Scripting ........................................................................................................................ 24 2.7 Data Streaming.............................................................................................................. 25 2.8 Case Study .................................................................................................................... 26 2.9 Discussion ..................................................................................................................... 28 2.10 Conclusions ................................................................................................................... 30 Chapter 3: DIVE: A Data Intensive Visualization Engine ..................................................... 39 3.1 Summary ....................................................................................................................... 39 3.2 Contributions................................................................................................................. 39 3.3 Introduction ................................................................................................................... 39 3.4 System and Implementation .......................................................................................... 40 3.5 Results ........................................................................................................................... 41 3.6 Case Studies .................................................................................................................. 43 3.7 Conclusions ................................................................................................................... 45 Chapter 4: The Dynameomics API: An Application Programming Interface for Molecular Dynamics Simulations ................................................................................................................ 56 4.1 Summary ....................................................................................................................... 56 4.2 Introduction ................................................................................................................... 56 4.3 An Object Oriented Design for MD Simulations and Experimental Structures ........... 58 4.4 Analysis Libraries ......................................................................................................... 65 4.5 Structural Libraries ....................................................................................................... 65 4.6 Implementation Details ................................................................................................. 68 4.7 Conclusions ................................................................................................................... 69 Chapter 5: Dynameomics: Data-Driven Methods and Models for Utilizing Large-Scale Protein Structure Repositories for Improving Fragment-Based Loop Prediction ............... 74 5.1 Summary ....................................................................................................................... 74 2 5.2 Introduction ................................................................................................................... 75 5.3 Results ........................................................................................................................... 78 5.4 Discussion ..................................................................................................................... 82 5.5 Methods and Materials .................................................................................................. 85 Chapter 6: Dynameomics: Comparative Data-Driven Analysis of the Correlation Between Rotameric States and Backbone Conformational Propensities and Improved Rotamer Libraries..................................................................................................................................... 109 6.1 Summary ..................................................................................................................... 109 6.2 Introduction ................................................................................................................. 110 6.3 Results ......................................................................................................................... 113 6.4 Discussion ................................................................................................................... 117 6.5 Methods and Materials ................................................................................................ 119 Chapter 7: Related and Continuing Work ............................................................................. 133 7.1 Evaluation of Cross-Linking Distances using Dynameomics .................................... 133 7.2 Comparison of Native and Denatured State Protein Fold Space Coverage ................ 134 7.3 Transition State

Load more