Title of Dissertation Goes Here in All Caps
Total Page:16
File Type:pdf, Size:1020Kb
ALGORITHMIC DEVELOPMENTS FOR SEQUENCE ANALYSIS, STRUCTURE MODELING AND FUNCTIONAL PREDICTION OF PROTEINS APPROVED BY SUPERVISORY COMMITTEE Nick V. Grishin, Ph.D.; Advisor Alexander Pertsemlidis, Ph.D.; Committee Chair Stephen R. Sprang, Ph.D. Zbyszek Otwinowski, Ph.D. I would like to thank the members of my Graduate Committee and the Graduate school. To my daughter Erica, my husband Yi, my mother, father and brother. ii ALGORITHMIC DEVELOPMENTS FOR SEQUENCE ANALYSIS, STRUCTURE MODELING AND FUNCTIONAL PREDICTION OF PROTEINS by YUAN QI DISSERTATION Presented to the Faculty of the Graduate School of Biomedical Sciences The University of Texas Southwestern Medical Center at Dallas In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY The University of Texas Southwestern Medical Center at Dallas Dallas, Texas November 2006 Copyright by Yuan Qi, 2006 All Rights Reserved iv ALGORITHMIC DEVELOPMENTS FOR SEQUENCE ANALYSIS, STRUCTURE MODELING AND FUNCTIONAL PREDICTION OF PROTEINS Publication No. Yuan Qi, Ph.D. The University of Texas Southwestern Medical Center at Dallas, 2006 Supervising Professor: Nick V. Grishin, Ph.D. Sequence, structure and function, being the three most important properties of proteins, are interrelated through homology relationships. In this post-genome era, we are equipped with abundant sequence information. Homology inference is thus of great practical importance because of its ability to make structural and functional predictions through sequence analysis. In an effort to explore and utilize the protein sequence-structure-function relationships, with homology detection and utilization as the central scheme, this work concentrates on algorithmic development of methods and systems for sequence similarity search, structure modeling and functional prediction purposes, as well as performs structure prediction and classification for specific protein families. v Three algorithmic developments are described in this dissertation. First, to facilitate identification of structurally or functionally important interactions between positions in a protein family, a program has been developed to perform positional correlation analysis of multiple sequence alignments using different methods. The program has been shown to be useful to identify functionally important position pairs or networks of correlated positions. Second, to further increase the sensitivity of sequence similarity search methods in terms of homology detection and structure modeling ability, a method has been developed by incorporating predicted secondary structure information with sequence profiles. Evaluation on PFAM-based system shows that this method provides improved structure template detection ability and generates alignment of better quality. Third, in order to systematically assess the structure modeling abilities of different sequence similarity search programs, a comprehensive evaluation system has been developed. This large-scale automatic evaluation system assesses the fold recognition ability and alignment quality of different programs from global and local perspectives using both reference-dependent and reference-independent approaches, which provides an instrument to understand the progress and limitations of the field. Two structure prediction and classification projects using manual analysis and existing tools are also described in this dissertation. First, the structure of C-terminal domain of Gyrase A is predicted through inferred homology relationship with regulator of chromosome condensation (RCC1). This prediction has been validated by experimental data. Second, a hierarchical structure classification of thioredoxin-like fold proteins has been carried out, which promotes understanding of fold definitions and sequence-structure- function relationships. vi TABLE OF CONTENTS Committee Signatures ............................................................................................................. i Dedication ...............................................................................................................................ii Title Page ...............................................................................................................................iii Copyright ...............................................................................................................................iv Abstract ...................................................................................................................................v Table of Contents ..................................................................................................................vii Prior Publications .................................................................................................................xiv List of Figures .......................................................................................................................xv List of Tables .....................................................................................................................xviii CHAPTER 1: General Introduction........................................................................................ 1 1.1 Homology Interrelates Protein Sequence, Structure And Function................................ 1 1.1.1 Homology Detection And Sequence Analysis......................................................... 1 1.1.2 Structure Modeling .................................................................................................. 2 1.1.3 Functional Prediction............................................................................................... 4 1.1.4 Structure Classification............................................................................................ 5 1.2 Overview of dissertation work........................................................................................ 6 CHAPTER 2: Structure Prediction of C-Terminal Domain of Gyrase A............................... 8 2.1 Introduction..................................................................................................................... 8 2.1.1 Background.............................................................................................................. 8 2.1.2 Objective.................................................................................................................. 9 2.2 Materials and Methods.................................................................................................. 10 vii 2.2.1 Sequence Similarity Searches................................................................................ 10 2.2.2 Multiple Sequence Alignment And Hydrophobicity Analysis .............................. 10 2.2.3 Secondary Structure Prediction And Threading .................................................... 11 2.3 Results........................................................................................................................... 12 2.3.1 PSI-BLAST Searches............................................................................................. 12 2.3.2 Secondary Structure Predictions And Fold Recognition ....................................... 13 2.3.3 Multiple Sequence Alignment ............................................................................... 14 2.4 Discussion..................................................................................................................... 15 2.4.1 Validity Of The Fold Prediction ............................................................................ 15 2.4.2 Structural Differences Between Gyra/Parc And RCC1......................................... 16 2.4.3 Functional Implications ......................................................................................... 17 2.4.4 Prediction Confirmation......................................................................................... 18 2.5 Conclusions................................................................................................................... 18 CHAPTER 3: Structural Classification of Thioredoxin-like Fold Proteins.......................... 27 3.1 Introduction................................................................................................................... 27 3.1.1 Background............................................................................................................ 27 3.1.2 Objective................................................................................................................ 28 3.2 Materials and Methods.................................................................................................. 29 3.2.1 Structural Motif Search For Thioredoxin-Like Protein Domains.......................... 29 3.2.2 Sequence-Based Classification Of The Thioredoxin-Like Protein Domains ........ 30 3.2.3 Structure And Function Based Classifications ...................................................... 31 3.3 Overall Fold Description............................................................................................... 31 viii 3.3.1 Thioredoxin-like fold............................................................................................. 31 3.3.2 Circular Permutations ............................................................................................ 33 3.4 Description Of Thioredoxin-Like Fold Families .......................................................... 34 3.4.1 Thioredoxin family ...............................................................................................