Calculating the Structure-Based Phylogenetic Relationship
Total Page:16
File Type:pdf, Size:1020Kb
CALCULATING THE STRUCTURE-BASED PHYLOGENETIC RELATIONSHIP OF DISTANTLY RELATED HOMOLOGOUS PROTEINS UTILIZING MAXIMUM LIKELIHOOD STRUCTURAL ALIGNMENT COMBINATORICS AND A NOVEL STRUCTURAL MOLECULAR CLOCK HYPOTHESIS A DISSERTATION IN Molecular Biology and Biochemistry and Cell Biology and Biophysics Presented to the Faculty of the University of Missouri-Kansas City in partial fulfillment of the requirements for the degree Doctor of Philosophy by SCOTT GARRETT FOY B.S., Southwest Baptist University, 2005 B.A., Truman State University, 2007 M.S., University of Missouri-Kansas City, 2009 Kansas City, Missouri 2013 © 2013 SCOTT GARRETT FOY ALL RIGHTS RESERVED CALCULATING THE STRUCTURE-BASED PHYLOGENETIC RELATIONSHIP OF DISTANTLY RELATED HOMOLOGOUS PROTEINS UTILIZING MAXIMUM LIKELIHOOD STRUCTURAL ALIGNMENT COMBINATORICS AND A NOVEL STRUCTURAL MOLECULAR CLOCK HYPOTHESIS Scott Garrett Foy, Candidate for the Doctor of Philosophy Degree University of Missouri-Kansas City, 2013 ABSTRACT Dendrograms establish the evolutionary relationships and homology of species, proteins, or genes. Homology modeling, ligand binding, and pharmaceutical testing all depend upon the homology ascertained by dendrograms. Regardless of the specific algorithm, all dendrograms that ascertain protein evolutionary homology are generated utilizing polypeptide sequences. However, because protein structures superiorly conserve homology and contain more biochemical information than their associated protein sequences, I hypothesize that utilizing the structure of a protein instead of its sequence will generate a superior dendrogram. Generating a dendrogram utilizing protein structure requires a unique methodology and novel bioinformatic programs to implement this methodology. Contained within this dissertation is an original methodology that permits the aforementioned structure-based iii dendrogram generation hypothesis. Additionally, I have scripted three novel bioinformatics programs required by this proposed methodology: a protein structure alignment program that proficiently superimposes distant homologs, an accurate structure-dependent sequence alignment program, and a dendrogram generation program that employs a novel structural molecular clock hypothesis. The results from this methodology support the proposed hypothesis by demonstrating that generating dendrograms utilizing protein structures is superior to those generated utilizing exclusively protein sequences. iv APPROVAL PAGE The faculty listed below, appointed by the Dean of the School of Graduate Studies, have examined a dissertation titled “Calculating the Structure-based Phylogenetic Relationship of Distantly Related Homologous Proteins Utilizing Maximum Likelihood Structural Alignment Combinatorics and a Novel Structural Molecular Clock Hypothesis”, presented by Scott G. Foy, candidate for the Doctor of Philosophy degree, and certify that in their opinion it is worthy of acceptance. Supervisory Committee Gerald Wyckoff, Ph.D., Committee Chair Department of Molecular Biology and Biochemistry Samuel Bouyain, Ph.D. Department of Molecular Biology and Biochemistry John Laity, Ph.D. Department of Cell Biology and Biophysics Thomas Menees, Ph.D. Department of Cell Biology and Biophysics Jakob Waterborg, Ph.D. Department of Cell Biology and Biophysics v CONTENTS ABSTRACT ........................................................................................................................ iii LIST OF ILLUSTRATIONS ................................................................................................. x LIST OF TABLES ..............................................................................................................xii ACKNOWLEDGEMENTS ............................................................................................... xiii Chapter I. INTRODUCTION ..................................................................................................... 1 The Molecular Clock and Dendrogram Generation ......................................... 4 Implementation of Sequence and Structure in Dendrogram Generation ................................................................................. 5 Methodological and Algorithmic Overview .................................................... 7 Methodological Overview ................................................................... 7 Novel Algorithms ............................................................................... 8 II. SABLE: STRUCTURAL ALIGNMENT BY MAXIMUM LIKELIHOOD ESTIMATION ......................................................................................................... 16 Introduction to Structural Alignment and Superpositioning .......................... 16 Structural Alignment and Superpositioning Algorithms .................... 17 Superpositioning by Structural Alignment ........................................ 20 SABLE Algorithm ....................................................................................... 22 Maximum Likelihood Structural Alignment ...................................... 22 Protein Geometry Terminology ......................................................... 23 Superimposing Measurement of SABLE ........................................... 24 Subunit Homology ............................................................................ 29 Reducing Infinite Pseudostates ......................................................... 30 vi SABLE: Phase 1 ............................................................................... 31 SABLE: Phase 2 ............................................................................... 33 SABLE: Phase 3 ............................................................................... 34 SABLE Accuracy Program ............................................................... 35 SABLE Results ............................................................................................ 36 Measuring Structural Superimposition .............................................. 37 Monomeric Pairwise Comparison ..................................................... 39 Multisubunit Pairwise Comparison ................................................... 49 Multiple Alignment Comparison ....................................................... 50 Discussion of SABLE Results ...................................................................... 56 III. UNITS: UNIVERSAL TRUE SDSA (STRUCTURE-DEPENDENT SEQUENCE ALIGNMENT) ........................................................................................................ 57 Introduction to Structure-dependent Sequence Alignments ........................... 57 Current SDSA Limitations ................................................................ 58 The UniTS Solution .......................................................................... 60 UniTS Algorithm ......................................................................................... 62 Pairwise SDSA ................................................................................. 62 The Grid ........................................................................................... 66 Residue Determination...................................................................... 67 Multiple SDSA ................................................................................. 68 Multiple SDSA Quality Assessment: Mean Standard Deviation ........................................................................... 70 UniTS Results .............................................................................................. 71 UniTS Compared to Theseus ............................................................ 72 vii UniTS Compared to DALI ................................................................ 74 UniTS Compared to Chimera ............................................................ 75 Multiple PL/PG SDSA ...................................................................... 77 Discussion of UniTS Results ........................................................................ 81 IV. PUSH: PHYLOGENETIC TREE USING STRUCTURAL HOMOLOGY .............. 82 Structural Molecular Clock Hypothesis ........................................................ 83 The Application of the Sequence-based Molecular Clock Hypothesis to Protein Structure............................................... 83 Proposal of a New Evolutionary Mechanism for Use as a Molecular Clock ............................................................ 85 Protein Structural and Function Evolution......................................... 87 Correlating Protein Structural Divergence and Evolutionary Selection Pressure ................................................. 88 A Novel Structural Molecular Clock Hypothesis ............................... 90 Structural Molecular Clock Hypothesis Discussion ........................... 92 PUSH Algorithm .......................................................................................... 93 Derivation of the Evolutionary Distance Matrix ................................ 93 Hierarchical Clustering ..................................................................... 94 PUSH Results .............................................................................................. 97 Quad PG-PL Dendrograms ............................................................... 97 Penta-PG Dendrograms .................................................................. 101 V. PUSH DISCUSSION AND GENERAL CONCLUSION ....................................... 105 Appendix