Comparison of Protein Structures by Transformation Into Dihedral Angle
Total Page:16
File Type:pdf, Size:1020Kb
Comparison of Protein Structures by Transformation into Dihedral Angle Sequences by Doug L Homan A Dissertation submitted to the faculty of The University of North Carolina at Chap el Hill in partial fulllment of the requirements for the degree of Do ctor of Philosophy in the Department of Computer Science Chap el Hill Approved by Ra j K Singh Advisor Bruce W Erickson Reader Jan F Prins Reader c Copyright Doug L Homan All rights reserved ii DOUG L HOFFMAN Comparison of Protein Structures by Transformation into Dihedral Angle Sequences Under the direction of Ra j K Singh ABSTRACT Proteins are large complex organic molecules that are essential to the existence of life Decades of study have revealed that proteins having dif ferent sequences of amino acids can p osses very similar threedimensional structures To date protein structure comparison metho ds have b een ac curate but costly in terms of computer time This dissertation presents a new metho d for comparing protein structures using dihedral transforma tions Atomic XYZ co ordinates are transformed into a sequence of dihe dral angles which is then transformed into a sequence of dihedral sectors Alignment of two sequences of dihedral sectors reveals similarities b etween the original protein structures Exp eriments have shown that this metho d detects structural similarities b etween sequences with less than amino acid sequence identity nding structural similarities that would not have b een detected using amino acid alignment techniques Comparisons can b e p erformed in seconds that had previously taken minutes or hours iii Contents List of Tables vii List of Figures ix List of Abbreviations xi Introduction and thesis Background What are proteins Comparison of protein structures Previous comparison metho ds Representing protein structure using dihedral angle descriptors Dihedral sequence comparison A characterization of the problem Simplication of the structural representation The dihedral transformation Calculation of the dihedral angle Calculation of the C p osition for glycine Dihedral sequence alignment The choice of sequence alignment algorithm Implementation in custom hardware Computational complexity iv Analysis of binication error Direct measurement of p ositional uncertainty Propagated error using partial derivatives Propagated error using interval arithmetic Impact of propagated error on bin size Classes of dihedral angle descriptors Mainchain dihedral angles Pendant dihedral angles Statistics of descriptor angle distributions Construction of the score table Relative information content scaling Statistical diusion of score values Impact of the mismatch score Exp erimental results Comparison of protein structure Dihedral sequence alignment vs D structure alignment Alignment of mbd and bab Alignment of cd and rhe Alignment of rcf and fxn Discussion Conclusions Future work A Relationship b etween the dihedral angles and o o A Generalized rotational tranformation matrix A Positions of the carb onyl carb on and oxygen atoms A C and O p ostitions in lo cal co ordinates i i A C and O p ostitions in lo cal co ordinates i1 i1 A Translated p ositions of the carb onyl carb on and oxygen atoms A Translated C and O p ostitions i i A Translated C and O p ostitions i1 i1 A Computation of the o o dihedral angle v B List of protein structures used Bibliography vi List of Tables Existing protein structure comparison metho ds Bond length statistics for cdaz Probability of correct bin membership Pendant dihedral angle descriptors Bin frequencies for the bb descriptors Bin frequencies for the b o descriptors Bin frequencies for the ob descriptors Bin frequencies for the o o descriptors Diusion co ecients based on probability of correct attribution a Numeric diusion co ecients give probability a Table of alignment scores based on the bb descriptor Table of alignment scores based on the bb descriptor Table of alignment scores based on the bb descriptor Table of alignment scores based on the bb descriptor Table of alignment scores based on the b o descriptor Table of alignment scores based on the b o descriptor Table of alignment scores based on the b o descriptor Table of alignment scores based on the b o descriptor Table of alignment scores based on the ob descriptor Table of alignment scores based on the ob descriptor Table of alignment scores based on the ob descriptor Table of alignment scores based on the ob descriptor Table of alignment scores based on the o o descriptor Table of alignment scores based on the o o descriptor vii Table of alignment scores based on the o o descriptor Table of alignment scores based on the o o descriptor Possible exp erimental outcomes Table of bb descriptor alignment scores for mbd vs bab Table of b o descriptor alignment scores for mbd vs bab Table of ob descriptor alignment scores for mbd vs bab Table of o o descriptor alignment scores for mbd vs bab Alignment statistics for cd vs rhe Alignment statistics for rcf vs fxn B List of protein structures used viii List of Figures Denition of the and dihedral angles Denition of the dihedral angle dened by four consecutive C atoms Denition of the o o dihedral angle The pro cess of structure comparison using dihedral sequences Diagonal path graph The BioSCAN algorithm in C The protein backbone showing the O and C p endant atoms The atoms dening the sixteen p endant dihedral angles Bin frequency for bb helix and sheet regions Bin frequency for b o helix and sheet regions Frequency plots for the bb descriptors Frequency plots for the b o descriptors Frequency plots for the ob descriptors Frequency plots for the o o descriptors Alignment of sp ermwhale myoglobin mbd and the b eta chain of human hemoglobin Thionville bab for the bb descriptors Alignment of sp ermwhale myoglobin mbd and the b eta chain of human hemoglobin Thionville bab for the bb descriptors continued Alignment of sp ermwhale myoglobin mbd and the b eta chain of human hemoglobin Thionville bab for the b o descriptors Alignment of sp ermwhale myoglobin mbd and the b eta chain of human hemoglobin Thionville bab for the b o descriptors continued Alignment of sp ermwhale myoglobin mbd and the b eta chain of human hemoglobin Thionville bab for the ob descriptors ix Alignment of sp ermwhale myoglobin mbd and the b eta chain of human hemoglobin Thionville bab for the ob descriptors continued Alignment of sp ermwhale myoglobin mbd and the b eta chain of human hemoglobin Thionville bab for the o o descriptors Alignment of sp ermwhale myoglobin mbd and the b eta chain of human hemoglobin Thionville bab for the o o descriptors continued D ribb on diagrams of the helical proteins myoglobin a and hemoglobin b eta chain b Dihedral sequence alignment of the human Tcell coreceptor CD cd with the immunoglobulin lambda chain of human BenceJones protein RHE rhe using the bb b o b o and b o descriptors Dihedral sequence alignment of the human Tcell coreceptor CD cd with the immunoglobulin lambda chain of human BenceJones protein RHE rhe using the o o o o and o o descriptors Alignment of the human Tcell coreceptor CD cd with the im munoglobulin lambda chain of human BenceJones protein RHE rhe D ribb on diagrams of the human Tcell coreceptora and the im munoglobulin lambda chain of human BenceJones protein RHE b Six alignments of dihedral sequences for the avodoxins rcf and fxn Alignment of sequences for the avodoxins from Anabaena rcf and Clostridium MP fxn Alignment of segments of the o o dihedral sequences of the avodoxins from Anabaena rcf and Clostridium MP fxn Three dihedral alignments for the avodoxins rcf and fxn using the o o descriptor D ribb on diagrams of the avodoxins rcf a and fxn b D ribb on diagrams of the avodoxins rcf a and fxn b x List of Abbreviations D Onedimensional D Threedimensional BioSCAN Biological Sequence Comparative Analysis No de BLOSUM Blo ck Substitution Matrix BNL Bro okhaven National Lab oratory DNA deoxyribonucleic acid FMN avin mononucleotide