Toward a Database of Geometric Interrelationships of Protein Secondary Structure Elements for De Novo Protein Design, Prediction and Analysis

University of New Orleans ScholarWorks@UNO University of New Orleans Theses and Dissertations Dissertations and Theses 12-17-2010 Toward a Database of Geometric Interrelationships of Protein Secondary Structure Elements for De Novo Protein Design, Prediction and Analysis Augustine Ada Orgah University of New Orleans Follow this and additional works at: https://scholarworks.uno.edu/td Recommended Citation Orgah, Augustine Ada, "Toward a Database of Geometric Interrelationships of Protein Secondary Structure Elements for De Novo Protein Design, Prediction and Analysis" (2010). University of New Orleans Theses and Dissertations. 100. https://scholarworks.uno.edu/td/100 This Thesis-Restricted is protected by copyright and/or related rights. It has been brought to you by ScholarWorks@UNO with permission from the rights-holder(s). You are free to use this Thesis-Restricted in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/or on the work itself. This Thesis-Restricted has been accepted for inclusion in University of New Orleans Theses and Dissertations by an authorized administrator of ScholarWorks@UNO. For more information, please contact [email protected]. Toward a Database of Geometric Interrelationships of Protein Secondary Structure Elements for De Novo Protein Design, Prediction and Analysis A Thesis Submitted to the Graduate Faculty of the University of New Orleans in partial fulfillment of the Requirements for the degree of Master of Science in Computer Science Bioinformatics by Augustine Ada Orgah B.S. Xavier University of Louisiana, 2008 December, 2010 Copyright 2010, Augustine Ada Orgah ii Table of Contents List of Figures....................................................................................................... v List of Tables....................................................................................................... vi Abstract.............................................................................................................. vii Chapter 1: Introduction......................................................................................... 1 1.1: About Proteins.................................................................................... 1 1.2: Protein Structure & Hierarchy ............................................................ 4 1.2.1: Protein Primary Structure...................................................... 4 1.2.2: Protein Secondary Structure................................................. 4 1.2.3: Protein Tertiary Structure...................................................... 6 1.2.4: Protein Quaternary Structure ................................................ 6 1.3: Protein Folding ................................................................................... 7 1.3: Protein Structure Prediction................................................................ 7 1.4: De Novo Protein Design .................................................................... 9 1.5: Defining Terms ................................................................................. 10 1.5.1: PDB .................................................................................... 10 1.5.2: DSSP.................................................................................. 11 1.5.3: ProtCAD.............................................................................. 12 1.5.4: MySql.................................................................................. 12 1.5.5: MySql++.............................................................................. 13 1.5.6: CATH ................................................................................. 13 Chapter 2: Project Overview .............................................................................. 15 Chapter 3: Database of Secondary Structure Elements (SSE) ......................... 18 3.1: Introduction ...................................................................................... 18 3.2: Generation of Secondary Structure Elements Dataset .................... 18 3.3: Methods/Tools ................................................................................. 19 3.4: Implementation of MySql database .................................................. 20 3.5: Results.............................................................................................. 22 3.6: Discussion/Conclusion...................................................................... 23 Chapter 4: Database of Geometric Information of Secondary Structure Elements (SSE) ........................................................................................................................... 25 4.1: Introduction ...................................................................................... 25 4.2: Generation of Geometric Dataset of Secondary Structure Elements (Helices) ................................................................................................................ 27 4.3: Generation of Non Redundant Geometric Dataset of Secondary Structure Elements (Helices) ................................................................................. 29 4.4: Generation of CATH Geometric Dataset of Secondary Structure Elements (Helices) ................................................................................................. 30 4.5: Generation of Membrane Proteins Geometric Dataset of Secondary Structure Elements (Helices) .................................................................. 30 4.6: Generation of Soluble Proteins Geometric Dataset of Secondary Structure Elements (Helices) ................................................................................. 30 4.7: Methods/Tools ................................................................................. 31 4.8: Implementation of MySql database ................................................. 31 4.9: Results ............................................................................................. 34 4.10: Discussion/Conclusion ................................................................... 40 iii Chapter 5: Overall Results ................................................................................ 43 Chapter 6: Overall Conclusions & Future Work ................................................. 48 6.1: Future Work ..................................................................................... 48 6.2: Overall Conclusions ......................................................................... 48 References: ....................................................................................................... 50 Vita .................................................................................................................... 53 iv List of Figures Figure 1: The Basic Structure of an Amino Acid................................................... 2 Figure 2: The Structures of the 20 Amino Acids................................................... 3 Figure 3: The Primary Structure of a Protein........................................................ 4 Figure 4: Helix Secondary Structure of a Protein ................................................. 5 Figure 5: Beta Sheet Secondary Structure of a Protein........................................ 5 Figure 6: The Tertiary Structure of a Protein ........................................................ 6 Figure 7: The Quaternary Structure of a Protein .................................................. 7 Figure 8: Flowchart for Secondary Structure Table............................................ 16 Figure 9: Flowchart for Geometric Data Tables.................................................. 17 Figure 10: Attributes of Database Table SecStructure ...................................... 18 Figure 11: Xgrid Architecture.............................................................................. 21 Figure 12: Sample DSSP File for Protein 200L ................................................. 23 Figure 13: Attributes of Helix Interaction Database Tables ............................... 26 Figure 14: Helix-Helix Interactions .................................................................... 29 Figure 15: Sample Geometric File for Protein 1ROP ......................................... 33 Figure 16: Helix Interactions SecStrucGeoData ................................................ 35 Figure 17: Helix Interactions CathGeoData ....................................................... 36 Figure 18: Helix Interactions NonRedundantGeoData ...................................... 36 Figure 19: Helix Interactions MembraneGeoData .............................................. 37 Figure 20: Helix Interactions SolubleGeoData .................................................. 37 Figure 21: Helix Interactions (Chothia Model) SecStrucGeoData ...................... 38 Figure 22: Helix Interactions (Chothia Model) CathGeoData ............................ 38 Figure 23: Helix Interactions (Chothia Model) NonRedundantGeoData............. 39 Figure 24: Helix Interactions (Chothia Model) MembraneGeoData.................... 39 Figure 25: Helix Interactions (Chothia Model) SolubleGeoData......................... 40 Figure 26: Entity Relationship Diagram for All Tables ....................................... 43 Figure 27: Results from Query 1 .......................................................................

Toward a Database of Geometric Interrelationships of Protein Secondary Structure Elements for De Novo Protein Design, Prediction and Analysis

Biointerface Research in Applied Chemistry

MYSTERIES of the EQUILATERAL TRIANGLE, First Published 2010

Tertiary Alphabet for the Observable Protein Structural Universe”

Functional Evaluation of the Π-Helix in the NAD(P)

From Sequence to Structure

Protocol Optimization of Molecular Dynamics Simulations of Apolipoproteins

Exploring the Functional Roles of Π-Helices and Flavin-Induced Oligomeric State Changes in Flavin Reductases of Two-Component Monooxygenase Systems

PYTHIA: Deep Learning Approach for Local Protein Conformation Prediction

Protein Secondary Structure Prediction by Fuzzy Min Max Neural Network with Compensatory Neurons

Structural and Functional Annotation of Uncharacterized Protein NCGM946K2 146 of Mycobacterium Tuberculosis: an In-Silico Approach †

Protein Folding Prediction with Genetic Algorithms

Local Protein Structures Bernard Offmann, Manoj Tyagi, Alexandre De Brevern