STACK: a Toolkit for Analysing Β-Helix Proteins

STACK: a toolkit for analysing ¯-helix proteins Master of Science Thesis (20 points) Salvatore Cappadona, Lars Diestelhorst Abstract ¯-helix proteins contain a solenoid fold consisting of repeated coils forming parallel ¯-sheets. Our goal is to formalise the intuitive notion of a ¯-helix in an objective algorithm. Our approach is based on first identifying residues stacks — linear spatial arrangements of residues with similar conformations — and then combining these elementary patterns to form ¯-coils and ¯-helices. Our algorithm has been implemented within STACK, a toolkit for analyzing ¯-helix proteins. STACK distinguishes aromatic, aliphatic and amidic stacks such as the asparagine ladder. Geometrical features are computed and stored in a relational database. These features include the axis of the ¯-helix, the stacks, the cross-sectional shape, the area of the coils and related packing information. An interface between STACK and a molecular visualisation program enables structural features to be highlighted automatically. i Contents 1 Introduction 1 2 Biological Background 2 2.1 Basic Concepts of Protein Structure ....................... 2 2.2 Secondary Structure ................................ 2 2.3 The ¯-Helix Fold .................................. 3 3 Parallel ¯-Helices 6 3.1 Introduction ..................................... 6 3.2 Nomenclature .................................... 6 3.2.1 Parallel ¯-Helix and its ¯-Sheets ..................... 6 3.2.2 Stacks ................................... 8 3.2.3 Coils ..................................... 8 3.2.4 The Core Region .............................. 8 3.3 Description of Known Structures ......................... 8 3.3.1 Helix Handedness .............................. 8 3.3.2 Right-Handed Parallel ¯-Helices ..................... 13 3.3.3 Left-Handed Parallel ¯-Helices ...................... 19 3.4 Amyloidosis .................................... 20 4 The STACK Toolkit 24 4.1 Identification of Structural Elements ....................... 24 4.1.1 Stacks .................................... 24 4.1.2 ¯-coils and ¯-Helices ............................ 26 4.1.3 The Core Residues of ¯-Helices ...................... 28 4.2 Geometrical Analysis of Structural Elements . 28 4.2.1 Axis of ¯-Helices .............................. 28 4.2.2 Shape of ¯-Coils .............................. 29 4.2.3 The Pitch and twist of ¯-coils ....................... 32 4.2.4 Area of ¯-Coils ............................... 33 4.2.5 Orientation of Side Chains ........................ 34 4.2.6 Packing of ¯-Coils ............................. 35 5 Results 39 6 Conclusions 41 i A User and Installation Manual 42 A.1 Readership ..................................... 42 A.2 Installation ..................................... 42 A.2.1 Requirements ................................ 42 A.2.2 Installation Steps .............................. 43 A.3 Configuration .................................... 43 A.3.1 GENERAL Properties .......................... 43 A.3.2 STORE Properties ............................. 44 A.3.3 OPERATION Properties ......................... 45 A.4 Starting and Using STACK ............................ 47 A.4.1 The activate-Command .......................... 47 A.4.2 The calculate-Command .......................... 47 A.4.3 The color-Command ............................ 48 A.4.4 The deactivate-Command ......................... 48 A.4.5 The exit-Command ............................. 48 A.4.6 The help-Command ............................ 48 A.4.7 The identify-Command ........................... 48 A.4.8 The import-Command ........................... 48 A.4.9 The list-Command ............................. 48 A.4.10 The visualize-Command .......................... 49 B Maintenance Manual 50 B.1 Readership ..................................... 50 B.2 Development Environment ............................. 50 B.2.1 Tools and Software Components ..................... 50 B.3 STACK Architecture ................................ 50 B.3.1 Files and Directory Structure ...................... 51 B.3.2 Package Overview ............................. 51 B.3.3 The geom and util Package ........................ 51 B.3.4 The store Package ............................. 53 B.3.5 The structure Package ........................... 53 B.3.6 The operations Package .......................... 56 ii List of Figures 2.1 Primary, secondary, tertiary and quaternary structure of proteins . 3 2.2 Secondary structure ................................ 4 2.3 Two types of ¯-sheet structures .......................... 4 2.4 Cartoon representation of a protein chain with a right-handed ¯-helix fold . 5 3.1 Nomenclature of ¯-sheets in right-handed ¯-helical proteins .......... 7 3.2 Packing of successive coils in PelC ........................ 9 3.3 Core of a ¯-helical protein ............................. 9 3.4 The right-hand rule ................................ 11 3.5 Left-handed and right-handed beta-helices have a different cross-section . 12 3.6 Chiral packing of isoleucines at the centre of LpxA . 13 3.7 Schematic of a typical coil of the pectate lyase family and of a resulting helix 14 3.8 N-terminal end and T3 loop of BsPel ....................... 15 3.9 Ribbon illustration of TmAFP .......................... 18 3.10 Lattice matching/occupation model for TmAFP binding to ice . 18 3.11 Typical left-handed parallel ¯-helix ....................... 19 3.12 The putative ice-binding site of SbwAFP .................... 21 3.13 ¯-helical models of amyloids ............................ 22 3.14 The ®-helical cap at the N-terminal of a right-handed parallel ¯-helix . 23 4.1 Stacking of residues ................................ 25 4.2 An aromatic stack of pectate lyase from Bacillus subtilis . 26 4.3 ¯-coils, repetition of stacks in the residue sequence . 27 4.4 Highlighted ¯-coils of pectate lyase from Bacillus sp. 27 4.5 “Gap-filling” to extend the core .......................... 28 4.6 The core of Rgase A from Aspergillus aculeatus . 29 4.7 Dependencies between the algorithms implemented in STACK . 30 4.8 Axis of PelC from Erwinia chrysanthemi ..................... 31 4.9 C®-trace of a ¯-coil as the basis for shape approximation and the parameterized vector notation of the curve ............................ 32 4.10 Shape of a ¯-coil .................................. 33 4.11 Pitch and twist of ¯-coils ............................. 34 4.12 Area of a ¯-coil ................................... 35 4.13 Orientation of side chains based on intersection count . 36 4.14 Orientation of side chains based on scalar product . 37 4.15 The packing of ¯-coils ............................... 38 iii B.1 STACK package diagram ............................. 52 B.2 UML class diagram of the store .......................... 54 B.3 Query related class diagram ............................ 54 B.4 Composition of query objects ........................... 55 B.5 Class hierarchy modeling ¯-helical proteins ................... 55 B.6 Using a factory-method to create a operation . 56 iv >2GJLK:_ ACKNOWLEDGEMENT .........SYNYIHGVKKVGLDGSSSSDTGRNITYH HNYYNDVNARLPLQRGGLVHAYNNLYTNITGSGLNVR QNGQALITHENWFEKAINPVTSRYDGKNFGTWVLKGN NIKDFSTYTWTADTKPYVNADSWTSTGTFPTVAYNYS KLPYAGVGGPVSAQCVK THANK IYOU IGRAHAM . v Chapter 1 Introduction ¯-helix proteins contain a solenoid domain of parallel ¯-strands folded into a large prism. The repeated unit of the solenoid is called a ¯-coil and consists of a succession of a few (usually three) ¯-strands. ¯-strands from adjacent coils are stacking to form parallel ¯-sheets that make up the faces of the prism. These faces are linked by loop regions that protrude from the helix, and in many cases, form the binding site of the helix. The cross section of this prism is typically L-shaped in right-handed parallel ¯-helices and triangular in left-handed parallel ¯-helices. The stability of the domain is mainly obtained by the stacking of similar residues at equiv- alent positions of the coils, both inside and outside the helix. The inward side chains are mainly hydrophobic and, when not, maximal hydrogen bonding or electrostatic interactions neutralise their polar or charged groups. Our goal is to formalise the intuitive notion of a ¯-helix in a set of objective algorithms that define few basic features of these proteins. Consistent with the literature, we identified stacks, ¯-coils, core and ¯-helix as essential attributes to be determined (the core is defined as the helical domain of the protein, as distinguished from the protruding loop regions). Our algorithms have been implemented within STACK, a toolkit for analysing ¯-helix proteins. STACK first identifies aromatic, aliphatic and amidic stacks and then combines these elementary patterns to form ¯-coils, core, and ¯-helices. Once defined, geometrical features are computed and stored in a relational database. These features include the axis of the ¯-helix, the cross-sectional shape, the area of the coils and related packing information. STACK is implemented in Java and runs on Unix and Windows. An interface between STACK and RasMol enables structural features to be highlighted automatically. After a small introduction on protein structures given in Chapter 2, in Chapter 3 we discuss in detail all the geometrical features of ¯-helices and we present a model, based on the parallel ¯-helix, which is the base of recent speculation on the molecular make-up of amyloids. The algorithms devised and implemented in our work are described in Chapter 4. Some results are then given in Chapter 5 and conclusions are

STACK: a Toolkit for Analysing Β-Helix Proteins

A Gene Co-Expression Network Model Identifies Yield-Related Vicinity

Non-Cellulosomal Cohesin- and Dockerin-Like Modules in the Three Domains of Life Ayelet Peera, Steven P

Eukaryotic Genome Annotation

Calculating the Structure-Based Phylogenetic Relationship

And Beta-Helical Protein Motifs

Introduction to the Local Theory of Curves and Surfaces

Chern-Simons-Higgs Model As a Theory of Protein Molecules

Computational Genomics

Prediction of Certain Well-Characterized Domains of Known Functions Within the PE and PPE Proteins of Mycobacteria

Cyclic GMP–AMP Signalling Protects Bacteria Against Viral Infection

Protein Vibrational Spectrum Calculations Using Dielectric Coupling in Carbonyls

Structures of the Glucocorticoid-Bound Adhesion Receptor GPR97–Go Complex