Computational Analysis of Protein Sequence and Structure
Total Page:16
File Type:pdf, Size:1020Kb
Computational Analysis of Protein Sequence and Structure Robert Matthew MacCallum September 1997 Biomolecular Structure and Modelling Group Department of Biochemistry and Molecular Biology University College London Gower Street London WCIE 6BT A Thesis Submitted to the University of London for the Degree of Doctor of Philosophy in the Faculty of Science. ProQuest Number: 10055396 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. uest. ProQuest 10055396 Published by ProQuest LLC(2016). Copyright of the Dissertation is held by the Author. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code. Microform Edition © ProQuest LLC. ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 Abstract This project has combined the structural and functional analysis of proteins, protein structure prediction, and sequence analysis. A study of antibody-antigen interactions has been undertaken. We have anal ysed antigen-contacting residues and combining site shape in the antibody crystal structures available in the Protein Data Bank. Antigen-contacting propensities are presented for each antibody residue, allowing a new definition for the complemen tarity determining regions to be proposed based on observed antigen contacts. An objective means of classifying protein surfaces by gross topography has been devel oped and applied to the antibody combining site surfaces. The surfaces have been clustered into four topographic classes, confirming suggestions in the literature that antigen type might influence the shape of the whole combining site. The prediction of secondary structural class and architecture from sequence composition analysis has also been investigated. Modifications to a well estab lished geometric prediction algorithm have led to improvements in accuracy and the estimation of reliability. The hierarchical prediction of fold architectures using these methods is then discussed. To complement the ab initio approach of class and architecture prediction, a novel sequence alignment algorithm employing direct comparisons of predicted sec ondary structure and sequence-derived hydrophobicity was developed, and applied to fold recognition. The method, called SIVA, appears to perform well when suffi cient multiple sequence information is available, although further testing, including blind public testing, is required. Kohonen’s self-organising map was applied to protein structure and sequence information, with the hope that fold recognition using SIVA could be improved using essential sequence features extracted by this technique. This was not suc cessful, however the potential of the mapping approach has been illustrated, and a number of specific applications have been suggested. Abbreviations 3D,2D,ID Three-dimensional, etc. CASP Critical Assessment of Structure Prediction (meeting) CATH Structure classification (Orengo et al., 1997) CDR Complementarity determining region DNA Deoxyribonucleic acid DSC Secondary structure prediction algorithm (King & Sternberg, 1996) DSSP Dictionary of Protein Secondary Structure (Kabsch & Sander, 1983) GA Genetic algorithm GOR Gamier Osguthorpe Robson (secondary structure prediction method) NMR Nuclear magnetic resonance PDB Protein Data Bank PHD Secondary structure prediction algorithm (Rost &: Sander, 1994) RMS Root mean square RNA Ribonucleic acid SCOP Structural Classification of Proteins (Murzin et al., 1995) SIVA Sequence-derived Information Vector Alignment SSAP Structural alignment algorithm (Taylor & Orengo, 1989) SSCP Secondary Structural Content of Proteins (Eisenhaber et al., 1996a) SWISS-PROT Sequence database (Bairoch & Boeckmann, 1991) TIM Triose phosphate isomerase Acknowledgements Somehow I managed to escape from the nightmare of freezers, unlabelled test- tubes and radioactivity, and find refuge with the Thornton group. It was probably the best decision I ever made. Thank you Janet for letting me stay and for all the interesting work we’ve done and talked about. For friendship and invaluable help in and around the lab I am indebted to Andrew Martin, Rob Miller, Alex May and Roman Laskowski. Andrew helped me hone my amateur programming skills into what they are now...? I miss Rob’s company but hope we can meet up soon. Alex, thanks for providing the audio-visual light-entertainment and letting me waste your time... And Roman, thanks for all the pearls of wisdom and for being a top geezer. Comrades Maria and Chiara also gave friendship and support, good luck with your theses... For keeping the system up, thank you Martin, although I’ll never forgive you for deserting me for four weeks at the most critical time. I hope we can continue to blow each up over the net when I start my new job. Also on a virtual note, I am very grateful to David Jones for answering my regular and long-winded emails. For the CATH lists and other help I thank Christine Orengo and Alex Michie. On a personal note, I thank my family, Nico and her family, and my ‘home’ friends for their love, patience and support. Without Nico’s expert proof-reading much of this thesis would have without commas or verbs then and other strange nonsense. On a technical note I thank my trusty Mac and the lab PC which never let me down. This thesis was prepared using the document preparation system and was edited in Emacs mostly on a Linux PC. A lot of stuff was written in Perl. All this software is free and works perfectly (almost) every time; thanks to all the programmers who make this kind of thing possible. This is a non-profit making thesis. Contents Introduction 13 1.1 Protein structure and function .................................................................. 14 1.1.1 Primary structure............................................................................ 14 1.1.2 Secondary structure ......................................................................... 15 1.1.3 Tertiary stru ctu re............................................................................ 17 1.1.4 Quaternary structure — macromolecular assemblies .............. 18 1.1.5 Soluble, fibrous and membrane proteins ..................................... 19 1.1.6 Enzym es............................................................................................. 19 1.1.7 Electron transfer proteins ............................................................ 20 1.1.8 Proteins in signalling and recognition ........................................ 20 1.1.9 Mechanical proteins ......................................................................... 21 1.1.10 Regulation and control of protein fu n c tio n ............................... 21 1.2 Protein structure classification .................................................................. 22 1.2.1 The dynamic programming alignment algorithm ...................... 23 1.2.2 Structural a lig n m e n t ...................................................................... 25 1.2.3 Classification schemes ................................................................... 26 1.3 Molecular interactions of proteins ........................................................... 29 1.3.1 Protein surfaces and binding ......................................................... 29 1.3.2 Molecular surfaces ......................................................................... 31 1.3.3 D o c k in g ............................................................................................ 31 1.4 Prediction of protein structure and function ........................................ 32 1.4.1 Sequence methods ......................................................................... 32 1.4.2 Fold recognition ............................................................................... 35 1.4.3 Comparative m odelling ................................................................... 38 1.4.4 Ab initio m ethods ............................................................................ 38 1.4.5 CASP - Critical Assessment of Structure Prediction 45 1.5 Outline of thesis ............................................................................................ 46 Antibody-antigen interactions 47 2.1 Introduction .................................................................................................. 47 2.2 Methods and D a t a ..................................................................................... 48 2.2.1 Antibody structures and sequences ............................................ 48 2.2.2 Accessibility and b u ria l .................................................................. 49 Contents 6 2.2.3 Canonical loops and structural alignment .................................. 50 2.2.4 Surface an aly sis............................................................................... 50 2.3 R esults ........................................................................................................... 51 2.3.1 Contact A n a ly sis ............................................................................ 51 2.3.2 Antibody surface topography ..................................................... 59 2.3.3 Predicting antigen contacts ........................................................ 64 2.4 D iscussion ...................................................................................................