A Bioinformatic Tool for Analysing the Structures of Protein Complexes by Means of Mass Spectrometry of Cross-Linked Proteins

A BIOINFORMATIC TOOL FOR ANALYSING THE STRUCTURES OF PROTEIN COMPLEXES BY MEANS OF MASS SPECTROMETRY OF CROSS-LINKED PROTEINS by Shannon L.N. Mayne Supervisor: Prof. Hugh -G. Patterton A Dissertation in fulfilment of a Masters of Science degree in Biochemistry University of the Free State 2013 DECLARATION I declare that the dissertation hereby submitted for the Magister Scientiae degree at the University of the Free State through the Faculty of Natural and Agricultural Sciences is my own work and has not been previously submitted by me at another University for any degree. I cede copyright of this dissertation in favour of the University of the Free State. ______________________ Shannon Leon Noël Mayne January 2013 ACKNOWLEDGEMENTS Thanks to staff and students at the University of the Free State, in particular: Pankaj Sharma and Gabre Kemp for experimental assistance, as well as Leon du Preez and the UFS ICT Services staff for assistance with the server access and settings. Special thanks are extended to Professor Hugh Patterton for invaluable input, guidance and indefatigable patience. Heartfelt gratitude and appreciation go to my family and closest friends for their unstinting support and understanding throughout. A postgraduate bursary from the former National Bioinformatics Network (NBN) is also gratefully acknowledged. i TABLE OF CONTENTS DECLARATION ............................................................................................................... I ACKNOWLEDGEMENTS ............................................................................................... I TABLE OF CONTENTS ................................................................................................. II LIST OF FIGURES ..................................................................................................... VIII LIST OF TABLES ....................................................................................................... XIII LIST OF EQUATIONS ................................................................................................ XIII CHAPTER 1 LITERATURE REVIEW: BIOINFORMATICS TOOLS FOR THE STRUCTURAL ELUCIDATION OF MULTI-SUBUNIT PROTEIN COMPLEXES BY MASS SPECTROMETRIC ANALYSIS OF PROTEIN-PROTEIN CROSS-LINKS. .............. 1 1.1. I NTRODUCTION .................................................................................................. 1 1.2. MS3D DATA ANALYSIS ........................................................................................ 2 1.2.1 Detection of cross-linked peptides ............................................................ 3 1.2.1.1 Non-cross-linked controls .................................................................. 6 1.2.1.2 Isotope labelling ................................................................................. 6 1.2.1.3 Post-fragmentation reporter ions ....................................................... 7 1.2.2 Matching peaks to a library of possible peptide dimers ............................ 7 1.2.3 Generating the library of peptide dimers ................................................... 8 1.2.4 Matching experimental peaks to the theoretical library ............................. 10 1.2.5 Identification of the cross-linked residues in the di-peptide ...................... 11 1.2.5.1 Generating an MS/MS fragment library ............................................. 11 1.2.5.2 Matching MS/MS spectra .................................................................. 12 1.2.5.3 Non-probabilistic scoring.................................................................... 13 1.2.5.4 Probabilistic scoring ........................................................................... 14 1.2.6 Structure modelling ................................................................................... 14 1.2.7 Data input and output ................................................................................ 15 1.2.8 Software release ....................................................................................... 15 1.3. D ISCUSSION ...................................................................................................... 17 1.4. R EFERENCES .................................................................................................... 19 ii CHAPTER 2 THE DEVELOPMENT OF ANCHORMS, A BIOINFORMATICS TOOL TO ELUCIDATE THE STRUCTURE OF PROTEIN COMPLEXES BY ANALYSIS OF MASS SPECTRA OF CHEMICALLY CROSS-LINKED COMPLEXES. ................... 31 2.1. I NTRODUCTION .................................................................................................. 31 2.2.W ORKFLOW AND ORGANISATION .......................................................................... 35 2.3. P ARSING AND DATA FORMATS .............................................................................. 40 2.3.1 Data formats commonly used for MS data ............................................... 40 2.3.2 Data file formats supported by AnchorMS ................................................ 41 2.3.3 Parsing of data files in AnchorMS and module organization ..................... 42 2.3.4 Experimental mass spectrum quality ........................................................ 44 2.4. L IBRARY CONSTRUCTION .................................................................................... 44 2.4.1 Overview of library size ............................................................................. 44 2.4.2 Restriction of library size ........................................................................... 45 2.4.3 Implementation and data structures .......................................................... 45 2.5 P RECURSOR LIBRARY CONSTRUCTION (MS 1) ....................................................... 46 2.5.1 Digestion ................................................................................................... 46 2.5.1.1 Protease cleavage model .................................................................. 46 2.5.1.2 Module organisation .......................................................................... 47 2.5.1.3 Parsing and parameters .................................................................... 48 2.5.1.4 Algorithm and workflow ...................................................................... 49 2.5.2 Cross-linking.............................................................................................. 50 2.5.2.1 Cross-linking structural information ................................................... 50 2.5.2.2 Module organization .......................................................................... 52 2.5.2.3 Algorithm and workflow ..................................................................... 52 2.5.2.4 Data structures and parsing .............................................................. 53 2.5.2.5 Cross-linking within library construction ............................................ 54 2.5.3 Modifications ............................................................................................. 54 2.5.3.1 Module organisation .......................................................................... 55 2.5.3.2 Combinatorial space and library size restriction ................................ 56 2.5.3.2.1 Fixed and variable modifications ................................................ 56 2.5.3.2.2 Combinatorial space expansion ................................................. 56 2.5.3.2.3 Combinatorial space constraint .................................................. 56 2.5.3.3 Algorithm and data structures ........................................................... 57 2.5.3.3.1 Possible modifications ............................................................... 57 iii 2.5.3.3.2 Permutations of modification states ........................................... 57 2.5.3.3.3 Molecular weight calculations .................................................... 58 2.5.3.3.4 Modifications within library construction ..................................... 58 2.5.4 Permutations ............................................................................................. 59 2.5.4.1 Numerical representation of permutations ......................................... 59 2.5.4.2 Parameters of get_permutation() ....................................................... 60 2.5.4.2.1 The number of states for each ordered unit ............................... 60 2.5.4.2.2 A maximum sum of state values for the overall system .............. 60 2.5.4.3 Algorithm implemented for calculating permutations ........................ 61 2.5.5 Generating the predicted mass spectrum: mass and charge ................... 63 2.5.5.1 Module organization and function ..................................................... 64 2.5.5.2 Calculating molecular masses: atomMW()......................................... 64 2.5.5.3 Calculating peptide masses: protMW() .............................................. 64 2.5.5.4 Calculating m/z peak values: peak() .................................................. 64 2.5.5.5 Calculating a default maximum charge: MaxChargeDetectable() ..... 65 2.6. I DENTIFICATION OF PUTATIVE DI -PEPTIDES IN THE MS 1 SPECTRUM ......................... 65 2.6.1 Peak matching........................................................................................... 65 2.6.1.1 Determining if two compared peaks match ....................................... 66 2.6.1.2 Restricting the number of peak comparisons .................................... 66 2.6.1.3 Module organisation .........................................................................

A Bioinformatic Tool for Analysing the Structures of Protein Complexes by Means of Mass Spectrometry of Cross-Linked Proteins

Microbial and Chemical Analysis of Non-Saccharomyces Yeasts from Chambourcin Hybrid Grapes for Potential Use in Winemaking

Openchrom: a Cross-Platform Open Source Software for the Mass Spectrometric Analysis of Chromatographic Data Philip Wenig*, Juergen Odermatt

(GC–GC/MS and GC/MS) Workflows

Sardine Roe As a Source of Lipids to Produce Liposomes Marta Guedes, Ana Rita Costa-Pinto, Virgínia Gonçalves, Joana Moreira- Silva, Maria Tiritan, Rui L

Chemspectra: a Web-Based Spectra Editor for Analytical Data

WHAT INFLUENCE WOULD a CLOUD BASED SEMANTIC LABORATORY NOTEBOOK HAVE on the DIGITISATION and MANAGEMENT of SCIENTIFIC RESEARCH? by Samantha Kanza

Impact of Phyllosilicates on Amino Acid Formation Under Asteroidal Conditions V

Author Correction: Complete Biosynthesis of Cannabinoids and Their Unnatural Analogues in Yeast

Processing of Chromatographic Signals How to Separate the Wheat from the Chaff

Using Free/Libre and Open Source Software in the Geological Sciences

An Integrated Approach for Mixture Analysis Using MS and NMR Techniques†

Georgetown University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Chemistry