
A BIOINFORMATIC TOOL FOR ANALYSING THE STRUCTURES OF PROTEIN COMPLEXES BY MEANS OF MASS SPECTROMETRY OF CROSS-LINKED PROTEINS by Shannon L.N. Mayne Supervisor: Prof. Hugh -G. Patterton A Dissertation in fulfilment of a Masters of Science degree in Biochemistry University of the Free State 2013 DECLARATION I declare that the dissertation hereby submitted for the Magister Scientiae degree at the University of the Free State through the Faculty of Natural and Agricultural Sciences is my own work and has not been previously submitted by me at another University for any degree. I cede copyright of this dissertation in favour of the University of the Free State. ______________________ Shannon Leon Noël Mayne January 2013 ACKNOWLEDGEMENTS Thanks to staff and students at the University of the Free State, in particular: Pankaj Sharma and Gabre Kemp for experimental assistance, as well as Leon du Preez and the UFS ICT Services staff for assistance with the server access and settings. Special thanks are extended to Professor Hugh Patterton for invaluable input, guidance and indefatigable patience. Heartfelt gratitude and appreciation go to my family and closest friends for their unstinting support and understanding throughout. A postgraduate bursary from the former National Bioinformatics Network (NBN) is also gratefully acknowledged. i TABLE OF CONTENTS DECLARATION ............................................................................................................... I ACKNOWLEDGEMENTS ............................................................................................... I TABLE OF CONTENTS ................................................................................................. II LIST OF FIGURES ..................................................................................................... VIII LIST OF TABLES ....................................................................................................... XIII LIST OF EQUATIONS ................................................................................................ XIII CHAPTER 1 LITERATURE REVIEW: BIOINFORMATICS TOOLS FOR THE STRUCTURAL ELUCIDATION OF MULTI-SUBUNIT PROTEIN COMPLEXES BY MASS SPECTROMETRIC ANALYSIS OF PROTEIN-PROTEIN CROSS-LINKS. .............. 1 1.1. I NTRODUCTION .................................................................................................. 1 1.2. MS3D DATA ANALYSIS ........................................................................................ 2 1.2.1 Detection of cross-linked peptides ............................................................ 3 1.2.1.1 Non-cross-linked controls .................................................................. 6 1.2.1.2 Isotope labelling ................................................................................. 6 1.2.1.3 Post-fragmentation reporter ions ....................................................... 7 1.2.2 Matching peaks to a library of possible peptide dimers ............................ 7 1.2.3 Generating the library of peptide dimers ................................................... 8 1.2.4 Matching experimental peaks to the theoretical library ............................. 10 1.2.5 Identification of the cross-linked residues in the di-peptide ...................... 11 1.2.5.1 Generating an MS/MS fragment library ............................................. 11 1.2.5.2 Matching MS/MS spectra .................................................................. 12 1.2.5.3 Non-probabilistic scoring.................................................................... 13 1.2.5.4 Probabilistic scoring ........................................................................... 14 1.2.6 Structure modelling ................................................................................... 14 1.2.7 Data input and output ................................................................................ 15 1.2.8 Software release ....................................................................................... 15 1.3. D ISCUSSION ...................................................................................................... 17 1.4. R EFERENCES .................................................................................................... 19 ii CHAPTER 2 THE DEVELOPMENT OF ANCHORMS, A BIOINFORMATICS TOOL TO ELUCIDATE THE STRUCTURE OF PROTEIN COMPLEXES BY ANALYSIS OF MASS SPECTRA OF CHEMICALLY CROSS-LINKED COMPLEXES. ................... 31 2.1. I NTRODUCTION .................................................................................................. 31 2.2.W ORKFLOW AND ORGANISATION .......................................................................... 35 2.3. P ARSING AND DATA FORMATS .............................................................................. 40 2.3.1 Data formats commonly used for MS data ............................................... 40 2.3.2 Data file formats supported by AnchorMS ................................................ 41 2.3.3 Parsing of data files in AnchorMS and module organization ..................... 42 2.3.4 Experimental mass spectrum quality ........................................................ 44 2.4. L IBRARY CONSTRUCTION .................................................................................... 44 2.4.1 Overview of library size ............................................................................. 44 2.4.2 Restriction of library size ........................................................................... 45 2.4.3 Implementation and data structures .......................................................... 45 2.5 P RECURSOR LIBRARY CONSTRUCTION (MS 1) ....................................................... 46 2.5.1 Digestion ................................................................................................... 46 2.5.1.1 Protease cleavage model .................................................................. 46 2.5.1.2 Module organisation .......................................................................... 47 2.5.1.3 Parsing and parameters .................................................................... 48 2.5.1.4 Algorithm and workflow ...................................................................... 49 2.5.2 Cross-linking.............................................................................................. 50 2.5.2.1 Cross-linking structural information ................................................... 50 2.5.2.2 Module organization .......................................................................... 52 2.5.2.3 Algorithm and workflow ..................................................................... 52 2.5.2.4 Data structures and parsing .............................................................. 53 2.5.2.5 Cross-linking within library construction ............................................ 54 2.5.3 Modifications ............................................................................................. 54 2.5.3.1 Module organisation .......................................................................... 55 2.5.3.2 Combinatorial space and library size restriction ................................ 56 2.5.3.2.1 Fixed and variable modifications ................................................ 56 2.5.3.2.2 Combinatorial space expansion ................................................. 56 2.5.3.2.3 Combinatorial space constraint .................................................. 56 2.5.3.3 Algorithm and data structures ........................................................... 57 2.5.3.3.1 Possible modifications ............................................................... 57 iii 2.5.3.3.2 Permutations of modification states ........................................... 57 2.5.3.3.3 Molecular weight calculations .................................................... 58 2.5.3.3.4 Modifications within library construction ..................................... 58 2.5.4 Permutations ............................................................................................. 59 2.5.4.1 Numerical representation of permutations ......................................... 59 2.5.4.2 Parameters of get_permutation() ....................................................... 60 2.5.4.2.1 The number of states for each ordered unit ............................... 60 2.5.4.2.2 A maximum sum of state values for the overall system .............. 60 2.5.4.3 Algorithm implemented for calculating permutations ........................ 61 2.5.5 Generating the predicted mass spectrum: mass and charge ................... 63 2.5.5.1 Module organization and function ..................................................... 64 2.5.5.2 Calculating molecular masses: atomMW()......................................... 64 2.5.5.3 Calculating peptide masses: protMW() .............................................. 64 2.5.5.4 Calculating m/z peak values: peak() .................................................. 64 2.5.5.5 Calculating a default maximum charge: MaxChargeDetectable() ..... 65 2.6. I DENTIFICATION OF PUTATIVE DI -PEPTIDES IN THE MS 1 SPECTRUM ......................... 65 2.6.1 Peak matching........................................................................................... 65 2.6.1.1 Determining if two compared peaks match ....................................... 66 2.6.1.2 Restricting the number of peak comparisons .................................... 66 2.6.1.3 Module organisation .........................................................................
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages227 Page
-
File Size-