Bioinformatic Solutions to Complex Problems in Mass Spectrometry Based Analysis of Biomolecules
Total Page:16
File Type:pdf, Size:1020Kb
Brigham Young University BYU ScholarsArchive Theses and Dissertations 2014-07-01 Bioinformatic Solutions to Complex Problems in Mass Spectrometry Based Analysis of Biomolecules Ryan M. Taylor Brigham Young University - Provo Follow this and additional works at: https://scholarsarchive.byu.edu/etd Part of the Biochemistry Commons, and the Chemistry Commons BYU ScholarsArchive Citation Taylor, Ryan M., "Bioinformatic Solutions to Complex Problems in Mass Spectrometry Based Analysis of Biomolecules" (2014). Theses and Dissertations. 5585. https://scholarsarchive.byu.edu/etd/5585 This Dissertation is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected], [email protected]. Bioinformatic Solutions to Complex Problems in Mass Spectrometry Based Analysis of Biomolecules Ryan M. Taylor A dissertation submitted to the faculty of Brigham Young University in partial fulfillment of the requirements for the degree of Doctor of Philosophy John T. Prince, Chair Mark Clement Steven W. Graves Barry M. Willardson Dixon J. Woodbury Department of Chemistry and Biochemistry Brigham Young University July 2014 Copyright © 2014 Ryan M. Taylor All Rights Reserved ABSTRACT Bioinformatic Solutions to Complex Problems in Mass Spectrometry Based Analysis of Biomolecules Ryan M. Taylor Department of Chemistry and Biochemistry, BYU Doctor of Philosophy Biological research has benefitted greatly from the advent of omic methods. For many biomolecules, mass spectrometry (MS) methods are most widely employed due to the sensitivity which allows low quantities of sample and the speed which allows analysis of complex samples. Improvements in instrument and sample preparation techniques create opportunities for large scale experimentation. The complexity and volume of data produced by modern MS-omic instrumentation challenges biological interpretation, while the complexity of the instrumentation, sample noise, and complexity of data analysis present difficulties in maintaining and ensuring data quality, validity, and relevance. We present a corpus of tools which improves quality assurance capabilities of instruments, provides comparison abilities for evaluating data analysis tool performance, distills ideas pertinent in MS analysis into a consistent nomenclature, enhances all lipid analysis by automatic structural classification, implements a rigorous and chemically derived lipid fragmentation prediction tool, introduces custom structural analysis approaches and validation techniques, simplifies protein analysis form SDS-PAGE sample excisions, and implements a robust peak detection algorithm. These contributions provide improved identification of biomolecules, improved quantitation, and improve data quality and algorithm clarity to the MS-omic field. Keywords: bioinformatics, quality assurance, mass spectrometry, lipidomics, proteomics, machine learning, lipid fragmentation, simulated dataset ACKNOWLEDGEMENTS To my parents for instilling a thirst for knowledge, my collaborators for sharing their knowledge, experience and judgement, my wife for all her support, my coworkers for reminding me why I enjoy research, my mentors for their guidance and counsel, particularly John who has always valued my growth first and foremost, both intellectually and spiritually, and my Heavenly Father who has never given up on me. TABLE OF CONTENTS Bioinformatic solutions to complex problems in mass spectrometry based analysis of biomolecules .......................................................................................................................................... i ABSTRACT ..........................................................................................................................................ii ACKNOWLEDGEMENTS ............................................................................................................... iii TABLE OF CONTENTS .................................................................................................................... iv LIST OF TABLES............................................................................................................................ viii LIST OF FIGURES ............................................................................................................................. ix Chapter 1 Introduction .................................................................................................................... 1 1.1 Biomolecules make life possible .......................................................................................... 1 1.2 The rise of ‘omics .................................................................................................................. 1 1.3 Mass Spectrometry enables analysis of biomolecules ........................................................ 2 1.4 State of Proteomics ................................................................................................................ 4 1.5 State of Lipidomics ............................................................................................................... 4 1.6 Data analysis is the major bottleneck ................................................................................... 5 Chapter 2 Improving nomenclature to uniquely map molecular entities to mass spectrometry signal ......................................................................................................................................... 9 2.1 Abstract .................................................................................................................................. 9 2.2 Background .......................................................................................................................... 10 2.3 Results and discussion......................................................................................................... 25 iv 2.4 Expected Objections ............................................................................................................ 33 2.5 Conclusion ........................................................................................................................... 35 Chapter 3 Sequence and Structural Characterization of Great Salt Lake Bacteriophage CW02, a Member of the T7-Like Supergroup ............................................................................................... 36 3.1 Abstract ................................................................................................................................ 36 3.2 Introduction .......................................................................................................................... 37 3.3 Materials and methods ........................................................................................................ 38 3.4 Results and discussion......................................................................................................... 56 Chapter 4 Resolving double disulfide bond patterns in SNAP25B using liquid chromatography–ion trap mass spectrometry ................................................................................... 72 4.1 Abstract ................................................................................................................................ 72 4.2 Introduction .......................................................................................................................... 73 4.3 Materials and method .......................................................................................................... 76 4.4 Results .................................................................................................................................. 80 4.5 Discussion and conclusions ................................................................................................ 91 4.6 Acknowledgements ............................................................................................................. 94 Chapter 5 Automatic lipid classification by machine learning .................................................. 95 5.1 Abstract ................................................................................................................................ 95 5.2 Introduction .......................................................................................................................... 96 5.3 Methods ................................................................................................................................ 97 v 5.4 Results ................................................................................................................................ 100 5.5 Discussion .......................................................................................................................... 102 Chapter 6 Msplinter: a structure driven approach to lipid fragmentation prediction.............. 105 6.1 Abstract .............................................................................................................................. 105 6.2 Introduction ........................................................................................................................ 106 6.3 Methods .............................................................................................................................. 108 6.4 Results ...............................................................................................................................